Thyroid Disease

Donated on 12/31/1986

10 separate databases from Garavan Institute

Dataset Characteristics

Multivariate, Domain-Theory

Subject Area

Health and Medicine

Associated Tasks

Classification

Feature Type

Categorical, Real

# Instances

7200

# Features

Dataset Information

Additional Information

# From Garavan Institute # Documentation: as given by Ross Quinlan # 6 databases from the Garavan Institute in Sydney, Australia # Approximately the following for each database: ** 2800 training (data) instances and 972 test instances ** Plenty of missing data ** 29 or so attributes, either Boolean or continuously-valued # 2 additional databases, also from Ross Quinlan, are also here ** Hypothyroid.data and sick-euthyroid.data ** Quinlan believes that these databases have been corrupted ** Their format is highly similar to the other databases # 1 more database of 9172 instances that cover 20 classes, and a related domain theory # Another thyroid database from Stefan Aeberhard ** 3 classes, 215 instances, 5 attributes ** No missing values # A Thyroid database suited for training ANNs ** 3 classes ** 3772 training instances, 3428 testing instances ** Includes cost data (donated by Peter Turney)

Has Missing Values?

Variables Table

Variable Name	Role	Type	Missing Values
Class	Target	Categorical	no
Attribute1	Feature	Integer	no
Attribute2	Feature	Continuous	no
Attribute3	Feature	Continuous	no
Attribute4	Feature	Continuous	no
Attribute5	Feature	Continuous	no

Rows per page

0 to 6 of 6

Baseline Model Performance

Dataset Files

File	Size
thyroid0387.data	754.5 KB
ann-train.data	258.5 KB
allhypo.data	237.5 KB
allbp.data	236.8 KB
ann-test.data	235.5 KB

Rows per page

0 to 5 of 39

Papers Citing this Dataset

Support vector machine with quantile hyper-spheres for pattern classification

By Maoxiang Chu, Xiaoping Liu, Rongfen Gong, Jie Zhao. 2019

Published in PloS one.

Extreme Value Theory for Open Set Classification -- GPD and GEV Classifiers

By Edoardo Vignotto, Sebastian Engelke. 2018

Published in ArXiv.

DOPING: Generative Data Augmentation for Unsupervised Anomaly Detection with GAN

By Swee Lim, Yi Loo, Ngoc-Trung Tran, Ngai-Man Cheung, Gemma Roig, Yuval Elovici. 2018

Published in ArXiv.

Entity Attribute Value Style Modeling Approach for Archetype Based Data

By Shivani Batra, Shelly Sachdeva, Subhash Bhalla. 2018

Published in Information.

Credit Card Fraud Detection in e-Commerce: An Outlier Detection Approach

By Utkarsh Porwal, Smruthi Mukund. 2018

Published in ArXiv.

Rows per page

0 to 5 of 18

Reviews

There are no reviews for this dataset yet.

Download (610.3 KB)

18 citations

79613 views

Creators

Ross Quinlan

DOI

10.24432/C5D010

License

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.

Thyroid Disease

Donated on 12/31/1986

Dataset Characteristics

Subject Area

Associated Tasks

Feature Type

# Instances

# Features

Dataset Information

Variables Table

Baseline Model Performance

Dataset Files

Papers Citing this Dataset

Reviews

Write a Review

Creators

DOI

License