Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

Browse Through:

Default Task - Undo

Classification (66)
Regression (15)
Clustering (13)
Other (2)

Attribute Type

Categorical (0)
Numerical (14)
Mixed (0)

Data Type

Multivariate (14)
Univariate (2)
Sequential (0)
Time-Series (3)
Text (2)
Domain-Theory (0)
Other (0)

Area - Undo

Life Sciences (15)
Physical Sciences (8)
CS / Engineering (28)
Social Sciences (5)
Business (15)
Game (1)
Other (6)

# Attributes - Undo

Less than 10 (7)
10 to 100 (15)
Greater than 100 (0)

# Instances

Less than 100 (0)
100 to 1000 (8)
Greater than 1000 (7)

Format Type

Matrix (14)
Non-Matrix (1)

15 Data Sets

Table View  List View


1. Algerian Forest Fires Dataset : The dataset includes 244 instances that regroup a data of two regions of Algeria.

2. Bone marrow transplant: children: The data set describes pediatric patients with several hematologic diseases, who were subject to the unmanipulated allogeneic unrelated donor hematopoietic stem cell transplantation.

3. Breast Cancer Wisconsin (Prognostic): Prognostic Wisconsin Breast Cancer Database

4. Early biomarkers of Parkinson’s disease based on natural connected speech: Predict a pattern of neurodegeneration in the dataset of speech features obtained from patients with early untreated Parkinson’s disease and patients at high risk developing Parkinson’s disease.

5. EEG Steady-State Visual Evoked Potential Signals: This database consists on 30 subjects performing Brain Computer Interface for Steady State Visual Evoked Potentials (BCI-SSVEP).

6. Estimation of obesity levels based on eating habits and physical condition : This dataset include data for the estimation of obesity levels in individuals from the countries of Mexico, Peru and Colombia, based on their eating habits and physical condition.

7. Fertility: 100 volunteers provide a semen sample analyzed according to the WHO 2010 criteria. Sperm concentration are related to socio-demographic data, environmental factors, health status, and life habits

8. Heart failure clinical records: This dataset contains the medical records of 299 patients who had heart failure, collected during their follow-up period, where each patient profile has 13 clinical features.

9. Hungarian Chickenpox Cases: A spatio-temporal dataset of weekly chickenpox cases from Hungary. The dataset consists of a county-level adjacency matrix and time series of the county-level reported cases between 2005 and 2015.

10. KEGG Metabolic Reaction Network (Undirected): KEGG Metabolic pathways modeled as un-directed reaction network. Variety of graphical features presented.

11. KEGG Metabolic Relation Network (Directed): KEGG Metabolic pathways modeled as directed relation network. Variety of graphical features presented.

12. Parkinson Speech Dataset with Multiple Types of Sound Recordings: The training data belongs to 20 Parkinson's Disease (PD) patients and 20 healthy subjects. From all subjects, multiple types of sound recordings (26) are taken.

13. Parkinsons Telemonitoring: Oxford Parkinson's Disease Telemonitoring Dataset

14. QSAR Bioconcentration classes dataset: Dataset of manually-curated Bioconcentration factor (BCF, fish) and mechanistic classes for QSAR modeling.

15. Simulated data for survival modelling: A variety of survival data, with carefully controlled event and censor rates, is available to allow people to develop and test new approaches to survival modelling.


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML