Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

Browse Through:

Default Task - Undo

Classification (66)
Regression (14)
Clustering (10)
Other (3)

Attribute Type - Undo

Categorical (0)
Numerical (14)
Mixed (0)

Data Type

Multivariate (14)
Univariate (2)
Sequential (0)
Time-Series (2)
Text (3)
Domain-Theory (0)
Other (0)

Area - Undo

Life Sciences (14)
Physical Sciences (15)
CS / Engineering (49)
Social Sciences (6)
Business (11)
Game (1)
Other (6)

# Attributes

Less than 10 (6)
10 to 100 (8)
Greater than 100 (0)

# Instances

Less than 100 (0)
100 to 1000 (3)
Greater than 1000 (11)

Format Type

Matrix (12)
Non-Matrix (2)

14 Data Sets

Table View  List View


1. Bar Crawl: Detecting Heavy Drinking: Accelerometer and transdermal alcohol content data from a college bar crawl. Used to predict heavy drinking episodes via mobile data.

2. Breast Cancer Wisconsin (Prognostic): Prognostic Wisconsin Breast Cancer Database

3. Cuff-Less Blood Pressure Estimation: This Data set provides preprocessed and cleaned vital signals which can be used in designing algorithms for cuff-less estimation of the blood pressure.

4. Drug Review Dataset (Drugs.com): The dataset provides patient reviews on specific drugs along with related conditions and a 10 star patient rating reflecting overall patient satisfaction.

5. Early biomarkers of Parkinson’s disease based on natural connected speech: Predict a pattern of neurodegeneration in the dataset of speech features obtained from patients with early untreated Parkinson’s disease and patients at high risk developing Parkinson’s disease.

6. EEG Steady-State Visual Evoked Potential Signals: This database consists on 30 subjects performing Brain Computer Interface for Steady State Visual Evoked Potentials (BCI-SSVEP).

7. Fertility: 100 volunteers provide a semen sample analyzed according to the WHO 2010 criteria. Sperm concentration are related to socio-demographic data, environmental factors, health status, and life habits

8. KEGG Metabolic Reaction Network (Undirected): KEGG Metabolic pathways modeled as un-directed reaction network. Variety of graphical features presented.

9. KEGG Metabolic Relation Network (Directed): KEGG Metabolic pathways modeled as directed relation network. Variety of graphical features presented.

10. Parkinson Speech Dataset with Multiple Types of Sound Recordings: The training data belongs to 20 Parkinson's Disease (PD) patients and 20 healthy subjects. From all subjects, multiple types of sound recordings (26) are taken.

11. Parkinsons Telemonitoring: Oxford Parkinson's Disease Telemonitoring Dataset

12. Physicochemical Properties of Protein Tertiary Structure: This is a data set of Physicochemical Properties of Protein Tertiary Structure. The data set is taken from CASP 5-9. There are 45730 decoys and size varying from 0 to 21 armstrong.

13. QSAR fish bioconcentration factor (BCF): Experimental bioconcentration factor (BCF) for 1056 molecules and binary fingeprints (extended connectivity) to be used for QSAR modeling.

14. Tamilnadu Electricity Board Hourly Readings: This data can be effectively produced the result to fewer parameter of the Load profile can be reduced in the Database


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML