Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

Browse Through:

Default Task - Undo

Classification (69)
Regression (22)
Clustering (13)
Other (2)

Attribute Type - Undo

Categorical (2)
Numerical (13)
Mixed (1)

Data Type

Multivariate (12)
Univariate (2)
Sequential (5)
Time-Series (5)
Text (2)
Domain-Theory (0)
Other (0)


Life Sciences (4)
Physical Sciences (1)
CS / Engineering (4)
Social Sciences (0)
Business (1)
Game (0)
Other (3)

# Attributes - Undo

Less than 10 (8)
10 to 100 (13)
Greater than 100 (6)

# Instances

Less than 100 (0)
100 to 1000 (4)
Greater than 1000 (9)

Format Type - Undo

Matrix (13)
Non-Matrix (1)

13 Data Sets

Table View  List View

1. Diabetes 130-US hospitals for years 1999-2008: This data has been prepared to analyze factors related to readmission as well as other outcomes pertaining to patients with diabetes.

2. Gesture Phase Segmentation: The dataset is composed by features extracted from 7 videos with people gesticulating, aiming at studying Gesture Phase Segmentation. It contains 50 attributes divided into two files for each video.

3. Dow Jones Index: This dataset contains weekly data for the Dow Jones Industrial Index. It has been used in computational investing research.

4. Grammatical Facial Expressions: This dataset supports the development of models that make possible to interpret Grammatical Facial Expressions from Brazilian Sign Language (Libras).

5. Mice Protein Expression: Expression levels of 77 proteins measured in the cerebral cortex of 8 classes of control and Down syndrome mice exposed to context fear conditioning, a task used to assess associative learning.

6. Heterogeneity Activity Recognition: The Heterogeneity Human Activity Recognition (HHAR) dataset from Smartphones and Smartwatches is a dataset devised to benchmark human activity recognition algorithms (classification, automatic data segmentation, sensor fusion, feature extraction, etc.) in real-world contexts; specifically, the dataset is gathered with a variety of different device models and use-scenarios, in order to reflect sensing heterogeneities to be expected in real deployments.

7. Libras Movement: The data set contains 15 classes of 24 instances each. Each class references to a hand movement type in LIBRAS (Portuguese name 'LÍngua BRAsileira de Sinais', oficial brazilian signal language).

8. Tennis Major Tournament Match Statistics: This is a collection of 8 files containing the match statistics for both women and men at the four major tennis tournaments of the year 2013. Each file has 42 columns and a minimum of 76 rows.

9. UJIIndoorLoc-Mag: The UJIIndoorLoc-Mag is an indoor localization database to test Indoor Positioning System that rely on Earth's magnetic field variations.

10. Educational Process Mining (EPM): A Learning Analytics Data Set: Educational Process Mining data set is built from the recordings of 115 subjects' activities through a logging application while learning with an educational simulator.

11. KEGG Metabolic Relation Network (Directed): KEGG Metabolic pathways modeled as directed relation network. Variety of graphical features presented.

12. KEGG Metabolic Reaction Network (Undirected): KEGG Metabolic pathways modeled as un-directed reaction network. Variety of graphical features presented.

13. Water Treatment Plant: Multiple classes predict plant state

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML