Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

Browse Through:

Default Task

Classification (19)
Regression (9)
Clustering (8)
Other (0)

Attribute Type

Categorical (1)
Numerical (19)
Mixed (0)

Data Type - Undo

Multivariate (125)
Univariate (4)
Sequential (21)
Time-Series (34)
Text (9)
Domain-Theory (2)
Other (3)


Life Sciences (2)
Physical Sciences (1)
CS / Engineering (12)
Social Sciences (0)
Business (3)
Game (0)
Other (3)

# Attributes - Undo

Less than 10 (15)
10 to 100 (21)
Greater than 100 (2)

# Instances - Undo

Less than 100 (1)
100 to 1000 (2)
Greater than 1000 (21)

Format Type

Matrix (16)
Non-Matrix (5)

21 Data Sets

Table View  List View

1. UJIIndoorLoc-Mag: The UJIIndoorLoc-Mag is an indoor localization database to test Indoor Positioning System that rely on Earth's magnetic field variations.

2. Educational Process Mining (EPM): A Learning Analytics Data Set: Educational Process Mining data set is built from the recordings of 115 subjects' activities through a logging application while learning with an educational simulator.

3. Pedestrian in Traffic Dataset: This data-set contains a number of pedestrian tracks recorded from a vehicle driving in a town in southern Germany. The data is particularly well-suited for multi-agent motion prediction tasks.

4. clickstream data for online shopping: The dataset contains information on clickstream from online store offering clothing for pregnant women.

5. EEG Eye State: The data set consists of 14 EEG values and a value indicating the eye state.

6. BLE RSSI Dataset for Indoor localization and Navigation: This dataset contains RSSI readings gathered from an array of Bluetooth Low Energy (BLE) iBeacons in a real-world and operational indoor environment for localization and navigation purposes.

7. Bach Choral Harmony: The data set is composed of 60 chorales (5665 events) by J.S. Bach (1675-1750). Each event of each chorale is labelled using 1 among 101 chord labels and described through 14 features.

8. Wearable Computing: Classification of Body Postures and Movements (PUC-Rio): A dataset with 5 classes (sitting-down, standing-up, standing, walking, and sitting) collected on 8 hours of activities of 4 healthy subjects. We also established a baseline performance index.

9. microblogPCU: MicroblogPCU data is crawled from sina weibo microblog[]. This data can be used to study machine learning methods as well as do some social network research.

10. SML2010: This dataset is collected from a monitor system mounted in a domotic house. It corresponds to approximately 40 days of monitoring data.

11. Wall-Following Robot Navigation Data: The data were collected as the SCITOS G5 robot navigates through the room following the wall in a clockwise direction, for 4 rounds, using 24 ultrasound sensors arranged circularly around its 'waist'.

12. Geo-Magnetic field and WLAN dataset for indoor localisation from wristband and smartphone: A multisource and multivariate dataset for indoor localisation methods based on WLAN and Geo-Magnetic field fingerprinting

13. Incident management process enriched event log: This event log was extracted from data gathered from the audit system of an instance of the ServiceNow platform used by an IT company and enriched with data loaded from a relational database.

14. Human Activity Recognition from Continuous Ambient Sensor Data: This dataset represents ambient data collected in homes with volunteer residents. Data are collected continuously while residents perform their normal routines.

15. Gesture Phase Segmentation: The dataset is composed by features extracted from 7 videos with people gesticulating, aiming at studying Gesture Phase Segmentation. It contains 50 attributes divided into two files for each video.

16. Molecular Biology (Splice-junction Gene Sequences): Primate splice-junction gene sequences (DNA) with associated imperfect domain theory

17. Hybrid Indoor Positioning Dataset from WiFi RSSI, Bluetooth and magnetometer: The dataset was created for the comparison and evaluation of hybrid indoor positioning methods. The dataset presented contains data from W-LAN and Bluetooth interfaces, and Magnetometer.

18. Ozone Level Detection: Two ground ozone level data sets are included in this collection. One is the eight hour peak set (, the other is the one hour peak set ( Those data were collected from 1998 to 2004 at the Houston, Galveston and Brazoria area.

19. CNNpred: CNN-based stock market prediction using a diverse set of variables: This dataset contains several daily features of S&P 500, NASDAQ Composite, Dow Jones Industrial Average, RUSSELL 2000, and NYSE Composite from 2010 to 2017.

20. Cargo 2000 Freight Tracking and Tracing: Sanitized and anonymized Cargo 2000 (C2K) airfreight tracking and tracing events, covering five months of business execution (3,942 process instances, 7,932 transport legs, 56,082 activities).

21. Grammatical Facial Expressions: This dataset supports the development of models that make possible to interpret Grammatical Facial Expressions from Brazilian Sign Language (Libras).

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML