Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

Browse Through:

Default Task

Classification (19)
Regression (9)
Clustering (11)
Other (0)

Attribute Type - Undo

Categorical (1)
Numerical (24)
Mixed (0)

Data Type - Undo

Multivariate (92)
Univariate (6)
Sequential (24)
Time-Series (43)
Text (16)
Domain-Theory (4)
Other (0)

Area - Undo

Life Sciences (4)
Physical Sciences (1)
CS / Engineering (24)
Social Sciences (0)
Business (6)
Game (0)
Other (3)

# Attributes

Less than 10 (7)
10 to 100 (12)
Greater than 100 (2)

# Instances - Undo

Less than 100 (2)
100 to 1000 (0)
Greater than 1000 (24)

Format Type

Matrix (12)
Non-Matrix (12)

24 Data Sets

Table View  List View


1. Educational Process Mining (EPM): A Learning Analytics Data Set: Educational Process Mining data set is built from the recordings of 115 subjects' activities through a logging application while learning with an educational simulator.

2. UJI Pen Characters: Data consists of written characters in a UNIPEN-like format

3. UJI Pen Characters (Version 2): A pen-based database with more than 11k isolated handwritten characters

4. BLE RSSI Dataset for Indoor localization and Navigation: This dataset contains RSSI readings gathered from an array of Bluetooth Low Energy (BLE) iBeacons in a real-world and operational indoor environment for localization and navigation purposes.

5. Online Handwritten Assamese Characters Dataset: This is a dataset of 8235 online handwritten assamese characters. The “online” process involves capturing of data as text is written on a digitizing tablet with an electronic pen.

6. microblogPCU: MicroblogPCU data is crawled from sina weibo microblog[http://weibo.com/]. This data can be used to study machine learning methods as well as do some social network research.

7. UJIIndoorLoc-Mag: The UJIIndoorLoc-Mag is an indoor localization database to test Indoor Positioning System that rely on Earth's magnetic field variations.

8. Geo-Magnetic field and WLAN dataset for indoor localisation from wristband and smartphone: A multisource and multivariate dataset for indoor localisation methods based on WLAN and Geo-Magnetic field fingerprinting

9. Wearable Computing: Classification of Body Postures and Movements (PUC-Rio): A dataset with 5 classes (sitting-down, standing-up, standing, walking, and sitting) collected on 8 hours of activities of 4 healthy subjects. We also established a baseline performance index.

10. Kitsune Network Attack Dataset: A cybersecurity dataset containing nine different network attacks on a commercial IP-based surveillance system and an IoT network. The dataset includes reconnaissance, MitM, DoS, and botnet attacks.

11. SML2010: This dataset is collected from a monitor system mounted in a domotic house. It corresponds to approximately 40 days of monitoring data.

12. Pedestrian in Traffic Dataset: This data-set contains a number of pedestrian tracks recorded from a vehicle driving in a town in southern Germany. The data is particularly well-suited for multi-agent motion prediction tasks.

13. CNNpred: CNN-based stock market prediction using a diverse set of variables: This dataset contains several daily features of S&P 500, NASDAQ Composite, Dow Jones Industrial Average, RUSSELL 2000, and NYSE Composite from 2010 to 2017.

14. Grammatical Facial Expressions: This dataset supports the development of models that make possible to interpret Grammatical Facial Expressions from Brazilian Sign Language (Libras).

15. Taxi Service Trajectory - Prediction Challenge, ECML PKDD 2015: An accurate dataset describing trajectories performed by all the 442 taxis running in the city of Porto, in Portugal.

16. Indoor User Movement Prediction from RSS data: This dataset contains temporal data from a Wireless Sensor Network deployed in real-world office environments. The task is intended as real-life benchmark in the area of Ambient Assisted Living.

17. Activity Recognition system based on Multisensor data fusion (AReM): This dataset contains temporal data from a Wireless Sensor Network worn by an actor performing the activities: bending, cycling, lying down, sitting, standing, walking.

18. Hybrid Indoor Positioning Dataset from WiFi RSSI, Bluetooth and magnetometer: The dataset was created for the comparison and evaluation of hybrid indoor positioning methods. The dataset presented contains data from W-LAN and Bluetooth interfaces, and Magnetometer.

19. DSRC Vehicle Communications: This set Provides data regarding wireless communications between vehicles and road side units. two separate data sets are provided (normal scenario) and in the presence of attacker (jammer).

20. detection_of_IoT_botnet_attacks_N_BaIoT: This dataset addresses the lack of public botnet datasets, especially for the IoT. It suggests *real* traffic data, gathered from 9 commercial IoT devices authentically infected by Mirai and BASHLITE.

21. Wall-Following Robot Navigation Data: The data were collected as the SCITOS G5 robot navigates through the room following the wall in a clockwise direction, for 4 rounds, using 24 ultrasound sensors arranged circularly around its 'waist'.

22. GNFUV Unmanned Surface Vehicles Sensor Data Set 2: The data-set contains eight (2x4) data-sets of mobile sensor readings data (humidity, temperature) corresponding to a swarm of four Unmanned Surface Vehicles (USVs) in a test-bed, Athens, Greece.

23. Parking Birmingham: Data collected from car parks in Birmingham that are operated by NCP from Birmingham City Council. UK Open Government Licence (OGL). https://data.birmingham.gov.uk/dataset/birmingham-parking

24. 3D Road Network (North Jutland, Denmark): 3D road network with highly accurate elevation information (+-20cm) from Denmark used in eco-routing and fuel/Co2-estimation routing algorithms.


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML