Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

Browse Through:

Default Task - Undo

Classification (72)
Regression (17)
Clustering (21)
Other (9)

Attribute Type

Categorical (1)
Numerical (2)
Mixed (5)

Data Type

Multivariate (6)
Univariate (1)
Sequential (1)
Time-Series (5)
Text (1)
Domain-Theory (0)
Other (0)


Life Sciences (2)
Physical Sciences (0)
CS / Engineering (1)
Social Sciences (0)
Business (1)
Game (0)
Other (4)

# Attributes - Undo

Less than 10 (9)
10 to 100 (12)
Greater than 100 (3)

# Instances

Less than 100 (0)
100 to 1000 (5)
Greater than 1000 (4)

Format Type

Matrix (7)
Non-Matrix (2)

9 Data Sets

Table View  List View

1. Bach Chorales: Time-series data based on chorales; challenge is to learn generative grammar; data in Lisp

2. CalIt2 Building People Counts: This data comes from the main door of the CalIt2 building at UCI.

3. Connectionist Bench (Nettalk Corpus): The file "" contains a list of 20,008 English words, along with a phonetic transcription for each word. The task is to train a network to produce the proper phonemes

4. Dodgers Loop Sensor: Loop sensor data was collected for the Glendale on ramp for the 101 North freeway in Los Angeles

5. Eco-hotel: This dataset includes Online Textual Reviews from both online (e.g., TripAdvisor) and offline (e.g., Guests' book) sources from the Areias do Seixo Eco-Resort.

6. EEG Database: This data arises from a large study to examine EEG correlates of genetic predisposition to alcoholism. It contains measurements from 64 electrodes placed on the scalp sampled at 256 Hz

7. EMG dataset in Lower Limb: 3 different exercises: sitting, standing and walking in the muscles: biceps femoris, vastus medialis, rectus femoris and semitendinosus addition to goniometry in the exercises.

8. Liver Disorders: BUPA Medical Research Ltd. database donated by Richard S. Forsyth

9. QtyT40I10D100K: Since there is no numerical sequential data stream available in standard data sets, this data set is generated from the original T40I10D100K data set

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML