Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

Browse Through:

Default Task

Classification (45)
Regression (8)
Clustering (11)
Other (15)

Attribute Type

Categorical (7)
Numerical (31)
Mixed (13)

Data Type

Multivariate (43)
Univariate (5)
Sequential (7)
Time-Series (13)
Text (8)
Domain-Theory (2)
Other (7)

Area - Undo

Life Sciences (89)
Physical Sciences (47)
CS / Engineering (129)
Social Sciences (23)
Business (25)
Game (10)
Other (67)

# Attributes

Less than 10 (14)
10 to 100 (30)
Greater than 100 (5)

# Instances

Less than 100 (3)
100 to 1000 (21)
Greater than 1000 (33)

Format Type

Matrix (41)
Non-Matrix (26)

67 Data Sets

Table View  List View

Name

Data Types

Default Task

Attribute Types

# Instances

# Attributes

Year

 

Activity Recognition from Single Chest-Mounted Accelerometer

Univariate, Sequential, Time-Series 

Classification, Clustering 

Real 

 

 

2014 

 

Air quality

Multivariate, Time-Series 

Regression 

Real 

9358 

15 

2016 

 

Australian Sign Language signs

Multivariate, Time-Series 

Classification 

Categorical, Real 

6650 

15 

1999 

 

Australian Sign Language signs (High Quality)

Multivariate, Time-Series 

Classification 

Real 

2565 

22 

2002 

 

Auto MPG

Multivariate 

Regression 

Categorical, Real 

398 

1993 

 

Automobile

Multivariate 

Regression 

Categorical, Integer, Real 

205 

26 

1987 

 

AutoUniv

Multivariate 

Classification 

Categorical, Integer, Real 

 

 

2010 

 

Bach Choral Harmony

Sequential 

Classification 

 

5665 

17 

2014 

 

Bach Chorales

Univariate, Time-Series 

 

Categorical, Integer 

100 

 

 

Badges

Univariate, Text 

Classification 

 

294 

1994 

 

Bag of Words

Text 

Clustering 

Integer 

8000000 

100000 

2008 

 

CalIt2 Building People Counts

Multivariate, Time-Series 

 

Categorical, Integer 

10080 

2006 

 

Car Evaluation

Multivariate 

Classification 

Categorical 

1728 

1997 

 

Chronic_Kidney_Disease

Multivariate 

Classification 

Real 

400 

25 

2015 

 

CMU Face Images

Image 

Classification 

Integer 

640 

 

1999 

 

Connectionist Bench (Nettalk Corpus)

Multivariate 

 

Categorical 

20008 

 

 

Connectionist Bench (Vowel Recognition - Deterding Data)

 

Classification 

Real 

528 

10 

 

 

Corel Image Features

Multivariate 

 

Real 

68040 

89 

1999 

 

Dexter

Multivariate 

Classification 

Integer 

2600 

20000 

2008 

 

DGP2 - The Second Data Generation Program

Data-Generator 

 

Real 

 

 

 

 

Document Understanding

 

 

 

 

 

1994 

 

Dodgers Loop Sensor

Multivariate, Time-Series 

 

Categorical, Integer 

50400 

2006 

 

Entree Chicago Recommendation Data

Transactional, Sequential 

Recommender-Systems 

Categorical 

50672 

 

2000 

 

Facebook Comment Volume Dataset

Multivariate 

Regression 

Integer, Real 

40949 

54 

2016 

 

Firm-Teacher_Clave-Direction_Classification

Multivariate 

Classification 

 

10800 

20 

2015 

 

Flags

Multivariate 

Classification 

Categorical, Integer 

194 

30 

1990 

 

Folio

Multivariate 

Classification, Clustering 

 

637 

20 

2015 

 

Geographical Original of Music

Multivariate 

Classification, Regression 

Real 

1059 

68 

2014 

 

Gesture Phase Segmentation

Multivariate, Sequential, Time-Series 

Classification, Clustering 

Real 

9900 

50 

2014 

 

Hill-Valley

Sequential 

Classification 

Real 

606 

101 

2008 

 

Image Segmentation

Multivariate 

Classification 

Real 

2310 

19 

1990 

 

Japanese Vowels

Multivariate, Time-Series 

Classification 

Real 

640 

12 

 

 

KDD Cup 1998 Data

Multivariate 

Regression 

Categorical, Integer 

191779 

481 

1998 

 

Legal Case Reports

Text 

Classification 

 

 

 

2012 

 

Lenses

Multivariate 

Classification 

Categorical 

24 

1990 

 

Libras Movement

Multivariate, Sequential 

Classification, Clustering 

Real 

360 

91 

2009 

 

Madelon

Multivariate 

Classification 

Real 

4400 

500 

2008 

 

Meta-data

Multivariate 

Classification 

Categorical, Integer, Real 

528 

22 

1996 

 

MONK's Problems

Multivariate 

Classification 

Categorical 

432 

1992 

 

Movie

Multivariate, Relational 

 

 

10000 

 

1999 

 

News Aggregator

Multivariate 

Classification, Clustering 

 

422937 

2016 

 

NSF Research Award Abstracts 1990-2003

Text 

 

 

129000 

 

2003 

 

Pittsburgh Bridges

Multivariate 

Classification 

Categorical, Integer 

108 

13 

1990 

 

Prodigy

Domain-Theory 

 

 

 

 

 

 

Pseudo Periodic Synthetic Time Series

Univariate, Time-Series 

 

 

100000 

 

1999 

 

QSAR biodegradation

Multivariate 

Classification 

Integer, Real 

1055 

41 

2013 

 

Record Linkage Comparison Patterns

Multivariate 

Classification 

Real 

5749132 

12 

2011 

 

Reuters-21578 Text Categorization Collection

Text 

Classification 

Categorical 

21578 

1997 

 

seismic-bumps

Multivariate 

Classification 

Real 

2584 

19 

2013 

 

Sentence Classification

Text 

Classification 

Integer 

 

 

2014 

 

Sentiment Labelled Sentences

Text 

Classification 

 

3000 

 

2015 

 

Spoken Arabic Digit

Multivariate, Time-Series 

Classification 

Real 

8800 

13 

2010 

 

Statlog (Image Segmentation)

Multivariate 

Classification 

Real 

2310 

19 

1990 

 

Statlog (Vehicle Silhouettes)

Multivariate 

Classification 

Integer 

946 

18 

 

 

Statlog Project

 

 

 

 

 

1992 

 

StoneFlakes

Multivariate 

Classification, Clustering, Causal-Discovery 

Real 

79 

2014 

 

Synthetic Control Chart Time Series

Time-Series 

Classification, Clustering 

Real 

600 

 

1999 

 

Teaching Assistant Evaluation

Multivariate 

Classification 

Categorical, Integer 

151 

1997 

 

Tennis Major Tournament Match Statistics

Multivariate 

Classification, Regression, Clustering 

Integer, Real 

127 

42 

2014 

 

Trains

Multivariate 

Classification 

Categorical 

10 

32 

1994 

 

Turkiye Student Evaluation

Multivariate 

Classification, Clustering 

 

5820 

33 

2013 

 

Twenty Newsgroups

Text 

 

 

20000 

 

1999 

 

Undocumented

 

 

 

 

 

 

 

University

Multivariate 

Classification 

Categorical, Integer 

285 

17 

1988 

 

User Identification From Walking Activity

Univariate, Sequential, Time-Series 

Classification, Clustering 

Real 

 

 

2014 

 

USPTO Algorithm Challenge, run by NASA-Harvard Tournament Lab and TopCoder Problem: Pat

Domain-Theory 

Classification 

Integer 

306 

2013 

 

YearPredictionMSD

Multivariate 

Regression 

Real 

515345 

90 

2011 

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML