Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

Welcome to the UC Irvine Machine Learning Repository!

We currently maintain 557 data sets as a service to the machine learning community. You may view all data sets through our searchable interface. For a general overview of the Repository, please visit our About page. For information about citing data sets in publications, please read our citation policy. If you wish to donate a data set, please consult our donation policy. For any other questions, feel free to contact the Repository librarians.

Supported By:

In Collaboration With:

Latest News:
09-24-2018: Welcome to the new Repository admins Dheeru Dua and Efi Karra Taniskidou!
04-04-2013: Welcome to the new Repository admins Kevin Bache and Moshe Lichman!
03-01-2010: Note from donor regarding Netflix data
10-16-2009: Two new data sets have been added.
09-14-2009: Several data sets have been added.
03-24-2008: New data sets have been added!
06-25-2007: Two new data sets have been added: UJI Pen Characters, MAGIC Gamma Telescope


Featured Data Set:  Madelon

Task: Classification
Data Type: Multivariate
# Attributes: 500
# Instances: 4400

MADELON is an artificial dataset, which was part of the NIPS 2003 feature selection challenge. This is a two-class classification problem with continuous input variables. The difficulty is that the problem is multivariate and highly non-linear.
Newest Data Sets:
07-22-2020:
 Facebook Large Page-Page Network
07-17-2020:
 Amphibians
07-12-2020:
 Early stage diabetes risk prediction dataset.
06-28-2020:
 Taiwanese Bankruptcy Prediction
06-20-2020:
 South German Credit (UPDATE)
06-17-2020:
 BitcoinHeistRansomwareAddressDataset
06-16-2020:
 Crop mapping using fused optical-radar data set
06-16-2020:
 Swarm Behaviour
06-15-2020:
 selfBACK
06-10-2020:
 HCV data
06-09-2020:
 IIWA14-R820-Gazebo-Dataset-10Trajectories
06-05-2020:
 Guitar Chords finger positions
Most Popular Data Sets (hits since 2007):
3536399:
 Iris
1924703:
 Adult
1485077:
 Wine
1326870:
 Breast Cancer Wisconsin (Diagnostic)
1310207:
 Heart Disease
1304936:
 Wine Quality
1276094:
 Bank Marketing
1238732:
 Car Evaluation
1033176:
 Human Activity Recognition Using Smartphones
976384:
 Abalone
916546:
 Forest Fires
767021:
 Student Performance

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML