Welcome to the UC Irvine Machine Learning Repository
We currently maintain 644 datasets as a service to the machine learning community. Here, you can donate and find datasets used by millions of people all around the world!
Popular Datasets
Iris
A small classic dataset from Fisher, 1936. One of the earliest known datasets used for evaluating classification methods.
Heart Disease
4 databases: Cleveland, Hungary, Switzerland, and the VA Long Beach
Adult
Predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset.
Dry Bean Dataset
Images of 13,611 grains of 7 different registered dry beans were taken with a high-resolution camera. A total of 16 features; 12 dimensions and 4 shape forms, were obtained from the grains.
Diabetes
This diabetes dataset is from AIM '94
Wine
Using chemical analysis to determine the origin of wines
New Datasets
9mers from cullpdb
The dataset consists of protein fragments of length nine, called 9mers, derived from 3,733 proteins selected by cullpdb [1]. All proteins have 1) resolution less than 1.6 angstrom, 2) R-factor less than 0.25, 3) sequence identity below 20%. In addition, all proteins with identity above 20% to CASP13 targets are removed. All torsion angle-pairs are in the allowed region of the Ramachandran plot (fragments containing outliers were detected by the Ramalyze function of the crystallography software PHENIX [1] and removed). The dataset has ~158,000 entries randomly split into train, test, and validation sets with a 60/20/20 split.
Room Occupancy Estimation
Data set for estimating the precise number of occupants in a room using multiple non-intrusive environmental sensors like temperature, light, sound, CO2 and PIR.
Image Recognition Task Execution Times in Mobile Edge Computing
Recorded task execution times for four Edge Servers submitted by edge node; node sends images to servers for image recognition tasks. The servers perform the tasks and return the results to nodes.
Rocket League Skillshots
This dataset contains data of players of the game Rocket League, performing different skillshots.
TUANDROMD (Tezpur University Android Malware Dataset)
TUNADROMD dataset contains 4465 instances and 241 attributes. The target attribute for classification is a category (malware vs goodware). (N.B. This is the preprocessed version of TUANDROMD)
REJAFADA
REJAFADA (Retrieval of Jar Files Applied to Dynamic Analysis) aims to be used, as benchmark, to check the quality of the detection of Jar malware.

