Welcome to the UC Irvine Machine Learning Repository

We currently maintain 644 datasets as a service to the machine learning community. Here, you can donate and find datasets used by millions of people all around the world!

Popular Datasets

Iris

A small classic dataset from Fisher, 1936. One of the earliest known datasets used for evaluating classification methods.

Heart Disease

4 databases: Cleveland, Hungary, Switzerland, and the VA Long Beach

Adult

Predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset.

Dry Bean Dataset

Images of 13,611 grains of 7 different registered dry beans were taken with a high-resolution camera. A total of 16 features; 12 dimensions and 4 shape forms, were obtained from the grains.

Diabetes

This diabetes dataset is from AIM '94

Wine

Using chemical analysis to determine the origin of wines

See More Popular Datasets

New Datasets

9mers from cullpdb

The dataset consists of protein fragments of length nine, called 9mers, derived from 3,733 proteins selected by cullpdb [1]. All proteins have 1) resolution less than 1.6 angstrom, 2) R-factor less than 0.25, 3) sequence identity below 20%. In addition, all proteins with identity above 20% to CASP13 targets are removed. All torsion angle-pairs are in the allowed region of the Ramachandran plot (fragments containing outliers were detected by the Ramalyze function of the crystallography software PHENIX [1] and removed). The dataset has ~158,000 entries randomly split into train, test, and validation sets with a 60/20/20 split.

Room Occupancy Estimation

Data set for estimating the precise number of occupants in a room using multiple non-intrusive environmental sensors like temperature, light, sound, CO2 and PIR.

Image Recognition Task Execution Times in Mobile Edge Computing

Recorded task execution times for four Edge Servers submitted by edge node; node sends images to servers for image recognition tasks. The servers perform the tasks and return the results to nodes.

Rocket League Skillshots

This dataset contains data of players of the game Rocket League, performing different skillshots.

TUANDROMD (Tezpur University Android Malware Dataset)

TUNADROMD dataset contains 4465 instances and 241 attributes. The target attribute for classification is a category (malware vs goodware). (N.B. This is the preprocessed version of TUANDROMD)

REJAFADA

REJAFADA (Retrieval of Jar Files Applied to Dynamic Analysis) aims to be used, as benchmark, to check the quality of the detection of Jar malware.

See More New Datasets

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy