Browse Datasets
Sort by # Views, desc
Iris
A small classic dataset from Fisher, 1936. One of the earliest known datasets used for evaluating classification methods.
Heart Disease
4 databases: Cleveland, Hungary, Switzerland, and the VA Long Beach
Wine Quality
Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests (see [Cortez et al., 2009], http://www3.dsi.uminho.pt/pcortez/wine/).
Adult
Predict whether annual income of an individual exceeds $50K/yr based on census data. Also known as "Census Income" dataset.
Breast Cancer Wisconsin (Diagnostic)
Diagnostic Wisconsin Breast Cancer Database.
Bank Marketing
The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).
Wine
Using chemical analysis to determine the origin of wines
Student Performance
Predict student performance in secondary education (high school).
Online Retail
This is a transactional data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.
Car Evaluation
Derived from simple hierarchical decision model, this database may be useful for testing constructive induction and structure discovery methods.
Predict Students' Dropout and Academic Success
A dataset created from a higher education institution (acquired from several disjoint databases) related to students enrolled in different undergraduate degrees, such as agronomy, design, education, nursing, journalism, management, social service, and technologies. The dataset includes information known at the time of student enrollment (academic path, demographics, and social-economic factors) and the students' academic performance at the end of the first and second semesters. The data is used to build classification models to predict students' dropout and academic sucess. The problem is formulated as a three category classification task, in which there is a strong imbalance towards one of the classes.
Diabetes
This diabetes dataset is from AIM '94
Automobile
From 1985 Ward's Automotive Yearbook
Air Quality
Contains the responses of a gas multisensor device deployed on the field in an Italian city. Hourly responses averages are recorded along with gas concentrations references from a certified analyzer.
Abalone
Predict the age of abalone from physical measurements
0 to 15 of 674