Browse Datasets

Iris

A small classic dataset from Fisher, 1936. One of the earliest known datasets used for evaluating classification methods.

Heart Disease

4 databases: Cleveland, Hungary, Switzerland, and the VA Long Beach

Wine Quality

Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests (see [Cortez et al., 2009], http://www3.dsi.uminho.pt/pcortez/wine/).

Breast Cancer Wisconsin (Diagnostic)

Diagnostic Wisconsin Breast Cancer Database.

Adult

Predict whether annual income of an individual exceeds $50K/yr based on census data. Also known as "Census Income" dataset.

Bank Marketing

The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).

Student Performance

Predict student performance in secondary education (high school).

Wine

Using chemical analysis to determine the origin of wines

Online Retail

This is a transactional data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.

Car Evaluation

Derived from simple hierarchical decision model, this database may be useful for testing constructive induction and structure discovery methods.

Predict Students' Dropout and Academic Success

A dataset created from a higher education institution (acquired from several disjoint databases) related to students enrolled in different undergraduate degrees, such as agronomy, design, education, nursing, journalism, management, social service, and technologies. The dataset includes information known at the time of student enrollment (academic path, demographics, and social-economic factors) and the students' academic performance at the end of the first and second semesters. The data is used to build classification models to predict students' dropout and academic sucess. The problem is formulated as a three category classification task, in which there is a strong imbalance towards one of the classes.

Diabetes

This diabetes dataset is from AIM '94

Air Quality

Contains the responses of a gas multisensor device deployed on the field in an Italian city. Hourly responses averages are recorded along with gas concentrations references from a certified analyzer.

Automobile

From 1985 Ward's Automotive Yearbook

Mushroom

From Audobon Society Field Guide; mushrooms described in terms of physical characteristics; classification: poisonous or edible

Abalone

Predict the age of abalone from physical measurements

Estimation of Obesity Levels Based On Eating Habits and Physical Condition

This dataset include data for the estimation of obesity levels in individuals from the countries of Mexico, Peru and Colombia, based on their eating habits and physical condition.

Default of Credit Card Clients

This research aimed at the case of customers' default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods.

Human Activity Recognition Using Smartphones

Human Activity Recognition database built from the recordings of 30 subjects performing activities of daily living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors.

Statlog (German Credit Data)

This dataset classifies people described by a set of attributes as good or bad credit risks. Comes in two formats (one all numeric). Also comes with a cost matrix

Bike Sharing

This dataset contains the hourly and daily count of rental bikes between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information.

Auto MPG

Revised from CMU StatLib library, data concerns city-cycle fuel consumption

Spambase

Classifying Email as Spam or Non-Spam

Individual Household Electric Power Consumption

Measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. Different electrical quantities and some sub-metering values are available.

CDC Diabetes Health Indicators

The Diabetes Health Indicators Dataset contains healthcare statistics and lifestyle survey information about people in general along with their diagnosis of diabetes. The 35 features consist of some demographics, lab test results, and answers to survey questions for each patient. The target variable for classification is whether a patient has diabetes, is pre-diabetic, or healthy.

0 to 25 of 677

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy