Default Task

Classification (8)
Regression (1)
Clustering (1)
Other (3)

Attribute Type

Categorical (5)
Numerical (3)
Mixed (3)

Data Type

Multivariate (11)
Univariate (0)
Sequential (1)
Time-Series (1)
Text (2)
Domain-Theory (0)
Other (0)


Life Sciences (4)
Physical Sciences (2)
CS / Engineering (3)
Social Sciences (2)
Business (0)
Game (0)
Other (2)

# Attributes

Less than 10 (5)
10 to 100 (6)
Greater than 100 (1)

# Instances - Undo

Less than 100 (13)
100 to 1000 (109)
Greater than 1000 (132)

Format Type

Matrix (10)
Non-Matrix (3)

13 Data Sets

Table View  List View

1. Opinosis Opinion ⁄ Review: This dataset contains sentences extracted from user reviews on a given topic. Example topics are “performance of Toyota Camry” and “sound quality of ipod nano”.

2. Balloons: Data previously used in cognitive psychology experiment; 4 data sets represent different conditions of an experiment

3. Lenses: Database for fitting contact lenses

4. Challenger USA Space Shuttle O-Ring: Task: predict the number of O-rings that experience thermal distress on a flight at 31 degrees F given data on the previous 23 shuttle flights

5. Shuttle Landing Control: Tiny database; all nominal values

6. Post-Operative Patient: Dataset of patient features

7. Labor Relations: From Collective Bargaining Review

8. Trains: 2 data formats (structured, one-instance-per-line)

9. Predict keywords activities in a online social media: The data from Twitter was collected during 360 consecutive days. It was done by querying 1497 English keywords sampled from Wikipedia. This dataset is proposed in a Learning to rank setting.

10. Soybean (Small): Michalski's famous soybean disease database

11. Sponge: Data on sponges; Attributes in spanish

12. Lung Cancer: Lung cancer data; no attribute definitions

13. DBWorld e-mails: It contains 64 e-mails which I have manually collected from DBWorld mailing list. They are classified in: 'announces of conferences' and 'everything else'.

