Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

Browse Through:

Default Task

Classification (8)
Regression (2)
Clustering (0)
Other (0)

Attribute Type

Categorical (0)
Numerical (7)
Mixed (0)

Data Type

Multivariate (6)
Univariate (1)
Sequential (1)
Time-Series (1)
Text (3)
Domain-Theory (2)
Other (0)


Life Sciences (0)
Physical Sciences (3)
CS / Engineering (4)
Social Sciences (0)
Business (1)
Game (0)
Other (1)

# Attributes - Undo

Less than 10 (14)
10 to 100 (21)
Greater than 100 (9)

# Instances

Less than 100 (0)
100 to 1000 (5)
Greater than 1000 (4)

Format Type - Undo

Matrix (41)
Non-Matrix (9)

9 Data Sets

Table View  List View

1. Hill-Valley: Each record represents 100 points on a two-dimensional graph. When plotted in order (from 1 through 100) as the Y co-ordinate, the points will create either a Hill (a “bump” in the terrain) or a Valley (a “dip” in the terrain).

2. Low Resolution Spectrometer: From IRAS data -- NASA Ames Research Center

3. Urban Land Cover: Classification of urban land cover using high resolution aerial imagery. Intended to assist sustainable urban planning efforts.

4. Northix: Northix is designed to be a schema matching benchmark problem for data integration of two entity relationship databases.

5. NoisyOffice: Corpus intended to do cleaning (or binarization) and enhancement of noisy grayscale printed text images using supervised learning methods. Noisy images and their corresponding ground truth provided.

6. Relative location of CT slices on axial axis: The dataset consists of 384 features extracted from CT images. The class variable is numeric and denotes the relative location of the CT slice on the axial axis of the human body.

7. Amazon Commerce reviews set: The dataset is used for authorship identification in online Writeprint which is a new research field of pattern recognition.

8. Farm Ads: This data was collected from text ads found on twelve websites that deal with various farm animal related topics. The binary labels are based on whether or not the content owner approves of the ad.

9. Gas sensor arrays in open sampling settings: The dataset contains 18000 time-series recordings from a chemical detection platform at six different locations in a wind tunnel facility in response to ten high-priority chemical gaseous substances

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML