Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

Browse Through:

Default Task

Classification (4)
Regression (0)
Clustering (0)
Other (0)

Attribute Type

Categorical (1)
Numerical (1)
Mixed (0)

Data Type - Undo

Multivariate (11)
Univariate (1)
Sequential (0)
Time-Series (4)
Text (4)
Domain-Theory (1)
Other (0)

Area

Life Sciences (0)
Physical Sciences (0)
CS / Engineering (2)
Social Sciences (1)
Business (0)
Game (0)
Other (1)

# Attributes - Undo

Less than 10 (4)
10 to 100 (2)
Greater than 100 (1)

# Instances - Undo

Less than 100 (0)
100 to 1000 (4)
Greater than 1000 (10)

Format Type - Undo

Matrix (3)
Non-Matrix (4)

4 Data Sets

Table View  List View


1. Turkish Spam V01: The TurkishSpam data set contains spam and normal emails written in Turkish.

2. Syskill and Webert Web Page Ratings: This database contains HTML source of web pages plus the ratings of a single user on these web pages. Web pages are on four seperate subjects (Bands- recording artists; Goats; Sheep; and BioMedical)

3. Russian Corpus of Biographical Texts: Sentence classification (Russian). The corpus contains Wikipedia texts splitted into sentences/ Each sentence has a topic label.

4. Labeled Text Forum Threads Dataset: The dataset is a collection of text forum threads with class labels reflects the reply quality to the Initial-Post, 3 for complete relevant, 2 for partially relevant, and 1 for irrelevant


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML