Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

Twenty Newsgroups Data Set
Download: Data Folder, Data Set Description

Abstract: This data set consists of 20000 messages taken from 20 newsgroups.

Data Set Characteristics:  

Text

Number of Instances:

20000

Area:

N/A

Attribute Characteristics:

N/A

Number of Attributes:

N/A

Date Donated

1999-09-09

Associated Tasks:

N/A

Missing Values?

No

Number of Web Hits:

129054


Source:

Original Owner and Donor:

Tom Mitchell
School of Computer Science
Carnegie Mellon University
tom.mitchell '@' cmu.edu
http://www.cs.cmu.edu/~tom/


Data Set Information:

N/A


Attribute Information:

N/A


Relevant Papers:

T. Mitchell. Machine Learning, McGraw Hill, 1997.

T. Joachims (1996). A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization, Computer Science Technical Report CMU-CS-96-118. Carnegie Mellon University.
[Web Link]



Citation Request:

You may use this material free of charge for any educational purpose, provided attribution is given in any lectures or publications that make use of this material.


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML