Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

Twenty Newsgroups Data Set
Download: Data Folder, Data Set Description

Abstract: This data set consists of 20000 messages taken from 20 newsgroups.

Data Set Characteristics:  

Text

Number of Instances:

20000

Area:

N/A

Attribute Characteristics:

N/A

Number of Attributes:

N/A

Date Donated

1999-09-09

Associated Tasks:

N/A

Missing Values?

No

Number of Web Hits:

52593


Source:

Original Owner and Donor:

Tom Mitchell
School of Computer Science
Carnegie Mellon University
tom.mitchell '@' cmu.edu
http://www.cs.cmu.edu/~tom/


Data Set Information:

N/A


Attribute Information:

N/A


Relevant Papers:

T. Mitchell. Machine Learning, McGraw Hill, 1997.

T. Joachims (1996). A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization, Computer Science Technical Report CMU-CS-96-118. Carnegie Mellon University.
[Web Link]



Citation Request:

You may use this material free of charge for any educational purpose, provided attribution is given in any lectures or publications that make use of this material.


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML