
Twenty Newsgroups
Donated on 9/8/1999
This data set consists of 20000 messages taken from 20 newsgroups.
Dataset Characteristics
Text
Subject Area
Other
Associated Tasks
-
Feature Type
-
# Instances
20000
# Features
-
Dataset Information
Has Missing Values?
No
Dataset Files
| File | Size |
|---|---|
| 20_newsgroups.tar.gz | 16.5 MB |
| mini_newsgroups.tar.gz | 1.8 MB |
| 20newsgroups.data.html | 4.3 KB |
| 20newsgroups.html | 895 Bytes |
Download (18.3 MB)
Install the ucimlrepo package
pip install ucimlrepo
Import the dataset into your code
View the full documentationfrom ucimlrepo import fetch_ucirepo # fetch dataset twenty_newsgroups = fetch_ucirepo(id=113) # data (as pandas dataframes) X = twenty_newsgroups.data.features y = twenty_newsgroups.data.targets # metadata print(twenty_newsgroups.metadata) # variable information print(twenty_newsgroups.variables)
0 citations
13905 views
Citation
Mitchell, T. (1997). Twenty Newsgroups [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5C323.
Style:
Creators
Tom Mitchell
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.