Twenty Newsgroups
Donated on 9/8/1999
This data set consists of 20000 messages taken from 20 newsgroups.
Dataset Characteristics
Text
Subject Area
Other
Associated Tasks
-
Feature Type
-
# Instances
20000
# Features
-
Dataset Information
Has Missing Values?
No
Dataset Files
File | Size |
---|---|
20_newsgroups.tar.gz | 16.5 MB |
mini_newsgroups.tar.gz | 1.8 MB |
20newsgroups.data.html | 4.3 KB |
20newsgroups.html | 895 Bytes |
Reviews
There are no reviews for this dataset yet.
Download (18.3 MB)
Install the ucimlrepo package
pip install ucimlrepo
Import the dataset into your code
View the full documentationfrom ucimlrepo import fetch_ucirepo # fetch dataset twenty_newsgroups = fetch_ucirepo(id=113) # data (as pandas dataframes) X = twenty_newsgroups.data.features y = twenty_newsgroups.data.targets # metadata print(twenty_newsgroups.metadata) # variable information print(twenty_newsgroups.variables)
0 citations
8160 views
Citation
Mitchell, T. (1997). Twenty Newsgroups [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5C323.
Style:
Creators
Tom Mitchell
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.