
Health News in Twitter
Donated on 2/18/2018
The data was collected in 2015 using Twitter API. This dataset contains health news from more than 15 major health news agencies such as BBC, CNN, and NYT.
Dataset Characteristics
Text
Subject Area
Computer Science
Associated Tasks
Clustering
Feature Type
Real
# Instances
58000
# Features
25000
Dataset Information
Additional Information
Each file is related to one Twitter account of a news agency. For example, bbchealth.txt is related to BBC health news. Each line contains tweet id|date and time|tweet. The separator is '|'. This text data has been used to evaluate the performance of topic models on short text data. However, it can be used for other tasks such as clustering.
Has Missing Values?
No
Dataset Files
| File | Size | 
|---|---|
| Health-Tweets/goodhealth.txt | 1.2 MB | 
| Health-Tweets/nytimeshealth.txt | 880.2 KB | 
| Health-Tweets/cbchealth.txt | 663.6 KB | 
| Health-Tweets/cnnhealth.txt | 637.8 KB | 
| Health-Tweets/reuters_health.txt | 633.9 KB | 
0 to 5 of 33
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset health_news_in_twitter = fetch_ucirepo(id=438) # data (as pandas dataframes) X = health_news_in_twitter.data.features y = health_news_in_twitter.data.targets # metadata print(health_news_in_twitter.metadata) # variable information print(health_news_in_twitter.variables)
Karami, A. (2017). Health News in Twitter [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5BW2Q.
Creators
Amir Karami
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.