Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

Health News in Twitter Data Set
Download: Data Folder, Data Set Description

Abstract: The data was collected in 2015 using Twitter API. This dataset contains health news from more than 15 major health news agencies such as BBC, CNN, and NYT.

Data Set Characteristics:  

Text

Number of Instances:

58000

Area:

Computer

Attribute Characteristics:

Real

Number of Attributes:

25000

Date Donated

2018-02-19

Associated Tasks:

Clustering

Missing Values?

N/A

Number of Web Hits:

19263


Source:

Amir Karami
karami '@' sc.edu
University of South Carolina


Data Set Information:

Each file is related to one Twitter account of a news agency. For example, bbchealth.txt is related to BBC health news. Each line contains tweet id|date and time|tweet. The separator is '|'. This text data has been used to evaluate the performance of topic models on short text data. However, it can be used for other tasks such as clustering.


Attribute Information:

N/A


Relevant Papers:

Karami, A., Gangopadhyay, A., Zhou, B., & Kharrazi, H. (2017). Fuzzy approach topic discovery in health and medical corpora. International Journal of Fuzzy Systems, 1-12.



Citation Request:

Karami, A., Gangopadhyay, A., Zhou, B., & Kharrazi, H. (2017). Fuzzy approach topic discovery in health and medical corpora. International Journal of Fuzzy Systems, 1-12.


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML