Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

Influenza outbreak event prediction via Twitter data Data Set
Download: Data Folder, Data Set Description

Abstract: By identifying influenza-related tweets, the goal is to forecast the spatiotemporal patterns of influenza outbreaks for different locations and dates.

Data Set Characteristics:  


Number of Instances:




Attribute Characteristics:

Integer, Real

Number of Attributes:


Date Donated


Associated Tasks:


Missing Values?


Number of Web Hits:



Liang Zhao, liang.zhao '@', Emory University

Data Set Information:

The data is from the United States. The data comes from different states under different weeks. For each week, the task is to predict whether or not there is an influenza outbreak on the next date. More specifically, for influenza activity, there are four levels of flu activities from minimal to high according to CDC Flu Activity Map. An influenza outbreak occurrence is indicated if the activity level is high.

The input of the prediction task is the set of the keyword counts for all the tweets in a state in a week. The output is the occurrence of influenza outbreak for the specific state in the next week, which is zero if no event in the next week; or one, otherwise. Here are the briefs of all the variables:

'flu_locations': a list of states.
'flu_keywords': keyword list.
'flu_X_*': input data for all the locations and all the weeks.
'flu_Y_*': output data for all the locations and all the weeks.

Attribute Information:

525 keywords specified in the variable 'flu_keywords' in the data

Relevant Papers:

Liang Zhao, Jiangzhuo Chen, Feng Chen, Wei Wang, Chang-Tien Lu, and Naren Ramakrishnan. 'SimNest: Social Media Nested Epidemic Simulation via Online Semi-supervised Deep Learning.' in Proceedings of the IEEE International Conference on Data Mining (ICDM 2015), regular paper (acceptance rate: 8.4%), Atlantic City, NJ, pp. 639-648, Nov 2015.

Liang Zhao. Event Prediction in the Big Data Era: A Systematic Survey. arXiv preprint [Web Link].

Citation Request:

Please cite our paper when using the dataset.

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML