Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

News Popularity in Multiple Social Media Platforms Data Set
Download: Data Folder, Data Set Description

Abstract: Large data set of news items and their respective social feedback on multiple platforms: Facebook, Google+ and LinkedIn.

Data Set Characteristics:  

Multivariate, Time-Series, Text

Number of Instances:

93239

Area:

Computer

Attribute Characteristics:

Integer, Real

Number of Attributes:

11

Date Donated

2018-02-20

Associated Tasks:

Regression

Missing Values?

N/A

Number of Web Hits:

27096


Source:

Nuno Moniz
LIAAD - INESC Tec; Sciences College, University of Porto
Email: nmmoniz '@' inesctec.pt

Luís Torgo
LIAAD - INESC Tec; Sciences College, University of Porto
Email: ltorgo '@' dcc.fc.up.pt


Data Set Information:

This is a large data set of news items and their respective social feedback on multiple platforms: Facebook, Google+ and LinkedIn.
The collected data relates to a period of 8 months, between November 2015 and July 2016, accounting for about 100,000 news items on four different topics: economy, microsoft, obama and palestine.
This data set is tailored for evaluative comparisons in predictive analytics tasks, although allowing for tasks in other research areas such as topic detection and tracking, sentiment analysis in short text, first story detection or news recommendation.

Further details on the process of building the data set are provided in the article mentioned in the 'Relevant Papers' section.

An .R file is provided to provide a simple introduction to handling the data set.


Attribute Information:

#######################
# VARIABLES OF NEWS DATA #
#######################

IDLink (numeric): Unique identifier of news items
Title (string): Title of the news item according to the official media sources
Headline (string): Headline of the news item according to the official media sources
Source (string): Original news outlet that published the news item
Topic (string): Query topic used to obtain the items in the official media sources
PublishDate (timestamp): Date and time of the news items' publication
SentimentTitle (numeric): Sentiment score of the text in the news items' title
SentimentHeadline (numeric): Sentiment score of the text in the news items' headline
Facebook (numeric): Final value of the news items' popularity according to the social media source Facebook
GooglePlus (numeric): Final value of the news items' popularity according to the social media source Google+
LinkedIn (numeric): Final value of the news items' popularity according to the social media source LinkedIn

#################################
# VARIABLES OF SOCIAL FEEDBACK DATA #
#################################

IDLink (numeric): Unique identifier of news items
TS1 (numeric): Level of popularity in time slice 1 (0-20 minutes upon publication)
TS2 (numeric): Level of popularity in time slice 2 (20-40 minutes upon publication)
TS... (numeric): Level of popularity in time slice ...
TS144 (numeric): Final level of popularity after 2 days upon publication


Relevant Papers:

Nuno Moniz and Luís Torgo (2018), “Multi-Source Social Feedback of Online News Feeds”, CoRR, [Web Link]



Citation Request:

When using this data set, please cite the following article.

Nuno Moniz and Luís Torgo (2018), “Multi-Source Social Feedback of Online News Feeds”, CoRR, [Web Link]

@Article{Moniz2018,
title = {Multi-Source Social Feedback of Online News Feeds},
author = {Nuno Moniz and Lu’is Torgo},
year = {2018},
ee = {[Web Link]},
volume = {[Web Link]},
journal = {CoRR},
}


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML