News Popularity in Multiple Social Media Platforms
Donated on 2/19/2018
Large data set of news items and their respective social feedback on multiple platforms: Facebook, Google+ and LinkedIn.
Dataset Characteristics
Multivariate, Time-Series, Text
Subject Area
Computer Science
Associated Tasks
Regression
Feature Type
Integer, Real
# Instances
93239
# Features
-
Dataset Information
Additional Information
This is a large data set of news items and their respective social feedback on multiple platforms: Facebook, Google+ and LinkedIn. The collected data relates to a period of 8 months, between November 2015 and July 2016, accounting for about 100,000 news items on four different topics: economy, microsoft, obama and palestine. This data set is tailored for evaluative comparisons in predictive analytics tasks, although allowing for tasks in other research areas such as topic detection and tracking, sentiment analysis in short text, first story detection or news recommendation. Further details on the process of building the data set are provided in the article mentioned in the 'Relevant Papers' section. An .R file is provided to provide a simple introduction to handling the data set.
Has Missing Values?
No
Introductory Paper
By Nuno Moniz, Luís Torgo. 2018
Published in arXiv.org
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no |
0 to 10 of 11
Additional Variable Information
####################### # VARIABLES OF NEWS DATA # ####################### IDLink (numeric): Unique identifier of news items Title (string): Title of the news item according to the official media sources Headline (string): Headline of the news item according to the official media sources Source (string): Original news outlet that published the news item Topic (string): Query topic used to obtain the items in the official media sources PublishDate (timestamp): Date and time of the news items' publication SentimentTitle (numeric): Sentiment score of the text in the news items' title SentimentHeadline (numeric): Sentiment score of the text in the news items' headline Facebook (numeric): Final value of the news items' popularity according to the social media source Facebook GooglePlus (numeric): Final value of the news items' popularity according to the social media source Google+ LinkedIn (numeric): Final value of the news items' popularity according to the social media source LinkedIn ################################# # VARIABLES OF SOCIAL FEEDBACK DATA # ################################# IDLink (numeric): Unique identifier of news items TS1 (numeric): Level of popularity in time slice 1 (0-20 minutes upon publication) TS2 (numeric): Level of popularity in time slice 2 (20-40 minutes upon publication) TS... (numeric): Level of popularity in time slice ... TS144 (numeric): Final level of popularity after 2 days upon publication
Dataset Files
File | Size |
---|---|
Data/News_Final.csv | 28.9 MB |
Data/Facebook_Obama.csv | 11.3 MB |
Data/LinkedIn_Economy.csv | 10.4 MB |
Data/Facebook_Economy.csv | 10.3 MB |
Data/GooglePlus_Economy.csv | 9.9 MB |
0 to 5 of 17
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset news_popularity_in_multiple_social_media_platforms = fetch_ucirepo(id=432) # data (as pandas dataframes) X = news_popularity_in_multiple_social_media_platforms.data.features y = news_popularity_in_multiple_social_media_platforms.data.targets # metadata print(news_popularity_in_multiple_social_media_platforms.metadata) # variable information print(news_popularity_in_multiple_social_media_platforms.variables)
Torgo, L. & Moniz, N. (2018). News Popularity in Multiple Social Media Platforms [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5H029.
Creators
Lus Torgo
Nuno Moniz
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.