microblogPCU
Donated on 3/16/2015
MicroblogPCU data is crawled from sina weibo microblog[http://weibo.com/]. This data can be used to study machine learning methods as well as do some social network research.
Dataset Characteristics
Multivariate, Univariate, Sequential, Text
Subject Area
Computer Science
Associated Tasks
Classification, Causal-Discovery
Feature Type
Integer, Real
# Instances
221579
# Features
20
Dataset Information
Additional Information
Our dataset is used by us to explore spammers in microblog and you can access our demo system at http://sd.skyclass.net/Spammer/dia.jsp Please add :8080 after the domain name as port. The repository webpage fails to parse the weblink when it's added in the source. (under inspection)
Has Missing Values?
Yes
Variable Information
weibo_user.csv has the following attributes: -user_id: account ID in sina weibo; -user_name: account nickname; -gender:account registration gender including male, female and other; -class:account level given by sina weibo; -message:account registration location or other personal information; -post_num: the number of posts of this account up to now; -follower_num: the number of followers of this account; -followee_num: the number of followee of this account; -follow ratio: followee_num/follower_num; -is_spammer: manually annotated label, 1 means spammer and -1 means non-spammer; user_post.csv has the following attributes: -post_id:user post ID given by sina weibo; -post_time:the time when a post is posted; -poster_id: the user ID who posted this post; -repost_num:the number of retweet by others; -commnet_num: the number of comment by others; followe-followee.csv has the following attributes: -follower: the nickname of follower; -follower_id: the user ID of follower; -followee: the nickname of followee; -followee_id: the user ID of followee; post.csv is almost the as user_post.csv and the post in it are retrievalled by a certain key word related to a topic; -content: the post text(mostly in Chinese, please set your Microsoft Office to make it readable)
Dataset Files
File | Size |
---|---|
microblogPCU/follower_followee.csv | 11.3 MB |
microblogPCU/user_post.csv | 5.8 MB |
microblogPCU/weibo_user.csv | 153.4 KB |
microblogPCU/post.csv | 9.4 KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset microblogpcu = fetch_ucirepo(id=323) # data (as pandas dataframes) X = microblogpcu.data.features y = microblogpcu.data.targets # metadata print(microblogpcu.metadata) # variable information print(microblogpcu.variables)
Chen, H., Zhan, M., Mi, J., Lv, Y., & Liu, J. (2015). microblogPCU [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5JG8Q.
Creators
Hao Chen
Mengting Zhan
Jianhong Mi
Yanzhang Lv
Jun Liu
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.