MicroblogPCU data is crawled from sina weibo microblog[http://weibo.com/]. This data can be used to study machine learning methods as well as do some social network research.

Dataset Characteristics

Multivariate, Univariate, Sequential, Text

Subject Area

Computer Science

Associated Tasks

Classification, Causal-Discovery

Feature Type

Integer, Real

# Instances


# Features


Dataset Information

Additional Information

Our dataset is used by us to explore spammers in microblog and you can access our demo system at http://sd.skyclass.net/Spammer/dia.jsp Please add :8080 after the domain name as port. The repository webpage fails to parse the weblink when it's added in the source. (under inspection)

Has Missing Values?


Variable Information

weibo_user.csv has the following attributes: -user_id: account ID in sina weibo; -user_name: account nickname; -gender:account registration gender including male, female and other; -class:account level given by sina weibo; -message:account registration location or other personal information; -post_num: the number of posts of this account up to now; -follower_num: the number of followers of this account; -followee_num: the number of followee of this account; -follow ratio: followee_num/follower_num; -is_spammer: manually annotated label, 1 means spammer and -1 means non-spammer; user_post.csv has the following attributes: -post_id:user post ID given by sina weibo; -post_time:the time when a post is posted; -poster_id: the user ID who posted this post; -repost_num:the number of retweet by others; -commnet_num: the number of comment by others; followe-followee.csv has the following attributes: -follower: the nickname of follower; -follower_id: the user ID of follower; -followee: the nickname of followee; -followee_id: the user ID of followee; post.csv is almost the as user_post.csv and the post in it are retrievalled by a certain key word related to a topic; -content: the post text(mostly in Chinese, please set your Microsoft Office to make it readable)

Dataset Files

microblogPCU/follower_followee.csv11.3 MB
microblogPCU/user_post.csv5.8 MB
microblogPCU/weibo_user.csv153.4 KB
microblogPCU/post.csv9.4 KB


Hao Chen

Mengting Zhan

Jianhong Mi

Yanzhang Lv

Jun Liu


