microblogPCU Data Set
Download: Data Folder, Data Set Description

Abstract: MicroblogPCU data is crawled from sina weibo microblog[[Web Link]]. This data can be used to study machine learning methods as well as do some social network research.

Data Set Characteristics:  

Multivariate, Univariate, Sequential, Text

Number of Instances:




Attribute Characteristics:

Integer, Real

Number of Attributes:


Date Donated


Associated Tasks:

Classification, Causal-Discovery

Missing Values?


Number of Web Hits:



Jun Liu(liukeen '@', Hao Chen(lechenhao '@' , Mengting Zhan, Jianhong Mi,Yanzhang Lv
MOEKLINNS Lab, Department of Computer Science ,Xi'an Jiaotong University, China

Data Set Information:

Our dataset is used by us to explore spammers in microblog and you can access our demo system at
[Web Link]

Please add :8080 after the domain name as port. The repository webpage fails to parse the weblink when it's added in the source. (under inspection)

Attribute Information:

weibo_user.csv has the following attributes:
-user_id: account ID in sina weibo;
-user_name: account nicknameï¼›
-gender:account registration gender including male, female and other;
-class:account level given by sina weibo;
-message:account registration location or other personal information;
-post_num: the number of posts of this account up to now;
-follower_num: the number of followers of this account;
-followee_num: the number of followee of this account;
-follow ratio: followee_num/follower_num;
-is_spammer: manually annotated label, 1 means spammer and -1 means non-spammer;
user_post.csv has the following attributes:
-post_id:user post ID given by sina weibo;
-post_time:the time when a post is posted;
-poster_id: the user ID who posted this post;
-repost_num:the number of retweet by others;
-commnet_num: the number of comment by others;
followe-followee.csv has the following attributes:
-follower: the nickname of follower;
-follower_id: the user ID of follower;
-followee: the nickname of followee;
-followee_id: the user ID of followee;
post.csv is almost the as user_post.csv and the post in it are retrievalled by a certain key word related to a topic;

-content: the post text(mostly in Chinese, please set your Microsoft Office to make it readable)

Relevant Papers:


Citation Request:

Thanks to MOEKLINNS Lab[[Web Link]] especially Spammer Detection Group for opening its data

