Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

microblogPCU Data Set
Download: Data Folder, Data Set Description

Abstract: MicroblogPCU data is crawled from sina weibo microblog[[Web Link]]. This data can be used to study machine learning methods as well as do some social network research.

Data Set Characteristics:  

Multivariate, Univariate, Sequential, Text

Number of Instances:




Attribute Characteristics:

Integer, Real

Number of Attributes:


Date Donated


Associated Tasks:

Classification, Causal-Discovery

Missing Values?


Number of Web Hits:



Jun Liu(liukeen '@', Hao Chen(lechenhao '@' , Mengting Zhan, Jianhong Mi,Yanzhang Lv
MOEKLINNS Lab, Department of Computer Science ,Xi'an Jiaotong University, China

Data Set Information:

Our dataset is used by us to explore spammers in microblog and you can access our demo system at
[Web Link]

Please add :8080 after the domain name as port. The repository webpage fails to parse the weblink when it's added in the source. (under inspection)

Attribute Information:

weibo_user.csv has the following attributes:
-user_id: account ID in sina weibo;
-user_name: account nicknameï¼›
-gender:account registration gender including male, female and other;
-class:account level given by sina weibo;
-message:account registration location or other personal information;
-post_num: the number of posts of this account up to now;
-follower_num: the number of followers of this account;
-followee_num: the number of followee of this account;
-follow ratio: followee_num/follower_num;
-is_spammer: manually annotated label, 1 means spammer and -1 means non-spammer;
user_post.csv has the following attributes:
-post_id:user post ID given by sina weibo;
-post_time:the time when a post is posted;
-poster_id: the user ID who posted this post;
-repost_num:the number of retweet by others;
-commnet_num: the number of comment by others;
followe-followee.csv has the following attributes:
-follower: the nickname of follower;
-follower_id: the user ID of follower;
-followee: the nickname of followee;
-followee_id: the user ID of followee;
post.csv is almost the as user_post.csv and the post in it are retrievalled by a certain key word related to a topic;

-content: the post text(mostly in Chinese, please set your Microsoft Office to make it readable)

Relevant Papers:


Citation Request:

Thanks to MOEKLINNS Lab[[Web Link]] especially Spammer Detection Group for opening its data

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML