Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

Amazon Commerce reviews set Data Set
Download: Data Folder, Data Set Description

Abstract: The dataset is used for authorship identification in online Writeprint which is a new research field of pattern recognition.

Data Set Characteristics:  

Multivariate, Text, Domain-Theory

Number of Instances:




Attribute Characteristics:


Number of Attributes:


Date Donated


Associated Tasks:


Missing Values?


Number of Web Hits:



Dataset creator and donator: ZhiLiu, e-mail: liuzhi8673 '@', institution: National Engineering Research Center for E-Learning, Hubei Wuhan, China

Data Set Information:

dataset are derived from the customers’ reviews in Amazon Commerce Website for authorship identification. Most previous studies conducted the identification experiments for two to ten authors. But in the online context, reviews to be identified usually have more potential authors, and normally classification algorithms are not adapted to large number of target classes. To examine the robustness of clasification algorithms, we identified 50 of the most active users (represented by a unique ID and username) who frequently posted reviews in these newsgroups. The number of reviews we collected for each author is 30.

Attribute Information:

attribution includes authors' lingustic style such as usage of digit, punctuation, words and sentences' length and usage frequency of words and so on

Relevant Papers:

Sanya Liu, Zhi Liu, Jianwen Sun, Lin Liu, 'Application of Synergetic Neural Network in Online Writeprint Identification', JDCTA: International Journal of Digital Content Technology and its Applications, Vol. 5, No. 3, pp. 126 ~ 135, 2011
Jianwen Sun, Zongkai Yang, Pei Wang, Sanya Liu, 'Variable Length Character N-Gram Approach for Online Writeprint Identification,' mines, pp.486-490, 2010 International Conference on Multimedia Information Networking and Security, 2010

Citation Request:

Please refer to the Machine Learning Repository's citation policy

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML