
Amazon Commerce reviews set
Donated on 6/10/2011
The dataset is used for authorship identification in online Writeprint which is a new research field of pattern recognition.
Dataset Characteristics
Multivariate, Text, Domain-Theory
Subject Area
Other
Associated Tasks
Classification
Feature Type
Real
# Instances
1500
# Features
10000
Dataset Information
Additional Information
dataset are derived from the customers’ reviews in Amazon Commerce Website for authorship identification. Most previous studies conducted the identification experiments for two to ten authors. But in the online context, reviews to be identified usually have more potential authors, and normally classification algorithms are not adapted to large number of target classes. To examine the robustness of clasification algorithms, we identified 50 of the most active users (represented by a unique ID and username) who frequently posted reviews in these newsgroups. The number of reviews we collected for each author is 30.
Has Missing Values?
No
Variable Information
attribution includes authors' lingustic style such as usage of digit, punctuation, words and sentences' length and usage frequency of words and so on
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset amazon_commerce_reviews_set = fetch_ucirepo(id=215) # data (as pandas dataframes) X = amazon_commerce_reviews_set.data.features y = amazon_commerce_reviews_set.data.targets # metadata print(amazon_commerce_reviews_set.metadata) # variable information print(amazon_commerce_reviews_set.variables)
Liu,Zhi. (2011). Amazon Commerce reviews set. UCI Machine Learning Repository. https://doi.org/10.24432/C55C88.
@misc{misc_amazon_commerce_reviews_set_215, author = {Liu,Zhi}, title = {{Amazon Commerce reviews set}}, year = {2011}, howpublished = {UCI Machine Learning Repository}, note = {{DOI}: https://doi.org/10.24432/C55C88} }
Creators
Zhi Liu
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.