Amazon Commerce Reviews
Donated on 6/10/2011
The dataset is used for authorship identification in online Writeprint which is a new research field of pattern recognition.
Dataset Characteristics
Multivariate, Text, Domain-Theory
Subject Area
Other
Associated Tasks
Classification
Feature Type
Real
# Instances
1500
# Features
10000
Dataset Information
Additional Information
dataset are derived from the customers’ reviews in Amazon Commerce Website for authorship identification. Most previous studies conducted the identification experiments for two to ten authors. But in the online context, reviews to be identified usually have more potential authors, and normally classification algorithms are not adapted to large number of target classes. To examine the robustness of clasification algorithms, we identified 50 of the most active users (represented by a unique ID and username) who frequently posted reviews in these newsgroups. The number of reviews we collected for each author is 30.
Has Missing Values?
No
Variable Information
attribution includes authors' lingustic style such as usage of digit, punctuation, words and sentences' length and usage frequency of words and so on
Dataset Files
File | Size |
---|---|
Amazon_initial_50_30_10000.rar | 2.1 MB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset amazon_commerce_reviews = fetch_ucirepo(id=215) # data (as pandas dataframes) X = amazon_commerce_reviews.data.features y = amazon_commerce_reviews.data.targets # metadata print(amazon_commerce_reviews.metadata) # variable information print(amazon_commerce_reviews.variables)
Liu, Z. (2011). Amazon Commerce Reviews [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C55C88.
Creators
Zhi Liu
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.