Product Classification and Clustering
Donated on 8/6/2023
This dataset was collected from PriceRunner, a popular product comparison platform. It includes 35311 product offers from 10 categories, provided by 306 different merchants. This dataset offers an ideal ground for evaluating classification, clustering, and entity matching algorithms. Although it contains product-related data, it can still be applied to any problem involving text/short-text mining.
Dataset Characteristics
Tabular, Text
Subject Area
Business
Associated Tasks
Classification, Clustering, Other
Feature Type
Categorical, Integer
# Instances
35311
# Features
7
Dataset Information
For what purpose was the dataset created?
Product classification, clustering and entity matching. Short-text clustering algorithms.
Who funded the creation of the dataset?
No funding
What do the instances in this dataset represent?
product offers by various merchants
Are there recommended data splits?
no
Does the dataset contain data that might be considered sensitive in any way?
no
Was there any data preprocessing performed?
Case folding and punctuation removal were applied to the titles of column 2.
Has Missing Values?
No
Introductory Paper
By Leonidas Akritidis, Athanasios Fevgas, Panayiotis Bozanis, C. Makris. 2020
Published in Artificial Intelligence Review
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
Product ID | Feature | Integer | no | ||
Product Title | Feature | Categorical | no | ||
Merchant ID | Feature | Integer | no | ||
Cluster ID | Feature | Integer | no | ||
Cluster Label | Feature | Categorical | no | ||
Category ID | Feature | Integer | no | ||
Category Label | Feature | Categorical | no |
0 to 7 of 7
Dataset Files
File | Size |
---|---|
pricerunner_aggregate.csv | 3.7 MB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset product_classification_and_clustering = fetch_ucirepo(id=837) # data (as pandas dataframes) X = product_classification_and_clustering.data.features y = product_classification_and_clustering.data.targets # metadata print(product_classification_and_clustering.metadata) # variable information print(product_classification_and_clustering.variables)
Akritidis, L. (2020). Product Classification and Clustering [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5M91Z.
Creators
Leonidas Akritidis
lakritidis@ihu.gr
International Hellenic University
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.