Repeat Consumption Matrices
Donated on 3/21/2018
The dataset contains 7 datasets of User - Item matrices, where each entry represents how many times a user consumed an item. Item is used as an umbrella term for various categories.
Dataset Characteristics
Multivariate
Subject Area
Computer Science
Associated Tasks
Clustering
Feature Type
Real
# Instances
130000
# Features
21000
Dataset Information
Additional Information
There are 7 datasets from Reddit, Twitter, Gowalla and Lastfm. Each matrix contains how many times a user 'consumed' and item. Items can be locations, artists, or subreddits. Details about each dataset are presented below. (In the parenthesis is the number of Users x Items) tw_oc (13k x 11k): tweets with geolocation from Orange County CA area. Items are locations a user visits in this case. tw_ny (30k x 11k): Same as tw_oc but from the New York area. go_sf (2k x 7k): Check-ins from the app Gowalla, from the San Fransisco area. Full dataset here: https://snap.stanford.edu/data/loc-gowalla.html go_ny (1k x 7k): Same as go_sf, but from the New York area. lastfm (992 x 15k): How many times, a user listened to each artist. Covers 3 years of listening habbits, full dataset here: http://www.dtic.upf.edu/∼ocelma/MusicRecommendationDataset/lastfm-1K.html reddit_top (113k x 21k): How many times a user posted in a subreddit. These are the 130k most active users from 2015 and 20k most subscribed subreddits. This dataset is very large and can take a lot of time to load/use. reddit_sample (20k x 21k): Same as reddit_top, but a sample of 20k users.
Has Missing Values?
No
Variable Information
The attributes represent items (categories) that uses tend to select multiple times. These can be music artists, subreddits or locations on the map.
Dataset Files
File | Size |
---|---|
data/reddit_top/train.csv | 97.4 MB |
data/reddit_top/validation.csv | 35.3 MB |
data/reddit_top/test.csv | 33.2 MB |
data/lastfm/train.csv | 5.5 MB |
data/reddit_sample/train.csv | 4.8 MB |
0 to 5 of 26
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset repeat_consumption_matrices = fetch_ucirepo(id=441) # data (as pandas dataframes) X = repeat_consumption_matrices.data.features y = repeat_consumption_matrices.data.targets # metadata print(repeat_consumption_matrices.metadata) # variable information print(repeat_consumption_matrices.variables)
Kotzias, D. (2018). Repeat Consumption Matrices [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5KG7N.
Creators
Dimitrios Kotzias
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.