Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

Repeat Consumption Matrices Data Set
Download: Data Folder, Data Set Description

Abstract: The dataset contains 7 datasets of User - Item matrices, where each entry represents how many times a user consumed an item. Item is used as an umbrella term for various categories.

Data Set Characteristics:  


Number of Instances:




Attribute Characteristics:


Number of Attributes:


Date Donated


Associated Tasks:


Missing Values?


Number of Web Hits:



Dimitrios Kotzias, dkotzias '@', University of California Irvine

Data Set Information:

There are 7 datasets from Reddit, Twitter, Gowalla and Lastfm.
Each matrix contains how many times a user 'consumed' and item. Items can be locations, artists, or subreddits.
Details about each dataset are presented below. (In the parenthesis is the number of Users x Items)

tw_oc (13k x 11k): tweets with geolocation from Orange County CA area. Items are locations a user visits in this case.
tw_ny (30k x 11k): Same as tw_oc but from the New York area.

go_sf (2k x 7k): Check-ins from the app Gowalla, from the San Fransisco area. Full dataset here: [Web Link]
go_ny (1k x 7k): Same as go_sf, but from the New York area.

lastfm (992 x 15k): How many times, a user listened to each artist. Covers 3 years of listening habbits, full dataset here: [Web Link]∼ocelma/[Web Link]

reddit_top (113k x 21k): How many times a user posted in a subreddit. These are the 130k most active users from 2015 and 20k most subscribed subreddits. This dataset is very large and can take a lot of time to load/use.
reddit_sample (20k x 21k): Same as reddit_top, but a sample of 20k users.

Attribute Information:

The attributes represent items (categories) that uses tend to select multiple times. These can be music artists, subreddits or locations on the map.

Relevant Papers:

Predicting Consumption Patterns with Repeated and Novel Events by Dimitrios Kotzias, Moshe Lichman and Padhraic Smyth.

Citation Request:

If you have no special citation requests, please leave this field blank.

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML