Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

Netflix Prize Data Set
Download: Data Folder, Data Set Description

Abstract: This is the official data set used in the Netflix Prize competition. The data consists of about 100 million movie ratings, and the goal is to predict missing entries in the movie-user rating matrix.

Data Set Characteristics:  

Multivariate, Time-Series

Number of Instances:

100480507

Area:

N/A

Attribute Characteristics:

Integer

Number of Attributes:

17770

Date Donated

2009-09-21

Associated Tasks:

Clustering, Recommender-Systems

Missing Values?

Yes

Number of Web Hits:

12075


Source:

The Netflix Prize

http://www.netflixprize.com


Data Set Information:

This dataset was constructed to support participants in the Netflix Prize. See
[Web Link] for details about the prize.

There are over 480,000 customers in the dataset, each identified by a
unique integer id.

The title and release year for each movie is also provided. There are over
17,000 movies in the dataset, each identified by a unique integer id.

The dataset contains over 100 million ratings. The ratings were collected
between October 1998 and December 2005 and reflect the distribution of all
ratings received during this period. Each rating has a customer id, a movie id,
the date of the rating, and the value of the rating.

As part of the original Netflix Prize a set of ratings was identified whose
rating values were not provided in the original dataset. The object of the
Prize was to accurately predict the ratings from this 'qualifying' set. These
missing ratings are now available in the grand_prize.tar.gz dataset file.


Attribute Information:

The format of the data is described fully in the README files contained in the
dataset tar files.

MovieID:
Arbitrarily assigned unique integer in the range [1 .. 17770].

CustomerID:
Arbitrarily assigned unique integer in the range [1..2649429] (with gaps).

Rating:
Number of 'stars' assigned to a movie by a customer; an integer from 1 to 5.

Title:
English language title of the movie on the Netflix website.

YearOfRelease:
Year a movie was released in the range [1890..2005]. May correspond to the
release of corresponding DVD, not necessarily its theaterical release.

Date:
Timestamp of a rating in the form YYYY-MM-DD, in the range 1998-11-01 to
2005-12-31.

NetflixID:
Integer ID of a movie as currently used in the Netflix developer API
[Web Link]


Relevant Papers:

James Bennett and Stan Lanning. 'The Netflix Prize', 2007.
[Web Link]


Papers That Cite This Data Set1:

Arvind Narayanan and Vitaly Shmatikov. Robust De-anonymization of Large Sparse Datasets. IEEE Symposium on Security and Privacy. 2008. [View Context].

Robert Bell and Yehuda Koren and Chris Volinsky. Modeling relationships at multiple scales to improve accuracy of large recommender systems. KDD. 2007. [View Context].

Ruslan Salakhutdinov and Andriy Mnih. Probabilistic Matrix Factorization. NIPS. 2007. [View Context].

Tapani Raiko and Alexander Ilin and Juha Karhunen. Principal Component Analysis for Large Scale Problems with Lots of Missing Values. ECML. 2007. [View Context].

Robert M Bell and Yehuda Koren. Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights. ICDM. 2007. [View Context].

Ruslan Salakhutdinov and Andriy Mnih and Geoffrey E Hinton. Restricted Boltzmann machines for collaborative filtering. ICML. 2007. [View Context].

Robert M Bell and Yehuda Koren. Improved Neighborhood-based Collaborative Filtering. KDDCup. 2007. [View Context].

Jorge Sueiras and Alfonso Salafranca and Jose Luis Florez. A classical predictive modeling approach for task "Who rated what?" of the KDD CUP 2007. SIGKDD Explorations. 2007. [View Context].

Dhiraj Goel and Dhruv Batra. Predicting User Preference for Movies using NetFlix database. Department of Electrical and Computer Engineering Carnegie Mellon University. [View Context].

Gabor Tak. On the Gravity Recommendation System. Dept. of Measurement and Information Systems Budapest University of Technology and Economics. [View Context].

Yew Jin Lim. Variational Bayesian Approach to Movie Rating Prediction. School of Computing National University of Singapore. [View Context].

Martin Szomszor and Ciro Cattuto and Harith Alani and Kieron O'Hara and Andrea Baldassarri and Vittorio Loreto and Vito D. P Servedio. Folksonomies, the Semantic Web, and Movie Recommendation. School of Electronics and Computer Science University of Southampton. [View Context].

Arkadiusz Paterek. Improving regularized singular value decomposition for collaborative filtering. Institute of Informatics, Warsaw University. [View Context].

Mingrui Wu. Collaborative Filtering via Ensembles of Matrix Factorizations. Max Planck Institute for Biological Cybernetics. [View Context].


Citation Request:

USAGE LICENSE:

Netflix can not guarantee the correctness of the data, its suitability for any
particular purpose, or the validity of results based on the use of the data set.
The data set may be used for any research purposes under the following
conditions:

* The user may not state or imply any endorsement from Netflix.

* The user must acknowledge the use of the data set in
publications resulting from the use of the data set, and must
send us an electronic or paper copy of those publications.

* The user may not redistribute the data without separate
permission.

* The user may not use this information for any commercial or
revenue-bearing purposes without first obtaining permission
from Netflix.

If you have any further questions or comments, please contact the Prize
administrator: prizemaster '@' netflix.com


[1] Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML