Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

YearPredictionMSD Data Set
Download: Data Folder, Data Set Description

Abstract: Prediction of the release year of a song from audio features. Songs are mostly western, commercial tracks ranging from 1922 to 2011, with a peak in the year 2000s.

Data Set Characteristics:  

Multivariate

Number of Instances:

515345

Area:

N/A

Attribute Characteristics:

Real

Number of Attributes:

90

Date Donated

2011-02-07

Associated Tasks:

Regression

Missing Values?

N/A

Number of Web Hits:

66198


Source:

This data is a subset of the Million Song Dataset:
http://labrosa.ee.columbia.edu/millionsong/
a collaboration between LabROSA (Columbia University) and The Echo Nest.
Prepared by T. Bertin-Mahieux <tb2332 '@' columbia.edu>


Data Set Information:

You should respect the following train / test split:
train: first 463,715 examples
test: last 51,630 examples
It avoids the 'producer effect' by making sure no song
from a given artist ends up in both the train and test set.


Attribute Information:

90 attributes, 12 = timbre average, 78 = timbre covariance
The first value is the year (target), ranging from 1922 to 2011.
Features extracted from the 'timbre' features from The Echo Nest API.
We take the average and covariance over all 'segments', each segment
being described by a 12-dimensional timbre vector.


Relevant Papers:

see the website: [Web Link]



Citation Request:

Please refer to the Machine Learning Repository's citation policy


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML