Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

NIPS Conference Papers 1987-2015 Data Set
Download: Data Folder, Data Set Description

Abstract: This data set contains the distribution of words in the full text of the NIPS conference papers published from 1987 to 2015.

Data Set Characteristics:  

Text

Number of Instances:

11463

Area:

Computer

Attribute Characteristics:

Integer

Number of Attributes:

5812

Date Donated

2016-11-23

Associated Tasks:

Clustering

Missing Values?

N/A

Number of Web Hits:

19133


Source:

Valerio Perrone
v.perrone '@' warwick.ac.uk
Department of Statistics
University of Warwick
Coventry (UK)


Data Set Information:

The dataset is in the form of a 11463 x 5812 matrix of word counts, containing 11463 words and 5811 NIPS conference papers (the first column contains the list of words). Each column contains the number of times each word appears in the corresponding document. The names of the columns give information about each document and its timestamp in the following format: Xyear_paperID.

The matrix of word counts was obtained using the R package 'tm” to process the raw .txt files of the full text of the NIPS conference papers published between 1987 and 2015. The document-term matrix was constructed after tokenization, removal of stopwords and truncation of the vocabulary by only keeping words occurring more than 50 times.


Attribute Information:

Column 1: 'X' (list of words)
Columns 2-5812: 'Xyear_ID' (timestamp and paper ID)


Relevant Papers:

Perrone V., Jenkins P. A., Spano D., Teh Y. W. (2016). Poisson Random Fields for Dynamic Feature Models. [Web Link] ([Web Link]).



Citation Request:

If you use this data please cite 'Poisson Random Fields for Dynamic Feature Models'. Perrone V., Jenkins P. A., Spano D., Teh Y. W. (2016). [Web Link] ([Web Link]).


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML