NIPS Conference Papers 1987-2015

Donated on 11/22/2016

This data set contains the distribution of words in the full text of the NIPS conference papers published from 1987 to 2015.

Dataset Characteristics

Text

Subject Area

Computer Science

Associated Tasks

Clustering

Feature Type

Integer

# Instances

11463

# Features

5812

Dataset Information

Additional Information

The dataset is in the form of a 11463 x 5812 matrix of word counts, containing 11463 words and 5811 NIPS conference papers (the first column contains the list of words). Each column contains the number of times each word appears in the corresponding document. The names of the columns give information about each document and its timestamp in the following format: Xyear_paperID. The matrix of word counts was obtained using the R package 'tm” to process the raw .txt files of the full text of the NIPS conference papers published between 1987 and 2015. The document-term matrix was constructed after tokenization, removal of stopwords and truncation of the vocabulary by only keeping words occurring more than 50 times.

Has Missing Values?

No

Variable Information

Column 1: 'X' (list of words) Columns 2-5812: 'Xyear_ID' (timestamp and paper ID)

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download
0 citations
4003 views

Creators

Valerio Perrone

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy