NIPS Conference Papers 1987-2015
Donated on 11/22/2016
This data set contains the distribution of words in the full text of the NIPS conference papers published from 1987 to 2015.
Dataset Characteristics
Text
Subject Area
Computer Science
Associated Tasks
Clustering
Feature Type
Integer
# Instances
11463
# Features
5812
Dataset Information
Additional Information
The dataset is in the form of a 11463 x 5812 matrix of word counts, containing 11463 words and 5811 NIPS conference papers (the first column contains the list of words). Each column contains the number of times each word appears in the corresponding document. The names of the columns give information about each document and its timestamp in the following format: Xyear_paperID. The matrix of word counts was obtained using the R package 'tm†to process the raw .txt files of the full text of the NIPS conference papers published between 1987 and 2015. The document-term matrix was constructed after tokenization, removal of stopwords and truncation of the vocabulary by only keeping words occurring more than 50 times.
Has Missing Values?
No
Variable Information
Column 1: 'X' (list of words) Columns 2-5812: 'Xyear_ID' (timestamp and paper ID)
Dataset Files
File | Size |
---|---|
NIPS_1987-2015.csv | 127.4 MB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset nips_conference_papers_1987_2015 = fetch_ucirepo(id=371) # data (as pandas dataframes) X = nips_conference_papers_1987_2015.data.features y = nips_conference_papers_1987_2015.data.targets # metadata print(nips_conference_papers_1987_2015.metadata) # variable information print(nips_conference_papers_1987_2015.variables)
Perrone, V. (2016). NIPS Conference Papers 1987-2015 [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5KC80.
Creators
Valerio Perrone
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.