NSF Research Award Abstracts 1990-2003
Donated on 11/17/2003
This data set consists of (a) 129,000 abstracts describing NSF awards for basic research, (b) bag-of-word data files extracted from the abstracts, (c) a list of words used for indexing the bag-of-word
Dataset Characteristics
Text
Subject Area
Other
Associated Tasks
-
Feature Type
-
# Instances
129000
# Features
-
Dataset Information
Additional Information
The abstracts, one per file, were furnished by the NSF (National Science Foundation). A sample abstract is shown in the next section. The bag-of-word data was produced by automatically processing the abstracts with a text analyzer called NSFAbst, built using VisualText. While most fields of the output are very accurate, the authors were not extracted from the Investigator: field with 100% accuracy, due to wide variability in that field. The word list came from a separate process, and may not include all the words of interest in the abstracts.
Has Missing Values?
No
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset nsf_research_award_abstracts_1990_2003 = fetch_ucirepo(id=134) # data (as pandas dataframes) X = nsf_research_award_abstracts_1990_2003.data.features y = nsf_research_award_abstracts_1990_2003.data.targets # metadata print(nsf_research_award_abstracts_1990_2003.metadata) # variable information print(nsf_research_award_abstracts_1990_2003.variables)
Pazzani,Michael and Meyers,Amnon. (2003). NSF Research Award Abstracts 1990-2003. UCI Machine Learning Repository. https://doi.org/10.24432/C55C9N.
@misc{misc_nsf_research_award_abstracts_1990-2003_134, author = {Pazzani,Michael and Meyers,Amnon}, title = {{NSF Research Award Abstracts 1990-2003}}, year = {2003}, howpublished = {UCI Machine Learning Repository}, note = {{DOI}: https://doi.org/10.24432/C55C9N} }
Creators
Michael Pazzani
Amnon Meyers
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.