Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

NSF Research Award Abstracts 1990-2003 Data Set
Download: Data Folder, Data Set Description

Abstract: This data set consists of (a) 129,000 abstracts describing NSF awards for basic research, (b) bag-of-word data files extracted from the abstracts, (c) a list of words used for indexing the bag-of-word

Data Set Characteristics:  


Number of Instances:




Attribute Characteristics:


Number of Attributes:


Date Donated


Associated Tasks:


Missing Values?


Number of Web Hits:



Original Owner and Donor

Abstracts provided by:

Michael J. Pazzani
ICS Department, School of Computer Science, UCI, Irvine CA, 92697, USA
pazzani '@'

Bag-of-word data provided by:

Amnon Meyers
ICS Department, School of Computer Science, UCI, Irvine CA, 92697, USA
ameyers '@'

Data Set Information:

The abstracts, one per file, were furnished by the NSF (National Science Foundation). A sample abstract is shown in the next section.

The bag-of-word data was produced by automatically processing the abstracts with a text analyzer called NSFAbst, built using VisualText. While most fields of the output are very accurate, the authors were not extracted from the Investigator: field with 100% accuracy, due to wide variability in that field.

The word list came from a separate process, and may not include all the words of interest in the abstracts.

Attribute Information:


Relevant Papers:


Citation Request:

Please refer to the Machine Learning Repository's citation policy

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML