1. USPTO Algorithm Challenge, run by NASA-Harvard Tournament Lab and TopCoder Problem: Pat: Data used for USPTO Algorithm Competition. Contains drawing pages from US patents with manually labeled figure and part labels.
2. Teaching Assistant Evaluation: The data consist of evaluations of teaching performance; scores are "low", "medium", or "high"
3. Russian Corpus of Biographical Texts: Sentence classification (Russian). The corpus contains Wikipedia texts splitted into sentences/ Each sentence has a topic label.
4. MONK's Problems: A set of three artificial domains over the same attribute space; Used to test a wide range of induction algorithms
5. ICMLA 2014 Accepted Papers Data Set: This data set compromises the metadata for the 2014 ICMLA conference's accepted papers, including ID, paper titles, author's keywords, abstracts and sessions in which they were exposed.
6. BuddyMove Data Set: User interest information extracted from user reviews published in holidayiq.com about various types of point of interests in South India
7. Badges: Badges labeled with a "+" or "-" as a function of a person's name
8. Bach Chorales: Time-series data based on chorales; challenge is to learn generative grammar; data in Lisp
9. Auto MPG: Revised from CMU StatLib library, data concerns city-cycle fuel consumption