1. BuddyMove Data Set: User interest information extracted from user reviews published in holidayiq.com about various types of point of interests in South India
2. USPTO Algorithm Challenge, run by NASA-Harvard Tournament Lab and TopCoder Problem: Pat: Data used for USPTO Algorithm Competition. Contains drawing pages from US patents with manually labeled figure and part labels.
3. Auto MPG: Revised from CMU StatLib library, data concerns city-cycle fuel consumption
4. Bach Chorales: Time-series data based on chorales; challenge is to learn generative grammar; data in Lisp
5. Teaching Assistant Evaluation: The data consist of evaluations of teaching performance; scores are "low", "medium", or "high"
6. MONK's Problems: A set of three artificial domains over the same attribute space; Used to test a wide range of induction algorithms
7. Russian Corpus of Biographical Texts: Sentence classification (Russian). The corpus contains Wikipedia texts splitted into sentences/ Each sentence has a topic label.
8. ICMLA 2014 Accepted Papers Data Set: This data set compromises the metadata for the 2014 ICMLA conference's accepted papers, including ID, paper titles, author's keywords, abstracts and sessions in which they were exposed.
9. Badges: Badges labeled with a "+" or "-" as a function of a person's name