1. Wisesight Sentiment Corpus: Social media messages in Thai language with sentiment label (positive, neutral, negative, question).
2. Roman Urdu Data Set: Roman Urdu (the scripting style for Urdu language) is one of the limited resource languages.A data corpus comprising of more than 20000 records was collected.
3. Online Retail II: A real online retail transaction data set of two years.
4. Guitar Chords finger positions: Position of the fingers for 2633 guitar chords in standard tuning (double checked with software)
5. Gender by Name: This dataset attributes first names to genders, giving counts and probabilities. It combines open-source government data from the US, UK, Canada, and Australia.
6. Drug Review Dataset (Drugs.com): The dataset provides patient reviews on specific drugs along with related conditions and a 10 star patient rating reflecting overall patient satisfaction.
7. Drug Review Dataset (Druglib.com): The dataset provides patient reviews on specific drugs along with related conditions. Reviews and ratings are grouped into reports on the three aspects benefits, side effects and overall comment.
8. BuddyMove Data Set: User interest information extracted from user reviews published in holidayiq.com about various types of point of interests in South India
9. Badges: Badges labeled with a "+" or "-" as a function of a person's name