1. Badges: Badges labeled with a "+" or "-" as a function of a person's name
2. Roman Urdu Data Set: Roman Urdu (the scripting style for Urdu language) is one of the limited resource languages.A data corpus comprising of more than 20000 records was collected.
3. Gender by Name: This dataset attributes first names to genders, giving counts and probabilities. It combines open-source government data from the US, UK, Canada, and Australia.
4. Wisesight Sentiment Corpus: Social media messages in Thai language with sentiment label (positive, neutral, negative, question).
5. Guitar Chords finger positions: Position of the fingers for 2633 guitar chords in standard tuning (double checked with software)
6. Drug Review Dataset (Drugs.com): The dataset provides patient reviews on specific drugs along with related conditions and a 10 star patient rating reflecting overall patient satisfaction.
7. BuddyMove Data Set: User interest information extracted from user reviews published in holidayiq.com about various types of point of interests in South India
8. Drug Review Dataset (Druglib.com): The dataset provides patient reviews on specific drugs along with related conditions. Reviews and ratings are grouped into reports on the three aspects benefits, side effects and overall comment.
9. Online Retail II: A real online retail transaction data set of two years.