1. Roman Urdu Data Set: Roman Urdu (the scripting style for Urdu language) is one of the limited resource languages.A data corpus comprising of more than 20000 records was collected.
2. Gender by Name: This dataset attributes first names to genders, giving counts and probabilities. It combines open-source government data from the US, UK, Canada, and Australia.
3. Wisesight Sentiment Corpus: Social media messages in Thai language with sentiment label (positive, neutral, negative, question).
4. Guitar Chords finger positions: Position of the fingers for 2633 guitar chords in standard tuning (double checked with software)
5. Drug Review Dataset (Drugs.com): The dataset provides patient reviews on specific drugs along with related conditions and a 10 star patient rating reflecting overall patient satisfaction.
6. Drug Review Dataset (Druglib.com): The dataset provides patient reviews on specific drugs along with related conditions. Reviews and ratings are grouped into reports on the three aspects benefits, side effects and overall comment.
7. Online Retail II: A real online retail transaction data set of two years.