1. Wisesight Sentiment Corpus: Social media messages in Thai language with sentiment label (positive, neutral, negative, question).
2. Twitter Data set for Arabic Sentiment Analysis: This problem of Sentiment Analysis (SA) has been studied well on the English language but not Arabic one. Two main approaches have been devised: corpus-based and lexicon-based.
3. Multimodal Damage Identification for Humanitarian Computing: 5879 captioned images (image and text) from social media related to damage during natural disasters/wars, and belong to 6 classes: Fires, Floods, Natural landscape, Infrastructural, Human, Non-damage.
4. Gender by Name: This dataset attributes first names to genders, giving counts and probabilities. It combines open-source government data from the US, UK, Canada, and Australia.