1. Turkish Spam V01: The TurkishSpam data set contains spam and normal emails written in Turkish.
2. Twitter Data set for Arabic Sentiment Analysis: This problem of Sentiment Analysis (SA) has been studied well on the English language but not Arabic one. Two main approaches have been devised: corpus-based and lexicon-based.
3. GitHub MUSAE: A social network of GitHub users with user-level attributes, connectivity data and a binary target variable.
4. Autism Screening Adult: Autistic Spectrum Disorder Screening Data for Adult. This dataset is related to classification and predictive tasks.
5. Multimodal Damage Identification for Humanitarian Computing: 5879 captioned images (image and text) from social media related to damage during natural disasters/wars, and belong to 6 classes: Fires, Floods, Natural landscape, Infrastructural, Human, Non-damage.