1. Multimodal Damage Identification for Humanitarian Computing: 5879 captioned images (image and text) from social media related to damage during natural disasters/wars, and belong to 6 classes: Fires, Floods, Natural landscape, Infrastructural, Human, Non-damage.
2. Speaker Accent Recognition: Data set featuring single English words read by speakers from six different countries for accent detection and recognition
3. Gender Gap in Spanish WP: Data set used to estimate the number of women editors and their editing practices in the Spanish Wikipedia
4. Autism Screening Adult: Autistic Spectrum Disorder Screening Data for Adult. This dataset is related to classification and predictive tasks.
5. Drug consumption (quantified): Classify type of drug consumer by personality data
6. Student Performance: Predict student performance in secondary education (high school).
7. Higher Education Students Performance Evaluation Dataset: The data was collected from the Faculty of Engineering and Faculty of Educational Sciences students in 2019. The purpose is to predict students' end-of-term performances using ML techniques.
8. Sports articles for objectivity analysis: 1000 sports articles were labeled using Amazon Mechanical Turk as objective or subjective. The raw texts, extracted features, and the URLs from which the articles were retrieved are provided.
9. A study of Asian Religious and Biblical Texts: Mainly from Project Gutenberg, we combine Upanishads, Yoga Sutras, Buddha Sutras, Tao Te Ching and Book of Wisdom, Book of Proverbs, Book of Ecclesiastes and Book of Ecclesiasticus