1. Gender by Name: This dataset attributes first names to genders, giving counts and probabilities. It combines open-source government data from the US, UK, Canada, and Australia.
2. Wisesight Sentiment Corpus: Social media messages in Thai language with sentiment label (positive, neutral, negative, question).
3. Real-time Election Results: Portugal 2019: Data set of the real-time election results of the 2019 Portuguese Parliamentary Election.
4. Sports articles for objectivity analysis: 1000 sports articles were labeled using Amazon Mechanical Turk as objective or subjective. The raw texts, extracted features, and the URLs from which the articles were retrieved are provided.
5. A study of Asian Religious and Biblical Texts: Mainly from Project Gutenberg, we combine Upanishads, Yoga Sutras, Buddha Sutras, Tao Te Ching and Book of Wisdom, Book of Proverbs, Book of Ecclesiastes and Book of Ecclesiasticus