1. Reuters Transcribed Subset: This dataset is created by reading out 200 files from the 10 largest Reuters
classes and using an Automatic Speech Recognition system to create
2. Farm Ads: This data was collected from text ads found on twelve websites that deal with various farm animal related topics. The binary labels are based on whether or not the content owner approves of the ad.
3. CNAE-9: This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a
subset of 9 categories
4. Eco-hotel: This dataset includes Online Textual Reviews from both online (e.g., TripAdvisor) and offline (e.g., Guests' book) sources from the Areias do Seixo Eco-Resort.
5. Online Retail II: A real online retail transaction data set of two years.