1. Reuters Transcribed Subset: This dataset is created by reading out 200 files from the 10 largest Reuters
classes and using an Automatic Speech Recognition system to create
2. Farm Ads: This data was collected from text ads found on twelve websites that deal with various farm animal related topics. The binary labels are based on whether or not the content owner approves of the ad.
3. CNAE-9: This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a
subset of 9 categories