1. Thyroid Disease: 10 separate databases from Garavan Institute 2. Amazon Commerce reviews set: The dataset is used for authorship identification in online Writeprint which is a new research field of pattern recognition. 3. Reuter_50_50: The dataset is used for authorship identification in online Writeprint which is a new research field of pattern recognition. 4. SMS Spam Collection: The SMS Spam Collection is a public set of SMS labeled messages that have been collected for mobile phone spam research. 5. Molecular Biology (Splice-junction Gene Sequences): Primate splice-junction gene sequences (DNA) with associated imperfect domain theory |