1. Reuters RCV1 RCV2 Multilingual, Multiview Text Categorization Test collection: This test collection contains feature characteristics of documents originally written in five different languages and their translations, over a common set of 6 categories.
2. chipseq: ChIP-seq experiments characterize protein modifications or binding at
specific genomic locations in specific samples. The machine learning
problem in these data is structured binary classification.
3. HIV-1 protease cleavage: The data contains lists of octamers (8 amino acids) and a flag (-1 or 1) depending on whether HIV-1 protease will cleave in the central position (between amino acids 4 and 5).
4. Bar Crawl: Detecting Heavy Drinking: Accelerometer and transdermal alcohol content data from a college bar crawl. Used to predict heavy drinking episodes via mobile data.
5. Bar Crawl: Detecting Heavy Drinking: Accelerometer and transdermal alcohol content data from a college bar crawl. Used to predict heavy drinking episodes via mobile data.
6. Localization Data for Person Activity: Data contains recordings of five people performing different activities. Each person wore four sensors (tags) while performing the same scenario five times.
7. Activity recognition with healthy older people using a batteryless wearable sensor: Sequential motion data from 14 healthy older people aged 66 to 86 years old using a batteryless, wearable sensor on top of their clothing for the recognition of activities in clinical environments.
8. Diabetic Retinopathy Debrecen Data Set: This dataset contains features extracted from the Messidor image set to predict whether an image contains signs of diabetic retinopathy or not.
9. Thyroid Disease: 10 separate databases from Garavan Institute
10. One-hundred plant species leaves data set: Sixteen samples of leaf each of one-hundred plant species. For each sample, a shape descriptor, fine scale margin and texture histogram are given.
11. Codon usage: DNA codon usage frequencies of a large sample of diverse biological organisms from different taxa
12. Simulated Falls and Daily Living Activities Data Set: 20 falls and 16 daily living activities were performed by 17 volunteers with 5 repetitions while wearing 6 sensors (3.060 instances) that attached to their head, chest, waist, wrist, thigh and ankle.
13. KASANDR: KASANDR is a novel, publicly available collection for recommendation systems that records the behavior of customers of the European leader in e-Commerce advertising, Kelkoo.