1. Quadruped Mammals: The file animals.c is a data generator of structured instances representing quadruped animals 2. Lung Cancer: Lung cancer data; no attribute definitions 3. Soybean (Small): Michalski's famous soybean disease database 4. Multi-view Brain Networks: Multi-layer brain network datasets derived from the resting-state electroencephalography (EEG) data. 5. Cervical Cancer Behavior Risk: The dataset contains 19 attributes regarding ca cervix behavior risk with class label is ca_cervix with 1 and 0 as values which means the respondent with and without ca cervix, respectively. 6. Fertility: 100 volunteers provide a semen sample analyzed according to the WHO 2010 criteria. Sperm concentration are related to socio-demographic data, environmental factors, health status, and life habits 7. Zoo: Artificial, 7 classes of animals 8. Autistic Spectrum Disorder Screening Data for Adolescent : Autistic Spectrum Disorder Screening Data for Adolescent. This dataset is related to classification and predictive tasks. 9. Molecular Biology (Promoter Gene Sequences): E. Coli promoter gene sequences (DNA) with partial domain theory 10. Breast Tissue: Dataset with electrical impedance measurements of freshly excised tissue samples from the breast. 11. Breast Cancer Coimbra: Clinical features were observed or measured for 64 patients with breast cancer and 52 healthy controls. 12. Early biomarkers of Parkinsons disease based on natural connected speech: Predict a pattern of neurodegeneration in the dataset of speech features obtained from patients with early untreated Parkinson’s disease and patients at high risk developing Parkinson’s disease. 13. Echocardiogram: Data for classifying if patients will survive for at least one year after a heart attack 14. Lymphography: This lymphography domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. (Restricted access) 15. Nasarian CAD Dataset: This dataset comprises records of 150 subjects (all male employees in Iran have visited the Abadan Occupational (Industrial) Medicine Clinic) and 52 features. 16. Hepatitis: From G.Gong: CMU; Mostly Boolean or numeric-valued attribute types; Includes cost data (donated by Peter Turney) 17. HCC Survival: Hepatocellular Carcinoma dataset (HCC dataset) was collected at a University Hospital in Portugal. It contains real clinical data of 165 patients diagnosed with HCC. 18. Divorce Predictors data set: Participants completed the “Personal Information Form” and “Divorce Predictors Scale”. 19. Divorce Predictors data set: Participants completed the Personal Information Form and Divorce Predictors Scale. 20. Bone marrow transplant: children: The data set describes pediatric patients with several hematologic diseases, who were subject to the unmanipulated allogeneic unrelated donor hematopoietic stem cell transplantation. 21. Amphibians: The dataset is a multilabel classification problem. The goal is to predict the presence of amphibians species near the water reservoirs based on features obtained from GIS systems and satellite images 22. Parkinsons: Oxford Parkinson's Disease Detection Dataset 23. Breast Cancer Wisconsin (Prognostic): Prognostic Wisconsin Breast Cancer Database 24. Risk Factor prediction of Chronic Kidney Disease: Chronic kidney disease (CKD) is an increasing medical issue that declines the productivity of renal capacities and subsequently damages the kidneys. 25. Audiology (Standardized): Standardized version of the original audiology database 26. Parkinson Dataset with replicated acoustic features : Contains acoustic features extracted from 3 voice recording replications of the sustained /a/ phonation for each one of the 80 subjects (40 of them with Parkinson's Disease). 27. Algerian Forest Fires Dataset : The dataset includes 244 instances that regroup a data of two regions of Algeria. 28. SPECT Heart: Data on cardiac Single Proton Emission Computed Tomography (SPECT) images. Each patient classified into two categories: normal and abnormal. 29. SPECTF Heart: Data on cardiac Single Proton Emission Computed Tomography (SPECT) images. Each patient classified into two categories: normal and abnormal. 30. Statlog (Heart): This dataset is a heart disease database similar to a database already present in the repository (Heart Disease databases) but in a slightly different form 31. Quality Assessment of Digital Colposcopies: This dataset explores the subjective quality assessment of digital colposcopies. 32. Autistic Spectrum Disorder Screening Data for Children : Children screening data for autism suitable for classification and predictive tasks 33. Heart failure clinical records: This dataset contains the medical records of 299 patients who had heart failure, collected during their follow-up period, where each patient profile has 13 clinical features. 34. Heart Disease: 4 databases: Cleveland, Hungary, Switzerland, and the VA Long Beach 35. extention of Z-Alizadeh sani dataset: It was collected for CAD diagnosis. 36. Z-Alizadeh Sani: It was collected for CAD diagnosis. 37. Soybean (Large): Michalski's famous soybean disease database 38. Forest type mapping: Multi-temporal remote sensing data of a forested area in Japan. The goal is to map different forest types using spectral data. 39. Primary Tumor: From Ljubljana Oncology Institute 40. Dermatology: Aim for this dataset is to determine the type of Eryhemato-Squamous Disease. 41. Horse Colic: Well documented attributes; 368 instances with 28 attributes (continuous, discrete, and nominal); 30% missing values 42. Refractive errors: Effect of life style and genetic on eye refractive errors. 43. Thoracic Surgery Data: The data is dedicated to classification problem related to the post-operative life expectancy in the lung cancer patients: class 1 - death within one year after surgery, class 2 - survival. 44. Breast Cancer Wisconsin (Diagnostic): Diagnostic Wisconsin Breast Cancer Database 45. ILPD (Indian Liver Patient Dataset): This data set contains 10 variables that are age, gender, total Bilirubin, direct Bilirubin, total proteins, albumin, A/G ratio, SGPT, SGOT and Alkphos. 46. HCV data: The data set contains laboratory values of blood donors and Hepatitis C patients and demographic values like age. 47. Breast Cancer Wisconsin (Original): Original Wisconsin Breast Cancer Database 48. QSAR Bioconcentration classes dataset: Dataset of manually-curated Bioconcentration factor (BCF, fish) and mechanistic classes for QSAR modeling. 49. Cervical cancer (Risk Factors): This dataset focuses on the prediction of indicators/diagnosis of cervical cancer. The features cover demographic information, habits, and historic medical records. 50. Parkinson Speech Dataset with Multiple Types of Sound Recordings: The training data belongs to 20 Parkinson's Disease (PD) patients and 20 healthy subjects. From all subjects, multiple types of sound recordings (26) are taken. 51. Mice Protein Expression: Expression levels of 77 proteins measured in the cerebral cortex of 8 classes of control and Down syndrome mice exposed to context fear conditioning, a task used to assess associative learning. 52. Diabetic Retinopathy Debrecen Data Set: This dataset contains features extracted from the Messidor image set to predict whether an image contains signs of diabetic retinopathy or not. 53. Hepatitis C Virus (HCV) for Egyptian patients: Egyptian patients who underwent treatment dosages for HCV about 18 months. Discretization should be applied based on expert recommendations; there is an attached file shows how. 54. One-hundred plant species leaves data set: Sixteen samples of leaf each of one-hundred plant species. For each sample, a shape descriptor, fine scale margin and texture histogram are given. 55. Estimation of obesity levels based on eating habits and physical condition : This dataset include data for the estimation of obesity levels in individuals from the countries of Mexico, Peru and Colombia, based on their eating habits and physical condition. 56. Cardiotocography: The dataset consists of measurements of fetal heart rate (FHR) and uterine contraction (UC) features on cardiotocograms classified by expert obstetricians. 57. Molecular Biology (Splice-junction Gene Sequences): Primate splice-junction gene sequences (DNA) with associated imperfect domain theory 58. Anuran Calls (MFCCs): Acoustic features extracted from syllables of anuran (frogs) calls, including the family, the genus, and the species labels (multilabel). 59. Thyroid Disease: 10 separate databases from Garavan Institute 60. Mushroom: From Audobon Society Field Guide; mushrooms described in terms of physical characteristics; classification: poisonous or edible 61. EEG Steady-State Visual Evoked Potential Signals: This database consists on 30 subjects performing Brain Computer Interface for Steady State Visual Evoked Potentials (BCI-SSVEP). 62. Codon usage: DNA codon usage frequencies of a large sample of diverse biological organisms from different taxa 63. EEG Eye State: The data set consists of 14 EEG values and a value indicating the eye state. 64. KEGG Metabolic Relation Network (Directed): KEGG Metabolic pathways modeled as directed relation network. Variety of graphical features presented. 65. Secondary Mushroom Dataset: Dataset of simulated mushrooms for binary classification into edible and poisonous. 66. KEGG Metabolic Reaction Network (Undirected): KEGG Metabolic pathways modeled as un-directed reaction network. Variety of graphical features presented. 67. Diabetes 130-US hospitals for years 1999-2008: This data has been prepared to analyze factors related to readmission as well as other
outcomes pertaining to patients with diabetes. 68. Covertype: Forest CoverType dataset |