Browse Datasets

Breast Cancer Wisconsin (Diagnostic)

Diagnostic Wisconsin Breast Cancer Database.

Breast Cancer Wisconsin (Original)

Original Wisconsin Breast Cancer Database

Breast Cancer

This breast cancer domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. This is one of three domains provided by the Oncology Institute that has repeatedly appeared in the machine learning literature. (See also lymphography and primary-tumor.)

Credit Approval

This data concerns credit card applications; good mix of attributes

Lung Cancer

Lung cancer data; no attribute definitions

Cervical Cancer (Risk Factors)

This dataset focuses on the prediction of indicators/diagnosis of cervical cancer. The features cover demographic information, habits, and historic medical records.

Clickstream Data for Online Shopping

The dataset contains information on clickstream from online store offering clothing for pregnant women.

Differentiated Thyroid Cancer Recurrence

This data set contains 13 clinicopathologic features aiming to predict recurrence of well differentiated thyroid cancer. The data set was collected in duration of 15 years and each patient was followed for at least 10 years.

Glioma Grading Clinical and Mutation Features

Gliomas are the most common primary tumors of the brain. They can be graded as LGG (Lower-Grade Glioma) or GBM (Glioblastoma Multiforme) depending on the histological/imaging criteria. Clinical and molecular/mutation factors are also very crucial for the grading process. Molecular tests are expensive to help accurately diagnose glioma patients. In this dataset, the most frequently mutated 20 genes and 3 clinical features are considered from TCGA-LGG and TCGA-GBM brain glioma projects. The prediction task is to determine whether a patient is LGG or GBM with a given clinical and molecular/mutation features. The main objective is to find the optimal subset of mutation genes and clinical features for the glioma grading process to improve performance and reduce costs.

TCGA Kidney Cancers

The TCGA Kidney Cancers Dataset is a bulk RNA-seq dataset that contains transcriptome profiles of patients diagnosed with three different subtypes of kidney cancers. This dataset can be used to make predictions about the specific subtype of kidney cancers given the normalized transcriptome profile data, as well as providing a hands-on experience on large and sparse genomic information.

0 to 10 of 24

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy