TCGA Kidney Cancers
Linked on 9/25/2023
The TCGA Kidney Cancers Dataset is a bulk RNA-seq dataset that contains transcriptome profiles of patients diagnosed with three different subtypes of kidney cancers. This dataset can be used to make predictions about the specific subtype of kidney cancers given the normalized transcriptome profile data, as well as providing a hands-on experience on large and sparse genomic information.
Dataset Characteristics
Tabular, Multivariate
Subject Area
Health and Medicine
Associated Tasks
Classification, Clustering
Feature Type
Real
# Instances
1024
# Features
60660
Dataset Information
For what purpose was the dataset created?
To better understand the relationship between human genome and cancers
Who funded the creation of the dataset?
The NIH.
What do the instances in this dataset represent?
- Bulk transcriptome profiles - Kidney cancer patients - Worldwide population
Are there recommended data splits?
Cross validation or a fixed train-test split could be used.
Does the dataset contain data that might be considered sensitive in any way?
This dataset contains the variables age, race, and ethnicity.
Was there any data preprocessing performed?
Fragments Per Kilo Million (FPKM) normalization.
Has Missing Values?
No
Introductory Paper
By J. Weinstein, E. Collisson, G. Mills, K. Shaw, B. Ozenberger, K. Ellrott, I. Shmulevich, C. Sander, Joshua M. Stuart. 2013
Published in Nature Genetics
Variable Information
Bulk RNA-Seq normalized using FPKM (fragments per kilo million) method
Class Labels
- TCGA-KICH - TCGA-KIRC - TCGA-KIRP
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset tcga_kidney_cancers = fetch_ucirepo(id=892) # data (as pandas dataframes) X = tcga_kidney_cancers.data.features y = tcga_kidney_cancers.data.targets # metadata print(tcga_kidney_cancers.metadata) # variable information print(tcga_kidney_cancers.variables)
Weinstein, J., Collisson, E., Mills, G., Shaw, K., Ozenberger, B., Ellrott, K., Shmulevich, I., Sander, C., & M., J. (2013). TCGA Kidney Cancers [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5702T.
Citations/Acknowledgements
If you use this dataset, please follow the acknowledgment policy on the original dataset website.
Creators
J. Weinstein
E. Collisson
G. Mills
K. Shaw
B. Ozenberger
K. Ellrott
I. Shmulevich
C. Sander
Joshua M.