gene expression cancer RNA-Seq
Donated on 6/8/2016
This collection of data is part of the RNA-Seq (HiSeq) PANCAN data set, it is a random extraction of gene expressions of patients having different types of tumor: BRCA, KIRC, COAD, LUAD and PRAD.
Dataset Characteristics
Multivariate
Subject Area
Biology
Associated Tasks
Classification, Clustering
Feature Type
Real
# Instances
801
# Features
20531
Dataset Information
Additional Information
Samples (instances) are stored row-wise. Variables (attributes) of each sample are RNA-Seq gene expression levels measured by illumina HiSeq platform.
Has Missing Values?
No
Variable Information
A dummy name (gene_XX) is given to each attribute. Check the original submission (https://www.synapse.org/#!Synapse:syn4301332), or the platform specs for the complete list of probes name. The attributes are ordered consitently with the original submission.
Dataset Files
File | Size |
---|---|
TCGA-PANCAN-HiSeq-801x20531.tar.gz | 69.5 MB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset gene_expression_cancer_rna_seq = fetch_ucirepo(id=401) # data (as pandas dataframes) X = gene_expression_cancer_rna_seq.data.features y = gene_expression_cancer_rna_seq.data.targets # metadata print(gene_expression_cancer_rna_seq.metadata) # variable information print(gene_expression_cancer_rna_seq.variables)
Fiorini, S. (2016). gene expression cancer RNA-Seq [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5R88H.
Creators
Samuele Fiorini
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.