Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

gene expression cancer RNA-Seq Data Set
Download: Data Folder, Data Set Description

Abstract: This collection of data is part of the RNA-Seq (HiSeq) PANCAN data set, it is a random extraction of gene expressions of patients having different types of tumor: BRCA, KIRC, COAD, LUAD and PRAD.

Data Set Characteristics:  


Number of Instances:




Attribute Characteristics:


Number of Attributes:


Date Donated


Associated Tasks:

Classification, Clustering

Missing Values?


Number of Web Hits:



Samuele Fiorini, samuele.fiorini '@', University of Genoa, redistributed under Creative Commons license ( from!Synapse:syn4301332.

Data Set Information:

Samples (instances) are stored row-wise. Variables (attributes) of each sample are RNA-Seq gene expression levels measured by illumina HiSeq platform.

Attribute Information:

A dummy name (gene_XX) is given to each attribute. Check the original submission ([Web Link]#!Synapse:syn4301332), or the platform specs for the complete list of probes name. The attributes are ordered consitently with the original submission.

Relevant Papers:

Weinstein, John N., et al. 'The cancer genome atlas pan-cancer analysis project.' Nature genetics 45.10 (2013): 1113-1120.

Citation Request:

The original data set (hosted at [Web Link]#!Synapse:syn4301332) is maintained by the cancer genome atlas pan-cancer analysis project.

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML