Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

HCC Survival Data Set
Download: Data Folder, Data Set Description

Abstract: Hepatocellular Carcinoma dataset (HCC dataset) was collected at a University Hospital in Portugal. It contains real clinical data of 165 patients diagnosed with HCC.

Data Set Characteristics:  

Multivariate

Number of Instances:

165

Area:

Life

Attribute Characteristics:

Integer, Real

Number of Attributes:

49

Date Donated

2017-11-29

Associated Tasks:

Classification

Missing Values?

Yes

Number of Web Hits:

13664


Source:

Donors:
Miriam Seoane Santos, Department of Informatics Engineering, Faculty of Sciences and Technology, University of Coimbra (miriams '@' student.dei.uc.pt)
Pedro Henriques Abreu, Department of Informatics Engineering, Faculty of Sciences and Technology, University of Coimbra (pha '@' dei.uc.pt)
Armando Carvalho, Internal Medicine Service, Hospital and University Centre of Coimbra (aspcarvalho '@' gmail.com)
Adélia Simão, Internal Medicine Service, Hospital and University Centre of Coimbra (adeliasimao '@' gmail.com)


Data Set Information:

HCC dataset was obtained at a University Hospital in Portugal and contais several demographic, risk factors, laboratory and overall survival features of 165 real patients diagnosed with HCC. The dataset contains 49 features selected according to the EASL-EORTC (European Association for the Study of the Liver - European Organisation for Research and Treatment of Cancer) Clinical Practice Guidelines, which are the current state-of-the-art on the management of HCC.

This is an heterogeneous dataset, with 23 quantitative variables, and 26 qualitative variables. Overall, missing data represents 10.22% of the whole dataset and only eight patients have complete information in all fields (4.85%). The target variables is the survival at 1 year, and was encoded as a binary variable: 0 (dies) and 1 (lives). A certain degree of class-imbalance is also present (63 cases labeled as “dies” and 102 as “lives”).

A detailed description of the HCC dataset (feature's type/scale, range, mean/mode and missing data percentages) is provided in Santos et al. A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients, Journal of biomedical informatics, 58, 49-59, 2015.


Attribute Information:

Gender: nominal
Symptoms: nominal
Alcohol: nominal
Hepatitis B Surface Antigen: nominal
Hepatitis B e Antigen: nominal
Hepatitis B Core Antibody: nominal
Hepatitis C Virus Antibody: nominal
Cirrhosis : nominal
Endemic Countries: nominal
Smoking: nominal
Diabetes: nominal
Obesity: nominal
Hemochromatosis: nominal
Arterial Hypertension: nominal
Chronic Renal Insufficiency: nominal
Human Immunodeficiency Virus: nominal
Nonalcoholic Steatohepatitis: nominal
Esophageal Varices: nominal
Splenomegaly: nominal
Portal Hypertension: nominal
Portal Vein Thrombosis: nominal
Liver Metastasis: nominal
Radiological Hallmark: nominal
Age at diagnosis: integer
Grams of Alcohol per day: continuous
Packs of cigarets per year: continuous
Performance Status: ordinal
Encefalopathy degree: ordinal
Ascites degree: ordinal
International Normalised Ratio: continuous
Alpha-Fetoprotein (ng/mL): continuous
Haemoglobin (g/dL): continuous
Mean Corpuscular Volume (fl): continuous
Leukocytes(G/L): continuous
Platelets (G/L): continuous
Albumin (mg/dL): continuous
Total Bilirubin(mg/dL): continuous
Alanine transaminase (U/L): continuous
Aspartate transaminase (U/L): continuous
Gamma glutamyl transferase (U/L): continuous
Alkaline phosphatase (U/L): continuous
Total Proteins (g/dL): continuous
Creatinine (mg/dL): continuous
Number of Nodules: integer
Major dimension of nodule (cm): continuous
Direct Bilirubin (mg/dL): continuous
Iron (mcg/dL): continuous
Oxygen Saturation (%): continuous
Ferritin (ng/mL): continuous
Class: nominal (1 if patient survives, 0 if patient died)


Relevant Papers:

Miriam Seoane Santos, Pedro Henriques Abreu, Pedro J Garcia-Laencina, Adelia Simao, Armando Carvalho, A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients, Journal of biomedical informatics, 58, 49-59, 2015.



Citation Request:

Miriam Seoane Santos, Pedro Henriques Abreu, Pedro J Garcia-Laencina, Adelia Simao, Armando Carvalho, A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients, Journal of biomedical informatics, 58, 49-59, 2015.


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML