HCC Survival

Donated on 11/28/2017

Hepatocellular Carcinoma dataset (HCC dataset) was collected at a University Hospital in Portugal. It contains real clinical data of 165 patients diagnosed with HCC.

Dataset Characteristics


Subject Area

Health and Medicine

Associated Tasks


Feature Type

Integer, Real

# Instances


# Features


Dataset Information

Additional Information

HCC dataset was obtained at a University Hospital in Portugal and contais several demographic, risk factors, laboratory and overall survival features of 165 real patients diagnosed with HCC. The dataset contains 49 features selected according to the EASL-EORTC (European Association for the Study of the Liver - European Organisation for Research and Treatment of Cancer) Clinical Practice Guidelines, which are the current state-of-the-art on the management of HCC. This is an heterogeneous dataset, with 23 quantitative variables, and 26 qualitative variables. Overall, missing data represents 10.22% of the whole dataset and only eight patients have complete information in all fields (4.85%). The target variables is the survival at 1 year, and was encoded as a binary variable: 0 (dies) and 1 (lives). A certain degree of class-imbalance is also present (63 cases labeled as “dies” and 102 as “lives”). A detailed description of the HCC dataset (feature's type/scale, range, mean/mode and missing data percentages) is provided in Santos et al. A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients, Journal of biomedical informatics, 58, 49-59, 2015.

Has Missing Values?


Variables Table

Variable NameRoleTypeDemographicDescriptionUnitsMissing Values

0 to 10 of 49

Additional Variable Information

Gender: nominal Symptoms: nominal Alcohol: nominal Hepatitis B Surface Antigen: nominal Hepatitis B e Antigen: nominal Hepatitis B Core Antibody: nominal Hepatitis C Virus Antibody: nominal Cirrhosis : nominal Endemic Countries: nominal Smoking: nominal Diabetes: nominal Obesity: nominal Hemochromatosis: nominal Arterial Hypertension: nominal Chronic Renal Insufficiency: nominal Human Immunodeficiency Virus: nominal Nonalcoholic Steatohepatitis: nominal Esophageal Varices: nominal Splenomegaly: nominal Portal Hypertension: nominal Portal Vein Thrombosis: nominal Liver Metastasis: nominal Radiological Hallmark: nominal Age at diagnosis: integer Grams of Alcohol per day: continuous Packs of cigarets per year: continuous Performance Status: ordinal Encefalopathy degree: ordinal Ascites degree: ordinal International Normalised Ratio: continuous Alpha-Fetoprotein (ng/mL): continuous Haemoglobin (g/dL): continuous Mean Corpuscular Volume (fl): continuous Leukocytes(G/L): continuous Platelets (G/L): continuous Albumin (mg/dL): continuous Total Bilirubin(mg/dL): continuous Alanine transaminase (U/L): continuous Aspartate transaminase (U/L): continuous Gamma glutamyl transferase (U/L): continuous Alkaline phosphatase (U/L): continuous Total Proteins (g/dL): continuous Creatinine (mg/dL): continuous Number of Nodules: integer Major dimension of nodule (cm): continuous Direct Bilirubin (mg/dL): continuous Iron (mcg/dL): continuous Oxygen Saturation (%): continuous Ferritin (ng/mL): continuous Class: nominal (1 if patient survives, 0 if patient died)

0 citations


Miriam Santos

Pedro Abreu

Pedro Garcia-Laencina

Adelia Simao

Armando Carvalho


By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy