Liver Disorders

Donated on 5/14/1990

BUPA Medical Research Ltd. database donated by Richard S. Forsyth

Dataset Characteristics

Multivariate

Subject Area

Health and Medicine

Associated Tasks

Regression

Feature Type

Categorical, Integer, Real

# Instances

345

# Features

Dataset Information

Additional Information

The first 5 variables are all blood tests which are thought to be sensitive to liver disorders that might arise from excessive alcohol consumption. Each line in the dataset constitutes the record of a single male individual. Important note: The 7th field (selector) has been widely misinterpreted in the past as a dependent variable representing presence or absence of a liver disorder. This is incorrect [1]. The 7th field was created by BUPA researchers as a train/test selector. It is not suitable as a dependent variable for classification. The dataset does not contain any variable representing presence or absence of a liver disorder. Researchers who wish to use this dataset as a classification benchmark should follow the method used in experiments by the donor (Forsyth & Rada, 1986, Machine learning: applications in expert systems and information retrieval) and others (e.g. Turney, 1995, Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm), who used the 6th field (drinks), after dichotomising, as a dependent variable for classification. Because of widespread misinterpretation in the past, researchers should take care to state their method clearly.

Has Missing Values?

Variables Table

Variable Name	Role	Type	Description	Missing Values
mcv	Feature	Continuous	mean corpuscular volume	no
alkphos	Feature	Continuous	alkaline phosphotase	no
sgpt	Feature	Continuous	alanine aminotransferase	no
sgot	Feature	Continuous	aspartate aminotransferase	no
gammagt	Feature	Continuous	gamma-glutamyl transpeptidase	no
drinks	Target	Continuous	number of half-pint equivalents of alcoholic beverages drunk per day	no
selector	Other	Categorical	field created by the BUPA researchers to split the data into train/test sets	no

Rows per page

0 to 7 of 7

Additional Variable Information

1. mcv mean corpuscular volume 2. alkphos alkaline phosphotase 3. sgpt alanine aminotransferase 4. sgot aspartate aminotransferase 5. gammagt gamma-glutamyl transpeptidase 6. drinks number of half-pint equivalents of alcoholic beverages drunk per day 7. selector field created by the BUPA researchers to split the data into train/test sets

Baseline Model Performance

Dataset Files

File	Size
bupa.data	7.1 KB
costs/bupa-liver.README	2.1 KB
bupa.names	1.2 KB
noteDuplicates.txt	252 Bytes
costs/Index	235 Bytes

Rows per page

0 to 5 of 10

Papers Citing this Dataset

Sparse Representation-Based Extreme Learning Machine for Motor Imagery EEG Classification

By Qingshan She, Kang Chen, Yuliang Ma, Thinh Nguyen, Yingchun Zhang. 2018

Published in Computational intelligence and neuroscience.

Learning to Assess from Pair-Wise Comparisons

By Jorge Díez, Juan Coz, Oscar Luaces, Félix Goyache, Jaime Alonso, A. Peña, Antonio Bahamonde. 2002

Published in IBERAMIA.

Rows per page

0 to 2 of 2

Reviews

There are no reviews for this dataset yet.

Download (5.8 KB)

2 citations

44056 views

DOI

10.24432/C54G67

License

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.

Liver Disorders

Donated on 5/14/1990

Dataset Characteristics

Subject Area

Associated Tasks

Feature Type

# Instances

# Features

Dataset Information

Variables Table

Additional Variable Information

Baseline Model Performance

Dataset Files

Papers Citing this Dataset

Reviews

Write a Review

DOI

License