Liver Disorders
Donated on 5/14/1990
BUPA Medical Research Ltd. database donated by Richard S. Forsyth
Dataset Characteristics
Multivariate
Subject Area
Health and Medicine
Associated Tasks
Regression
Feature Type
Categorical, Integer, Real
# Instances
345
# Features
5
Dataset Information
Additional Information
The first 5 variables are all blood tests which are thought to be sensitive to liver disorders that might arise from excessive alcohol consumption. Each line in the dataset constitutes the record of a single male individual. Important note: The 7th field (selector) has been widely misinterpreted in the past as a dependent variable representing presence or absence of a liver disorder. This is incorrect [1]. The 7th field was created by BUPA researchers as a train/test selector. It is not suitable as a dependent variable for classification. The dataset does not contain any variable representing presence or absence of a liver disorder. Researchers who wish to use this dataset as a classification benchmark should follow the method used in experiments by the donor (Forsyth & Rada, 1986, Machine learning: applications in expert systems and information retrieval) and others (e.g. Turney, 1995, Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm), who used the 6th field (drinks), after dichotomising, as a dependent variable for classification. Because of widespread misinterpretation in the past, researchers should take care to state their method clearly.
Has Missing Values?
No
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
mcv | Feature | Continuous | mean corpuscular volume | no | |
alkphos | Feature | Continuous | alkaline phosphotase | no | |
sgpt | Feature | Continuous | alanine aminotransferase | no | |
sgot | Feature | Continuous | aspartate aminotransferase | no | |
gammagt | Feature | Continuous | gamma-glutamyl transpeptidase | no | |
drinks | Target | Continuous | number of half-pint equivalents of alcoholic beverages drunk per day | no | |
selector | Other | Categorical | field created by the BUPA researchers to split the data into train/test sets | no |
0 to 7 of 7
Additional Variable Information
1. mcv mean corpuscular volume 2. alkphos alkaline phosphotase 3. sgpt alanine aminotransferase 4. sgot aspartate aminotransferase 5. gammagt gamma-glutamyl transpeptidase 6. drinks number of half-pint equivalents of alcoholic beverages drunk per day 7. selector field created by the BUPA researchers to split the data into train/test sets
Baseline Model Performance
Dataset Files
File | Size |
---|---|
bupa.data | 7.1 KB |
costs/bupa-liver.README | 2.1 KB |
bupa.names | 1.2 KB |
noteDuplicates.txt | 252 Bytes |
costs/Index | 235 Bytes |
0 to 5 of 10
Papers Citing this Dataset
Sort by Year, desc
By Qingshan She, Kang Chen, Yuliang Ma, Thinh Nguyen, Yingchun Zhang. 2018
Published in Computational intelligence and neuroscience.
By Jorge Díez, Juan Coz, Oscar Luaces, Félix Goyache, Jaime Alonso, A. Peña, Antonio Bahamonde. 2002
Published in IBERAMIA.
0 to 2 of 2
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset liver_disorders = fetch_ucirepo(id=60) # data (as pandas dataframes) X = liver_disorders.data.features y = liver_disorders.data.targets # metadata print(liver_disorders.metadata) # variable information print(liver_disorders.variables)
Liver Disorders [Dataset]. (2016). UCI Machine Learning Repository. https://doi.org/10.24432/C54G67.
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.