Hepatitis
Donated on 10/31/1988
From G.Gong: CMU; Mostly Boolean or numeric-valued attribute types; Includes cost data (donated by Peter Turney)
Dataset Characteristics
Multivariate
Subject Area
Health and Medicine
Associated Tasks
Classification
Feature Type
Categorical, Integer, Real
# Instances
155
# Features
19
Dataset Information
Additional Information
Please ask Gail Gong for further information on this database.
Has Missing Values?
Yes
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
Class | Target | Categorical | no | ||
Age | Feature | Integer | no | ||
Sex | Feature | Categorical | no | ||
Steroid | Feature | Categorical | yes | ||
Antivirals | Feature | Categorical | no | ||
Fatigue | Feature | Categorical | yes | ||
Malaise | Feature | Categorical | yes | ||
Anorexia | Feature | Categorical | yes | ||
Liver Big | Feature | Categorical | yes | ||
Liver Firm | Feature | Categorical | yes |
0 to 10 of 20
Additional Variable Information
1. Class: DIE, LIVE 2. AGE: 10, 20, 30, 40, 50, 60, 70, 80 3. SEX: male, female 4. STEROID: no, yes 5. ANTIVIRALS: no, yes 6. FATIGUE: no, yes 7. MALAISE: no, yes 8. ANOREXIA: no, yes 9. LIVER BIG: no, yes 10. LIVER FIRM: no, yes 11. SPLEEN PALPABLE: no, yes 12. SPIDERS: no, yes 13. ASCITES: no, yes 14. VARICES: no, yes 15. BILIRUBIN: 0.39, 0.80, 1.20, 2.00, 3.00, 4.00 -- see the note below 16. ALK PHOSPHATE: 33, 80, 120, 160, 200, 250 17. SGOT: 13, 100, 200, 300, 400, 500, 18. ALBUMIN: 2.1, 3.0, 3.8, 4.5, 5.0, 6.0 19. PROTIME: 10, 20, 30, 40, 50, 60, 70, 80, 90 20. HISTOLOGY: no, yes The BILIRUBIN attribute appears to be continuously-valued. I checked this with the donater, Bojan Cestnik, who replied: About the hepatitis database and BILIRUBIN problem I would like to say the following: BILIRUBIN is continuous attribute (= the number of it's "values" in the ASDOHEPA.DAT file is negative!!!); "values" are quoted because when speaking about the continuous attribute there is no such thing as all possible values. However, they represent so called "boundary" values; according to these "boundary" values the attribute can be discretized. At the same time, because of the continious attribute, one can perform some other test since the continuous information is preserved. I hope that these lines have at least roughly answered your question.
Baseline Model Performance
Dataset Files
File | Size |
---|---|
hepatitis.data | 7.4 KB |
hepatitis.names | 3 KB |
costs/hepatitis.README | 2.1 KB |
costs/hepatitis.expense | 415 Bytes |
costs/hepatitis.delay | 405 Bytes |
0 to 5 of 9
Papers Citing this Dataset
Sort by Year, desc
By Rosaida Rosly, Mokhairi Makhtar, Mohd Awang, Mohd Awang, Mohd Rahman. 2018
Published in International Journal of Engineering & Technology.
By Mateusz Lango, Jerzy Stefanowski. 2017
Published in Journal of Intelligent Information Systems.
By Waldemar Koczkodaj, Alicja Wolny-Dominiak. 2017
Published in The R Journal.
By Henrik Linusson, Ulf Johansson, Henrik Boström, Tuve Löfström. 2016
Published in PAKDD.
By Huaping Guo, Weimei Zhi, Hongbing Liu, Mingliang Xu. 2016
Published in Computational intelligence and neuroscience.
0 to 5 of 12
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset hepatitis = fetch_ucirepo(id=46) # data (as pandas dataframes) X = hepatitis.data.features y = hepatitis.data.targets # metadata print(hepatitis.metadata) # variable information print(hepatitis.variables)
Hepatitis [Dataset]. (1983). UCI Machine Learning Repository. https://doi.org/10.24432/C5Q59J.
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.