Liver Disorders Data Set
Below are papers that cite this data set, with context shown.
Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info.
Return to Liver Disorders data set page.
Glenn Fung and M. Murat Dundar and Jinbo Bi and Bharat Rao. A fast iterative algorithm for fisher discriminant using heterogeneous kernels. ICML. 2004.
used in the literature for benchmarking from the UCI Machine Learning Repository (Murphy & Aha, 1992): Ionosphere, Cleveland Heart, Pima Indians, BUPA Liver and Boston Housing. Additionally, a sixth dataset, the colon CAD dataset, relates to colorectal cancer diagnosis using virtual colonoscopy derived from computer tomographic images. We will refer to this dataset as the colon CAD dataset. The
Zhi-Hua Zhou and Yuan Jiang. NeC4.5: Neural Ensemble Based C4.5. IEEE Trans. Knowl. Data Eng, 16. 2004.
ensemble. Moreover, Table III shows that the generalization ability of NeC4.5 with µ = 0% is still better than that of C4.5. In detail, pairwise two-tailed t-tests indicate that there are seven data sets (cleveland, diabetes, ionosphere, liver sonar, waveform21, and waveform40) where NeC4.5 with µ = 0% is significantly more accurate than C4.5, while there is no significant difference on the
Yuan Jiang and Zhi-Hua Zhou. Editing Training Data for kNN Classifiers with Neural Network Ensemble. ISNN (1). 2004.
of five hidden units. Therefore here the approach is denoted as NNEE(5,5). Table 6 shows that the NNEE approach achieves the best editing e®ect. In detail, it obtains the best performance on seven data sets, i.e. annealing, credit, liver pima, soybean, wine and zoo. RemoveOnly obtains the best performance on three data sets, i.e. glass, hayes-roth and wine. It is surprising that Depuration obtains
Xavier Llor and David E. Goldberg and Ivan Traus and Ester Bernad i Mansilla. Accuracy, Parsimony, and Generality in Evolutionary Learning Systems via Multiobjective Selection. IWLCS. 2002.
were obtained from the UCI repository (Merz & Murphy, 1998). We chose seven data sets: Bupa Liver Disorders (bpa), Wisconsin Breast Cancer (bre), Glass (gls), Ionosphere (ion), Iris (irs), Primary Tumor (prt), and Sonar (son). These data sets contain categorical and numeric
Jochen Garcke and Michael Griebel. Classification with sparse grids using simplicial basis functions. Intell. Data Anal, 6. 2002.
% 0.8 20.6 3 train 91.4 % 194.1 86.6 % 9.6 88.0 % 10.8 test 70.8 % 69.9 % 70.5 % 4 train 92.6 % 1217.6 94.2 % 68.3 93.1 % 75.4 test 68.8 % 71.4 % 70.5 % Table 4: Results for the BUPA liver disorders data set basis functions. Note that a testing correctness of 90.6 % and 91.1 % was achieved with neural networks in  and , respectively, for this data set. 3.2 Small data sets 3.2.1 BUPA Liver The
Michail Vlachos and Carlotta Domeniconi and Dimitrios Gunopulos and George Kollios and Nick Koudas. Non-linear dimensionality reduction techniques for classification and visualization. KDD. 2002.
The average error rates for the smaller data sets (i.e., Iris, Sonar, Glass, Liver and Lung) were based on leave-oneout cross-validation, and the error rates for Image and Vowel were based on ten two-fold-cross-validation, as summarized in Table
Jochen Garcke and Michael Griebel and Michael Thess. Data Mining with Sparse Grids. Computing, 67. 2001.
# = 0:01 20 3.2 6-dimensional problems 3.2.1 BUPA Liver The BUPA Liver Disorders data set from Irvine Machine Learning Database Repository  consists of 345 data points with 6 features plus a selector field used to split the data into 2 sets with 145 instances and 200 instances
Jochen Garcke and Michael Griebel. Data mining with sparse grids using simplicial basis functions. KDD. 2001.
combination technique with linear basis functions. Left: level 4, # = 0.0035. Right: level 8, # = 0.0037 90.6 % and 91.1 % was achieved in  and , respectively, for this data set. 3.2 6-dimensional problems 3.2.1 BUPA Liver The BUPA Liver Disorders data set from Irvine Machine Learning Database Repository  consists of 345 data points with 6 features and a selector field
Petri Kontkanen and Jussi Lahtinen and Petri Myllymäki and Henry Tirri. Unsupervised Bayesian visualization of high-dimensional data. KDD. 2000.
experimental results confirm this hypothesis: in cases where the leave-one-out crossvalidated classification accuracy of the NB classifier is poor in the absolute sense (as with the Liver Disorders data set), or in the relative sense with respect to the default classification accuracy (as with the Postoperative Patient data set), the class labeled colored images are somewhat blurred. Nevertheless,
Carlotta Domeniconi and Jing Peng and Dimitrios Gunopulos. An Adaptive Metric Machine for Pattern Classification. NIPS. 2000.
consists of q = 16 numerical attributes and J = 26 classes; 8. Liver data. This data set consists of 345 instances, represented by q = 6 numerical attributes, and J = 2 classes; and 9. Lung data. This example has 32 instances having q = 56 numerical features and J = 3 classes. Results:
Guido Lindner and Rudi Studer. AST: Support for Algorithm Selection with a CBR Approach. PKDD. 1999.
with the results of C5.0 on these datasets which are glass2 and liver from the UCI repository. Besides the results of applying the different algorithms on these applications the user gets a degree of similarity of the selected cases. This
Iñaki Inza and Pedro Larrañaga and Basilio Sierra and Ramon Etxeberria and Jose Antonio Lozano and Jos Manuel Peña. Representing the behaviour of supervised classification learning algorithms by Bayesian networks. Pattern Recognition Letters, 20. 1999.
treatment is done for unknown values, exploiting each algorithm its own characteristics. PEBLS and HOODG algorithms are not able to handle unknown values: thus, they are only used in the four datasets without unknown values (diabetes, heart, liver and lymphography). For each database and algorithm, a classification model is induced using the specified training set: when run with fixed default
Kristin P. Bennett and Erin J. Bredensteiner. A Parametric Optimization Method for Machine Learning. INFORMS Journal on Computing, 9. 1997.
is also available via anonymous ftp from the UCI Repository Of Machine Learning Databases [MA92]. BUPA liver disorders The BUPA dataset contains 345 single male patients with 6 numeric attributes. Five of these attributes are blood tests which are thought to be relevant to liver disorders. The sixth attribute corresponds to the
Jennifer A. Blue and Kristin P. Bennett. Hybrid Extreme Point Tabu Search. Department of Mathematical Sciences Rensselaer Polytechnic Institute. 1996.
available via anonymous ftp from the Machine Learning Repository at the University of California at Irvine . The datasets are: the BUPA Liver Disease dataset (Liver); the PIMA Indians Diabetes dataset (Diabetes), the Wisconsin Breast Cancer Database (Cancer) , and the Cleveland Heart Disease Database (Heart) .
Peter D. Turney. Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm. CoRR, csAI/9503102. 1995.
reasonable to assume that other areas have similar relative test costs. For our purposes, the relative costs are important, not the absolute costs. A.1 BUPA Liver Disorders The BUPA Liver Disorders dataset was created by BUPA Medical Research Ltd. and it was donated to the Irvine collection by Richard Forsyth. 19 Table 15 shows the test costs for the BUPA Liver Disorders dataset. The tests in group A
Gabor Melli. A Lazy Model-Based Approach to On-Line Classification. University of British Columbia. 1989.
DBPredictor achieved a higher error rate on five datasets: liver disease, hepatitis, heart-c, credit-g,andechocardiogram. Based on this evidence pruning appears to significantly lower DBPredictor's vulnerability to overspecialization. CHAPTER 7. EMPIRICAL
Adil M. Bagirov and John Yearwood. A new nonsmooth optimization algorithm for clustering. Centre for Informatics and Applied Optimization, School of Information Technology and Mathematical Sciences, University of Ballarat.
that these values of c allow significant reduction in the number of instances and CPU time. From the results presented in Table 3 we can conclude that appropriate values of c for the liver disorder data set are c 2 [0, 2]. Dierences in the number of clusters when c 2 [0, 2] arise because of small clusters which contain less than 5 % of all instances. Thus using results of numerical experiments on these
H. Altay G uvenir and Aynur Akkus. WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS. Department of Computer Engineering and Information Science Bilkent University.
Data Set: cleveland glass horse hungarian iris liver sonar wine No. of Instances 303 214 368 294 150 345 208 178 No. of Features 13 9 22 13 4 6 60 13 No. of Classes 2 6 2 2 3 2 2 3 No. of Missing values 6 0
C. Titus Brown and Harry W. Bullen and Sean P. Kelly and Robert K. Xiao and Steven G. Satterfield and John G. Hagedorn and Judith E. Devaney. Visualization and Data Mining in an 3D Immersive Environment: Summer Project 2003.
either upgrading the system to handle a larger number of polygons or creating a series of graphs using different randomly selected data points. 27 Figure 4.5: A long shot of the full tree cover data set display. 28 4.6 BUPA Liver Disorder This data set was analysed by Robert Xiao. Overview This data set consisted of 6 attributes: the results of 5 different blood tests which were thought to be
David R. Musicant. DATA MINING VIA MATHEMATICAL PROGRAMMING AND MACHINE LEARNING. Doctor of Philosophy (Computer Sciences) UNIVERSITY.
from the University of California at Irvine (UCI) repository : . The liver disorders dataset contains 345 points, each consisting of six features. Class 1 contains 145 points, and class -1 contains 200 points. . The letter-recognition dataset is used for recognizing letters of the alphabet.
Aynur Akku and H. Altay Guvenir. Weighting Features in k Nearest Neighbor Classification on Feature Projections. Department of Computer Engineering and Information Science Bilkent University.
significantly. This should be because all the features are equally relevant. On the cleveland, liver iris and glass (except k = 1) datasets, the weights learned by the individual accuracies always performed significantly better than the others. The weight learning method based on the homogeneity performed better than the other on the
Greg Ridgeway. The State of Boosting. Department of Statistics University of Washington.
to examine the performance of boosting Cox's proportional hazards model, I turn to a clinical trial for testing the drug DPCA for the treatment of primary biliary cirrhosis of the liver (PBC). This dataset has been the subject of several modern data analyses (Fleming and Harrington 1991). I tested this method by comparing the out-of-sample predictive performance of the linear Cox model to the Cox
Adil M. Bagirov and Alex Rubinov and A. N. Soukhojak and John Yearwood. Unsupervised and supervised data classification via nonsmooth and global optimization. School of Information Technology and Mathematical Sciences, The University of Ballarat.
Australian credit dataset, Diabetes dataset, Liver disorder dataset and Vehicle dataset. The description of these datasets can be found in Appendix. We studied these datasets, using different subsets of features and