Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

Lymphography Data Set

Below are papers that cite this data set, with context shown. Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info.

Return to Lymphography data set page.


Marcus Hutter and Marco Zaffalon. Distribution of Mutual Information from Complete and Incomplete Data. CoRR, csLG/0403025. 2004.

on a number of di®erent domains. For example, Shuttle-small reports data on diagnosing failures of the space shuttle; Lymphography and Hypothyroid are medical data sets; Spam is a body of e-mails that can be spam or non-spam; etc. The data sets presenting non-categorical features have been pre-discretized by MLC++ [KJL + 94], default options, i.e. by the common


Michael G. Madden. Evaluation of the Performance of the Markov Blanket Bayesian Classifier Algorithm. CoRR, csLG/0211003. 2002.

with the fewest instances, this procedure was repeated 10 times. For the SPECT and Lymphography datasets, the procedure was repeated 50 times to reduce variability. Prediction accuracy results and standard deviations are reported in Table 2. Following usual conventions, for each dataset the algorithm


Marco Zaffalon and Marcus Hutter. Robust Feature Selection by Mutual Information Distributions. CoRR, csAI/0206006. 2002.

on a number of di®erent domains. For example, Shuttle-small reports data on diagnosing failures of the space shuttle; Lymphography and Hypothyroid are medical data sets; Spam is a body of e-mails that can be spam or non-spam; etc. The data sets presenting non-nominal features have been pre-discretized by MLC++ [KJL + 94], default options. This step may remove some


Thomas G. Dietterich. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Machine Learning, 40. 2000.

were recoded to use discrete values where appropriate. All attributes were treated as continuous in the kingrook-vs-king (krk) data set. In lymphography the lymph-nodes-dimin, lymph-nodes-enlar, and no-of-nodes-in attributes were treated as continuous. In segment, all features were rounded to four significant digits to avoid


Mark A. Hall and Lloyd A. Smith. Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper. FLAIRS Conference. 1999.

were chosen because of the prevalence of nominal features and their predominance in the literature. Three of the datasets (australian, lymphography and horsecolic) contain a few continuous features; the rest contain only nominal features. Fifty runs were done for each machine learning algorithm on each dataset with


Mark A. Hall. Department of Computer Science Hamilton, NewZealand Correlation-based Feature Selection for Machine Learning. Doctor of Philosophy at The University of Waikato. 1999.

symbols to ensure confidentiality of the data. There are six continuous features and nine nominal. The nominal features range from 2 to 14 values. Lymphography (ly) This is a small medical dataset containing 148 instances. The task is to distinguish healthy patients from those with metastases or malignant lymphoma. All 18 features are nominal. This is the one of three medical domains (the


Yk Huhtala and Juha Kärkkäinen and Pasi Porkka and Hannu Toivonen. Efficient Discovery of Functional and Approximate Dependencies Using Partitions. ICDE. 1998.

decreases slightly (Wisconsin breast cancer), or drops significantly (Hepatitis). The drop is even stronger with the Lymphography data set (shown only in the table). Approximate dependencies could not be discovered in the Adult data set with TANE/MEM due to the lack of main memory. To find out how the number of rows affects the


Prototype Selection for Composite Nearest Neighbor Classifiers. Department of Computer Science University of Massachusetts. 1997.

for the Composite Fitness--Feature Selection algorithm. : : : : : 107 4.9 Relationships between component accuracy and diversity for the Glass Recognition, LED-24 Digit, Lymphography and Soybean data sets for the four boosting algorithms. "c" represents the Coarse Reclassification algorithm; "d", Deliberate Misclassification; "f ", Composite Fitness; and "s" Composite Fitness--Feature Selection. : :


Pedro Domingos. Control-Sensitive Feature Selection for Lazy Learners. Artif. Intell. Rev, 11. 1997.

used in the empirical study, in particular M. Zwitter and M. Soklic of the University Medical Centre, Ljubljana, for supplying the lymphography breast cancer and primary tumor datasets, and Robert Detrano, of the V.A. Medical Center, Long Beach and Cleveland Clinic Foundation, for supplying the heart disease dataset. Please see the documentation in the UCI Repository for detailed


Geoffrey I. Webb. OPUS: An Efficient Admissible Algorithm for Unordered Search. J. Artif. Intell. Res. (JAIR, 3. 1995.

of the search space below a poor choice of node can do much to minimize the damage done by that poor choice, even when there is no backtracking as is the case for depth-first search. For five data sets (House Votes 84, Lymphography Mushroom, Primary Tumor and Soybean Large), disabling optimistic pruning has little e®ect under best-first search. Disabling optimistic pruning always has large e®ect


M. A. Galway and Michael G. Madden. DEPARTMENT OF INFORMATION TECHNOLOGY technical report NUIG-IT-011002 Evaluation of the Performance of the Markov Blanket Bayesian Classifier Algorithm. Department of Information Technology National University of Ireland, Galway.

with the fewest instances, this procedure was repeated 10 times. For the SPECT and Lymphography datasets, the procedure was repeated 50 times to reduce variability. Prediction accuracy results and standard deviations are reported in Table 2. Following usual conventions, for each dataset the algorithm


Geoffrey I Webb. Learning Decision Lists by Prepending Inferred Rules. School of Computing and Mathematics Deakin University.

supported by the Australian Research Council. I am grateful to Mike Cammeron-Jones for discussions that helped refine the ideas presented herein. The Breast Cancer, Lymphography and Primary Tumor data sets were compiled by M. Zwitter and M. Soklic at University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. The Audiology data set was compiled by Professor Jergen at Baylor College of


Return to Lymphography data set page.

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML