Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

Audiology (Standardized) Data Set
Download: Data Folder, Data Set Description

Abstract: Standardized version of the original audiology database

Data Set Characteristics:  


Number of Instances:




Attribute Characteristics:


Number of Attributes:


Date Donated


Associated Tasks:


Missing Values?


Number of Web Hits:



Original Version:

(a) Original Owner: Professor Jergen at Baylor College of Medicine
(b) Donor: Bruce Porter (porter '@' fall.cs.utexas.EDU)

Standardized Version:

(a) Donor: Ross Quinlan

Data Set Information:

This database is a standardized version of the original audiology database (see audiology.* in this directory). The non-standard set of attributes have been converted to a standard set of attributes according to the rules that follow.

* Each property that appears anywhere in the original .data or .test file has been represented as a separate attribute in this file.

* A property such as age_gt_60 is represented as a boolean attribute with values f and t.

* In most cases, a property of the form x(y) is represented as a discrete attribute x() whose possible values are the various y's; air() is an example. There are two exceptions:
** when only one value of y appears anywhere, e.g. static(normal). In this case, x_y appears as a boolean attribute.
** when one case can have two or more values of x, e.g. history(..). All possible values of history are treated as separate boolean attributes.

* Since boolean attributes only appear as positive conditions, each boolean attribute is assumed to be false unless noted as true. The value of multi-value discrete attributes taken as unknown ("?") unless a value is specified.

* The original case identifications, p1 to p200 in the .data file and t1 to t26 in the .test file, have been added as a unique identifier attribute.

[Note: in the original .data file, p165 has a repeated specification of o_ar_c(normal); p166 has repeated specification of speech(normal) and conflicting values air(moderate) and air(mild). No other problems with the original data were noted.]

Attribute Information:

age_gt_60: f, t.
air(): mild,moderate,severe,normal,profound.
airBoneGap: f, t.
ar_c(): normal,elevated,absent.
ar_u(): normal,absent,elevated.
bone(): mild,moderate,normal,unmeasured.
boneAbnormal: f, t.
bser(): normal,degraded.
history_buzzing: f, t.
history_dizziness: f, t.
history_fluctuating: f, t.
history_fullness: f, t.
history_heredity: f, t.
history_nausea: f, t.
history_noise: f, t.
history_recruitment: f, t.
history_ringing: f, t.
history_roaring: f, t.
history_vomiting: f, t.
late_wave_poor: f, t.
m_at_2k: f, t.
m_cond_lt_1k: f, t.
m_gt_1k: f, t.
m_m_gt_2k: f, t.
m_m_sn: f, t.
m_m_sn_gt_1k: f, t.
m_m_sn_gt_2k: f, t.
m_m_sn_gt_500: f, t.
m_p_sn_gt_2k: f, t.
m_s_gt_500: f, t.
m_s_sn: f, t.
m_s_sn_gt_1k: f, t.
m_s_sn_gt_2k: f, t.
m_s_sn_gt_3k: f, t.
m_s_sn_gt_4k: f, t.
m_sn_2_3k: f, t.
m_sn_gt_1k: f, t.
m_sn_gt_2k: f, t.
m_sn_gt_3k: f, t.
m_sn_gt_4k: f, t.
m_sn_gt_500: f, t.
m_sn_gt_6k: f, t.
m_sn_lt_1k: f, t.
m_sn_lt_2k: f, t.
m_sn_lt_3k: f, t.
middle_wave_poor: f, t.
mod_gt_4k: f, t.
mod_mixed: f, t.
mod_s_mixed: f, t.
mod_s_sn_gt_500: f, t.
mod_sn: f, t.
mod_sn_gt_1k: f, t.
mod_sn_gt_2k: f, t.
mod_sn_gt_3k: f, t.
mod_sn_gt_4k: f, t.
mod_sn_gt_500: f, t.
notch_4k: f, t.
notch_at_4k: f, t.
o_ar_c(): normal,elevated,absent.
o_ar_u(): normal,absent,elevated.
s_sn_gt_1k: f, t.
s_sn_gt_2k: f, t.
s_sn_gt_4k: f, t.
speech(): normal,good,very_good,very_poor,poor,unmeasured.
static_normal: f, t.
tymp(): a,as,b,ad,c.
viith_nerve_signs: f, t.
wave_V_delayed: f, t.
waveform_ItoV_prolonged: f, t.
indentifier (unique for each instance)


Relevant Papers:

Bareiss, E. Ray, & Porter, Bruce (1987). Protos: An Exemplar-Based Learning Apprentice. In the Proceedings of the 4th International Workshop on Machine Learning, 12-23, Irvine, CA: Morgan Kaufmann.
[Web Link]

Papers That Cite This Data Set1:

Marcus Hutter and Marco Zaffalon. Distribution of Mutual Information from Complete and Incomplete Data. CoRR, csLG/0403025. 2004. [View Context].

Vassilis Athitsos and Stan Sclaroff. Boosting Nearest Neighbor Classifiers for Multiclass Recognition. Boston University Computer Science Tech. Report No, 2004-006. 2004. [View Context].

Richard Nock and Marc Sebban and David Bernard. A SIMPLE LOCALLY ADAPTIVE NEAREST NEIGHBOR RULE WITH APPLICATION TO POLLUTION FORECASTING. International Journal of Pattern Recognition and Artificial Intelligence Vol. 2003. [View Context].

Alexander K. Seewald. How to Make Stacking Better and Faster While Also Taking Care of an Unknown Weakness. ICML. 2002. [View Context].

Wai Lam and Kin Keung and Charles X. Ling. PR 1527. Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong. 2001. [View Context].

Alexander K. Seewald and Johann Petrak and Gerhard Widmer. Hybrid Decision Tree Learners with Alternative Leaf Classifiers: An Empirical Study. FLAIRS Conference. 2001. [View Context].

Jihoon Yang and Rajesh Parekh and Vasant Honavar. DistAl: An inter-pattern distance-based constructive learning algorithm. Intell. Data Anal, 3. 1999. [View Context].

Mark A. Hall. Department of Computer Science Hamilton, NewZealand Correlation-based Feature Selection for Machine Learning. Doctor of Philosophy at The University of Waikato. 1999. [View Context].

Pedro Domingos. Unifying Instance-Based and Rule-Based Induction. Machine Learning, 24. 1996. [View Context].

Thomas G. Dietterich and Ghulum Bakiri. Solving Multiclass Learning Problems via Error-Correcting Output Codes. CoRR, csAI/9501101. 1995. [View Context].

Geoffrey I. Webb. OPUS: An Efficient Admissible Algorithm for Unordered Search. J. Artif. Intell. Res. (JAIR, 3. 1995. [View Context].

Alexander K. Seewald. Meta-Learning for Stacked Classification. Austrian Research Institute for Artificial Intelligence. [View Context].

Bernhard Pfahringer and Ian H. Witten and Philip Chan. Improving Bagging Performance by Increasing Decision Tree Diversity. Austrian Research Institute for AI. [View Context].

D. Randall Wilson and Roel Martinez. Improved Center Point Selection for Probabilistic Neural Networks. Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms. [View Context].

Alexander K. Seewald. Dissertation Towards Understanding Stacking Studies of a General Ensemble Learning Scheme ausgefuhrt zum Zwecke der Erlangung des akademischen Grades eines Doktors der technischen Naturwissenschaften. [View Context].

Geoffrey I Webb. Learning Decision Lists by Prepending Inferred Rules. School of Computing and Mathematics Deakin University. [View Context].

Mohammed Waleed Kadous and Claude Sammut. The University of New South Wales School of Computer Science and Engineering Temporal Classification: Extending the Classification Paradigm to Multivariate Time Series. [View Context].

Mohammed Waleed Kadous. Expanding the Scope of Concept Learning Using Metafeatures. School of Computer Science and Engineering, University of New South Wales. [View Context].

Jerome H. Friedman and Ron Kohavi and Youngkeol Yun. To appear in AAAI-96 Lazy Decision Trees. Statistics Department and Stanford Linear Accelerator Center Stanford University. [View Context].

Citation Request:

WARNING: This database should be credited to the original owner whenever used for any publication whatsoever.

[1] Papers were automatically harvested and associated with this data set, in collaboration with

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML