Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

Glass Identification Data Set

Below are papers that cite this data set, with context shown. Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info.

Return to Glass Identification data set page.


Ping Zhong and Masao Fukushima. A Regularized Nonsmooth Newton Method for Multi-class Support Vector Machines. 2005.

the starting point of the next (k + 1)th iteration. The parameters º 1 and º 2 in (3) are both set 0.01. In Algorithm 3.1, we replaced the standard Armijo-rule in (S.3) by 10 Table 1: Six benchmark datasets from UCI name iris wine glass vowel vehicle segment #pts 150 178 214 528 846 2310 {fiats|flats} 4 13 9 10 18 19 #cls 3 3 6 11 4 7 #pts: the number of training data; {fiats|flats}: the number of


Vassilis Athitsos and Stan Sclaroff. Boosting Nearest Neighbor Classifiers for Multiclass Recognition. Boston University Computer Science Tech. Report No, 2004-006. 2004.

used in the experiments, largely copied from (Allwein et al., 2000). Dataset Train Test Attributes Classes glass 214 - 9 6 isolet 6238 1559 617 26 letter 16000 4000 16 26 pendigits 7494 3498 16 10 satimage 4435 2000 36 6 segmentation 2310 - 19 7 vowel 528 462 10 11 yeast


Yuan Jiang and Zhi-Hua Zhou. Editing Training Data for kNN Classifiers with Neural Network Ensemble. ISNN (1). 2004.

i.e. annealing, credit, liver, pima, soybean, wine and zoo. RemoveOnly obtains the best performance on three data sets, i.e. glass hayes-roth and wine. It is surprising that Depuration obtains the best performance on only one data set, i.e. iris, as RelabelOnly does. These observations indicate that NNEE is a


S. Augustine Su and Jennifer G. Dy. Automated hierarchical mixtures of probabilistic principal component analyzers. ICML. 2004.

5 wine -3-8 .478 .627 8 2 .623 .722 4 Interestingly, the mixture of PPCA approach is not always better than PCA+EM. Mixtures of PPCA performed better than PCA+EM in terms of NMI and FM on most small datasets (toy, oil, and glass , which are well modeled by mixtures of Gaussians. PPCA with fewer clusters has a comparable performance with EM + PCA on the large data sets (optical digits, satellite image,


Xiaoli Z. Fern and Carla Brodley. Solving cluster ensemble problems by bipartite graph partitioning. ICML. 2004.

a fair comparison. 7. Experimental Results The goal of the experiments is to evaluate the three graph formulations - IBGF, CBGF and HBGF - given different cluster ensembles. Table 1. Summary of the data sets data set eos glass hrct isolet6 modis #inst. 2398 214 1545 1440 4975 #class 8 6 8 6 10 org. dim. 20 9 183 617 112 rp dim. 5 5 10 10 6 pca dim. --- --- 30 60 6 7.1. Data Sets and Parameter Settings


Francesco Masulli. An experimental analysis of the dependence among codeword bit errors in ECOC learning machines. and Giorgio Valentini b,c. 2003.

machine, varying the number of hidden units between 5 to 50, yielding to 11×20 = 220 evaluations of I E , I SE , ©R and © S both for ECOC monolithic and ECOC PND learning machines. For the UCI data sets glass letter and optdigits we used only 2 di®erent structures, using, respectively, 5 and 9, 120 and 140, 60 and 70 hidden units, yielding to 2 × 20 = 40 evaluations of the mutual information


Giorgio Valentini and Francesco Masulli. NEURObjects: an object-oriented library for neural network development. Neurocomputing, 48. 2002.

validation [35]. The folds can be prepared using the program dofold in a simple way: dofold glass data -nf 10 -na 9 -name glass This command build the folds for a ten fold cross validation test. The data set is glass from the UCI Machine Learning repository [29]. Ten folds from the data file glass.data (named glass.#.train and glass.#.test varying i from 1 to 10) are extracted (the option -na specifies


Krzysztof Krawiec. Genetic Programming-based Construction of Features for Machine Learning and Knowledge Discovery Tasks. Institute of Computing Science, Poznan University of Technology. 2002.

in favor of feature construction is usually statistically relevant. Note also that positive results have been obtained for both real-world problems (Crx, Diabetes and Glass as well as artificial datasets, which were intentionally designed to test the usefulness of feature construction methods [34]. Although the increases in accuracy of classification are not always impressive, the feature


Michail Vlachos and Carlotta Domeniconi and Dimitrios Gunopulos and George Kollios and Nick Koudas. Non-linear dimensionality reduction techniques for classification and visualization. KDD. 2002.

The average error rates for the smaller data sets (i.e., Iris, Sonar, Glass Liver, and Lung) were based on leave-oneout cross-validation, and the error rates for Image and Vowel were based on ten two-fold-cross-validation, as summarized in Table


D. I. S I and Francesco Masulli and Giorgio Valentini and D. I. S. Universit#a di Genova. Dipartimento di Informatica e Scienze dell' Informazione. 2001.

machine, varying the number of hidden units between 5 to 50, yielding to 11 # 20 = 220 evaluations of I E ; I SE ; #R and # S both for ECOC monolithic and ECOC PND learning machines. On the UCI data sets glass letter and 18 optdigits we have used only 2 different structures, using, respectively, 5 and 9, 120 and 140, 60 and 70 hidden units, yielding to 2 # 20 = 40 evaluations of the mutual


Carlotta Domeniconi and Jing Peng and Dimitrios Gunopulos. An Adaptive Metric Machine for Pattern Classification. NIPS. 2000.

N = 208 data of J = 2 classes (``mines'' and ``rocks''); 3. Vowel data. This example has q = 10 measurements and 11 classes. There are total of N = 528 samples in this example; 4. Glass data. This data set consists of q = 9 chemical attributes measured for each of N = 214 data of J = 6 classes; 5. Image data. This data set consists of 40 texture images that are manually classified into 15 classes. The


Mark A. Hall. Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning. ICML. 2000.

In the case of CFS, a discretized copy of each training split was made for it to operate on. The same folds were used for each feature selector{learning scheme combination. Table 1. Discrete class data sets. Data Set Instances Num. Nom. Classes 1 glass 2 163 9 0 2 2 anneal 898 6 32 5 3 breast-c 286 0 9 2 4 credit-g 1000 7 13 2 5 diabetes 768 8 0 2 6 horse colic 368 7 15 2 7 heart-c 303 6 7 2 8


Petri Kontkanen and Petri Myllym and Tomi Silander and Henry Tirri and Peter Gr. On predictive distributions and Bayesian networks. Department of Computer Science, Stanford University. 2000.

used show very similar behavior. As an illustrative example, in Figure 4 the average log-scores obtained are plotted in the Hepatitis and Glass dataset cases. Again, the EVU and EVJ approaches are quite robust in the sense that they predict quite well even with small training sets. This shows that the data sets used here are quite redundant, and


Thierry Denoeux. A neural network classifier based on Dempster-Shafer theory. IEEE Transactions on Systems, Man, and Cybernetics, Part A, 30. 2000.

analysis techniques [13]. As shown in Table I, our approach with at least three prototypes per class dominates the other techniques for this classification task. 2) Forensic Glass Data: This data set contains the description of 214 fragments of glass [17] originally collected for a study in the context of criminal investigation. Each fragment has a measured reflectivity index and chemical


Francesco Masulli and Giorgio Valentini. Effectiveness of Error Correcting Output Codes in Multiclass Learning Problems. Multiple Classifier Systems. 2000.

effective for PND classi#ers rather than monolithic MLP classifiers. Hypothesis 2 In PLD error recovering induced by ECOC is counter-balanced by the higher error rate of the dichotomizers. Table 1. Data sets general features. The data sets glass letter and optdigits data sets are from the UCI repository [16]. Data set Number of Number of Number of Number of attributes classes training samples testing


Nir Friedman and Iftach Nachman. Gaussian Process Networks. UAI. 2000.

contains 4177 samples with 9 attributes. 300 samples were used as a test set. # Glass identification data set - a data describing the material concentrations in glasses, with a class attribute denoting the type of the glass. The data set contains 214 samples with 10 attributes. 64 samples were used as a


Kai Ming Ting and Ian H. Witten. Issues in Stacked Generalization. J. Artif. Intell. Res. (JAIR, 10. 1999.

Note that stacking performs very poorly on Glass and Ionosphere, two small real-world datasets. This is not surprising, because cross-validation inevitably produces poor estimates for small datasets. 4.2 Discussion Like bagging, stacking is ideal for parallel computation. The construction of


Christopher J. Merz. Using Correspondence Analysis to Combine Classifiers. Machine Learning, 36. 1999.

The resulting weighting schemes reversed this effect by counting a vote for one of the confused classes as a vote for the other, and vice versa. PV performs well on the glass lymph and wave data sets where the errors of the learned models are measured (using the statistic) to be fairly uncorrelated. Here, SCANN performs similarly to PV, but S-BP and S-Bayes (except for wave) appear to be


Eibe Frank and Ian H. Witten. Generating Accurate Rule Sets Without Global Optimization. ICML. 1998.

listed in Table 2. They give the percentage of correct classifications, averaged over ten ten-fold cross-validation runs, and standard 3 Following Holte (Holte, 1993), the G2 variant of the glass dataset has classes 1 and 3 combined and classes 4 to 7 deleted, and the horse-colic dataset has attributes 3, 25, 26, 27, 28 deleted with attribute 24 being used as the class. We also deleted all


Ethem Alpaydin. Voting over Multiple Condensed Nearest Neighbors. Artif. Intell. Rev, 11. 1997.

do not contribute much. Whether an additional subset pays off the additional complexity and memory is a trade-off that needs to be resolved depending on the particular application at hand. In three datasets, VOWEL, THYROID, and GLASS we do not seem to gain anything by voting. The VOWEL database defines a quite difficult problem and is very noisy; the optimal k is 7. The result with 7-NN is the


Jan C. Bioch and D. Meer and Rob Potharst. Bivariate Decision Trees. PKDD. 1997.

with the standard error. From these table we can conclude 10 name cases attr classes glass 214 9 6 diabetes(pima) 768 8 2 breast cancer 699 9 2 heart 270 13 2 wave 300 21 3 Table 1: Summary of the Datasets method glass diabetes cancer heart wave BIT1 65.3Sigma1:1 74.3Sigma0:7 95.4Sigma0:3 78.5Sigma0:3 76.1Sigma1:3 6.2Sigma2:1 5.2Sigma2:5 2.8Sigma0:2 4.1Sigma0:5 5.0Sigma1:6 BIT2


D. Greig and Hava T. Siegelmann and Michael Zibulevsky. A New Class of Sigmoid Activation Functions That Don't Saturate. 1997.

(3 hidden nodes) the ø values (0:5; 1:5; 2:5) were used, for the glass data set (6 hidden nodes), the values (0:5; 1:0; 1:5; 2:0; 2:5; 3:0) were used, and for the bodyfat data set (7 hidden nodes) the values (0:5; 1:0; 1:5; 2:0; 2:5; 3:0; 3:5) were used. The results for these


Christopher J. Merz. Combining Classifiers Using Correspondence Analysis. NIPS. 1997.

with error rates around 80 percent. This empirically demonstrates PV's known sensitivity to learned models with highly correlated errors. On the other hand, PV performs well on the glass and wave data sets where the errors of the learned models are measured to be fairly uncorrelated. Here, SCANN performs similarly to PV, but S-BP and S-Bayes appear to be overfitting by making erroneous predictions


Prototype Selection for Composite Nearest Neighbor Classifiers. Department of Computer Science University of Massachusetts. 1997.

there is no sense of positive and negative examples. For example, in the Glass Recognition data set, there are six classes and therefore six prototypes will be selected. There, the classes correspond to the source and manufacturing process of glass fragments for crime scene analysis: building


Georg Thimm and E. Fiesler. Optimal Setting of Weights, Learning Rate, and Gain. E S E A R C H R E P R O R T I D I A P. 1997.

Multilayer perceptrons behave similarly, as shown in figure 4, as confirmed by experiments performed with the Solar, Wine, Glass and Servo data sets. The most important difference with high order perceptrons is that the networks do not or only very slowly converge for weight variances close to zero. Such variances should therefore not be used


Richard Maclin and David W. Opitz. An Empirical Evaluation of Bagging and Boosting. AAAI/IAAI. 1997.

that can be drawn from the results is that both the Simple Ensemble and Bagging approaches almost always produces better performance than just training a single classifier. For some of these data sets (e.g., glass kr-vs-kp, letter, segmentation, soybean, and vehicle) the gains in performance are quite significant. One aspect many of these data sets share is that they involve predictions for


Aynur Akkus and H. Altay Güvenir. K Nearest Neighbor Classification on Feature Projections. ICML. 1996.

Many Irrelevant Features, Proceedings of the Ninth National Conference on Artificial Intelligence, 547-552. Dasarathy, B. V., (1990). Nearest Neighbor (NN) Table 2: Comparison on some real-world datasets. Data Set: bcancerw cleveland glass hungarian ionosphere iris musk wine No. of Instances 273 303 214 294 351 150 476 178 No. of Features 9 13 9 13 34 4 166 13 No. of Classes 2 2 6 2 2 3 2 3 No. of


Ron Kohavi and Mehran Sahami. Error-Based and Entropy-Based Discretization of Continuous Features. KDD. 1996.

to discretize using Ent-MDL were Sick-euthyroid and Hypothyroid, which each took about 31 seconds per fold on an SGI Challenge. The longest running time for ErrorMin was encountered with the Glass dataset which took 153 seconds per fold to discretize, although this was much longer than any other of the datasets examined. The ErrorMin method could not be run on the Letter domain with 300MB of main


Jitender S. Deogun and Vijay V. Raghavan and Hayri Sever. Exploiting Upper Approximation in the Rough Set Methodology. KDD. 1995.

does not match to known concepts we use 5NNR classification scheme with Euclidean distance function to determine the closest known concept. The difference between two values of an attribute are data set Size Attr. Training Test 1. Glass 9 66 148 2. Breast cancer 9 211 488 3. Parity 5+10 15 226 524 4. Iris 4 45 105 5. Monk 1 6 124 432 6. Monk 2 6 169 432 7. Monk 3 6 122 432 8. Vote 16 132 303 9.


Thomas G. Dietterich and Ghulum Bakiri. Solving Multiclass Learning Problems via Error-Correcting Output Codes. CoRR, csAI/9501101. 1995.

Table 4 summarizes the data sets employed in the study. The glass vowel, soybean, audiologyS, ISOLET, letter, and NETtalk data sets are available from the Irvine Repository of machine learning databases (Murphy & Aha, 1994). 1


Suresh K. Choubey and Jitender S. Deogun and Vijay V. Raghavan and Hayri Sever. A comparison of feature selection algorithms in the context of rough classifiers.

different and zero otherwise, and the difference between two quantitative values is normalized into the interval [0,1]. We first consider results from Table 2. Except for Glass Monks, and Hepatitis data sets, the performance obtained in Predictive Experiments approach those in the case of Upperbound Experiments. This suggests that for Glass, Monks, and Hepatitis data data set Size No. of Attributes


Stefan Aeberhard and Danny Coomans and De Vel. THE PERFORMANCE OF STATISTICAL PATTERN RECOGNITION METHODS IN HIGH DIMENSIONAL SETTINGS. James Cook University.

resonant frequency easily obtainable using NMR, constituting the 19 variables measured. With 19 dimensions and 13 training samples per class, this problem is ill-posed. Glass Types Data This data set is from [15]. It summarises a chemical analysis done on two types (classes) of glass. Glass which was float-processed and such which was not. The data is ten dimensional and well-posed, with 87


Chih-Wei Hsu and Cheng-Ru Lin. A Comparison of Methods for Multi-class Support Vector Machines. Department of Computer Science and Information Engineering National Taiwan University.

section we present experimental results on several problems from the Statlog collection [20] and the UCI Repository of machine learning databases [1]. From UCI Repository we choose the following datasets: iris, wine, glass and vowel. Those problems had already been tested in [27]. From Statlog collection we choose all multi-class datasets: vehicle, segment, dna, satimage, letter, and shuttle. Note


C. Titus Brown and Harry W. Bullen and Sean P. Kelly and Robert K. Xiao and Steven G. Satterfield and John G. Hagedorn and Judith E. Devaney. Visualization and Data Mining in an 3D Immersive Environment: Summer Project 2003.

whole museum from abovelack of support for numeric targets by InfoGain. 50 4.14 glass The glass data set was analysed by Sean Kelly. This is a purely numeric dataset containing roughly 200 glass samples with information on amounts of various chemical elements in the samples and what purpose the glass


Eectiveness of Error Correcting Output Coding methods in ensemble and monolithic learning machines. Dipartimento di Informatica, Universitdi Pisa.

and composed by normal distributed clusters of data. The set p6 contains 6 classes with no overlapping regions, while the regions of the 9 classes of p9 hardly overlap. Glass letter and optdigits data sets are from the UCI repository [42]. In the experimentation we have used exhaustive [17] and BCH ECOC generation algorithms [8]. ECOC exhaustive algorithms select among all possible 2 K dichotomies


Zhi-Hua Zhou and Xu-Ying Liu. Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem.

effect on soybean which is with the biggest number of classes and suffering from serious class imbalance. It is noteworthy that the sampling methods and SMOTE cause negative effect on several data sets suffering from class imbalance, that is, glass soybean and annealing. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 9 TABLE IX EXPERIMENTAL RESULTS ON MULTI-CLASS UCI DATA SETS WITH TYPE (A)


Aynur Akku and H. Altay Guvenir. Weighting Features in k Nearest Neighbor Classification on Feature Projections. Department of Computer Engineering and Information Science Bilkent University.

weight to features by this method on realworld taken from the UCI repository (Murphy, 1995). The results of this experiments will be presented in Section 4. 5 Table 1: Comparison on some real-world datasets. Data Set: bcancerw cleveland glass hungarian ionosphere iris liver wine No. of Instances 273 303 214 294 351 150 345 178 No. of Features 9 13 9 13 34 4 6 13 No. of Classes 2 2 6 2 2 3 2 3 No. of


Francesco Masulli and Giorgio Valentini. Quantitative Evaluation of Dependence among Outputs in ECOC Classifiers Using Mutual Information Based Measures. Universitdi Genova DISI - Dipartimento di Informatica e Scienze dell'Informazione INFM - Istituto Nazionale per la Fisica della Materia.

the dependence among codeword bits errors [12, 8]. In our experimentation we evaluate this dependence in ECOC monolithic and ECOC PND learning machines. 3.2 The data In our experiments we have used data sets from the UCI repository of Irvine glass letter, optdigits) [11] and a synthetic data set (d5) made up by five threedimensional classes, each composed by two normal distributed disjoint clusters


Rong-En Fan and P. -H Chen and C. -J Lin. Working Set Selection Using the Second Order Information for Training SVM. Department of Computer Science and Information Engineering National Taiwan University.

was originally used in (Bailey et al., 1993). The problem mg is a Mackey Glass time series. The data sets cpusmall and splice are from the Delve archive (http://www.cs.toronto.edu/~delve). Problem fourclass is from (Ho and Kleinberg, 1996) and we further transform it to a two-class set. The problem


Yin Zhang and W. Nick Street. Bagging with Adaptive Costs. Management Sciences Department University of Iowa Iowa City.

and the out-of-bag margin estimation will result in better generalization as it does in stacking. 3. Computational Experiments Bacing was implemented using MATLAB and tested on 14 UCI repository data sets [2]: Autompg, Bupa, Glass Haberman, Housing, Cleveland-heart-disease, Hepatitis, Ion, Pima, Sonar, Vehicle, WDBC, Wine and WPBC. Some of the data sets do not originally depict two-class problems


Ping Zhong and Masao Fukushima. Second Order Cone Programming Formulations for Robust Multi-class Classification.

problem as follows: max ®,¾,¿ e T ®- (¾ + ¿) s.t. ¯ E T ® = 0, ® · (1 - º)e, (38) ¾ - ¿ = º, ° ° ° ° ° ° 2 4 - 1 p 2(K+1) ~ A T ® ¿ 3 5 ° ° ° ° ° ° · ¾. Table 1: Description of Iris, Wine and Glass datasets. name dimension (N) #classes (K) #examples (L) Iris 4 3 150 Wine 13 3 178 Glass 9 6 214 14 Table 2: Results for Iris, Wine and Glass datasets with noise (½ = 0.3, · = 2, º = 0.05). R a Robust (I)


Karthik Ramakrishnan. UNIVERSITY OF MINNESOTA.

classifier is shown as a straight line across the x-axis for comparison purposes. . . . . . . . . . . . . . . . . . 37 11 Bagging, Boosting, and Distance-Weighted test set error rates for the glass data set as the number of classifiers in the ensemble increases. The test set error rate for a single decision tree classifier is shown as a straight line across the x-axis for comparison purposes. . . . . .


Pramod Viswanath and M. Narasimha Murty and Shalabh Bhatnagar. A pattern synthesis technique to reduce the curse of dimensionality effect. E-mail.

We performed experiments with five different datasets, viz., OCR, WINE, THYROID, GLASS and PENDIGITS, respectively. Except the OCR dataset, all others are from the UCI Repository [16]. OCR dataset is also used in [17, 18]. The properties of the


Erin J. Bredensteiner and Kristin P. Bennett. Multicategory Classification by Support Vector Machines. Department of Mathematics University of Evansville.

protocol (ftp) from the UCI Repository of Machine Learning Databases and Domain Theories [16] at ftp://ftp.ics.uci.edu/pub/machine-learning-databases. Glass Identification Database The Glass dataset [11] is used to identify the origin of a sample of glass through chemical analysis. This dataset is comprised of six classes of 214 points with 9 features. The distribution of points by class is as


Pramod Viswanath and M. Narasimha Murty and Shalabh Bhatnagar. Partition Based Pattern Synthesis Technique with Efficient Algorithms for Nearest Neighbor Classification. Department of Computer Science and Automation, Indian Institute of Science.

We performed experiments with five different datasets, viz., OCR, WINE, VOWEL, THYROID, GLASS and PENDIGITS, respectively. Except the OCR dataset, all others are from the UCI Repository [19]. OCR dataset is also used in [20,18]. The properties of the


Federico Divina and Elena Marchiori. Handling Continuous Attributes in an Evolutionary Inductive Learner. Department of Computer Science Vrije Universiteit.

and ECL-LSDc (together with ECL-GSD) becomes significantly better than ECL-LSDf and ECL-LUD on the German dataset. The other datasets (Echocardiogram, Glass 2, Heart, and Hepatitis) are small, and the results of the experiments are not normally distributed, so the t-test cannot be applied. Dataset ECL-LSDc


James J. Liu and James Tin and Yau Kwok. An Extended Genetic Rule Induction Algorithm. Department of Computer Science Wuhan University.

With ESIA and other GA-based rule induction algorithms, this can easily be done by incorporating various "interestingness" measures [8] into 5 Following [6, 14], the glass2 variant of the glass dataset has classes 1 and 3 combined and classes 4 to 7 deleted, and the horse-colic dataset has attributes 3, 25, 26, 27, 28 deleted and with attribute 24 being used as the class label. We also deleted all


Francesco Masulli and Giorgio Valentini. Comparing Decomposition Methods for Classification. Istituto Nazionale per la Fisica della Materia DISI - Dipartimento di Informatica e Scienze dell'Informazione.

OPC bold driver CC grad. desc. CC bold driver ECOC BCH grad. desc. ECOC BCH bold driver (a) (b) Figure 1: Comparison of performances of different decomposition methods on glass (a) and optdigits (b) data sets of the UCI. Table 4: Standard PWC (left) and CC (right) decomposition matrices. 0 B B B B B @ +1 1 0 0 +1 0 1 0 +1 0 0 1 0 +1 1 0 0 +1 0 1 0 0 +1 1 1 C C C C C A 0 B B B B B @ +1 +1 1 1 +1 1 +1 1


Alexander K. Seewald. Dissertation Towards Understanding Stacking Studies of a General Ensemble Learning Scheme ausgefuhrt zum Zwecke der Erlangung des akademischen Grades eines Doktors der technischen Naturwissenschaften.

#10 Training set (8-9=CV, 7=75%, 6=62%,.. 1=25%) Hold-out accuracy Figure 6.2: Learning curves for dataset balance-scale to glass 56 0 1 2 3 4 5 6 7 8 9 0.65 0.7 0.75 0.8 0.85 0.9 Learncurve for Dataset #11 Training set (8-9=CV, 7=75%, 6=62%,.. 1=25%) Hold-out accuracy 0 1 2 3 4 5 6 7 8 9 0.65 0.7 0.75


H. Altay G uvenir and Aynur Akkus. WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS. Department of Computer Engineering and Information Science Bilkent University.

row of each k value presents the accuracy of the WkNNFP algorithm with equal feature weigths, while the second row shows the accuracy obtained by WkNNFP using Table 1: Comparison on some real-world datasets. Data Set: cleveland glass horse hungarian iris liver sonar wine No. of Instances 303 214 368 294 150 345 208 178 No. of Features 13 9 22 13 4 6 60 13 No. of Classes 2 6 2 2 3 2 2 3 No. of Missing


Ron Kohavi and Brian Frasca. Useful Feature Subsets and Rough Set Reducts. the Third International Workshop on Rough Sets and Soft Computing.

taken from Holte's paper, C4.5 has a 3.5% higher accuracy. The average accuracy for Holte-II is 82.7%, and 86.2% for C4.5. If we ignore the two glass datasets on which Holte-II does poorly, the difference shrinks to 1.3%. Thus even on data with continuous features that have not been discretized, Holte-II does reasonably close to C4.5. Moreover, the


H. Altay Guvenir. A Classification Learning Algorithm Robust to Irrelevant Features. Bilkent University, Department of Computer Engineering and Information Science.

VFI5 1NN 3NN 5NN 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Number of irrelevant features added 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Classification accuracy Glass data set VFI5 1NN 3NN 5NN 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Number of irrelevant features added 0.5 0.6 0.7 0.8 0.9 1.0 Classification accuracy Iris data set VFI5 1NN 3NN 5NN 0 1 2 3 4 5 6


Return to Glass Identification data set page.

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML