Iris Data Set
Below are papers that cite this data set, with context shown.
Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info.
Return to Iris data set page.
Igor Fischer and Jan Poland. Amplifying the Block Matrix Structure for Spectral Clustering. Telecommunications Lab. 2005.
are common benchmark sets with realworld data (Murphy & Aha, 1994): the iris the wine and the breast cancer data set. Both our methods perform very well on iris and breast cancer. However, the wine data set is too sparse for contextdependent method: only 178 points in 13 dimensions, giving the conductivity too
Sotiris B. Kotsiantis and Panayiotis E. Pintelas. Logitboost of Simple Bayesian Classifier. Informatica. 2005.
were hand selected so as to come from realworld problems and to vary in characteristics. Thus, we have used data sets from the domains of: pattern recognition iris zoo), image recognition (ionosphere, sonar), medical diagnosis (breastcancer, breastw, colic, diabetes, heartc, hearth, heartstatlog, hepatitis,
Manuel Oliveira. Library Release Form Name of Author: Stanley Robson de Medeiros Oliveira Title of Thesis: Data Transformation For PrivacyPreserving Data Mining Degree: Doctor of Philosophy Year this Degree Granted. University of Alberta Library. 2005.
(d o =18,d r = 12). 119 7.13 Average of Fmeasure (10 trials) for the Iris dataset (d o =5,d r =3)......120 7.14 An example of partitioning for the Pumsb dataset. . . . . . . . . . . . . . . . 120 7.15 Average of Fmeasure (10 trials) for the Pumsb dataset over vertically
Ping Zhong and Masao Fukushima. A Regularized Nonsmooth Newton Method for Multiclass Support Vector Machines. 2005.
the starting point of the next (k + 1)th iteration. The parameters º 1 and º 2 in (3) are both set 0.01. In Algorithm 3.1, we replaced the standard Armijorule in (S.3) by 10 Table 1: Six benchmark datasets from UCI name iris wine glass vowel vehicle segment #pts 150 178 214 528 846 2310 {fiatsflats} 4 13 9 10 18 19 #cls 3 3 6 11 4 7 #pts: the number of training data; {fiatsflats}: the number of
Anthony K H Tung and Xin Xu and Beng Chin Ooi. CURLER: Finding and Visualizing Nonlinear Correlated Clusters. SIGMOD Conference. 2005.
of three helix clusters with different cluster existence spaces, the iris plant dataset and the image segmentation dataset from the UCI Repository of Machine Learning Databases and Domain Theories [6], and the Iyer time series gene expression data with 10 wellknown linear clusters
Jeroen Eggermont and Joost N. Kok and Walter A. Kosters. Genetic Programming for data classification: partitioning the search space. SAC. 2004.
is disappointing as only our clustering gp algorithm with 3 clusters per numerical valued attribute manages to really outperform our simple gp but still performs much worse than C4.5. The Iris Data Set If we look at the results of our gp algorithms on the Iris data set in Table 8 we see that by far the best performance is achieved by our clustering gp algorithm with 3 clusters per numerical valued
Remco R. Bouckaert and Eibe Frank. Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms. PAKDD. 2004.
perform differently in 19 out of 27 cases. For some rows, the test consistently indicates no difference between any two of the three schemes, in particular for the iris and Hungarian heart disease datasets. However, most rows contain at least one cell where the outcomes of the test are not consistent. The row labeled "consistent" at the bottom of the table lists the number of datasets for which all
Mikhail Bilenko and Sugato Basu and Raymond J. Mooney. Integrating constraints and metric learning in semisupervised clustering. ICML. 2004.
Experiments were conducted on three datasets from the UCI repository: Iris Wine, and Ionosphere (Blake & Merz, 1998); the Protein dataset used by Xing et al. (2003) and BarHillel et al. (2003), and randomly sampled subsets from the Digits
Qingping Tao Ph. D. MAKING EFFICIENT LEARNING ALGORITHMS WITH EXPONENTIALLY MANY FEATURES. Qingping Tao A DISSERTATION Faculty of The Graduate College University of Nebraska In Partial Fulfillment of Requirements. 2004.
(T 0 = n 2 and T s =10n 2 ). M  Metropolis, G  Gibbs, MG  Metropolized Gibbs, PT  Parallel Tempering, BF  Brute Force. Data Sets iris car breast cancer voting auto annealing n 4 6 9 16 25 38 M 5.3 ± 2.1 1.7 ± 0.831.5 ± 5.05.0± 2.1 12.8 ± 7.5 1.0 ± 0.7 G 6.7 ± 3.81.9 ± 0.8 30.9 ± 5.5 5.0 ± 2.415.6 ± 7.80.6 ± 0.5 MG 6.0 ± 1.7
Yuan Jiang and ZhiHua Zhou. Editing Training Data for kNN Classifiers with Neural Network Ensemble. ISNN (1). 2004.
i.e. glass, hayesroth and wine. It is surprising that Depuration obtains the best performance on only one data set, i.e. iris as RelabelOnly does. These observations indicate that NNEE is a better editing approach than Depuration. Moreover, since the e®ect of Depuration is only comparable to that of
Sugato Basu. SemiSupervised Clustering with Limited Background Knowledge. AAAI. 2004.
like stopword removal, tfidf weighting, and removal of very highfrequency and very lowfrequency words (Dhillon & Modha, 2001). From the UCI collection we selected Iris which is a wellknown dataset having 150 points in 4 dimensions. We used the active pairwise constrained version of KMeans on Iris, and SPKMeans on Classic3subset. Learning curves with cross validation For all algorithms on
Judith E. Devaney and Steven G. Satterfield and John G. Hagedorn and John T. Kelso and Adele P. Peskin and William George and Terence J. Griffin and Howard K. Hung and Ronald D. Kriz. Science at the Speed of Thought. Ambient Intelligence for Scientific Discovery. 2004.
EXAMPLES Figure 1 shows part of our visualization of the Iris data set [2]. (The full visualization contains multiple rooms with an alternate visualization of the same data set in each room, enabling a scientist to visit each of the rooms.) On the near side of the left
Jennifer G. Dy and Carla Brodley. Feature Selection for Unsupervised Learning. Journal of Machine Learning Research, 5. 2004.
EMkSTD (e) Figure 9: Feature selection versus without feature selection on the fourclass data. 6.5 Experiments on Real Data We examine the FSSEM variants on the iris wine, and ionosphere data set from the UCI learning repository (Blake and Merz, 1998), and on a high resolution computed tomography (HRCT) lung 867 DY AND BRODLEY image data which we collected from IUPUI medical center (Dy et
Ross J. Micheals and Patrick Grother and P. Jonathon Phillips. The NIST HumanID Evaluation Framework. AVBPA. 2003.
Jonathon's signature therefore contains five sigmembers: one for the iris scan, three for each facial image, and one for the gait video. For the first sigmember, the iris scan, there is a single dataset with a single file that contains the iris data. Three sigmembers, for the facial imagery, each have a single dataset, each with a single file that each contain a facial image. The fifth sigmember,
Sugato Basu. Also Appears as Technical Report, UTAI. PhD Proposal. 2003.
like stopword removal, tfidf weighting, and removal of very highfrequency and very lowfrequency words (Dhillon & Modha, 2001). From the UCI collection we selected Iris which is a wellknown dataset having 150 points in 4 dimensions. We used the active pairwise constrained version of KMeans on Iris, and SPKMeans on Classic3subset. Learning curves with cross validation For all algorithms on
Dick de Ridder and Olga Kouropteva and Oleg Okun and Matti Pietikäinen and Robert P W Duin. Supervised Locally Linear Embedding. ICANN. 2003.
retained in the remaining M dimensions [3]. This local intrinsic dimensionality estimate is denoted by ML . The feature extraction process is illustrated in Figure 1: the C = 3 classes in the iris data set [1] are mapped onto single points by 1SLLE. #SLLE retains some of the class structure, but reduces withinclass dispersion compared to LLE. Clearly, SLLE is suitable as a feature extraction step
Aristidis Likas and Nikos A. Vlassis and Jakob J. Verbeek. The global kmeans clustering algorithm. Pattern Recognition, 36. 2003.
it is also possible to employ the above presented kd tree approach with the global kmeans algorithm. 4 Experimental results We have tested the proposed clustering algorithms on several wellknown data sets, namely the iris data set [8], the synthetic data set [9] and the image segmentation data set [8]. In all data sets we conducted experiments for the clustering problems obtained by considering only
ZhiHua Zhou and Yuan Jiang and Shifu Chen. Extracting symbolic rules from trained neural network ensembles. AI Commun, 16. 2003.
80 2 19 13 6 iris plant iris 150 3 4 0 4 statlog australian credit approval credita 690 2 15 9 6 statlog german credit creditg 1,000 2 20 13 7 Table 2 Fidelity of rules extracted via REFNE data set balance voting hepatitis iris credita creditg average fidelity 87.88% 89.26% 84.50% 96.25% 84.13% 74.10% 86.02% Table 3 Comparison of generalization error data set REFNE ensemble single NN C4.5
Jeremy Kubica and Andrew Moore. Probabilistic Noise Identification and Data Cleaning. ICDM. 2003.
We also compared the algorithms by their ability to identify artificial corruptions. Three different test sets were used: a noise free version of the rock data described above, the UCI Iris data set, and the UCI Wine data set [3]. Noise was generated by choosing to corrupt each record with some probability p. For each record chosen, corruption and noise vectors were sampled from their
Julie Greensmith. New Frontiers For An Artificial Immune System. Digital Media Systems Laboratory HP Laboratories Bristol. 2003.
using the g++ compiler version 2.96 for Red Hat Linux 7.3 2.96113, and was run on one out of 4 of Intel Pentium fiff 4 CPU 1.80GHz HP `ePC's'. On completion of the compilation process, the iris dataset (provided with the source code) was used to perform preliminary testing on the system. Once it was clear on how to use the various parameter settings, and that classification could be performed,
Manoranjan Dash and Huan Liu and Peter Scheuermann and KianLee Tan. Fast hierarchical clustering and its validation. Data Knowl. Eng, 44. 2003.
consists of 10,992 objects in 16 dimensions. There are 10 classes corresponding to digits 0...9. The 16 dimensions are drawn by resampling from handwritten digits. Iris dataset has 150 points in 4 dimensions in 3 clusters. Dimensions are sepal length, sepal width, petal length, and petal width. Clusters are Iris Setosa, Iris Versicolour, and Iris Virginia. Each of the 3
Bob Ricks and Dan Ventura. Training a Quantum Neural Network. NIPS. 2003.
an epoch refers to finding and fixing the weight of a single node. We also tried the randomized search algorithm for a few realworld machine learning problems: lenses, HayesRoth and the iris datasets [19]. The lenses data set is a data set that tries to predict whether people will need soft contact lenses, hard contact lenses or no contacts. The iris dataset details features of three different
Eibe Frank and Mark Hall. Visualizing Class Probability Estimators. PKDD. 2003.
<= 1.7 iris virginica (46.0/1.0) } 1.7 Irisversicolor (48.0/1.0) <= 4.9 petalwidth } 4.9 Irisvirginica (3.0) <= 1.5 Irisversicolor (3.0/1.0) } 1.5 Fig. 5. The decision tree for the twoclass iris dataset. (a) (b) (c) Fig. 6. Visualizing the decision tree for the twoclass iris data using (a) petallength and petalwidth, (b) petallength and sepallength, and (c) sepallength and sepalwidth (with the
Jun Wang and Bin Yu and Les Gasser. Concept Tree Based Clustering Visualization with Shaded Similarity Matrices. ICDM. 2002.
we will briefly show how shaded similarity matrices are constructed and how one looks through an example. The data used in the example is part of the Iris data from the UCI repository[9]. The Iris data set contains 150 instances, evenly distributed in 3 classes. We fetch 5 instances from each class, and thus obtain 15 instances (Table 1). The similarity matrix was computed based on Euclidean distance
Michail Vlachos and Carlotta Domeniconi and Dimitrios Gunopulos and George Kollios and Nick Koudas. Nonlinear dimensionality reduction techniques for classification and visualization. KDD. 2002.
used in our experiments Dataset ] data ] dims ] classes experiment Iris 100 4 2 leave 1 out cv Sonar 208 60 2 leave 1 out cv Glass 214 9 6 leave 1 out cv Liver 345 6 2 leave 1 out cv Lung 32 56 3 leave 1 out cv Image 640 16
Geoffrey Holmes and Bernhard Pfahringer and Richard Kirkby and Eibe Frank and Mark A. Hall. Multiclass Alternating Decision Trees. ECML. 2002.
90.49 89.72 labor 84.67 87.5 + promoters 86.8 87.3 sickeuthyroid 97.71 97.85 + sonar 76.65 74.12 vote 96.5 96.18 +, statistically significant difference Table 3. Wrapping twoclass ADTree results dataset 1vs1 1vsRest Random Exhaustive iris 95.13 95.33 95.33 95.33 balancescale 83.94 85.06 + 85.06 + 85.06 + hypothyroid 99.61 99.63 99.64 99.64 anneal 99.01 98.96 99.05 99.19 + zoo 90.38 93.45 + 95.05 +
Inderjit S. Dhillon and Dharmendra S. Modha and W. Scott Spangler. Class visualization of highdimensional data with applications. Department of Computer Sciences, University of Texas. 2002.
sketch the outline of the paper. Section 2 introduces classpreserving projections and classeigenvector plots, and contains several illustrations of the Iris plant and ISOLET speech recognition data sets [27]. Classsimilarity graphs and class tours are discussed in Sections 3 and 4. We illustrate the value of the above visualization tools in Section 5, where we present a detailed study of the
Manoranjan Dash and Kiseok Choi and Peter Scheuermann and Huan Liu. Feature Selection for Clustering  A Filter Solution. ICDM. 2002.
are almost correct as well as the selected features are all important and it missed out only one important feature. 5.2 Benchmark and Real Datasets Iris dataset, popularly used for testing clustering and classification algorithms, is taken from UCI ML repository [5]. It contains 3 classes of 50 instances each, where each class refers to a type
Ayhan Demiriz and Kristin P. Bennett and Mark J. Embrechts. A Genetic Algorithm Approach for SemiSupervised Clustering. EBusiness Department, Verizon Inc.. 2002.
506 points), House Votes (16 variables, 435 points), Breast Cancer Diagnostic (30 variables, 569 points), Pima Diabetes ( 8 variables, 769 points), and Iris ( 4 variables, 150 points). The datasets have categorical dependent variables except Housing. The continuous dependent variable for this dataset was categorized at the level of 21.5. Iris is a three class problem. The other datasets are
Wai Lam and Kin Keung and Charles X. Ling. PR 1527. Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong. 2001.
from di3erent realworld application in various domains, such as the citycycle fuel consumption (Am), Wisconsin breast cancer (Bc) and the 43 famous iris plant database (Ir). Table 1 shows the data sets and their corresponding code used in this paper. 45 For each data set, we randomly partitioned the data into ten even portions. Ten trials derived from 10fold 47 crossvalidation were conducted
Jinyan Li and Guozhu Dong and Kotagiri Ramamohanarao and Limsoon Wong. DeEPs: A New Instancebased Discovery and Classification System. Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases. 2001.
we highlight some interesting points. 1. DeEPs versus kNN. ffl Both DeEPs and kNN perform equally accurately on soybeansmall (100%) and on iris (96%). ffl DeEPs wins on 26 data sets; kNN wins on 11. It can be seen that the accuracy of DeEPs is generally better than that of kNN. ffl The speed of DeEPs is about 1.5 times slower than that of kNN. The main reason is that DeEPs
David Hershberger and Hillol Kargupta. Distributed Multivariate Regression Using WaveletBased Collective Data Mining. J. Parallel Distrib. Comput, 61. 2001.
Application of this method to Linear Discriminant Analysis, which is related to parametric multivariate regression, produced classification results on the Iris data set that are comparable to those obtained with centralized data analysis. Key Words: data mining, distributed data mining, collective data mining, knowledge discovery, wavelets, regression 1.
David Horn and A. Gottlieb. The Method of Quantum Clustering. NIPS. 2001.
minima appear, as seen in Fig. 3. Nonetheless, they lie high and contain only a few data points. The major minima are the same as in Fig. 2. 3.2 iris Data Our second example consists of the iris data set [10], which is a standard benchmark obtainable from the UCI repository [11]. Here we use the first two principal components to define the two dimensions in which we apply our method. Fig. 4, which
Asa BenHur and David Horn and Hava T. Siegelmann and Vladimir Vapnik. A Support Vector Method for Clustering. NIPS. 2000.
the core regions by an SV method with a global optimal solution. We have found examples where a local maximum is hard to identify by Roberts' method. 3.2 The iris data We ran SVC on the iris data set [9], which is a standard benchmark in the pattern recognition literature. It can be obtained from the UCI repository [10]. The data set contains 150 instances, each containing four measurements of
Neil Davey and Rod Adams and Mary J. George. The Architecture and Performance of a Stochastic Competitive Evolutionary Neural Tree Network. Appl. Intell, 12. 2000.
5 and 6 are illustrated in Figures 2 and 3. The IRIS data set is included to provide a benchmark performance. Set 1 2D single source Gaussian cluster, zero mean and unit variance. Simple cluster, base line test. Set 2 20D single source Gaussian cluster, zero
Edgar Acuna and Alex Rojas. Ensembles of classifiers based on Kernel density estimators. Department of Mathematics University of Puerto Rico. 2000.
has been developed to carry out all our tasks. The results are shown in the table 7. Table 6. Comparison of Bagging using classical and adaptive kernel classifiers Classical Kernel Adaptive Kernel Dataset Single Bagged Improv Single Bagged Improv Iris 4.00 3.33 16.75 4.67 4.00 14.34 Glass 44.97 40.52 9.90 35.20 33.25 5.54 HeartC 22.09 20.05 9.23 23.60 19.80 16.10 BreastW 4.34 4.10 5.53 4.88 4.53
Manoranjan Dash and Huan Liu. Feature Selection for Clustering. PAKDD. 2000.
in Figure 3. The Xaxis of the plots is for number of most important features and Y axis is for tr(P Gamma 1 W PB ) value for the corresponding subset of most important features. For Iris data set trace value was the maximum for the two most important features. For D3C, D4C and D6C data trace value increases with addition of important features in a fast rate but slows down to almost a halt
Carlotta Domeniconi and Jing Peng and Dimitrios Gunopulos. An Adaptive Metric Machine for Pattern Classification. NIPS. 2000.
used were taken from the UCI Machine Learning Database Repository [10], except for the unreleased image data set. They are: 1. Iris data. This data set consists of q = 4 measurements made on each of N = 100 iris plants of J = 2 species; 2. Sonar data. This data set consists of q = 60 frequency measurements
David M J Tax and Robert P W Duin. Support vector domain description. Pattern Recognition Letters, 20. 1999.
almost Gaussian distributed and class 2 is scattered around it. The SVDD cannot distinguish one class 2 object from class 1. Finally, the performance of the outlier methods are applied on the iris dataset. Here, all methods work reasonably well, which indicates that the data distributions of the classes are well clustered. Only the Parzen density estimation slightly overtrains. From these results we
Ismail Taha and Joydeep Ghosh. Symbolic Interpretation of Artificial Neural Networks. IEEE Trans. Knowl. Data Eng, 11. 1999.
and universal approach. A rule evaluation technique that orders extracted rules based on three performance measures is then proposed. The three techniques are applied to the iris and breast cancer data sets. The extracted rules are evaluated qualitatively and quantitatively, and compared with those obtained by other approaches. Index Terms: rule extraction, hybrid systems, knowledge refinement, neural
Foster J. Provost and Tom Fawcett and Ron Kohavi. The Case against Accuracy Estimation for Comparing Induction Algorithms. ICML. 1998.
we often do not know whether the existing distribution is the natural distribution, or whether it has been stratified. The iris data set has exactly 50 instances of each class. The splice junction data set (DNA) has 50% donor sites, 25% acceptor sites and 25% nonboundary sites, even though the natural class distribution is very
Stephen D. Bay. Combining Nearest Neighbor Classifiers Through Multiple Feature Subsets. ICML. 1998.
comparison, we used the Wilcoxon signed rank test and found that MFS1 and MFS2 were significantly better than all others with a confidence level greater than 99%. MFS only performed poorly on two datasets: Iris and TicTacToe. For Iris, both MFS1 and MFS2 gave the lowest accuracy out of all the classifiers. This can possibly be explained by the small number of features in the Iris dataset. With
Wojciech Kwedlo and Marek Kretowski. Discovery of Decision Rules from Databases: An Evolutionary Approach. PKDD. 1998.
Features Examples Classes australian 15 (9 nominal) 690 2 diabetes 8 768 2 german 20 (13 nominal) 1000 2 glass 9 214 7 hepatitis 19 (13 nominal) 155 2 iris 4 150 3 Table 1. Description of the datasets used in the experiments. Dataset Majority C4.5 EDRL fi australian 55.5 85:3 Sigma 0:2 86:1 Sigma 0:4 0.05 diabetes 65.1 74:6 Sigma 0:3 77:9 Sigma 0:3 0.2 german 70.0 71:6 Sigma 0:3 70:1 Sigma
Igor Kononenko and Edvard Simec and Marko RobnikSikonja. Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF. Appl. Intell, 7. 1997.
We compared the performance of the algorithms also on the following nonmedical real world data sets (SOYB, IRIS and VOTE are obtained from the Irvine database[21], SAT is obtained from the StatLog database [18]): SOYB: The famous soybean data set used by Michalski & Chilausky [17]. IRIS: The
Prototype Selection for Composite Nearest Neighbor Classifiers. Department of Computer Science University of Massachusetts. 1997.
FitnessFeature Selection. : : : : : : : : : : : : : : : : 116 4.10 Relationships between component accuracy and diversity for the Monks2, Breast Cancer Ljubljana, Diabetes and Iris Plants data sets for the four boosting algorithms. "c" represents the Coarse Reclassification algorithm; "d", Deliberate Misclassification; "f ", Composite Fitness; and "s" Composite FitnessFeature Selection. : :
Ke Wang and Han Chong Goh. Minimum Splits Based Discretization for Continuous Features. IJCAI (2). 1997.
but never explored multiway split of a continuous feature, making the simple structure disappear. Consider the following two decision trees built in one of the 10fold cross validation on Iris dataset. The first tree is produced by the multiway split proposed in this paper, and the second by C4.5. Though both trees have the same size and same error rate on test data, the first tree classifies
Ethem Alpaydin. Voting over Multiple Condensed Nearest Neighbors. Artif. Intell. Rev, 11. 1997.
accuracy goes higher but the variance also decreases. This indicates better generalization and is the clear advantage of voting. Complete results are given in Table 4. Results for the IRIS and WINE datasets are similar and are omitted. When one increases the number of voting subsets, after a certain number, new subsets do not contribute much. Whether an additional subset pays off the additional
Tapio Elomaa and Juho Rousu. Finding Optimal MultiSplits for Numerical Attributes in Decision Tree Learning. ESPRIT Working Group in Neural and Computational Learning. 1996.
used. Data set Examples Attributes Classes Num. Total Iris plant classification 150 4 4 3 Glass type identification 214 9 9 6 Australian credit card assessment 690 6 14 2 Wisconsin breast cancer data 699 9 9 2
Ron Kohavi. Scaling Up the Accuracy of NaiveBayes Classifiers: A DecisionTree Hybrid. KDD. 1996.
easy to understand when the log probabilities were presented as evidence that adds up in favor of different classes. Figure 1 shows a visualization of the NaiveBayes classifier for Fisher's iris data set, where the task is to determine the type of iris based on four attributes. Each bar represents evidence for a given class and attribute value. Users can immediately see that all values for
Daniel C. St and Ralph W. Wilkerson and Cihan H. Dagli. RULE SET QUALITY MEASURES FOR INDUCTIVE LEARNING ALGORITHMS. proceedings of the Artificial Neural Networks In Engineering Conference 1996 (ANNIE. 1996.
distribution of the 148 instances among the four classes "normal" with 2 instances, "metastases" with 81 instances, "malign" with 61 instances, and "fibrosis" with 4 instances. The Iris data set, developed by R. A. Fisher (1936), lists the measurements of four characteristics of Iris flowers: petal length, petal width, sepal length, and sepal width. The set includes the measurements of 50
Ron Kohavi. A Study of CrossValidation and Bootstrap for Accuracy Estimation and Model Selection. IJCAI. 1995.
then an overrepresented class in one subset will be a underrepresented in the other. To demonstrate the issue, we simulated a 2/3, 1/3 split of Fisher's famous iris dataset and used a majority inducer that builds a classifier predicting the prevalent class in the training set. The iris dataset describes iris plants using four continuous features, and the task is to
Ron Kohavi. The Power of Decision Tables. ECML. 1995.
other class is the more prevalent in the training set and the majority inducer predicts the wrong label for the test instance. We have observed a similar phenomenon even with tenfold CV. The iris dataset has 150 instances, 50 of each class. Predicting any class would yield 33.3% accuracy, but tenfold CV using a majority induction algorithm yields 21.5% accuracy (averaged over 100 runs of tenfold
George H. John and Ron Kohavi and Karl Pfleger. Irrelevant Features and the Subset Selection Problem. ICML. 1994.
performance was on parity5+5 and CorrAL using stepwise backward elimination, which reduced the error to 0% from 50% and 18.8% respectively. Experiments were also run on the Iris Thyroid, and Monk1* datasets. The results on these datasets were similar to those reported in this paper. We observed high variance in the 25fold crossvalidation estimates of the error. Since our algorithms depend on
Zoubin Ghahramani and Michael I. Jordan. Learning from incomplete data. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES. 1994.
stochastic estimator. 4.4 Classification Classification with missing inputs 0 20 40 60 80 100 20 40 60 80 100 % missing features EM % correct classification MI Figure 3: Classification of the iris data set. 100 data points were used for training and 50 for testing. Each data point consisted of 4 realvalued attributes and one of three class labels. The figure shows classification performance Sigma 1
Gabor Melli. A Lazy ModelBased Approach to OnLine Classification. University of British Columbia. 1989.
................ 88 7.2 Example of one algorithm (A 1 ) being more accurate than another (A 2 ). . . . 90 7.3 Accuracy performance on the iris dataset for several parameter combinations of the DI n ()basedalgorithm............................ 93 7.4 Parameter settings for the DI n () based algorithm that achieve the lowest
YongSeog Kim and W. Nick Street and Filippo Menczer. Optimal Ensemble Construction via MetaEvolutionary Ensembles. Business Information Systems, Utah State University.
with detailed information from most of input features to learn multiple patterns. Therefore, classifiers with information from few projected variables will not perform well. Note that, among 15 data sets, there are four multiclass data sets iris hypo, segment, and soybean) while the remaining 11 data sets are biclass data sets. Out of four multiclass data sets, MEE shows consistently worse
Maria Salamo and Elisabet Golobardes. Analysing Rough Sets weighting methods for CaseBased Reasoning Systems. Enginyeria i Arquitectura La Salle.
are obtained from the UCI repository [MM98]. They are: breast cancer, glass, ionosphere, iris led, sonar, vehicle and vowel. Private datasets are from our own repository. They deal with diagnosis of breast cancer and synthetic datasets. Datasets related to diagnosis are biopsy and mammogram. Biopsy is the result of digitally processed
Lawrence O. Hall and Nitesh V. Chawla and Kevin W. Bowyer. Combining Decision Trees Learned in Parallel. Department of Computer Science and Engineering, ENB 118 University of South Florida.
0.6 < PetalWidth <= 1.5 and PetalLength } 4.9 } Iris Viginica R5: If 1.5 < PetalWidth <= 1.7 and PetalLength } 4.9 } IrisVersicolor <= 1.7 Figure 1: The C4.5 tree produced on the full Iris dataset and the corresponding rules. adjust just one condition. For example, R1 no longer conflicts its test is adjusted to be petalwidthcm :5. A more complex problem is a condition in one rule overlaps
Anthony Robins and Marcus Frean. Learning and generalisation in a stable network. Computer Science, The University of Otago.
network. The effectiveness of pseudorehearsal at reducing catastrophic forgetting has been proven using a range of populations, including: randomly constructed autoassociative and hetroassociative data sets [Robins, 1995]; the Iris data set [Robins, 1996]; a classification task using the Mushroom data set [French, 1997]; and an alphanumeric character set using a Hopfield type network [Robins and
Geoffrey Holmes and Leonard E. Trigg. A Diagnostic Tool for Tree Based Supervised Classification Learning Algorithms. Department of Computer Science University of Waikato Hamilton New Zealand.
difference by the range of the tested attribute, giving the formula: cost =  v 1  v 2  max a 1 min a 1 Figure 2 illustrates the problem for case 4 with an example taken from the familiar iris dataset. The minimum cost edit sequence to transform the tree on the left involves deleting the nonroot Petal width nodes and their rightmost leaf nodes (giving a cost of 4). We are left with two trees
Shlomo Dubnov and Ran El and Yaniv Technion and Yoram Gdalyahu and Elad Schneidman and Naftali Tishby and Golan Yona. Clustering By Friends : A New Nonparametric Pairwise Distance Based Clustering Algorithm. Ben Gurion University.
procedure of the crossvalidation index (see Section 3) and we only report the resulting crossvalidation indices obtained during the computations. In section 5.1 we consider the classical Iris data sets. Then, in section 5.2 we consider the Isolet data set. An application to musical data is considered in section 5.3. 5.1. The Iris Data This data set, due to Fisher (Fisher, 1936), is a classic
Michael R. Berthold and KlausPeter Huber. From Radial to Rectangular Basis Functions: A new Approach for Rule Learning from Large Datasets. Institut fur Rechnerentwurf und Fehlertoleranz (Prof. D. Schmid) Universitat Karlsruhe.
extracted from a Neural Network trained on the data, rather than from the data itself. In this scenario the Neural Network already took care of the noisy patterns. B. The IRIS data This very famous dataset from Fisher ([5]) contains three classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other two; the latter are not linearly
Norbert Jankowski. Survey of Neural Transfer Functions. Department of Computer Methods, Nicholas Copernicus University.
sphere defined by this metric. The influence of input renormalization (using Minkovsky distance functions) on the shapes of decision borders is illustrated in Fig. 30 for the classical Iris flowers dataset (only the last two input features, x 3 and x 4 are shown, for description of the data cf. [89]). Dramatic changes in the shapes of decision borders for different Minkovsky metrices are observed.
Karthik Ramakrishnan. UNIVERSITY OF MINNESOTA.
classifier is shown as a straight line across the xaxis for comparison purposes. . . . . . . . . . . . . . . . 39 vi 15 Bagging, Boosting, and DistanceWeighted test set error rates for the iris data set as the number of classifiers in the ensemble increases. The test set error rate for a single decision tree classifier is shown as a straight line across the xaxis for comparison purposes. . . . . .
Wl/odzisl/aw Duch and Rafal Adamczak and Geerd H. F Diercksen. Neural Networks from Similarity Based Perspective. Department of Computer Methods, Nicholas Copernicus University.
them on a unit sphere defined by this metric. 6 Pedagogical illustration The influence of nonEuclidean distance functions on the decision borders is illustrated here on the classical Iris flowers dataset, containing 50 cases in each of the 3 classes. The flowers are described by 4 measurements (petal and sepal width and length). Two classes, Iris virginica and Iris versicolor, overlap, and therefore
Fernando Fern#andez and Pedro Isasi. Designing Nearest Neighbour Classifiers by the Evolution of a Population of Prototypes. Universidad Carlos III de Madrid.
first version is due to the high number of centroids to eliminate. An example of the classifier found is given in #gure1(a), showing the centroids located in the mean of the distributions. 3.2 Iris Data Set Iris Data Set from UCI Machine Learning Repository 1 [3] is used in the second experiment. This dataset consits of 150 samples of three classes, where each class has 50 examples. The dimension of
Asa BenHur and David Horn and Hava T. Siegelmann and Vladimir Vapnik. A Support Vector Method for Hierarchical Clustering. Faculty of IE and Management Technion.
cost of a decrease in efficiency, which makes our algorithm useful even for very large datasets. To compare the performance of our algorithm with other hierarchical algorithms we ran it on the Iris data set [15], which is a standard benchmark in the pattern recognition literature. It can be obtained from the UCI repository [16]. The data set contains 150 instances each containing four measurements of
Lawrence O. Hall and Nitesh V. Chawla and Kevin W. Bowyer. Decision Tree Learning on Very Large Data Sets. Department of Computer Science and Engineering, ENB 118 University of South Florida.
0.6 < PetalWidth <= 1.5 and PetalLength } 4.9 } Iris Viginica R5: If 1.5 < PetalWidth <= 1.7 and PetalLength } 4.9 } IrisVersicolor <= 1.7 Figure 1. The C4.5 tree produced on the full Iris dataset and the corresponding rules. The final rules will be ordered by their accuracy taken from the original tree in all cases except for conflict resolution rules for which the accuracy is calculated on
G. Ratsch and B. Scholkopf and Alex Smola and K. R Muller and T. Onoda and Sebastian Mika. Arc: Ensemble Learning in the Presence of Outliers. GMD FIRST.
[17] explains the good generalization performance of AdaBoost in the low noise regime. However, AdaBoost performs worse on noisy tasks [10, 11], such as the iris and the breast cancer benchmark data sets [1]. On the latter tasks, a large margin on all training points cannot be achieved without adverse effects on the generalization error. This experimental observation was supported by the study of
Wl odzisl/aw Duch and Rudy Setiono and Jacek M. Zurada. Computational intelligence methods for rulebased data understanding.
larger input uncertainties do not change in subsequent minimizations. VIII. EXTRACTION OF RULES  ILLUSTRATIVE EXAMPLE The process of rule extraction is illustrated here using the wellknown Iris dataset, provided by Fisher in 1936. The data PROCEEDINGS OF IEEE, VOL. XX, NO. YY, 2003 17 have been obtained from the UCI machine learning repository [118]. The Iris data have 150 vectors evenly
H. Altay G uvenir and Aynur Akkus. WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS. Department of Computer Engineering and Information Science Bilkent University.
row of each k value presents the accuracy of the WkNNFP algorithm with equal feature weigths, while the second row shows the accuracy obtained by WkNNFP using Table 1: Comparison on some realworld datasets. Data Set: cleveland glass horse hungarian iris liver sonar wine No. of Instances 303 214 368 294 150 345 208 178 No. of Features 13 9 22 13 4 6 60 13 No. of Classes 2 6 2 2 3 2 2 3 No. of Missing
Huan Liu. A Family of Efficient Rule Generators. Department of Information Systems and Computer Science National University of Singapore.
and to compare with the results reported in [22] since they have done some comparison with other methods such as ID3 [14] and the one by Han et al [7]. Then, we show the results for another two data sets: GolfPlaying [13] and Iris [4]. The authors of [13, 22] did not provide testing data. Only the Iris data is divided evenly into two sets (75 patterns each) for training and testing. Datasets CAR
Rudy Setiono and Huan Liu. Fragmentation Problem and Automated Feature Construction. School of Computing National University of Singapore.
[21] which has 9 binary features x 1 ; x 2 ; : : : ; x 9 . The 512 instances are labeled as follows: (a) Class 1: x 1 x 2 x 3 + x 1 x 2 + x 7 x 8 x 9 + x 7 x 9 , (b) Class 2: Otherwise. ffl Iris dataset [6] which has 150 instances described by 4 continuous attributes: sepal length (A 1 ), sepal width (A 2 ), petal length (A 3 ), and petal width (A 4 ). Each pattern belongs to one of the 3 possible
Fran ois Poulet. Cooperation between automatic algorithms, interactive algorithms and visualization tools for Visual Data Mining. ESIEA Recherche.
by the user on the screen and the right part shows the transformed line (the best separating plane computed with the convex hulls). Fig. 6. An example of the automatic best separating plane on iris data set 2.3 Clustering The interactive algorithm described in the previous section can also be used for unsupervised classification. The computation of the convex hulls and the nearest points can be
Takao Mohri and Hidehiko Tanaka. An Optimal Weighting Criterion of Case Indexing for Both Numeric and Symbolic Attributes. Information Engineering Course, Faculty of Engineering The University of Tokyo.
(vote, soybean, crx, hypo) were in the distribution floppy disk of Quinlan's C4.5 book (Quinlan 1993). The remaining four data sets iris hepatitis, led, lednoise) were obtained from the Irvine Machine Learning Database (Murphy & Aha 1994). Including our 3 methods,VDM, PCF, CCF, IB4, and C4.5 are compared. Quinlan's C4.5 is a
Huan Li and Wenbin Chen. Supervised Local Tangent Space Alignment for Classification. IFan Shen.
containing multiple classes. The results obtained with the unsupervised and supervised LTSA are expected to be different as is shown in Fig.1. The iris data set [Blake and Merz, 1998] includes 150 4D data belonging to 3 different classes. Here first 100 data points are selected as training samples and mapped from the 4D input space to a 2D feature space
Adam H. Cannon and Lenore J. Cowen and Carey E. Priebe. Approximate Distance Classification. Department of Mathematical Sciences The Johns Hopkins University.
data before implementing the ADC classification algorithm. Here, only the raw data has been analyzed using the same procedure described above. 5 Conclusions Results on the Wisconsin breast cancer data set and the Fisher iris data set compare very well with previous work on these data. The Pima Indian diabetes results are also nearly competitive with previous work. In all three cases it should be
A. da Valls and Vicen Torra. Explaining the consensus of opinions with the vocabulary of the experts. Dept. d'Enginyeria Informtica i Matemtiques Universitat Rovira i Virgili.
as L i+1 end if return d(P i ,P c ) calculated with the definition (1). end. 4.1 Experimental results We have made different tests on different domains. Particularly, we have considered a wellknown data set: Iris [10], which has 150 flowers described by means of 4 numerical attributes: petal and sepal length, and petal and sepal width; and a second set of data built by 5 colleagues who have described
Wl/odzisl/aw Duch and Rafal Adamczak and Krzysztof Grabczewski. Extraction of crisp logical rules using constrained backpropagation networks. Department of Computer Methods, Nicholas Copernicus University.
a few cases. The final solution may be presented as a set of rules or as a network of nodes performing logical functions. III. Three examples A. Iris data In the first example the classical Iris dataset was used (all datasets were taken from the UCI machine learning repository [9]). The data has 150 vectors evenly distributed in three classes, called irissetosa, irisversicolor and irisvirginica.
Eric P. Kasten and Philip K. McKinley. MESO: Perceptual Memory to Support Online Learning in Adaptive Software. Proceedings of the Third International Conference on Development and Learning (ICDL.
sizes and feature counts. Data Set Size Features Classes Iris 150 4 3 ATT Faces 360 10,304 40 Mult. Feature 2,000 649 10 Mushroom 8,124 22 2 Japanese Vowel 9,859 12 9 Letter 20,000 16 26 Cover Type 581,012 54 7 set. As such, no
Karol Grudzi nski and Wl/odzisl/aw Duch. SBLPM: A Simple Algorithm for Selection of Reference Instances in Similarity Based Methods. Department of Computer Methods, Nicholas Copernicus University.
the UCI repository [9] and contains 3 classes Iris Setosa, Virginica and Versicolor flowers), 4 attributes (measurements of leaf and petal widths and length), 50 cases per class. The entire Iris dataset has been shown here (Fig. 1) in two dimensions, x 3 and x 4 , which are much more informative the other two (cf. [10]). In Fig 2. the reference set obtained by taking the value of # from the
ChihWei Hsu and ChengRu Lin. A Comparison of Methods for Multiclass Support Vector Machines. Department of Computer Science and Information Engineering National Taiwan University.
section we present experimental results on several problems from the Statlog collection [20] and the UCI Repository of machine learning databases [1]. From UCI Repository we choose the following datasets: iris wine, glass, and vowel. Those problems had already been tested in [27]. From Statlog collection we choose all multiclass datasets: vehicle, segment, dna, satimage, letter, and shuttle. Note
Wl odzisl and Rafal Adamczak and Krzysztof Grabczewski and Grzegorz Zal. A hybrid method for extraction of logical rules from data. Department of Computer Methods, Nicholas Copernicus University.
for benchmark applications were taken from the UCI machine learning repository [14]. Application of the constructive MLP2LN approach to the classical Iris dataset was already presented in detail [15], therefore only new aspects related to the hybrid method are discussed here. The Iris data has 150 vectors evenly distributed in three irissetosa,
Wl/odzisl/aw Duch and Rafal Adamczak and Geerd H. F Diercksen. Classification, Association and Pattern Completion using Neural Similarity Based Methods. Department of Computer Methods, Nicholas Copernicus University.
them on a unit sphere defined by this metric. 6PEDAGOGICAL ILLUSTRATION The influence of nonEuclidean distance functions on the decision borders is illustrated here on the classical Iris flowers dataset, containing 50 cases in each of the 3 classes. The flowers are described by 4 measurements (petal and sepal width and length). Two classes, Iris virginica and Iris versicolor, overlap, and therefore
Alexander K. Seewald. Dissertation Towards Understanding Stacking Studies of a General Ensemble Learning Scheme ausgefuhrt zum Zwecke der Erlangung des akademischen Grades eines Doktors der technischen Naturwissenschaften.
ionosphere Compressed glyph visualization for dataset iris Compressed glyph visualization for dataset labor Compressed glyph visualization for dataset lymph Compressed glyph visualization for dataset primarytumor Compressed glyph visualization for
Stefan Aeberhard and Danny Coomans and De Vel. THE PERFORMANCE OF STATISTICAL PATTERN RECOGNITION METHODS IN HIGH DIMENSIONAL SETTINGS. James Cook University.
means coincide. FDP performed very well for the exponential data. The results of the real data support the observations made from the simulations. FDP does not perform very well on welldefined data sets (wine data, Iris data), especially when compared to FF. It however compares somewhat better in the other cases, most noticeably in the case of the tertiary institutions data, where it equals the
Michael P. Cummings and Daniel S. Myers and Marci Mangelson. Applying Permuation Tests to TreeBased Statistical Models: Extending the R Package rpart. Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, University of Maryland.
In this section we show several examples of the application of permutation tests to treebased statistical models. We begin by permutation testing a classification tree built on the famous Iris dataset setosa: 50 versicolor: 50 virginica: 50 virginica: 0 versiolor: 0 setosa: 0 versicolor: 50 virginica: 50 virginica: 45 versicolor: 1 virginica: 5 versicolor: 49 setosa: 0 petal length < 2.45 cm
Ping Zhong and Masao Fukushima. Second Order Cone Programming Formulations for Robust Multiclass Classification.
problem as follows: max ®,¾,¿ e T ® (¾ + ¿) s.t. ¯ E T ® = 0, ® · (1  º)e, (38) ¾  ¿ = º, ° ° ° ° ° ° 2 4  1 p 2(K+1) ~ A T ® ¿ 3 5 ° ° ° ° ° ° · ¾. Table 1: Description of Iris Wine and Glass datasets. name dimension (N) #classes (K) #examples (L) Iris 4 3 150 Wine 13 3 178 Glass 9 6 214 14 Table 2: Results for Iris, Wine and Glass datasets with noise (½ = 0.3, · = 2, º = 0.05). R a Robust (I)
Wl odzisl/aw Duch and Rafal Adamczak and Norbert Jankowski. Initialization of adaptive parameters in density networks. Department of Computer Methods, Nicholas Copernicus University.
network parameters, but it is interesting to note that these results are frequently already of rather high quality. Except for galaxies all other data was obtained from the UCI repository [13]. Iris dataset contains 150 cases in 3 classes. After initialization with Gaussian functions including rotations only 4 classification errors are made (97.3% accuracy), which is a better results than many
Aynur Akku and H. Altay Guvenir. Weighting Features in k Nearest Neighbor Classification on Feature Projections. Department of Computer Engineering and Information Science Bilkent University.
significantly. This should be because all the features are equally relevant. On the cleveland, liver, iris and glass (except k = 1) datasets, the weights learned by the individual accuracies always performed significantly better than the others. The weight learning method based on the homogeneity performed better than the other on the
Jun Wang. Classification Visualization with Shaded Similarity Matrix. Bei Yu Les Gasser Graduate School of Library and Information Science University of Illinois at UrbanaChampaign.
similarity matrix is constructed and how it looks through an example. The data used in the example is part of the Iris data from the UCI repository [25]. There are 150 instances in the original Iris data set, which evenly distributed in 3 classes: setosa, virginica, and versicolor. For each class, we fetch its first 5 instances from the data file, and thus obtaining 15 instances (see Table 1). Table 2
Andrew Watkins and Jon Timmis and Lois C. Boggess. Artificial Immune Recognition System (AIRS): An ImmuneInspired Supervised Learning Algorithm. (abw5,jt6@kent.ac.uk) Computing Laboratory, University of Kent.
where classification accuracy of 98% was achieved using a kvalue of 3. This seemed to bode well, and further experiments were undertaken using the Fisher Iris data set, Pima diabetes data, Ionosphere data and the Sonar data set, all obtained from the repository at the University of California at Irvine [4]. Table II shows the performance of AIRS on these data sets
Gaurav Marwah and Lois C. Boggess. Artificial Immune Systems for Classification : Some Issues. Department of Computer Science Mississippi State University.
satisfying some stimulation threshold, but the stimulation threshold for out of class ARBs was somewhat relaxed as compared to in class ARBs. Table 4 shows the accuracy rates obtained for the iris data set using the approaches just described. Five way cross validation was performed to achieve these results. Table 4: Accuracy Rates For Iris Data Set Using Different Approaches For ARB Pool Organization
Igor Kononenko and Edvard Simec. Induction of decision trees using RELIEFF. University of Ljubljana, Faculty of electrical engineering & computer science.
for patients suffering from hepatitis. The data was provided by Gail Gong from CarnegieMellon University. We also compared the performance of the algorithms on the following nonmedical real world data sets (SOYB, IRIS and VOTE are obtained from the Irvine database (Murphy & Aha, 1991)): SOYB: The famous soybean data set used by Michalski & Chilausky (1980). IRIS: The well known Fisher's problem of
Daichi Mochihashi and Genichiro Kikui and Kenji Kita. Learning Nonstructural Distance Metric by Minimum Cluster Distortions. ATR Spoken Language Translation research laboratories.
0 . 7 0 . 8 0 . 9 1 1 2 3 4 D i m e n s i o n P r e c i s i o n (c) iris dataset 0 . 6 0 . 7 0 . 8 0 . 9 1 1 2 5 1 0 2 0 3 5 D i m e n s i o n P r e c i s i o n (d) "soybean" dataset Figure 4: Kmeans clustering of UCI Machine Learning dataset results. The horizontal axis shows
Wl odzisl/aw Duch and Karol Grudzinski. Prototype based rules  a new way to understand the data. Department of Computer Methods, Nicholas Copernicus University.
were extracted recently [2]. For comparison we have analyzed some of these dataset here. Iris flowers data, taken from the UCI repository [14], has been used in many previous studies. It contains 3 classes (Iris Setosa, Virginica and Versicolor flowers), 4 attributes (sepal and
H. Altay Guvenir. A Classification Learning Algorithm Robust to Irrelevant Features. Bilkent University, Department of Computer Engineering and Information Science.
VFI5 1NN 3NN 5NN 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Number of irrelevant features added 0.5 0.6 0.7 0.8 0.9 1.0 Classification accuracy Iris data set VFI5 1NN 3NN 5NN 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Number of irrelevant features added 0.5 0.6 0.7 0.8 0.9 1.0 Classification accuracy Newthyroid data set VFI5 1NN 3NN 5NN 0 1 2
Enes Makalic and Lloyd Allison and David L. Dowe. MML INFERENCE OF SINGLELAYER NEURAL NETWORKS. School of Computer Science and Software Engineering Monash University.
0.20, overfitting was observed  MDL inferred four hidden neurons as optimal rather than three (see Fig. 4). 780 790 800 810 820 830 840 850 1 2 3 4 5 6 Message length (nits) Hidden Neurons Iris Dataset MML Figure 5. MML inference of the Iris dataset Finally, we have tested both MML and MDLbased criteria on a real problem: the Iris dataset from the UCI machine learning repository. This is
Ron Kohavi and Brian Frasca. Useful Feature Subsets and Rough Set Reducts. the Third International Workshop on Rough Sets and Soft Computing.
bears no resemblance to Holte's 1R algorithm. 1993), stopping after a predetermined number of nonimproving node expansions. Figure 2 shows the search through the feature subsets in the IRIS dataset. The number in brackets denotes the order the nodes are visited. The bootstrap estimate is given with one standard deviation of the accuracy after the +=Gamma sign. The estimated real accuracy (on
G. Ratsch and B. Scholkopf and Alex Smola and Sebastian Mika and T. Onoda and K. R Muller. Robust Ensemble Learning for Data Mining. GMD FIRST, Kekul#estr.
generalization performance of AdaBoost in the low noise regime. However, AdaBoost performs worse than other learning machines on noisy tasks [6, 7], such as the iris and the breast cancer benchmark data sets [5]. The present paper addresses the overfitting problem of AdaBoost in two ways. Primarily, it makes an algorithmic contribution to the problem of constructing regularized boosting algorithms.
