Ionosphere Data Set
Below are papers that cite this data set, with context shown.
Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info.
Return to Ionosphere data set page.
Jeroen Eggermont and Joost N. Kok and Walter A. Kosters. Genetic Programming for data classification: partitioning the search space. SAC. 2004.
algorithm has a much smaller standard deviation. When we look at the number of clusters or maximum number of partitions we see that a maximum of 2 clusters or partitions is clearly the best for this data set. The Ionosphere Data Set If we look at the results on the Ionosphere data set in Table 7 we see that using the gain ratio instead of the gain criterion with our refined gp algorithms greatly
Jennifer G. Dy and Carla Brodley. Feature Selection for Unsupervised Learning. Journal of Machine Learning Research, 5. 2004.
EM-k-STD (e) Figure 9: Feature selection versus without feature selection on the four-class data. 6.5 Experiments on Real Data We examine the FSSEM variants on the iris, wine, and ionosphere data set from the UCI learning repository (Blake and Merz, 1998), and on a high resolution computed tomography (HRCT) lung 867 DY AND BRODLEY image data which we collected from IUPUI medical center (Dy et
Mikhail Bilenko and Sugato Basu and Raymond J. Mooney. Integrating constraints and metric learning in semi-supervised clustering. ICML. 2004.
from the UCI repository: Iris, Wine, and Ionosphere (Blake & Merz, 1998); the Protein dataset used by Xing et al. (2003) and Bar-Hillel et al. (2003), and randomly sampled subsets from the Digits and Letters handwritten character recognition datasets, also from the UCI repository. For Digits
Zhi-Hua Zhou and Yuan Jiang. NeC4.5: Neural Ensemble Based C4.5. IEEE Trans. Knowl. Data Eng, 16. 2004.
ensemble. Moreover, Table III shows that the generalization ability of NeC4.5 with µ = 0% is still better than that of C4.5. In detail, pairwise two-tailed t-tests indicate that there are seven data sets (cleveland, diabetes, ionosphere liver, sonar, waveform21, and waveform40) where NeC4.5 with µ = 0% is significantly more accurate than C4.5, while there is no significant difference on the
Hyunsoo Kim and Se Hyun Park. Data Reduction in Support Vector Machines by a Kernelized Ionic Interaction Model. SDM. 2004.
The number of 50% data points are approximately similar to the number of support vectors that was found after training with the full number of data points, except the Ionosphere and Mushroom data sets. For the Mushroom data set, the percentage of the points predicted as support vectors is mush less than 50% though the desired selection percentage was 50% in the IoI algorithm. By selecting
Glenn Fung and M. Murat Dundar and Jinbo Bi and Bharat Rao. A fast iterative algorithm for fisher discriminant using heterogeneous kernels. ICML. 2004.
p-values obtained show that there is no significant difference between A-KFD and the the standard KFD where the kernel model is chosen using a cross-validation tuning procedure. Only on two of the datasets, ionosphere and housing there is a small statistically significant difference for the two methods, with the performance of A-KFD being the better of the two for the ionosphere dataset and the
Predrag Radivojac and Zoran Obradovic and A. Keith Dunker and Slobodan Vucetic. Feature Selection Filters Based on the Permutation Test. ECML. 2004.
basic characteristics. NF and CF indicate the number of numerical and categorical features, respectively. Dataset Size Size of class 1 NF CF IONOSPHERE 351 225 34 0 VOTES 435 267 0 48 GLASS 214 163 9 0 HEART 303 139 6 7 LABOR 57 37 8 21 HOUSING 506 250 13 0 CREDIT 690 307 6 41 PIMA 768 268 9 0 ZOO 78 41 1 15
Dmitriy Fradkin and David Madigan. Experiments with random projections for machine learning. KDD. 2003.
bounds on the quality of randomized dimensionality reduction. Engebretsen, Indyk and O'Donnell [9] present a deterministic algorithm for constructing mappings of the type Table 1: Description of Datasets Name # Instances # Attributes Ionosphere 351 34 Colon 62 2000 Leukemia 72 3571 Spam 4601 57 Ads 3279 1554 described in the JL lemma (by use of the method of conditional probabilities) and use it to
Michael L. Raymer and Travis E. Doom and Leslie A. Kuhn and William F. Punch. Knowledge discovery in medical and biological datasets using a hybrid Bayes classifier/evolutionary algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 33. 2003.
is thus better for evaluating feature selection capability than classifier accuracy. Ionosphere -- The 34 continuous features in this data set are derived from the signals read by a phased array of 16 highfrequency antennas in Goose Bay, Labrador [39]. These radar signals are designed to recognize structure in the ionosphere. Each reading
Marina Skurichina and Ludmila Kuncheva and Robert P W Duin. Bagging and Boosting for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy. Multiple Classifier Systems. 2002.
are taken from the UCI Repository [22]. They are the 8dimensional pima-diabetes data set, the 34-dimensional ionosphere data set and the 60-dimensional sonar data set. Training sets are chosen randomly and the remaining data are used for testing. All experiments are repeated 50 times on
Robert Burbidge and Matthew Trotter and Bernard F. Buxton and Sean B. Holden. STAR - Sparsity through Automated Rejection. IWANN (1). 2001.
has 699 examples in nine dimensions and is `noise-free', one feature has 16 missing values which are replaced with the feature mean. The ionosphere data set has 351 examples in 33 dimensions and is slightly noisy. The heart data set has 270 examples in 13 dimensions. The Pima Indians diabetes data set has 768 examples in eight dimensions. These last two
Juan J. Rodr##guez and Carlos J. Alonso and Henrik Bostrom. Boosting Interval Based Literals. 2000.
in [HR99] is an error of 9.96, for the specified partition. Our results for the specified partition, setting 5, 100 iterations is an error of 11.54, our best result is 10.58. 4.7 Ionosphere This data set, also from the ML UCI Repository, contains information collected by a radar system [SWHB89]. The targets were free electrons in the ionosphere. Good" radar returns are those showing evidence of
Colin Campbell and Nello Cristianini and Alex J. Smola. Query Learning with Large Margin Classifiers. ICML. 2000.
with a training set of 200 (Figure 2) there are an average 56 support vectors against an average 60 queries made. Real World Data. In Figure 5 we plot the corresponding curves for the ionosphere data set from the UCI Repository (Blake, Keogh & Merz, 1998). The ionosphere data set had a sparsity ratio of 0.29 so the advantages of selective sampling are clear. A plot of the averaged distance to the
Marina Skurichina and Robert P W Duin. Boosting in Linear Discriminant Analysis. Multiple Classifier Systems. 2000.
are taken from the UCI Repository [14]. The first is the 34dimensional ionosphere data set (Data II) with 225 and 126 objects belonging to the first and the second data class, respectively. The second is the 8-dimensional diabetes data set (Data III) consisting of 500 and 268 objects from
Lorne Mason and Peter L. Bartlett and Jonathan Baxter. Improved Generalization Through Explicit Optimization of Margins. Machine Learning, 38. 2000.
classifier produced by DOOM can be as good or better than that of the classifier produced by AdaBoost, despite having dramatically worse minimum training margin. Conversely, Figure 3 Ionosphere data set) shows that an improved minimum margin can result in improved generalization performance. These results clearly demonstrate that the minimum margin is not the importantquantity. Second, the margin
Justin Bradley and Kristin P. Bennett and Bennett A. Demiriz. Constrained K-Means Clustering. Microsoft Research Dept. of Mathematical Sciences One Microsoft Way Dept. of Decision Sciences and Eng. Sys. 2000.
tailored to network optimization [2]. These codes usually run 1 or 2 orders of magnitude faster than general linear programming (LP) codes. 4 Numerical Evaluation We report results using two real datasets: the Johns Hopkins Ionosphere dataset and the Wisconsin Diagnostic Breast Cancer dataset (WDBC) [7]. The Ionosphere dataset contains 351 data points in R 33 and values along each dimension
Jennifer G. Dy and Carla Brodley. Feature Subset Selection and Order Identification for Unsupervised Learning. ICML. 2000.
To illustrate FSSEM on real data, we present results for two data sets: ionosphere (Blake & Merz, 1998) and a high resolution computed tomography images of the lungs (HRCT-lung) data set (Dy et al., 1999). See Dy (1999) for experiments on additional data sets.
P. S and Bradley K. P and Bennett A. Demiriz. Constrained K-Means Clustering. Microsoft Research Dept. of Mathematical Sciences One Microsoft Way Dept. of Decision Sciences and Eng. Sys. 2000.
tailored to network optimization [2]. These codes usually run 1 or 2 orders of magnitude faster than general linear programming (LP) codes. 4 Numerical Evaluation We report results using two real datasets: the Johns Hopkins Ionosphere dataset and the Wisconsin Diagnostic Breast Cancer dataset (WDBC) [7]. The Ionosphere dataset contains 351 data points in R 33 and values along each dimension
Stephen D. Bay. Nearest neighbor classification from multiple feature subsets. Intell. Data Anal, 3. 1999.
much as expected bychance, and 1 if the classifiers always agree. Diversity increases with smaller Kappa values. Figure 1 shows the Kappa-Error diagram for NN ensembles generated for the Ionosphere dataset by Bagging, randomly selecting 50 prototypes, and randomly selecting 6 6 features. Bagging results in a cloud of points centered roughly about (0.825,0.15). Using a smaller number of prototypes (50)
Stavros J. Perantonis and Vassilis Virvilis. Input Feature Extraction for Multilayered Perceptrons Using Supervised Principal Component Analysis. Neural Processing Letters, 10. 1999.
Real World Examples: We give results concerning four supervised learning examples from the University of California-Irvine machine learning repository [13], namely 8 1. the Ionosphere data set [14]. Here the task is to distinguish between two sets of radar returns from the ionosphere. This set comprises 351 patterns with 33 features for each pattern. 2. the "BUPA Liver Disorders" set. The
David M J Tax and Robert P W Duin. Support vector domain description. Pattern Recognition Letters, 20. 1999.
classi®ers on the remaining class. Also visible is that the Parzen method overtrains heavily and performs poorly when class 1 is the outlier class. The SVDD performs best overall. In the ionosphere dataset, the Parzen density estimation again overtrains and the instability method cannot be used because only two classes are available. From the results we see that class 1 is almost Gaussian distributed
Art B. Owen. Tubular neighbors for regression and classification. Stanford University. 1999.
part of this. The tubular neighbor model performs better than polymars on this data. It is only slightly better than a global additive model with linear and quadratic terms. 7.2 Ionosphere data This data set comes from the Irvine repository. The goal is to separate ``good'' from ``bad'' radar returns based on 34 predictors. The first 200 observations constitute the training data and have 101 ``good''
Chun-Nan Hsu and Hilmar Schuschel and Ya-Ting Yang. The ANNIGMA-Wrapper Approach to Neural Nets Feature Selection for Knowledge Discovery and Data Mining. Institute of Information Science. 1999.
concentrated on distinguishing the presence of heart disease (value 1,2 3,4 in the classification) from absence (value 0). Missing values in both databases are all replaced by zero. Ionosphere This dataset has 351 records of 34 features. The goal is to classify two types of radar returns from the ionosphere. ``Good'' radar returns are those showing evidence of some type of structure in the ionosphere.
Lorne Mason and Jonathan Baxter and Peter L. Bartlett and Marcus Frean. Boosting Algorithms as Gradient Descent. NIPS. 1999.
These results show that DOOM II generally outperforms AdaBoost and that the improvementis more pronounced in the presence of label noise. -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 Error advantage (%) Data set sonar cleve ionosphere vote1 credit breast-cancer pima-indians hypo1 splice 0% noise 5% noise 15% noise Figure 1: Summary of test error advantage (with standard error bars) of DOOM II over AdaBoost
Kai Ming Ting and Ian H. Witten. Issues in Stacked Generalization. J. Artif. Intell. Res. (JAIR, 10. 1999.
Note that stacking performs very poorly on Glass and Ionosphere two small real-world datasets. This is not surprising, because cross-validation inevitably produces poor estimates for small datasets. 4.2 Discussion Like bagging, stacking is ideal for parallel computation. The construction of
Robert E. Schapire and Yoav Freund and Peter Bartlett and Wee Sun Lee. The Annals of Statistics, to appear. Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods. AT&T Labs. 1998.
the generalization error more often than not. In one experiment reported by Breiman, the generalization error increases even though the margins of all of the instances are increased (for this dataset, called ionosphere " the number of instances is 351, much too small for our bounds to apply). While none of these experiments contradict the theory, they highlight the incompleteness of the theory
Lorne Mason and Peter L. Bartlett and Jonathan Baxter. Direct Optimization of Margins Improves Generalization in Combined Classifiers. NIPS. 1998.
classifier produced by DOOM can be as good as or better than that of the classifier produced by AdaBoost, despite having dramatically worse minimum training margin. Conversely, Figure 3 Ionosphere data set) shows that improved generalization performance can be associated with an improved minimum margin. The margin distributions also show that there is a balance to be found between training error and
Richard Maclin. Boosting Classifiers Regionally. AAAI/IAAI. 1998.
as the number of hidden units in the component network. The results of these experiments are similar to those obtained using the nearest neighbor methods, and produce significant gains for two other data sets (labor and ionosphere . Together, these experiments indicate that the overall RegionBoost approach can produce significant gains for many (though not all) data sets. One question which might be
Kristin P. Bennett and Erin J. Bredensteiner. A Parametric Optimization Method for Machine Learning. INFORMS Journal on Computing, 9. 1997.
were used in the computational experiments. Johns Hopkins University Ionosphere Database The Ionosphere dataset is used to distinguish between good and bad radar returns. A good return is one indicating evidence of some type of structure in the ionosphere. A bad return simply passes through the ionosphere.
Aynur Akkus and H. Altay Güvenir. K Nearest Neighbor Classification on Feature Projections. ICML. 1996.
Data Set: bcancerw cleveland glass hungarian ionosphere iris musk wine No. of Instances 273 303 214 294 351 150 476 178 No. of Features 9 13 9 13 34 4 166 13 No. of Classes 2 2 6 2 2 3 2 3 No. of Missing
Federico Divina and Elena Marchiori. Knowledge-Based Evolutionary Search for Inductive Concept Learning. Vrije Universiteit of Amsterdam.
found by ECL with the use of the EWUS operator are less simple that those found with the use of the other selection operators. In some cases, like in the ionosphere the crx and the australian datasets, the difference is evident, while in other, e.g., in the glass2, the accidents and the congestions dataset, the simplicity obtained by the three methods is comparable. An explanation for this is
Charles Campbell and Nello Cristianini. Simple Learning Algorithms for Training Support Vector Machines. Dept. of Engineering Mathematics.
Service [17]. As examples of the improvements with generalisation ability which can be achieved with a soft margin we will also describe experiments with the ionosphere and Pima Indians diabetes datasets from the UCI Repository [4]. Though we have successfully used other kernels with KA we will only describe experiments using Gaussian kernels in this section. We will predominantly use the KA
K. A. J Doherty and Rolf Adams and Neil Davey. Unsupervised Learning with Normalised Data and Non-Euclidean Norms. University of Hertfordshire.
from the UCI Machine Learning Repository [5]. The data sets considered were the Ionosphere Image Segmentation (training data), Wisconsin Diagnostic Breast Cancer (WDBC) and Wine data sets. These data sets were selected to show our approach on data with a
Michael Lindenbaum and Shaul Markovitch and Dmitry Rusakov. Selective Sampling Using Random Field Modelling.
Among them there were three natural datasets: Pima Indians Diabetes dataset, Ionosphere dataset and Image Segmentation dataset, one synthetic dataset: Letters dataset and three artificial problems: Two-Spirals problem, Two-Gaussians problem
Christos Emmanouilidis and Anthony Hunter. A Comparison of Crossover Operators in Neural Network Feature Selection with Multiobjective Evolutionary Algorithms. Centre for Adaptive Systems, School of Computing, Engineering and Technology University of Sunderland.
it reduces the effect of the noise in fitness evaluation. 4 EXPERIMENTAL INVESTIGATION We compare the performance of the SSOCF operator against that of standard n-point crossover on a benchmarking data set of considerable dimensionality, the ionosphere dataset [13]. It consists of 351 patterns, with 34 attributes and one output with two classes. Ten random permutations of this data set are employed.
Chiranjib Bhattacharyya and Pannagadatta K. S and Alexander J. Smola. A Second order Cone Programming Formulation for Classifying Missing Data. Department of Computer Science and Automation Indian Institute of Science.
of the UCI database. From left to right: Pima, Ionosphere and Heart dataset. Top: small fraction of data with missing variables (50%), Bottom: large number of observations with missing variables (90%) The experimental results are summarized by the graphs(1). The robust
Perry Moerland. Mixtures of latent variable models for density estimation and classification. E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r Pe r cep t ua l A r t i f i c i a l Intelligence .
different models by specifying the total number of underlined scores for each model class. This number of wins shows that Mfas are the best density estimators. They are outperformed on only three data sets (cancer, ionosphere and vowel) out of 18. With respect to the spherical Gmms, the score of only 1 win illustrates that they are too constrained to model the data. From the results, one can also
Markus Breitenbach and Rodney Nielsen and Gregory Z. Grudic. Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities. Department of Computer Science University of Colorado.
2, and asterisks demonstrate the bottom line efficacy of the algorithm, which adjusts the first-order predictions according to performance on the out-of-bag data within the same prediction interval. Dataset PRF MPMCL MPMCG SVML SVMG RF Ionosphere 80.8 ± .7 % 85.4% 93.0% 87.8% 91.5% 92.9% Sonar 81.0 ± .9 % 75.1% 89.8% 75.9% 86.7% 84.1% Breast Cancer 95.9 ± .3 % 97.2% 97.3% 92.6% 98.5% 97.1% Pima
Federico Divina and Elena Marchiori. Handling Continuous Attributes in an Evolutionary Inductive Learner. Department of Computer Science Vrije Universiteit.
better than ECL-GSD on the Breast dataset, and better than ECL-LUD on the Ionosphere dataset, together with ECL-LSDf and ECL-GSD. If we increase the confidence level to 5% then we get that ECL-LUD and ECL-LSDc are significantly better than
Glenn Fung and Sathyakama Sandilya and R. Bharat Rao. Rule extraction from Linear Support Vector Machines. Computer-Aided Diagnosis & Therapy, Siemens Medical Solutions, Inc.
from the UCI Machine Learning Repository [13]: Wisconsin Diagnosis Breast Cancer (WDBC), Ionosphere and Cleveland heart. The fourth dataset is a dataset related to the nontraditional authorship attribution problem related to the federalist papers [7] and the fifth dataset is a dataset used for training in a computer aided detection
Karthik Ramakrishnan. UNIVERSITY OF MINNESOTA.
is shown as a straight line across the x-axis for comparison purposes. . . . . . . . . . . . . . . . . . 39 14 Bagging, Boosting, and Distance-Weighted test set error rates for the ionosphere data set as the number of classifiers in the ensemble increases. The test set error rate for a single decision tree classifier is shown as a straight line across the x-axis for comparison purposes. . . . . .
Michalis K. Titsias and Aristidis Likas. Shared Kernel Models for Class Conditional Density Estimation.
and two from the UCI repository [13] (Pima Indians and Ionosphere data sets). To assess the performance of the models for each problem we have selected the five-fold cross-validation method. For each problem the original set was divided into five independent parts
Alexander K. Seewald. Dissertation Towards Understanding Stacking Studies of a General Ensemble Learning Scheme ausgefuhrt zum Zwecke der Erlangung des akademischen Grades eines Doktors der technischen Naturwissenschaften.
audiology to hepatitis. 79 Compressed glyph visualization for dataset ionosphere Compressed glyph visualization for dataset iris Compressed glyph visualization for dataset labor Compressed glyph visualization for dataset lymph Compressed glyph visualization for
Wl/odzisl/aw Duch and Karol Grudzinski and Geerd H. F Diercksen. Minimal distance neural methods. Department of Computer Methods, Nicholas Copernicus University.
optimized radius but the results were not significantly better. The failure of the minimum distance method with global parameters for this case is surprising and requires further study. Non-medical datasets included the ionosphere (350 cases, 34 attributes, 2 classes), satimage (4435 cases, 36 attributes, 6 classes), sonar (208 cases, 60 attributes, 2 classes) and vovel (528 training and 462 test
Andrew Watkins and Jon Timmis and Lois C. Boggess. Artificial Immune Recognition System (AIRS): An ImmuneInspired Supervised Learning Algorithm. (abw5,jt6@kent.ac.uk) Computing Laboratory, University of Kent.
where classification accuracy of 98% was achieved using a k-value of 3. This seemed to bode well, and further experiments were undertaken using the Fisher Iris data set, Pima diabetes data, Ionosphere data and the Sonar data set, all obtained from the repository at the University of California at Irvine [4]. Table II shows the performance of AIRS on these data sets
Aynur Akku and H. Altay Guvenir. Weighting Features in k Nearest Neighbor Classification on Feature Projections. Department of Computer Engineering and Information Science Bilkent University.
Data Set: bcancerw cleveland glass hungarian ionosphere iris liver wine No. of Instances 273 303 214 294 351 150 345 178 No. of Features 9 13 9 13 34 4 6 13 No. of Classes 2 2 6 2 2 3 2 3 No. of Missing
Krzysztof Grabczewski and Wl/odzisl/aw Duch. THE SEPARABILITY OF SPLIT VALUE CRITERION. Department of Computer Methods, Nicolaus Copernicus University.
96.0 Sigillito [7] C4.5 94.9 Hamilton [8] FSM 92.8 Rafal/ Adamczak (our group) [9] SSV Tree 92.0 this paper DB-CART 91.3 Shang, Breiman [10] CART 88.9 Shang, Breiman [10] Table 2: Test ionosphere dataset results 5.2 Ionosphere The ionosphere dataset has 200 vectors in the training set and 150 in the test set. Each data vector is described by 34 continuous attributes and belongs to one of two
Christos Emmanouilidis and A. Hunter and Dr J. MacIntyre. A Multiobjective Evolutionary Setting for Feature Selection and a Commonality-Based Crossover Operator. Centre for Adaptive Systems, School of Computing, Engineering and Technology University of Sunderland.
is that it reduces the effect of the noise in fitness evaluation. 6 Experimental Results We demonstrate how our multiobjective evolutionary algorithm feature selection works on two benchmarking data sets of considerable dimensionality. Ionosphere This data set [21] consists of 351 patterns, with 34 attributes and one output with two classes, good or bad, with good implying evidence of some type of
Chiranjib Bhattacharyya. Robust Classification of noisy data using Second Order Cone Programming approach. Dept. Computer Science and Automation, Indian Institute of Science.
downloaded from UCI machine learning dataset website[9]. Ionosphere sonar and wiconsin breast cancer were the three different datasets. The ionosphere dataset contains 34 dimensional observations, which are obtained from radar signals, while
Ayhan Demiriz and Kristin P. Bennett. Chapter 1 OPTIMIZATIONAPPROACHESTOSEMI-SUPERVISED LEARNING. Department of Decision Sciences and Engineering Systems & Department of Mathematical Sciences, Rensselaer Polytechnic Institute.
as in the previous section. Due to the long computational times for S µ VM-IQP and transductive SVM-Light, we limit our experiments to only the Heart, Housing, Ionosphere and Sonar datasets. Linear kernel functions are used for all methods used in this section. The results given in Table 1.3 show that using unlabeled data in the case of datasets Heart and Ionosphere affects
Isabelle Alvarez and Stephan Bernard. Ranking Cases with Decision Trees: a Geometric Method that Preserves Intelligibility.
score gives interesting results when misclassified examples are near the decision boundary. This is particularly true for the bupa (liver-disorder) and ionosphere databases. Table 2 shows that these datasets doesn't verify the hypothesis of proximity of errors on a majority of samples, and actually the global geometric score give bad results for these datasets. Concerning the improvement of the
Christos Dimitrakakis and Samy Bengioy. Online Policy Adaptation for Ensemble Classifiers. IDIAP.
0.5 0.6 0.7 0.8 0.9 1 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1 expert 2 experts 4 experts 8 experts 16 experts 32 experts Figure 1: Cumulative margin distribution for RL on the ionosphere dataset, with an increasing number of experts. MLP Boost MOE RL 7.28% 1.21% 4.84% 2.43% 32.8% 16.2% 31.6% 29.1% 15.0% 16.2% 13.7% 15.0% 5.96% 5.96% 5.96% 3.08% 4.10% 2.52% 4.55% 3.73% 2.63% 1.42% 2.13% 2.3%
Rajesh Parekh and Jihoon Yang and Vasant Honavar. Constructive Neural-Network Learning Algorithms for Pattern Classification.
Outputs represents the number of output classes, and Attributes describes the type of input attributes of the patterns. The real-world datasets ionosphere pima, segmentation, and vehicle are available at the UCI Machine Learning Repository [34] while the 3-circles dataset was artificially generated. The 3-circles dataset comprises of 1800
Alain Rakotomamonjy. Leave-One-Out errors in Bipartite Ranking SVM. PSI CNRS FRE2645 INSA de Rouen Avenue de l'universite.
test set and the approximated bound have been evaluated. Presented results are the average results for 20 di®erent trials of the random split. Figure (4) presents the results that we achieved for datasets sonar and ionosphere In one case, we can see that the LOPO approximated bound gives interesting result since the true test AUC plot has a similar behaviour to the 17 10 -2 10 -1 10 0 10 1 10 2 10
Wl/odzisl/aw Duch and Karol Grudzinski. Meta-learning: searching in the model space. Department of Computer Methods, Nicholas Copernicus University.
to find classification models that achieve 100% accuracy on the test set. For hepatobiliary disorders a model with highest accuracy for real medical data has been found automatically. For some data sets, such as the ionosphere there seems to be no correlation between the results on the training and on the test set. Although the use of a validation set (or the use of the crossvalidation
|