Connectionist Bench (Sonar, Mines vs. Rocks) Data Set
Below are papers that cite this data set, with context shown.
Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info.
Return to Connectionist Bench (Sonar, Mines vs. Rocks) data set page.
Zhi-Hua Zhou and Yuan Jiang. NeC4.5: Neural Ensemble Based C4.5. IEEE Trans. Knowl. Data Eng, 16. 2004.
ensemble. Moreover, Table III shows that the generalization ability of NeC4.5 with µ = 0% is still better than that of C4.5. In detail, pairwise two-tailed t-tests indicate that there are seven data sets (cleveland, diabetes, ionosphere, liver, sonar waveform21, and waveform40) where NeC4.5 with µ = 0% is significantly more accurate than C4.5, while there is no significant difference on the
Jianbin Tan and David L. Dowe. MML Inference of Oblique Decision Trees. Australian Conference on Artificial Intelligence. 2004.
and medical data, such as Bupa, Breast Cancer, Wisconsin, Lung Cancer, and Cleveland. The nine UCI Repository  data-sets used are these five, Balance, Credit, Sonar and Wine. For each of the nine data sets, 100 independent tests were done by randomly sampling 90% of the data as training data and testing on the remaining 10%. 4 Discussion We compare the MML oblique tree scheme to C4.5 and C5. The
Jeremy Kubica and Andrew Moore. Probabilistic Noise Identification and Data Cleaning. ICDM. 2003.
from the assumed models. 5.1.1 Leaf and Rock Data The leaf and rock data, summarized in Table 1, consist of attributes extracted from a series of pictures of leaves and rocks respectively. The leaf data set contains 71 records from pictures of living and dead leaves. As expected, the living leaves were green in color while the dead leaves were brownish or yellow. The rock data set contains 56 records
Dennis DeCoste. Anytime Query-Tuned Kernel Machines via Cholesky Factorization. SDM. 2003.
signs against minWz n 's typically less aggressive but steady improvements. Other hybrids are likely even better and worthy of future research. 5 Examples We checked our approach on two UCI datasets , Sonar and Haberman, and the MNIST digit-recognition dataset . We confirmed that L k (x) # f(x) # H k (x) always held. Table 3 summarizes some of our results. Rows labelled 1-2 summarize
Michail Vlachos and Carlotta Domeniconi and Dimitrios Gunopulos and George Kollios and Nick Koudas. Non-linear dimensionality reduction techniques for classification and visualization. KDD. 2002.
The average error rates for the smaller data sets (i.e., Iris, Sonar Glass, Liver, and Lung) were based on leave-oneout cross-validation, and the error rates for Image and Vowel were based on ten two-fold-cross-validation, as summarized in Table
Xavier Llor and David E. Goldberg and Ivan Traus and Ester Bernad i Mansilla. Accuracy, Parsimony, and Generality in Evolutionary Learning Systems via Multiobjective Selection. IWLCS. 2002.
Bupa Liver Disorders (bpa), Wisconsin Breast Cancer (bre), Glass (gls), Ionosphere (ion), Iris (irs), Primary Tumor (prt), and Sonar (son). These data sets contain categorical and numeric attributes, as well as binary and n-ary classification tasks. We also run several evolutionary and non-evolutionary classifier schemes on the previous data sets. The
Fei Sha and Lawrence K. Saul and Daniel D. Lee. Multiplicative Updates for Nonnegative Quadratic Programming in Support Vector Machines. NIPS. 2002.
Kernel Polynomial Radial Data k=4 k=6 #=0.3 #=1.0 #=3.0 sonar 9.6% 9.6% 7.6% 6.7% 10.6% Breast cancer 5.1% 3.6% 4.4% 4.4% 4.4% Table 1: Misclassification error rates on the sonar and breast cancer data sets after 512 iterations of the multiplicative updates. 3.1 Multiplicative updates The loss function in eq. (6) is a special case of eq. (1) with A ij = y i y j K(x i , x j ) and b i =- 1. Thus, the
Marina Skurichina and Ludmila Kuncheva and Robert P W Duin. Bagging and Boosting for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy. Multiple Classifier Systems. 2002.
the 34-dimensional ionosphere data set and the 60-dimensional sonar data set. Training sets are chosen randomly and the remaining data are used for testing. All experiments are repeated 50 times on independent training sets. So all the
Dennis DeCoste. Anytime Interval-Valued Outputs for Kernel Machines: Fast Support Vector Machine Classification via Distance Geometry. ICML. 2002.
the S i in our embedding point sequence being defined in terms of multiple z i vectors (and/or non-uniform # i 's) seems worthy of future research. 7. Experiments We checked our approach on two UCI datasets (Blake & Merz, 1998): Sonar (to test relatively high d (high kernel costs)) and Haberman (to contrast with related experiments (Downs et al., 2001)). We also report MISR cloud classification
Ayhan Demiriz and Kristin P. Bennett and Mark J. Embrechts. A Genetic Algorithm Approach for Semi-Supervised Clustering. E-Business Department, Verizon Inc.. 2002.
the transduction result (using both labeled and unlabeled data) cases. Note that on the center cluster transduction does work appropriately. The inductive 1 The (k, Ż, ®) values applied for each dataset were bright (15, 0.01,0.99), sonar (7,0.1,1), heart (7,0.25,0.75), ionosphere (7, 0.01,0.99), house (7,0.1,0.9), housing (11,0.01, 0.99), diagnostic (11,0.4,0.6), pima (11,0.01,0.99) and
Wl/odzisl/aw Duch and Karol Grudzinski. Ensembles of Similarity-based Models. Intelligent Information Systems. 2001.
(except for the data described here we have tried sonar and hepatitis datasets from UCI ) the improvements have been insignificant. This shows that an ensemble of models of similar types may sometimes fail to improve the results. One reason for this may come from
Juan J. Rodr##guez and Carlos J. Alonso and Henrik Bostrom. Boosting Interval Based Literals. 2000.
in a supervised classification setting from other authors. The results are shown in table 10. All the differences considered are significant, with only one exception. 4.6 Sonar This data set was introduced in [GS88] and it is available at the UCI ML Repository [Bay99]. The task is to discriminate between sonar signals bounced off a metal cylinder and those bounced off a roughly
Chris Drummond and Robert C. Holte. Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria. ICML. 2000.
sonar a classifier with a normalized expected cost that is totally independent of misclassification costs and priors. Figure 15 shows cost curves for the different splitting criteria on the sonar data set. For a given splitting criterion the classifier corresponding to the intersection would be a horizontal line through the highest point on the cost curve. In all cases this cost-insensitive
Carlotta Domeniconi and Jing Peng and Dimitrios Gunopulos. An Adaptive Metric Machine for Pattern Classification. NIPS. 2000.
consists of q = 4 measurements made on each of N = 100 iris plants of J = 2 species; 2. Sonar data. This data set consists of q = 60 frequency measurements made on each of N = 208 data of J = 2 classes (``mines'' and ``rocks''); 3. Vowel data. This example has q = 10 measurements and 11 classes. There are total
Lorne Mason and Peter L. Bartlett and Jonathan Baxter. Improved Generalization Through Explicit Optimization of Margins. Machine Learning, 38. 2000.
cases a maximization of minimum margin at the expense of all other margins generally gave worse generalization performance than AdaBoost. As can be seen in Figure 3 (Credit Application and Sonar data sets), the generalization performance of the combined classifier produced by DOOM can be as good or better than that of the classifier produced by AdaBoost, despite having dramatically worse minimum
Kristin P. Bennett and Ayhan Demiriz and John Shawe-Taylor. A Column Generation Algorithm For Boosting. ICML. 2000.
where the base learner solves (6) exactly, then to examine LPBoost in a more realistic environment. 5.1 Boosting Decision Tree Stumps We used decision tree stumps as a base learner on six UCI datasets: Cancer (9,699), Heart (13,297), Sonar (60,208), Ionosphere (34,351), Diagnostic (30,569), and Musk (166,476). The number of features and number of points in each dataset are shown in parentheses
Chris Drummond and Robert C. Holte. Explicitly representing expected cost: an alternative to ROC representation. KDD. 2000.
classifier for a particular PCF(+) value is not necessarily the one produced by a training set with the same PCF(+) characteristics is illustrated in figure 17, which shows ROC curves for the sonar data set from the UCI collection . The points represented by circles, and connected by solid lines, were generated using C4.5 (release 7 using information gain) modified to account for costs (by altering
Ayhan Demiriz and Kristin P. Bennett and Mark J. Embrechts. Semi-Supervised Clustering Using Genetic Algorithms. Dept. 1999.
The transductive MSE+GINI method based on all available data showed no consistent improvements over the induc2 The (k, #, #) values applied for each dataset were bright (15, 0.01,0.99), sonar (7,0.1,1), heart (7,0.25,0.75), ionosphere (7, 0.01,0.99), house (7,0.1,0.9), housing (11,0.01, 0.99), prognosis (11,0.4,0.6), and pima (11,0.01,0.99). 13 tion
Kagan Tumer and Joydeep Ghosh. Robust Combining of Disparate Classifiers through Order Statistics. CoRR, csLG/9905013. 1999.
trim and spread, and derive the amount of error reduction associated with each. In Section 5 we present the performance of order statistic combiners on a real world sonar problem , and several data sets from the Proben1/UCI benchmarks [4, 25]. Section 6 discusses the implications of using linear combinations of order statistics as a strategy for pooling the outputs of individual classifiers. 2
Chun-Nan Hsu and Hilmar Schuschel and Ya-Ting Yang. The ANNIGMA-Wrapper Approach to Neural Nets Feature Selection for Knowledge Discovery and Data Mining. Institute of Information Science. 1999.
instances of 59 features. The domain of the features are one of fA,G,T,Cg (DNA nucleotides). We set their value to f1,2,3,4g respectively. 53 instances are positive and 53 are negative. Sonar This dataset contains 208 patterns. 111 patterns were obtained by bouncing sonar signals off a metal at various angles and under various conditions. 97 patterns were obtained from rocks under similar conditions.
Art B. Owen. Tubular neighbors for regression and classification. Stanford University. 1999.
data translates into a mean absolute error of about 0:58, and suggests that the recorded results really are better. 7.4 sonar data This data is from the Irvine repository. The predictors in this data set are 60 values in a sonar spectrum reflected by an object. The response is a 1 if the object is a metal cylinder, and 0 if it is a cylindrical rock. The objective was to distinguish underwater mines
Stavros J. Perantonis and Vassilis Virvilis. Input Feature Extraction for Multilayered Perceptrons Using Supervised Principal Component Analysis. Neural Processing Letters, 10. 1999.
. It comprises 768 patterns taken from patients who may show signs of diabetes. Each sample is described by 8 attributes. 4. The sonar Targets" dataset . The task is to distinguish between sonar returns from a metal cylinder and sonar returns from a cylindrically shaped rock. The set comprises 208 patterns with 60 features for each pattern. For
Jing Peng and Bir Bhanu. Feature Relevance Estimation for Image Databases. Multimedia Information Systems. 1999.
significant performance improvement across the tasks. In general, PFRL seems to outperform MARS on all the tasks. Superior performance by PFRL is particularly pronounced on the SegData and Sonar data sets. Moreover, the results show that PFRL can handle large problems with high dimensionality well by its superb performance on the 60 dimensional Sonar data set. Table 1: Average retrieval precision
Lorne Mason and Jonathan Baxter and Peter L. Bartlett and Marcus Frean. Boosting Algorithms as Gradient Descent. NIPS. 1999.
These results show that DOOM II generally outperforms AdaBoost and that the improvementis more pronounced in the presence of label noise. -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 Error advantage (%) Data set sonar cleve ionosphere vote1 credit breast-cancer pima-indians hypo1 splice 0% noise 5% noise 15% noise Figure 1: Summary of test error advantage (with standard error bars) of DOOM II over AdaBoost
Hiroshi Shimodaira and Jun Okui and Mitsuru Nakai. Modified Minimum Classification Error Learning and Its Application to Neural Networks. SSPR/SPR. 1998.
generalization performance of the original MCE learning, optimization of the network architecture and learning parameters is not very important. Table 1. Performance comparison in two-class problems Data set Cancer House Sonar # classes 2 2 2 # training data 420 265 141 # test data 279 170 67 # attributes 9 15 60 Method # hidden units 12 12 12 Bayes/ML 95.0 98.8 100.0 NN/EBP training 91.9 96.3 95.0
Richard Maclin. Boosting Classifiers Regionally. AAAI/IAAI. 1998.
always see reductions in error rate. One difference between the two methods for weighting the confidence of predictions is that the Continuous method produces significant gains for two data sets, sonar and vehicle, for which the Discrete method does not perform well. In a second set of experiments we tested the idea of using RegionBoost where the estimated accuracy for a new point is
Lorne Mason and Peter L. Bartlett and Jonathan Baxter. Direct Optimization of Margins Improves Generalization in Combined Classifiers. NIPS. 1998.
the light curveisDOOMwith` selected by cross-validation. The test errors for both algorithms are marked on the vertical axis at margin 0. can be seen in Figure 3 (Credit Application and Sonar data sets), the generalization performance of the combined classifier produced by DOOM can be as good as or better than that of the classifier produced by AdaBoost, despite having dramatically worse minimum
Thomas G. Dietterich. Machine-Learning Research. AI Magazine, 18. 1997.
fast fourier transform). The resulting ensemble classifier was able to match the performance of human experts in identifying volcanoes. Tumer and Ghosh (1996) applied a similar technique to a sonar dataset with 25 input features. However, they found that deleting even a few of the input features hurt the performance of the individual classifiers so much that the voted ensemble did not perform very
Richard Maclin and David W. Opitz. An Empirical Evaluation of Bagging and Boosting. AAAI/IAAI. 1997.
outperforms using a single classifier, but significantly outperforms Bagging (e.g., kr-vskp, letter, segmentation, and vehicle). Ada-Boosting's results are even more extreme. For certain data sets (kr-vs-kp, letter, sonar , Ada-Boosting produces a significant gain over any other method (including Arcing). On other data sets Ada-Boosting produces results that are even worse than using a
Perry Moerland and E. Fiesler and I. Ubarretxena-Belandia. Martigny - Valais - Suisse Discrete All-Positive Multilayer Perceptrons for Optical Implementation. E S E A R C H R E P R O R T I D I A P. 1997.
implementation. eXclusive OR (XOR) The training set consists of the boolean exclusive OR function. It is the classical example of a simple problem that is not linearly separable . Sonar This data set was originally used by R. Gorman and T. Sejnowski in their study of the classification of sonar signals using a neural network. The task is to discriminate between sonar signals bounced off a metal
Erin J. Bredensteiner and Kristin P. Bennett. Feature Minimization within Decision Trees. National Science Foundation. 1996.
are generated from a large set of star and galaxy images collected by Odewahn  at the University of Minnesota. Sonar Mines vs. Rocks The Sonar data set  contains sixty real-valued attributes between 0.0 and 1.0 used to define 208 mines and rocks. Attributes are obtained by bouncing sonar signals off a metal cylinder (or rock) at various angles
Alexander K. Seewald. Dissertation Towards Understanding Stacking Studies of a General Ensemble Learning Scheme ausgefuhrt zum Zwecke der Erlangung des akademischen Grades eines Doktors der technischen Naturwissenschaften.
segment Compressed glyph visualization for dataset sonar Compressed glyph visualization for dataset soybean Compressed glyph visualization for dataset vehicle Compressed glyph visualization for dataset vote Compressed glyph visualization for dataset
ESEARCH R and D. R. Ort and Perry Moerland and E. Fiesler and I. Ubarretxena-Belandia. Multilayer Perceptrons for Optical Implementation. Optical Engineering, ol.
implementation. eXclusive OR (R OR) The training set consists of the boolean exclusive OR function. It is the classical example of a simple problem that is not linearly separable . Sonar This data set was originally used by R. Gorman and T. Sejnowski in their study of the classification of sonar signals using a neural network. The task is to discriminate between sonar signals bounced off a metal
Yin Zhang and W. Nick Street. Bagging with Adaptive Costs. Management Sciences Department University of Iowa Iowa City.
: Autompg, Bupa, Glass, Haberman, Housing, Cleveland-heart-disease, Hepatitis, Ion, Pima, Sonar Vehicle, WDBC, Wine and WPBC. Some of the data sets do not originally depict two-class problems so we did some transformation on the dependent variables to get binary class labels. Specifically in our experiments, Autompg data is labeled by whether
Chiranjib Bhattacharyya. Robust Classification of noisy data using Second Order Cone Programming approach. Dept. Computer Science and Automation, Indian Institute of Science.
downloaded from UCI machine learning dataset website. Ionosphere, sonar and wiconsin breast cancer were the three different datasets. The ionosphere dataset contains 34 dimensional observations, which are obtained from radar signals, while
Andrew Watkins and Jon Timmis and Lois C. Boggess. Artificial Immune Recognition System (AIRS): An ImmuneInspired Supervised Learning Algorithm. (abw5,email@example.com) Computing Laboratory, University of Kent.
Pima diabetes data, Ionosphere data and the Sonar data set, all obtained from the repository at the University of California at Irvine . Table II shows the performance of AIRS on these data sets when compared with other popular classifiers  and ,
Perry Moerland and E. Fiesler and I. Ubarretxena-Belandia. Incorporating LCLV Non-Linearities in Optical Multilayer Neural Networks. Preprint of an article published in Applied Optics.
the exclusive or (XOR) problem, the 3-bit parity problem (Par), and the 4-bit addition problem (Add), where the modulo 2 sum of 2 numbers of 2 bits has to be calculated. Furthermore, two real-world data sets have been used, namely the sonar benchmark  and the wine data set : Sonar This data set was originally used by R. Gorman and T. Sejnowski in their study of the classification of sonar
Maria Salamo and Elisabet Golobardes. Analysing Rough Sets weighting methods for Case-Based Reasoning Systems. Enginyeria i Arquitectura La Salle.
are obtained from the UCI repository [MM98]. They are: breast cancer, glass, ionosphere, iris, led, sonar vehicle and vowel. Private datasets are from our own repository. They deal with diagnosis of breast cancer and synthetic datasets. Datasets related to diagnosis are biopsy and mammogram. Biopsy is the result of digitally processed
Jakub Zavrel. An Empirical Re-Examination of Weighted Voting for k-NN. Computational Linguistics.
40 50 60 70 80 90 100 % correct k letter "letter.majority" "letter.inverse" "letter.dudani" "letter.shepard" Figure 1: Accuracy results from experiments on the sonar isolet, PP-attach, and letter datasets as a function of k. this count. Although Shepard's function performed well on the PP-attachment dataset, it is much less robust on the UCI datasets, and only slightly outperforms majority voting.
Rudy Setiono and Huan Liu. Neural-Network Feature Selector. Department of Information Systems and Computer Science National University of Singapore.
set consists of the remaining 384 samples. Applying the ADAP algorithm trained on 576 samples, Smith et al.  achieved an accuracy rate of 76 % on the remaining 192 samples. 4. sonar Targets Dataset. The sonar returns classification dataset  consists of 208 sonar returns, each of which is represented by 60 real numbers between 0.0 and 1.0. The task is to distinguish between returns from a
Wl/odzisl/aw Duch and Jerzy J. Korczak. Optimization and global minimization methods suitable for neural networks. Department of Computer Methods, Nicholas Copernicus University.
by NOVEL and SIMANN. Genetic algorithms achieved the worst results, below 60% in all cases, being unable to find good solutions. NOVEL has also been tried on Sonar Vovel, 10-parity and NetTalk datasets from the UCI repository , using different number of hidden units, achieving very good results on the test sets, and falling behind TN-MS only in one case. From these few comparisons scattered
Christos Emmanouilidis and A. Hunter and Dr J. MacIntyre. A Multiobjective Evolutionary Setting for Feature Selection and a Commonality-Based Crossover Operator. Centre for Adaptive Systems, School of Computing, Engineering and Technology University of Sunderland.
the individual features distribution during evolution (Figure 5, lower row). Once feature selection is completed, final MLP models are built, based on the training and validation data. The sonar data set and, to a lesser extent the ionoshpere, is so sparse that employing a large number of hidden units seems to lead to overfit. Both MLP and PNN models are tested on the independent evaluation data. 0
Elena Smirnova and Ida G. Sprinkhuizen-Kuyper and I. Nalbantis and b. ERIM and Universiteit Rotterdam. Unanimous Voting using Support Vector Machines. IKAT, Universiteit Maastricht.
(polynomial kernel) for which the gain is 0.72 and the Sonar dataset (polynomial kernel) for which the gain is 0.68. 1 Note VS(I + , I - ) = VS(I + f , I - f ) VS(I + n , I - n ) and VS(I + , I - ) = VS(I + f , I - f )NVS. 6 Conclusion This paper proposes a new
Alain Rakotomamonjy. Leave-One-Out errors in Bipartite Ranking SVM. PSI CNRS FRE2645 INSA de Rouen Avenue de l'universite.
test set and the approximated bound have been evaluated. Presented results are the average results for 20 di®erent trials of the random split. Figure (4) presents the results that we achieved for datasets sonar and ionosphere. In one case, we can see that the LOPO approximated bound gives interesting result since the true test AUC plot has a similar behaviour to the 17 10 -2 10 -1 10 0 10 1 10 2 10
Hiroshi Shimodaira and Jun Okui and Mitsuru Nakai. IMPROVING THE GENERALIZATION PERFORMANCE OF THE MCE/GPD LEARNING. School of Information Science Japan Advanced Institute of Science and Technology Tatsunokuchi, Ishikawa.
parameter updating rule of (6) were set to the one obtained by the EBP learning. A. Results for Two-Class Problems Preliminary experiments were, at first, performed for two-class problems on the UCI datasets "cancer", "house" and sonar . Each dataset was divided into two groups, one was used for training and the other was used for testing. The experimental results are summarized in Table 1. It can be
Charles Campbell and Nello Cristianini. Simple Learning Algorithms for Training Support Vector Machines. Dept. of Engineering Mathematics.
include mirror symmetry , n-parity  and the two-spirals problem . The real world datasets include a sonar classification problem , the Wisconsin breast cancer dataset  and a database of handwritten digits collected by the US Postal Service . As examples of the improvements
Ayhan Demiriz and Kristin P. Bennett. Chapter 1 OPTIMIZATIONAPPROACHESTOSEMI-SUPERVISED LEARNING. Department of Decision Sciences and Engineering Systems & Department of Mathematical Sciences, Rensselaer Polytechnic Institute.
as in the previous section. Due to the long computational times for S µ VM-IQP and transductive SVM-Light, we limit our experiments to only the Heart, Housing, Ionosphere, and Sonar datasets. Linear kernel functions are used for all methods used in this section. The results given in Table 1.3 show that using unlabeled data in the case of datasets Heart and Ionosphere affects
Ronaldo C. Prati and Peter A. Flach. ROCCER: A ROC convex hull rule learning algorithm. Institute of Mathematics and Computer Science at University of So Paulo.
# attributes # examples class distribution German 20 1000 bad (30%), good (70%) Pima 8 768 1 (34,8%), 0 (65,2%) Sonar 61 208 mine (46,6%), rock (53,4%) Table 1. Datasets summary. These datasets were chosen because classifiers using standard machine learning algorithms for those datasets are reported in the literature as having AUC values lower than 80%. As
Perry Moerland. Mixtures of latent variable models for density estimation and classification. E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r Pe r cep t ua l A r t i f i c i a l Intelligence .
classifiers with Mfas are in Table 9. The set-up was as with the Bayesian Mpcas and again the Ml scores have been copied from Table 6. For the high-dimensional NIST, optical, sonar and soybean data sets, a lower value of ` was chosen both for Mpcas and Mfas. This was done partly to save computation time and partly to avoid problems with the cheap and cheerful approximation already outlined in
Stefan Aeberhard and O. de Vel and Danny Coomans. New Fast Algorithms for Variable Selection based on Classifier Performance. James Cook University.
and 59, 71 and 48 objects per class. The classes correspond to three different cultivars and the 13 variables measure 13 different constituents of the three types of resulting wines. The second data set, the sonar data , is 60 dimensional with two classes and 111 and 97 objects per class. The two different classes correspond to sonar signals bounced off a metal cylinder and reflected off a
Kristin P. Bennett and Erin J. Bredensteiner. Geometry in Learning. Department of Mathematical Sciences Rensselaer Polytechnic Institute.
signals off a metal cylinder. The Sonar signal is transmitted at various angles with rises in frequency. A similar procedure is performed to obtain the rock attributes. The publicly available Sonar dataset represents 208 mines and rocks . Sixty real-valued attributes between 0.0 and 1.0 are collected for each mine or rock. The value of the attribute represents the amount of energy within a
Carlotta Domeniconi and Bojun Yan. On Error Correlation and Accuracy of Nearest Neighbor Ensemble Classifiers. Information and Software Engineering Department George Mason University.
we consider equal priors, and thus # 1,2 =1/C C # i=1 # i 1,2 gives the total error correlation between classifiers 1 and 2. Sample results for liver and sonar are given in Tables 9-10. For each data set we summarize the average error correlation values computed between five classifiers. We also report the corresponding error rates of the ensembles. In each case simple voting is used. Weight-C is
Chris Drummond and Robert C. Holte. C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling. Institute for Information Technology, National Research Council Canada.
The bold dashed curve in Figure 2 shows the performance of C4.5 using under-sampling on the Sonar data set. Sonar has 208 instances, 111 mines and 97 rocks with 60 real valued attributes. Undersampling produces a cost curve that is reasonably cost sensitive, it is quite smooth and largely within the