ISOLET Data Set
Below are papers that cite this data set, with context shown.
Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info.
Return to ISOLET data set page.
Jaakko Peltonen and Samuel Kaski. Discriminative Components of Data. IEEE. 2004.
whose properties are summarized in Table I. The Landsat, Isolet and Multiple Features (MFeat) data sets are from UCI Machine Learning Repository , LVQ PAK refers to the Finnish acoustic phoneme data distributed with the LVQ-PAK , and TIMIT refers to phoneme data from the Darpa TIMIT acoustic
Vassilis Athitsos and Stan Sclaroff. Boosting Nearest Neighbor Classifiers for Multiclass Recognition. Boston University Computer Science Tech. Report No, 2004-006. 2004.
and the best result attained among the 6 variations of
aive" k-nn classification. For our method we also provide the standard deviation across multiple trials, except for the isolet dataset where we only ran one trial of our algorithm. Dataset Boost-NN Allwein Naive k-nn glass 24.4 # 1.7 25.2 26.8 isolet 6.5 5.3 7.6 letter 3.5 # 0.2 7.1 4.5 pendigits 3.9 # 0.6 2.9 2.2 satimage 9.6 #
David Littau and Daniel Boley. Using Low-Memory Representations to Cluster Very Large Data Sets. SDM. 2003.
to 0 centroids are a PDDP clustering of a low-memory representation using the centroid in C closest to a data item to approximate that data item. For the data examined, the results are better if the dataset isolet k1 reuters forest m 7997 2340 9494 581012 n 617 21839 19968 54 categories 26 20 66 7 # dense 0.68% 0.20% dense k s 5 5 5 5 k c 150 50 100 500 k z 5 5 5 1 k f 150 50 100 500 Table 3: Datasets
Inderjit S. Dhillon and Dharmendra S. Modha and W. Scott Spangler. Class visualization of high-dimensional data with applications. Department of Computer Sciences, University of Texas. 2002.
(a d#n matrix where n is the number of data points), which has a computational complexity of O(d 2 n). Typically n is much larger than k as can be seen from two sample data sets we will use later in this paper. In the ISOLET speech recognition data set, k = 26, n = 7797 and d = 617, while in the PENDIGITS example k = 10, n = 10992 and d = 16. The non-linear SOM and MDS
Erin L. Allwein and Robert E. Schapire and Yoram Singer. Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers. ICML. 2000.
according to the number of classes): Dermatology, Satimage, Glass, Segmentation, E-coli, Pendigits, Yeast, Vowel, Soybean-large, Thyroid, Audiology, Isolet Letterrecognition. The proprties of the datasets are summarized in Table 1. In the SVM experiments, we 22 REDUCING MULTICLASS TO BINARY Hamming Decoding Problem One-vs-all Complete All-Pairs Dense Sparse dermatology 5.0 4.2 3.1 3.9 3.6 satimage
Khaled A. Alsabti and Sanjay Ranka and Vineet Singh. CLOUDS: A Decision Tree Classifier for Large Datasets. KDD. 1998.
are taken from the STATLOG project, which has been a widely used benchmark in classification. 3 The Abalone," Waveform," and Isolet datasets can be found in . The Synth1" and Synth2" datasets have been used in [15, 17] for evaluating SLIQ and SPRINT; they have been referred to as the Function2" dataset. The main parameter of our
Hiroshi Shimodaira and Jun Okui and Mitsuru Nakai. Modified Minimum Classification Error Learning and Its Application to Neural Networks. SSPR/SPR. 1998.
#. This shows the proposed approach is more effective than the McDermott's approach  discussed in Section 3. B. Results of Multi-Class Problems In order to evaluate the performance on different datasets, speech database isolet (isolated alphabet letters) of the UCI repository, and "vowels" (Japanese five vowels) made from the ATR continuous speech database "Set-B" were collected. In the "isolet"
Thomas G. Dietterich and Ghulum Bakiri. Solving Multiclass Learning Problems via Error-Correcting Output Codes. CoRR, csAI/9501101. 1995.
employed in the study. The glass, vowel, soybean, audiologyS, ISOLET letter, and NETtalk data sets are available from the Irvine Repository of machine learning databases (Murphy & Aha, 1994). 1 The POS (part of speech) data set was provided by C. Cardie (personal communication); an earlier
Shlomo Dubnov and Ran El and Yaniv Technion and Yoram Gdalyahu and Elad Schneidman and Naftali Tishby and Golan Yona. Clustering By Friends : A New Nonparametric Pairwise Distance Based Clustering Algorithm. Ben Gurion University.
procedure of the cross-validation index (see Section 3) and we only report the resulting cross-validation indices obtained during the computations. In section 5.1 we consider the classical Iris data sets. Then, in section 5.2 we consider the Isolet data set. An application to musical data is considered in section 5.3. 5.1. The Iris Data This data set, due to Fisher (Fisher, 1936), is a classic
Jakub Zavrel. An Empirical Re-Examination of Weighted Voting for k-NN. Computational Linguistics.
40 50 60 70 80 90 100 % correct k letter "letter.majority" "letter.inverse" "letter.dudani" "letter.shepard" Figure 1: Accuracy results from experiments on the sonar, isolet PP-attach, and letter datasets as a function of k. this count. Although Shepard's function performed well on the PP-attachment dataset, it is much less robust on the UCI datasets, and only slightly outperforms majority voting.
Hiroshi Shimodaira and Jun Okui and Mitsuru Nakai. IMPROVING THE GENERALIZATION PERFORMANCE OF THE MCE/GPD LEARNING. School of Information Science Japan Advanced Institute of Science and Technology Tatsunokuchi, Ishikawa.
and network architecture Dataset #classes #attributes #hidden nodes isolet UCI) 26 617 32 vowels(ATR) 5 12 12 80 85 90 95 100 0.001 0.01 0.1 1 10 Correct classification rate [%] g mMCE Figure. 2: Classification performance for the