Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

Statlog (Vehicle Silhouettes) Data Set

Below are papers that cite this data set, with context shown. Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info.

Return to Statlog (Vehicle Silhouettes) data set page.


Ken Tang and Ponnuthurai N. Suganthan and Xi Yao and A. Kai Qin. Linear dimensionalityreduction using relevance weighted LDA. School of Electrical and Electronic Engineering Nanyang Technological University. 2005.

to compare LDA, aPAC, WLDR, EWLDR. The six data sets are landsat, optdigits, vehicle DNA, thyroid disease and vowel data sets. Landsat. The Landsat data set is generated from landsat multi-spectral scanner image data. It has 36 dimensions, 4435


Ping Zhong and Masao Fukushima. A Regularized Nonsmooth Newton Method for Multi-class Support Vector Machines. 2005.

the starting point of the next (k + 1)th iteration. The parameters 1 and 2 in (3) are both set 0.01. In Algorithm 3.1, we replaced the standard Armijo-rule in (S.3) by 10 Table 1: Six benchmark datasets from UCI name iris wine glass vowel vehicle segment #pts 150 178 214 528 846 2310 {fiats|flats} 4 13 9 10 18 19 #cls 3 3 6 11 4 7 #pts: the number of training data; {fiats|flats}: the number of


Remco R. Bouckaert and Eibe Frank. Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms. PAKDD. 2004.

very sensitive to the particular partitioning of the anneal data. Looking at the column for naive Bayes vs. C4.5, this test could be used to justify the claim that the two perform the same for all datasets except the vehicle dataset just by choosing appropriate random number seeds. However, it could just as well be used to support the claim that the two algorithms perform differently in 19 out of 27


Dmitry Pavlov and Alexandrin Popescul and David M. Pennock and Lyle H. Ungar. Mixtures of Conditional Maximum Entropy Models. ICML. 2003.

from university computer science departments. We used all classes but others and different numbers (up to 1000) of the most frequent words. The Letter recognition, Yeast, MS Web, Vehicle and Vowel data sets were downloaded from the UC Irvine machine learning repository (Blake & Merz, 1998). In the MS Web data set, we predicted whether a user visited the free downloads" web page, given the rest of his


Gisele L. Pappa and Alex Alves Freitas and Celso A A Kaestner. Attribute Selection with a Multi-objective Genetic Algorithm. SBIA. 2002.

One disadvantage of the use of the GA is that it is computationally expensive. In the two largest data sets used in our experiments, Vehicle (with the largest number of examples) and Arritymia (with the largest number of attributes, viz. 269), a single run of the GA took about 25 minutes and 5 hours and


James Bailey and Thomas Manoukian and Kotagiri Ramamohanarao. Fast Algorithms for Mining Emerging Patterns. PKDD. 2002.

using thresholds. We see that mining with a threshold value of 4 is substantially faster than mining the complete set of JEPs using a ratio tree. Classification accuracy is degraded for three of the datasets Vehicle Waveform and Letter-recognition) though. Analysis of the vehicle and chess datasets aid in explaining this outcome (supporting figures have been excluded due to lack of space). It is


Robi Polikar and L. Upda and S. S. Upda and Vasant Honavar. Learn++: an incremental learning algorithm for supervised neural networks. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 31. 2001.

combinations were tried. An MLP with 50 hidden layer nodes and a 100 times smaller error goal of 0.001 was able to match (and slightly exceed) Learn++ performance, by classifying 95% of the TEST dataset. B. Vehicle Silhouette Database Also obtained from the UCI depository, the vehicle silhouette database consisted of 18 features from which the type of a vehicle is determined. The database consisted


Thomas G. Dietterich. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Machine Learning, 40. 2000.

the effect of classification noise, we added random class noise to nine domains (audiology, hypo, king-rook-vs-king-pawn (krkp), satimage, sick, splice, segment, vehicle and waveform). These data sets were chosen because at least one pair of the ensemble methods gave statistically significantly different performance on these domains. We did not perform noise experiments with letter-recognition


Thierry Denoeux. A neural network classifier based on Dempster-Shafer theory. IEEE Transactions on Systems, Man, and Cybernetics, Part A, 30. 2000.

by Ripley [20], 29 instances were discarded, and the remaining 185 were re-grouped in four classes: window float glass (70), window nonfloat glass (76), vehicle window glass (17) and other (22). The data set was split randomly in a training set of size 89 and a test set of size 96. Our method (with normalized outputs) was compared to three neural network classifiers: learning vector quantization (LVQ)


Richard Maclin. Boosting Classifiers Regionally. AAAI/IAAI. 1998.

always see reductions in error rate. One difference between the two methods for weighting the confidence of predictions is that the Continuous method produces significant gains for two data sets, sonar and vehicle for which the Discrete method does not perform well. In a second set of experiments we tested the idea of using RegionBoost where the estimated accuracy for a new point is


Robert E. Schapire and Yoav Freund and Peter Bartlett and Wee Sun Lee. The Annals of Statistics, to appear. Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods. AT&T Labs. 1998.

4: Error curves and margin distribution graphs for three voting methods (bagging, boosting and ECOC) using C4.5 as the base learning algorithm. Results are given for the letter, satimage and vehicle datasets. (See caption under Figure 1 for an explanation of these curves.) 13 decision stumps Boosting Bagging ECOC letter error (%) 10 100 1000 0 20 40 60 80 100 10 100 1000 0 20 40 60 80 100 10 100 1000 0


Ron Kohavi and Mehran Sahami. Error-Based and Entropy-Based Discretization of Continuous Features. KDD. 1996.

6 13 155 20.83 12 horse-colic 7 15 368 36.91 13 hypothyroid 7 18 3163 4.77 14 ionosphere 34 0 351 35.87 15 iris 4 0 150 76.67 16 sick-euthyroid 7 18 3163 9.26 17 vehicle 18 0 846 77.41 Table 1: Datasets, the number of continuous features, nominal features, dataset size, and baseline error (majority inducer on the 10 folds). MDL ErrorMin-T2 ErrorMin-MDL C4.5-Disc 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15


Ron Kohavi. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. IJCAI. 1995.

rand, with 20 Boolean features and a Boolean random label. On one dataset, vehicle the generalization accuracy of the Naive-Bayes algorithm deteriorated by more than 4% as more instances were given. A similar phenomenon was observed on the shuttle dataset. Such a


Jeffrey P. Bradford and Clayton Kunz and Ron Kohavi and Clifford Brunk and Carla Brodley. Appears in ECML-98 as a research note Pruning Decision Trees with Misclassification Costs. School of Electrical Engineering.

classification based on census bureau data), breast cancer diagnosis, chess, crx (credit), german (credit), pima diabetes, road (dirt), satellite images, shuttle, and vehicle In choosing the datasets, we decided on the following desiderata: 1. Datasets should be two-class to make the evaluation easier. This desideratum was hard to satisfy and we resorted to converting several multi-class


Gisele L. Pappa and Alex Alves Freitas and Celso A A Kaestner. AMultiobjective Genetic Algorithm for Attribute Selection. Computing Laboratory Pontificia Universidade Catolica do Parana University of Kent at Canterbury.

none of the MOFSS-found solutions were dominated by the baseline solution (the set of all attributes). In general, MOFSS found more solutions than the GA, except in the Vehicle data set. In 2 data sets (Arrhytmia and Crx) the majority of solutions found by MOFSS dominate the baseline solution. However, in the other 4 data sets the majority of solutions found by MOFSS are neutral.


Chih-Wei Hsu and Cheng-Ru Lin. A Comparison of Methods for Multi-class Support Vector Machines. Department of Computer Science and Information Engineering National Taiwan University.

iris, wine, glass, and vowel. Those problems had already been tested in [27]. From Statlog collection we choose all multi-class datasets: vehicle segment, dna, satimage, letter, and shuttle. Note that except problem dna we scale all training data to be in [-1, 1]. Then test data are adjusted using the same linear transformation.


Yin Zhang and W. Nick Street. Bagging with Adaptive Costs. Management Sciences Department University of Iowa Iowa City.

[2]: Autompg, Bupa, Glass, Haberman, Housing, Cleveland-heart-disease, Hepatitis, Ion, Pima, Sonar, Vehicle WDBC, Wine and WPBC. Some of the data sets do not originally depict two-class problems so we did some transformation on the dependent variables to get binary class labels. Specifically in our experiments, Autompg data is labeled by whether


H. Altay Guvenir. A Classification Learning Algorithm Robust to Irrelevant Features. Bilkent University, Department of Computer Engineering and Information Science.

VFI5 1NN 3NN 5NN 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Number of irrelevant features added 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Classification accuracy Vehicle data set VFI5 1NN 3NN 5NN 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Number of irrelevant features added 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Classification accuracy Wine data set VFI5 1NN


Alexander K. Seewald. Dissertation Towards Understanding Stacking Studies of a General Ensemble Learning Scheme ausgefuhrt zum Zwecke der Erlangung des akademischen Grades eines Doktors der technischen Naturwissenschaften.

soybean Compressed glyph visualization for dataset vehicle Compressed glyph visualization for dataset vote Compressed glyph visualization for dataset vowel Compressed glyph visualization for dataset zoo Figure 8.7: Glyph visualization for datasets


Adil M. Bagirov and Alex Rubinov and A. N. Soukhojak and John Yearwood. Unsupervised and supervised data classification via nonsmooth and global optimization. School of Information Technology and Mathematical Sciences, The University of Ballarat.

Diabetes dataset, Liver disorder dataset and Vehicle dataset. The description of these datasets can be found in Appendix. We studied these datasets, using different subsets of features and different numbers of


Ron Kohavi and George H. John. Automatic Parameter Selection by Minimizing Estimated Error. Computer Science Dept. Stanford University.

by replacing a node's test with the test at one of its children, so perhaps m=1 gives more latitude in the pruning phase. Information-gain (turning the g parameter on) was a big winner on several datasets: vehicle segment, hypothyroid, heart, and cleve. Turning on the s parameter helped in tic-tactoe and monk1. Table 5: Experimental results: Accuracies for C4.5, C4.5-AP, and C4.5* from running on


Rajesh Parekh and Jihoon Yang and Vasant Honavar. Constructive Neural-Network Learning Algorithms for Pattern Classification.

Outputs represents the number of output classes, and Attributes describes the type of input attributes of the patterns. The real-world datasets ionosphere, pima, segmentation, and vehicle are available at the UCI Machine Learning Repository [34] while the 3-circles dataset was artificially generated. The 3-circles dataset comprises of 1800


Vikas Sindhwani and P. Bhattacharya and Subrata Rakshit. Information Theoretic Feature Crediting in Multiclass Support Vector Machines.

0.02 0.04 0 20 40 60 80 100 120 140 160 180 Credits Features Vehicle and Satellite Datasets The Vehicle dataset is a multiclass pattern recognition problem of classifying a given silhouette as one of four types of vehicle. There are 18 features. The Satellite dataset is a 6-class and


Maria Salamo and Elisabet Golobardes. Analysing Rough Sets weighting methods for Case-Based Reasoning Systems. Enginyeria i Arquitectura La Salle.

are obtained from the UCI repository [MM98]. They are: breast cancer, glass, ionosphere, iris, led, sonar, vehicle and vowel. Private datasets are from our own repository. They deal with diagnosis of breast cancer and synthetic datasets. Datasets related to diagnosis are biopsy and mammogram. Biopsy is the result of digitally processed


Ronaldo C. Prati and Peter A. Flach. ROCCER: an Algorithm for Rule Learning Based on ROC Analysis. Institute of Mathematics and Computer Science University of S~ ao Paulo.

351 64.10 10 Kr-vs-Kp 37 3196 52.22 11 Letter-a 17 20000 96.06 12 New-thyroid 6 215 83.72 13 Nursery 9 12960 97.45 14 Pima 9 768 65.10 15 Satimage 37 6435 90.27 16 Vehicle 19 846 76.48 Table 1: UCI data sets used in our experiments. been modified to incorporate the induction of unordered rule sets and Laplace error correction as evaluation function [Clark and Boswell, 1991] . Ripper [Cohen, 1995]


Return to Statlog (Vehicle Silhouettes) data set page.

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML