Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

Statlog (Landsat Satellite) Data Set

Below are papers that cite this data set, with context shown. Papers were automatically harvested and associated with this data set, in collaboration with

Return to Statlog (Landsat Satellite) data set page.

Ken Tang and Ponnuthurai N. Suganthan and Xi Yao and A. Kai Qin. Linear dimensionalityreduction using relevance weighted LDA. School of Electrical and Electronic Engineering Nanyang Technological University. 2005.

to compare LDA, aPAC, WLDR, EWLDR. The six data sets are landsat optdigits, vehicle, DNA, thyroid disease and vowel data sets. Landsat. The Landsat data set is generated from landsat multi-spectral scanner image data. It has 36 dimensions, 4435

Jaakko Peltonen and Arto Klami and Samuel Kaski. Improved Learning of Riemannian Metrics for Exploratory Analysis. Improved Learning of Riemannian Metrics for Exploratory Analysis. Neural Networks. 2004.

used for empirical comparisons Data set Dimensions Classes Samples Landsat Satellite Data a 36 6 6435 Letter Recognition Data a 16 26 20000 Phoneme Data b 20 13 3656 TIMIT Data from (TIMIT, 1998) 12 41 14994 a from the UCI Machine

Fabian Hoti and Lasse Holmström. A semiparametric density estimation approach to pattern classification. Pattern Recognition, 37. 2004.

classifiers tried in [17]. The classification error of both KDA and QDA was 3:7%. Using a so-called convex local subspace classifier, a smaller error rate of 2.1% was reported in [18]. 3.2.2 Public data sets 1: satellite image and handwritten digits Next we consider two public data sets obtained from the UCI Machine Learning Repository [19]. The first example is a satellite image data set with 4435

Giorgio Valentini. Random Aggregated and Bagged Ensembles of SVMs: An Empirical Bias?Variance Analysis. Multiple Classifier Systems. 2004.

software library [13] and the SVMlight applications [9]. 4.2 Results In particular we analyzed the relationships of the components of the error with the kernels and kernel parameters, using data sets from UCI [14] (Waveform, Grey Landsat Letter-Two, Letter-Two with added noise, Spam, Musk) and the P2 synthetic data set 1 . We achieved a characterization of the bias--variance decomposition of

Xiaoli Z. Fern and Carla Brodley. Cluster Ensembles for High Dimensional Clustering: An Empirical Study. Journal of Machine Learning Research n, a. 2004.

(6 letters only) UCI ML archive mfeat Handwritten digits represented by Fourier coefficients (Blake and Merz, 1998) satimage StatLog Satellite image data set (training set) segmentation Image segmentation data In contrast, HBGF allows the similarity of instances and the similarity of clusters to be considered simultaneously in producing the final

Jaakko Peltonen and Samuel Kaski. Discriminative Components of Data. IEEE. 2004.

Number of Number of Dataset Dimensionality classes samples Landsat 36 6 4435 LVQ PAK 20 13 3656 Isolet 30 26 3742 MFeat 76 10 1500 TIMIT 12 41 14994 to indirect measures; we will make sure, however, that the comparisons

S. Augustine Su and Jennifer G. Dy. Automated hierarchical mixtures of probabilistic principal component analyzers. ICML. 2004.

(toy, oil, and glass), which are well modeled by mixtures of Gaussians. PPCA with fewer clusters has a comparable performance with EM + PCA on the large data sets (optical digits, satellite image, segment), except for the letter data, where PPCA performed better. Finally, mixtures of PPCA are worse than PCA+EM on the chart and wine data. Upon closer

Giorgio Valentini and Thomas G. Dietterich. Low Bias Bagged Support Vector Machines. ICML. 2003.

Waveform Linear 0.0811 0.0821 0.0955 2-3-0 5-0-0 5-0-0 Polyn. 0.0625 0.0677 0.0698 2-3-0 2-3-0 3-2-0 Gauss. 0.0574 0.0653 0.0666 4-1-0 4-1-0 2-3-0 Data set Grey Landsat Linear 0.0508 0.0510 0.0601 0-5-0 3-2-0 3-2-0 Polyn. 0.0432 0.0493 0.0535 1-4-0 2-3-0 1-4-0 Gauss. 0.0475 0.0486 0.0483 1-3-1 1-3-1 0-5-0 Data set Letter-Two Linear 0.0832 0.0864 0.1011

Zoubin Ghahramani and Hyun-Chul Kim. Bayesian Classifier Combination. Gatsby Computational Neuroscience Unit University College London. 2003.

and using different component classifiers. We used Satellite and DNA data sets from the Statlog project([8]) and the UCI digit data set ([1]) 3 . Our goal was not to obtain the best classifier performance---for this we would have paid very careful attention to the component

Giorgio Valentini. Ensemble methods based on bias--variance analysis Theses Series DISI-TH-2003. Dipartimento di Informatica e Scienze dell'Informazione . 2003.

analysis with single SVMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.3 Procedure to perform bias--variance analysis on single SVMs . . . . . . . . 48 4.4 Grey Landsat data set. Error (a) and its decomposition in bias (b), net variance (c), unbiased variance (d), and biased variance (e) in SVM RBF, varying both C and ¾. . . . . . . . . . . . . . . . . . . . . . . . . . . .

Peter Sykacek and Stephen J. Roberts. Adaptive Classification by Variational Kalman Filtering. NIPS. 2002.

ionosphere data, balance scale weight and distance data and the wine recognition database, all taken from the StatLog database which is available at the UCI repository ([4]). The satellite image data set is used as is provided with 4435 samples in the training and 2000 samples in the test set. Vehicle data are merged such that we have 500 samples in the training and 252 in the test set. The other

Igor V. Tetko. Associative Neural Network. Neural Processing Letters, 16. 2002.

of such data without a need to retrain the neural network ensemble. Applications of ASNN for prediction of lipophilicity of chemical compounds and classification of UCI letter and satellite data set are presented. The developed algorithm is available on-line at Key words. associative memory, bias correction, classification, function approximation,

Jaakko Peltonen and Arto Klami and Samuel Kaski. Learning More Accurate Metrics for Self-Organizing Maps. ICANN. 2002.

Data set Dimensions Classes Samples Landsat Satellite Data * 36 6 6435 Letter Recognition Data * 16 26 20000 Phoneme Data from LVQ PAK [8] 20 14 3656 TIMIT Data from [10] 12 41 14994 Bankruptcy Data used in

Stephen D. Bay. Multivariate Discretization for Set Mining. Knowl. Inf. Syst, 3. 2001.

of variables and number of records) and more detailed (i.e. standard census variables such as industry-code or occupation are recorded at a more detailed level in this database). { SatImage. This data set was generated from Landsat Multi-Spectral Scanner image data (i.e. it is a satellite image). It contains multi-spectral values for 3#3 pixel neighborhood and the soil type (e.g. red soil, cotton

Kagan Tumer and Joydeep Ghosh. Robust Combining of Disparate Classifiers through Order Statistics. CoRR, csLG/9905013. 1999.

were trained half as along as they would have been, had they been stand-alone classifiers. 5 The number of hidden units was determined experimentally. 15 ffl Satellite a 36-dimensional, 6-class data set with 6435 examples of feature vectors extracted from satellite imagery; an MLP with 20 hidden units. These three sets were chosen as they have relatively large number of features, somewhat large

Kagan Tumer and Nikunj C. Oza. Decimated Input Ensembles for Improved Generalization. NASA Ames Research Center. 1999.

improvement in the classification accuracy through ensembles. For the Gene data, the average combiner was significantly more accurate than the single MLP, while for the Satellite Image and Splice data sets, the combiner was only marginally more accurate. TABLE I Average Accuracy of Original Network and Combiners Single Average Corr. Gene 83.417 Sigma .796 86.418 Sigma .342 .7910 Splice 84.722

Xavier Giannakopoulos and Juha Karhunen and Erkki Oja. An Experimental Comparison of Neural Algorithms for Independent Component Analysis and Blind Separation. Int. J. Neural Syst, 9. 1999.

methods for real-world data For comparing the performance and properties of the studied ICA or BSS algorithms in more practical circumstances, we made experiments with three dioeerent real-world data sets. These data sets, namely crab data, satellite data, and MEG artefact data, will be brieAEy discussed in context with the respective results in the next section. Because we are now dealing with

Cesar Guerra-Salcedo and L. Darrell Whitley. Genetic Approach to Feature Selection for Ensemble Creation. GECCO. 1999.

the attributes values are 0 or 1. In the Segment dataset the attributes values are floats. In the LandSat dataset the attribute values are integers. Dataset Features Classes Train Size Test Size LandSat 36 6 4435 2000 DNA 180 39 2000 1186 Segment 19 7 210

Robert E. Schapire and Yoav Freund and Peter Bartlett and Wee Sun Lee. The Annals of Statistics, to appear. Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods. AT&T Labs. 1998.

Each stimulus was converted into 16 primitive numerical attributes (statistical moments and edge counts) which were then scaled to fit into a range of integer values from 0 through 15." The satimage dataset is the statlog version of a satellite image dataset. According to the documentation, "This database consists of the multi-spectral values of pixels in 3 # 3 neighborhoods in a satellite image, and

Je Scott and Mahesan Niranjan and Richard W. Prager. Realisable Classifiers: Improving Operating Performance on Variable Cost Problems. Cambridge University Department of Engineering.

the convex hull over the Test data ROC curves. The ROC curves for System 1 and System 2 using the Unseen data are plotted for comparison with the MRROC. 3.3 LandSat data A LandSat image segmentation dataset, originally used in the Statlog project, was obtained from the UCI repository [13, 12]. The data consisted of multi-spectral values of pixels in 3 # 3 neighbourhoods in a satellite image. A

Vikas Sindhwani and P. Bhattacharya and Subrata Rakshit. Information Theoretic Feature Crediting in Multiclass Support Vector Machines.

is a 6-class and 36-feature dataset containing Landsat satellite data. We tested SVM-Infoprop for non-linear SVMs with these datasets. Support Vector Machines with polynomial kernels of degree 2 were trained with 564 examples and

Jaakko Peltonen and Arto Klami and Samuel Kaski. Learning Metrics for Information Visualization. Neural Networks Research Centre Helsinki University of Technology.

estimates of auxiliary data, at the winner units of test samples. This measure has a slightly unintuitive corollary: since it requires a density estimator, even though traditional SOMs are not Data set n C N Landsat Satellite Data [1] 36 6 6435 Letter Recognition Data [1] 16 26 20000 LVQ PAK (Phoneme) [6] 20 13 3656 TIMIT Data [11] 12 41 14994 Table 1: The data sets and their dimensionality (n),

C. esar and Cesar Guerra-Salcedo and Darrell Whitley. Feature Selection Mechanisms for Ensemble Creation : A Genetic Search Perspective. Department of Computer Science Colorado State University.

the RSM method. To classify an unseen case x, each classifier in the ensemble votes on the class for x. The class with the most votes is the class predicted by the ensemble (majority-vote scheme). Dataset Features Classes Train Size Test Size LandSat 36 6 4435 2000 DNA 180 39 2000 1186 Segment 19 7 210 2100 Cloud 204 10 1000 633 Table 1: Dataset employed for the experiments. In the DNA dataset the

Grigorios Tsoumakas and Ioannis P. Vlahavas. Fuzzy Meta-Learning: Preliminary Results. Greek Secretariat for Research and Technology.

that have large number of classes. These were the Satellite and `Segment' data sets again from the Machine Learning Repository. The setup of the experiments was the same as in the case of testing the meta-fuzzy scheme. Table 2 presents the results. Table 2: Results with the

Xavier Giannakopoulos and Juha Karhunen and Erkki Oja. A COMPARISON OF NEURAL ICA ALGORITHMS USING REAL-WORLD DATA. IDSIA.

roughly optimal [12]. On-line estimation of kurtosis was added also to this algorithm. 4. COMPARISON METHODS We have made experiments with crab data, satellite data, and MEG artefact data. These data sets will be brieAEy discussed in context with experimental results. Because we are now dealing with real-world data, the assumptions made on the ICA model (1) may not hold, or hold only approximately.

Adil M. Bagirov and Julien Ugon. An algorithm for computation of piecewise linear function separating two sets. CIAO, School of Information Technology and Mathematical Sciences, The University of Ballarat.

to be known. In further research some methods to find automatically this number will 19 Table 2: Results of numerical experiments with Shuttle control, Letter recognition and Landsat satellite image datasets Training Test |I| |J i | a 2c a mc a 2c a mc fct eval DG eval Shuttle control dataset 1 1 97.61 97.22 97.53 97.00 925 615 2 1 99.44 97.56 99.41 97.42 2148 1676 3 1 99.61 97.57 99.59 97.50 1474 968

Giorgio Valentini. An experimental bias--variance analysis of SVM ensembles based on resampling techniques.

we randomly split all the available data in a training and a test set of about equal size, except for the Grey Landsat data set for which we maintained the original size for both the training and test set. To measure the bias--variance decomposition of error, for each data set we used 100 sets (parameter n = 100 in Sect.

Cesar Guerra-Salcedo and Stephen Chen and Darrell Whitley and Sarah Smith. Fast and Accurate Feature Selection Using Hybrid Genetic Strategies. Department of Computer Science Colorado State University.

were employed and one artificially generated classification problem. The real-world classification problems are: satellite classification dataset LandSat , a DNA classification dataset and a Cloud classification dataset. On the other hand, the artificially generated classification problem rely on a LED identification problem. LED cases are

Return to Statlog (Landsat Satellite) data set page.

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML