PenBased Recognition of Handwritten Digits Data Set
Below are papers that cite this data set, with context shown.
Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info.
Return to PenBased Recognition of Handwritten Digits data set page.
Ken Tang and Ponnuthurai N. Suganthan and Xi Yao and A. Kai Qin. Linear dimensionalityreduction using relevance weighted LDA. School of Electrical and Electronic Engineering Nanyang Technological University. 2005.
is generated from landsat multispectral scanner image data. It has 36 dimensions, 4435 training samples and 2000 testing samples belonging to 6 classes. Optdigits. This is a 60dimensional data set on optical recognition of 10 handwritten digits It has separate training and testing sets with 3823 and 1797 samples, respectively. Vehicle. This data set involves classification of a given
Mikhail Bilenko and Sugato Basu and Raymond J. Mooney. Integrating constraints and metric learning in semisupervised clustering. ICML. 2004.
used by Xing et al. (2003) and BarHillel et al. (2003), and randomly sampled subsets from the Digits and Letters handwritten character recognition datasets, also from the UCI repository. For Digits and Letters, we chose two sets of three classes: {I, J, L} from Letters and {3, 8, 9} from Digits, sampling 10% of the data points from the original
Fabian Hoti and Lasse Holmström. A semiparametric density estimation approach to pattern classification. Pattern Recognition, 37. 2004.
classifiers tried in [17]. The classification error of both KDA and QDA was 3:7%. Using a socalled convex local subspace classifier, a smaller error rate of 2.1% was reported in [18]. 3.2.2 Public data sets 1: satellite image and handwritten digits Next we consider two public data sets obtained from the UCI Machine Learning Repository [19]. The first example is a satellite image data set with 4435
Thomas Serafini and G. Zanghirati and Del Zanna and T. Serafini and Gaetano Zanghirati and Luca Zanni. DIPARTIMENTO DI MATEMATICA. Gradient Projection Methods for. 2003.
and show how the GVPMs can be a valuable alternative to both SPGM(# (1,2) ) and ALSPGMs. To evaluate the above methods on some QP problems of the form (2) we train Gaussian SVMs on two realworld data sets: the MNIST database of handwritten digits [24] and the UCI Adult data set [27]. These experiments are carried out on a Compaq XP1000 workstation at 667MHz with 1GB of RAM, with standard C codes.
Manoranjan Dash and Huan Liu and Peter Scheuermann and KianLee Tan. Fast hierarchical clustering and its validation. Data Knowl. Eng, 44. 2003.
consists of 10,992 objects in 16 dimensions. There are 10 classes corresponding to digits 0...9. The 16 dimensions are drawn by resampling from handwritten digits. Iris dataset has 150 points in 4 dimensions in 3 clusters. Dimensions are sepal length, sepal width, petal length, and petal width. Clusters are Iris Setosa, Iris Versicolour, and Iris Virginia. Each of the 3
Dennis DeCoste. Anytime QueryTuned Kernel Machines via Cholesky Factorization. SDM. 2003.
Lanckriet, L. E. Ghaoui, C. Bhattacharyya, and M. I. Jordan. Minmax probability machine. Advances in Neural Information Processing Systems (NIPS) 14, 2002. [10] Y. LeCun. MNIST handwritten digits dataset. Available at http://www.research.att.com/ #yann/ ocr/ mnist/, 2000. [11] S. Mika, G. Ratsch, and K.R. Muller. A mathematical programming approach to the kernel Fisher algorithm. In Advances in
Greg Hamerly and Charles Elkan. Learning the k in kmeans. NIPS. 2003.
them slow for more than 8 to 12 dimensions. All our code is written in Matlab; Xmeans is written in C. 3.1 Discovering true clusters in labeled data We tested these algorithms on two realworld datasets for handwritten digit recognition the NIST dataset [12] and the Pendigits dataset [2]. The goal is to cluster the data without knowledge of the labels and measure how well the clustering captures
Marina Meila and Michael I. Jordan. Learning with Mixtures of Trees. Journal of Machine Learning Research, 1. 2000.
on. 5.2 Density estimation experiments 5.2.1 digits and digit pairs images Our first density estimation experiment involved a subset of binary vector representations of handwritten digits. The datasets consist of normalized and quantized 8×8 binary images of handwritten digits made available by the US Postal Service Office for Advanced Technology. One datasetwhich we refer to as the "digits"
Ethem Alpaydin. Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms. Neural Computation, 11. 1999.
2 no VOWEL 2 no ODR 8 yes DIGIT 7 yes PEN 10 yes changing the numerator where we compare a single layer perceptron (LP) with a multilayer perceptron with one hidden layer (MLP). ODR, DIGIT are two datasets on optical handwritten digit recognition and PEN is on penbased handwritten digit recognition. These three datasets are available from the author. The other datasets are from the UCI repository
Georg Thimm and Emile Fiesler. IDIAP Technical report High Order and Multilayer Perceptron Initialization. IEEE Transactions. 1994.
in the same row or the same column in the image. This configuration should allow the extraction of sufficient features to learn the digits Training sessions on the in section 4.1 described digits data set gave an acceptable recognition of untrained digits, despite the small training 7 Non10E5 0.001 0.01 0.1 10E6 10 10E4 1 Convergence Time log( Initial Weight Variance ) B gence A ConverFigure 2:
Perry Moerland. Mixtures of latent variable models for density estimation and classification. E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r Pe r cep t ua l A r t i f i c i a l Intelligence .
of the ten digit classes with a 10component 10factor Mfa. This is comparable to the best scores we obtained with large Mlps. 7.3 Experiments: MNIST Data We also performed experiments on the MNIST data set, a large collection of handwritten digits which can be obtained from (LeCun 2000). It comes as 28#28 grey level images in a training set of 60,000 examples and a test set of 10,000 examples. Each of
Luca Zanni. An Improved Gradient Projectionbased Decomposition Technique for Support Vector Machines. Dipartimento di Matematica, Universitdi Modena e Reggio Emilia.
sized 49749. The Gaussian SVM parameters are: C = 5 and ¾ = p 10. . MNIST data set The MNIST database of handwritten digits [18] contains 784dimensional nonbinary sparse vectors; the size of the database is 60000 and the sparsity level of the inputs is ¼ 81%. A Gaussian SVM for
Adil M. Bagirov and John Yearwood. A new nonsmooth optimization algorithm for clustering. Centre for Informatics and Applied Optimization, School of Information Technology and Mathematical Sciences, University of Ballarat.
of all features were 1. In numerical experiments we take = 10 2 and the initial number of clusters q 0 = 2 with c = 1.5 for penbased recognition of handwritten digits and the image segmentation data sets whereas for all others we used the entire data set. First we applied Algorithm 3.1 to calculate clusters. Then the kmeans algorithm was applied with the same number of clusters as calculated by
Ahmed Hussain Khan and Intensive Care. MultiplierFree Feedforward Networks. 174.
forwardpass capability. It differs from the conventional model in restricting its synapses to the set{ 1, 0, 1} while allowing unrestricted offsets. Simulation results on the `onset of diabetes' data set and a handwritten numeral recognition database indicate that the new network, despite having strong constraints on its synapses, has a generalization performance similar to that of its conventional
Adil M. Bagirov and Alex Rubinov and A. N. Soukhojak and John Yearwood. Unsupervised and supervised data classification via nonsmooth and global optimization. School of Information Technology and Mathematical Sciences, The University of Ballarat.
was donated by Richard S. Forsyth BUPA Medical research Ltd. It contains 2 classes, 345 observations and 6 attributes. Penbased recognition of handwritten digits (Pendig) This dataset was introduced by E. Alpaydin and Fevzi Alimoglu. It contains 10 classes, 10992 observations, 16 attributes. All input attributes are integers 1. . . 100. Satellite image (SatIm, image segmentation)
Georg Thimm and Emile Fiesler. High Order and Multilayer Perceptron Initialization.
in the same row or the same column in the image. This configuration should allow the extraction of sufficient features to learn the digits Training sessions on the in section IVA described digits data set gave an acceptable recognition of untrained digits, despite the small training set used. The three different initial random weight distributions used are: uniform on the interval [Gamma a; a] (with
Adil M. Bagirov and Julien Ugon. An algorithm for computation of piecewise linear function separating two sets. CIAO, School of Information Technology and Mathematical Sciences, The University of Ballarat.
Training Test I J i  a 2c a mc a 2c a mc fct eval. DG. eval. Penbased recognition of handwritten dataset 1 1 97.27 96.74 96.43 92.37 1597 1146 2 1 99.31 99.44 98.33 96.14 2607 1852 3 1 99.79 99.92 98.89 96.20 3040 2220 2 2 99.80 99.95 98.99 96.03 3083 2306 3 2 99.83 99.87 99.12 95.88 1806 1268 3 3
Charles Campbell and Nello Cristianini. Simple Learning Algorithms for Training Support Vector Machines. Dept. of Engineering Mathematics.
include a sonar classification problem [14], the Wisconsin breast cancer dataset [35] and a database of handwritten digits collected by the US Postal Service [17]. As examples of the improvements with generalisation ability which can be achieved with a soft margin we will also
