Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

Pen-Based Recognition of Handwritten Digits Data Set

Below are papers that cite this data set, with context shown. Papers were automatically harvested and associated with this data set, in collaboration with

Return to Pen-Based Recognition of Handwritten Digits data set page.

Ken Tang and Ponnuthurai N. Suganthan and Xi Yao and A. Kai Qin. Linear dimensionalityreduction using relevance weighted LDA. School of Electrical and Electronic Engineering Nanyang Technological University. 2005.

is generated from landsat multi-spectral scanner image data. It has 36 dimensions, 4435 training samples and 2000 testing samples belonging to 6 classes. Optdigits. This is a 60-dimensional data set on optical recognition of 10 handwritten digits It has separate training and testing sets with 3823 and 1797 samples, respectively. Vehicle. This data set involves classification of a given

Mikhail Bilenko and Sugato Basu and Raymond J. Mooney. Integrating constraints and metric learning in semi-supervised clustering. ICML. 2004.

used by Xing et al. (2003) and Bar-Hillel et al. (2003), and randomly sampled subsets from the Digits and Letters handwritten character recognition datasets, also from the UCI repository. For Digits and Letters, we chose two sets of three classes: {I, J, L} from Letters and {3, 8, 9} from Digits, sampling 10% of the data points from the original

Fabian Hoti and Lasse Holmström. A semiparametric density estimation approach to pattern classification. Pattern Recognition, 37. 2004.

classifiers tried in [17]. The classification error of both KDA and QDA was 3:7%. Using a so-called convex local subspace classifier, a smaller error rate of 2.1% was reported in [18]. 3.2.2 Public data sets 1: satellite image and handwritten digits Next we consider two public data sets obtained from the UCI Machine Learning Repository [19]. The first example is a satellite image data set with 4435

Greg Hamerly and Charles Elkan. Learning the k in k-means. NIPS. 2003.

them slow for more than 8 to 12 dimensions. All our code is written in Matlab; X-means is written in C. 3.1 Discovering true clusters in labeled data We tested these algorithms on two real-world datasets for handwritten digit recognition the NIST dataset [12] and the Pendigits dataset [2]. The goal is to cluster the data without knowledge of the labels and measure how well the clustering captures

Thomas Serafini and G. Zanghirati and Del Zanna and T. Serafini and Gaetano Zanghirati and Luca Zanni. DIPARTIMENTO DI MATEMATICA. Gradient Projection Methods for. 2003.

and show how the GVPMs can be a valuable alternative to both SPGM(# (1,2) ) and AL-SPGMs. To evaluate the above methods on some QP problems of the form (2) we train Gaussian SVMs on two real-world data sets: the MNIST database of handwritten digits [24] and the UCI Adult data set [27]. These experiments are carried out on a Compaq XP1000 workstation at 667MHz with 1GB of RAM, with standard C codes.

Manoranjan Dash and Huan Liu and Peter Scheuermann and Kian-Lee Tan. Fast hierarchical clustering and its validation. Data Knowl. Eng, 44. 2003.

consists of 10,992 objects in 16 dimensions. There are 10 classes corresponding to digits 0...9. The 16 dimensions are drawn by re-sampling from handwritten digits. Iris dataset has 150 points in 4 dimensions in 3 clusters. Dimensions are sepal length, sepal width, petal length, and petal width. Clusters are Iris Setosa, Iris Versicolour, and Iris Virginia. Each of the 3

Dennis DeCoste. Anytime Query-Tuned Kernel Machines via Cholesky Factorization. SDM. 2003.

Lanckriet, L. E. Ghaoui, C. Bhattacharyya, and M. I. Jordan. Minmax probability machine. Advances in Neural Information Processing Systems (NIPS) 14, 2002. [10] Y. LeCun. MNIST handwritten digits dataset. Available at #yann/ ocr/ mnist/, 2000. [11] S. Mika, G. Ratsch, and K.-R. Muller. A mathematical programming approach to the kernel Fisher algorithm. In Advances in

Marina Meila and Michael I. Jordan. Learning with Mixtures of Trees. Journal of Machine Learning Research, 1. 2000.

on. 5.2 Density estimation experiments 5.2.1 digits and digit pairs images Our first density estimation experiment involved a subset of binary vector representations of handwritten digits. The datasets consist of normalized and quantized 8×8 binary images of handwritten digits made available by the US Postal Service Office for Advanced Technology. One dataset---which we refer to as the "digits"

Ethem Alpaydin. Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms. Neural Computation, 11. 1999.

2 no VOWEL 2 no ODR 8 yes DIGIT 7 yes PEN 10 yes changing the numerator where we compare a single layer perceptron (LP) with a multilayer perceptron with one hidden layer (MLP). ODR, DIGIT are two datasets on optical handwritten digit recognition and PEN is on pen-based handwritten digit recognition. These three datasets are available from the author. The other datasets are from the UCI repository

Georg Thimm and Emile Fiesler. IDIAP Technical report High Order and Multilayer Perceptron Initialization. IEEE Transactions. 1994.

in the same row or the same column in the image. This configuration should allow the extraction of sufficient features to learn the digits Training sessions on the in section 4.1 described digits data set gave an acceptable recognition of untrained digits, despite the small training 7 Non10E-5 0.001 0.01 0.1 10E-6 10 10E-4 1 Convergence Time log( Initial Weight Variance ) B gence A ConverFigure 2:

Georg Thimm and Emile Fiesler. High Order and Multilayer Perceptron Initialization.

in the same row or the same column in the image. This configuration should allow the extraction of sufficient features to learn the digits Training sessions on the in section IV-A described digits data set gave an acceptable recognition of untrained digits, despite the small training set used. The three different initial random weight distributions used are: uniform on the interval [Gamma a; a] (with

Adil M. Bagirov and Julien Ugon. An algorithm for computation of piecewise linear function separating two sets. CIAO, School of Information Technology and Mathematical Sciences, The University of Ballarat.

Training Test |I| |J i | a 2c a mc a 2c a mc fct eval. DG. eval. Pen-based recognition of handwritten dataset 1 1 97.27 96.74 96.43 92.37 1597 1146 2 1 99.31 99.44 98.33 96.14 2607 1852 3 1 99.79 99.92 98.89 96.20 3040 2220 2 2 99.80 99.95 98.99 96.03 3083 2306 3 2 99.83 99.87 99.12 95.88 1806 1268 3 3

Charles Campbell and Nello Cristianini. Simple Learning Algorithms for Training Support Vector Machines. Dept. of Engineering Mathematics.

include a sonar classification problem [14], the Wisconsin breast cancer dataset [35] and a database of handwritten digits collected by the US Postal Service [17]. As examples of the improvements with generalisation ability which can be achieved with a soft margin we will also

Perry Moerland. Mixtures of latent variable models for density estimation and classification. E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r Pe r cep t ua l A r t i f i c i a l Intelligence .

of the ten digit classes with a 10-component 10-factor Mfa. This is comparable to the best scores we obtained with large Mlps. 7.3 Experiments: MNIST Data We also performed experiments on the MNIST data set, a large collection of handwritten digits which can be obtained from (LeCun 2000). It comes as 28#28 grey level images in a training set of 60,000 examples and a test set of 10,000 examples. Each of

Luca Zanni. An Improved Gradient Projection-based Decomposition Technique for Support Vector Machines. Dipartimento di Matematica, Universitdi Modena e Reggio Emilia.

sized 49749. The Gaussian SVM parameters are: C = 5 and ¾ = p 10. . MNIST data set The MNIST database of handwritten digits [18] contains 784-dimensional nonbinary sparse vectors; the size of the database is 60000 and the sparsity level of the inputs is ¼ 81%. A Gaussian SVM for

Adil M. Bagirov and John Yearwood. A new nonsmooth optimization algorithm for clustering. Centre for Informatics and Applied Optimization, School of Information Technology and Mathematical Sciences, University of Ballarat.

of all features were 1. In numerical experiments we take = 10 -2 and the initial number of clusters q 0 = 2 with c = 1.5 for pen-based recognition of handwritten digits and the image segmentation data sets whereas for all others we used the entire data set. First we applied Algorithm 3.1 to calculate clusters. Then the k-means algorithm was applied with the same number of clusters as calculated by

Ahmed Hussain Khan and Intensive Care. Multiplier-Free Feedforward Networks. 174.

forward-pass capability. It differs from the conventional model in restricting its synapses to the set{- 1, 0, 1} while allowing unrestricted offsets. Simulation results on the `onset of diabetes' data set and a handwritten numeral recognition database indicate that the new network, despite having strong constraints on its synapses, has a generalization performance similar to that of its conventional

Adil M. Bagirov and Alex Rubinov and A. N. Soukhojak and John Yearwood. Unsupervised and supervised data classification via nonsmooth and global optimization. School of Information Technology and Mathematical Sciences, The University of Ballarat.

was donated by Richard S. Forsyth BUPA Medical research Ltd. It contains 2 classes, 345 observations and 6 attributes. Pen-based recognition of handwritten digits (Pendig) This dataset was introduced by E. Alpaydin and Fevzi Alimoglu. It contains 10 classes, 10992 observations, 16 attributes. All input attributes are integers 1. . . 100. Satellite image (SatIm, image segmentation)

Return to Pen-Based Recognition of Handwritten Digits data set page.

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML