Letter Recognition

Donated on 12/31/1990

Database of character image features; try to identify the letter

Dataset Characteristics

Multivariate

Subject Area

Computer Science

Associated Tasks

Classification

Feature Type

Integer

# Instances

20000

# Features

Dataset Information

Additional Information

The objective is to identify each of a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. The character images were based on 20 different fonts and each letter within these 20 fonts was randomly distorted to produce a file of 20,000 unique stimuli. Each stimulus was converted into 16 primitive numerical attributes (statistical moments and edge counts) which were then scaled to fit into a range of integer values from 0 through 15. We typically train on the first 16000 items and then use the resulting model to predict the letter category for the remaining 4000. See the article cited above for more details.

Has Missing Values?

Variables Table

Variable Name	Role	Type	Description	Missing Values
lettr	Target	Categorical	capital letter	no
x-box	Feature	Integer	horizontal position of box	no
y-box	Feature	Integer	vertical position of box	no
width	Feature	Integer	width of box	no
high	Feature	Integer	height of box	no
onpix	Feature	Integer	total # on pixels	no
x-bar	Feature	Integer	mean x of on pixels in box	no
y-bar	Feature	Integer	mean y of on pixels in box	no
x2bar	Feature	Integer	mean x variance	no
y2bar	Feature	Integer	mean y variance	no

Rows per page

0 to 10 of 17

Additional Variable Information

1. lettr capital letter (26 values from A to Z) 2. x-box horizontal position of box (integer) 3. y-box vertical position of box (integer) 4. width width of box (integer) 5. high height of box (integer) 6. onpix total # on pixels (integer) 7. x-bar mean x of on pixels in box (integer) 8. y-bar mean y of on pixels in box (integer) 9. x2bar mean x variance (integer) 10. y2bar mean y variance (integer) 11. xybar mean x y correlation (integer) 12. x2ybr mean of x * x * y (integer) 13. xy2br mean of x * y * y (integer) 14. x-ege mean edge count left to right (integer) 15. xegvy correlation of x-ege with y (integer) 16. y-ege mean edge count bottom to top (integer) 17. yegvx correlation of y-ege with x (integer)

Baseline Model Performance

Dataset Files

File	Size
letter-recognition.data	695.9 KB
letter-recognition.data.Z	187.4 KB
letter-recognition.names	2.7 KB
Index	194 Bytes

Papers Citing this Dataset

CURE: Curvature Regularization For Missing Data Recovery

By Bin Dong, Haocheng Ju, Yiping Lu, Zuoqiang Shi. 2019

Published in ArXiv.

Self-Paced Probabilistic Principal Component Analysis for Data with Outliers

By Bowen Zhao, Xi Xiao, Wanpeng Zhang, Bin Zhang, Shutao Xia. 2019

Published in ArXiv.

A Neural Network Based On-device Learning Anomaly Detector for Edge Devices

By Mineto Tsukada, Masaaki Kondo, Hiroki Matsutani. 2019

Published in ArXiv.

A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression

By Elizaveta Rebrova, Gustavo Chavez, Yang Liu, Pieter Ghysels, Xiaoye Li. 2018

Published in IPDPS workshops 2018.

A Variance Maximization Criterion for Active Learning

By Yazhou Yang, Marco Loog. 2017

Published in Pattern Recognition 78C (2018) pp. 358-370.

Rows per page

0 to 5 of 25

Reviews

There are no reviews for this dataset yet.

Download (378.1 KB)

25 citations

45884 views

Keywords

object recognition

Creators

David Slate

DOI

10.24432/C5ZP40

License

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.

Letter Recognition

Donated on 12/31/1990

Dataset Characteristics

Subject Area

Associated Tasks

Feature Type

# Instances

# Features

Dataset Information

Variables Table

Additional Variable Information

Baseline Model Performance

Dataset Files

Papers Citing this Dataset

Reviews

Write a Review

Keywords

Creators

DOI

License