Letter Recognition
Donated on 12/31/1990
Database of character image features; try to identify the letter
Dataset Characteristics
Multivariate
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
Integer
# Instances
20000
# Features
16
Dataset Information
Additional Information
The objective is to identify each of a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. The character images were based on 20 different fonts and each letter within these 20 fonts was randomly distorted to produce a file of 20,000 unique stimuli. Each stimulus was converted into 16 primitive numerical attributes (statistical moments and edge counts) which were then scaled to fit into a range of integer values from 0 through 15. We typically train on the first 16000 items and then use the resulting model to predict the letter category for the remaining 4000. See the article cited above for more details.
Has Missing Values?
No
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
lettr | Target | Categorical | capital letter | no | |
x-box | Feature | Integer | horizontal position of box | no | |
y-box | Feature | Integer | vertical position of box | no | |
width | Feature | Integer | width of box | no | |
high | Feature | Integer | height of box | no | |
onpix | Feature | Integer | total # on pixels | no | |
x-bar | Feature | Integer | mean x of on pixels in box | no | |
y-bar | Feature | Integer | mean y of on pixels in box | no | |
x2bar | Feature | Integer | mean x variance | no | |
y2bar | Feature | Integer | mean y variance | no |
0 to 10 of 17
Additional Variable Information
1. lettr capital letter (26 values from A to Z) 2. x-box horizontal position of box (integer) 3. y-box vertical position of box (integer) 4. width width of box (integer) 5. high height of box (integer) 6. onpix total # on pixels (integer) 7. x-bar mean x of on pixels in box (integer) 8. y-bar mean y of on pixels in box (integer) 9. x2bar mean x variance (integer) 10. y2bar mean y variance (integer) 11. xybar mean x y correlation (integer) 12. x2ybr mean of x * x * y (integer) 13. xy2br mean of x * y * y (integer) 14. x-ege mean edge count left to right (integer) 15. xegvy correlation of x-ege with y (integer) 16. y-ege mean edge count bottom to top (integer) 17. yegvx correlation of y-ege with x (integer)
Baseline Model Performance
Dataset Files
File | Size |
---|---|
letter-recognition.data | 695.9 KB |
letter-recognition.data.Z | 187.4 KB |
letter-recognition.names | 2.7 KB |
Index | 194 Bytes |
Papers Citing this Dataset
Sort by Year, desc
By Bin Dong, Haocheng Ju, Yiping Lu, Zuoqiang Shi. 2019
Published in ArXiv.
By Bowen Zhao, Xi Xiao, Wanpeng Zhang, Bin Zhang, Shutao Xia. 2019
Published in ArXiv.
By Mineto Tsukada, Masaaki Kondo, Hiroki Matsutani. 2019
Published in ArXiv.
By Elizaveta Rebrova, Gustavo Chavez, Yang Liu, Pieter Ghysels, Xiaoye Li. 2018
Published in IPDPS workshops 2018.
By Yazhou Yang, Marco Loog. 2017
Published in Pattern Recognition 78C (2018) pp. 358-370.
0 to 5 of 25
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset letter_recognition = fetch_ucirepo(id=59) # data (as pandas dataframes) X = letter_recognition.data.features y = letter_recognition.data.targets # metadata print(letter_recognition.metadata) # variable information print(letter_recognition.variables)
Slate, D. (1991). Letter Recognition [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5ZP40.
Keywords
Creators
David Slate
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.