Artificial Characters
Donated on 6/30/1992
Dataset artificially generated by using first order theory which describes structure of ten capital letters of English alphabet
Dataset Characteristics
Multivariate
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
Categorical, Integer, Real
# Instances
6000
# Features
7
Dataset Information
Additional Information
This database has been artificially generated by using a first order theory which describes the structure of ten capital letters of the English alphabet and a random choice theorem prover which accounts for etherogeneity in the instances. The capital letters represented are the following: A, C, D, E, F, G, H, L, P, R. Each instance is structured and is described by a set of segments (lines) which resemble the way an automatic program would segment an image. Each instance is stored in a separate file whose format is the following: CLASS OBJNUM TYPE XX1 YY1 XX2 YY2 SIZE DIAG where CLASS is an integer number indicating the class as described below, OBJNUM is an integer identifier of a segment (starting from 0) in the instance and the remaining columns represent attribute values. For further details, contact the author.
Has Missing Values?
No
Variable Information
TYPE: the first attribute describes the type of segment and is always set to the string "line". Its C language type is char. XX1,YY1,XX2,YY2: these attributes contain the initial and final coordinates of a segment in a cartesian plane. Their C language type is int. SIZE: this is the length of a segment computed by using the geometric distance between two points A(X1,Y1) and B(X2,Y2). Its C language type is float. DIAG: this is the length of the diagonal of the smallest rectangle which includes the picture of the character. The value of this attribute is the same in each object. Its C language type is float.
Dataset Files
File | Size |
---|---|
character.tar.Z | 779.4 KB |
convert.cc | 11.7 KB |
domain_theory | 4.5 KB |
character.names | 3.9 KB |
Index | 200 Bytes |
Papers Citing this Dataset
Sort by Year, desc
By Andrew Gelman, Aleks Jakulin, Maria Pittau, Yu-Sung Su. 2009
Published in Annals of Applied Statistics 2008, Vol. 2, No. 4, 1360-1383.
0 to 2 of 2
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset artificial_characters = fetch_ucirepo(id=6) # data (as pandas dataframes) X = artificial_characters.data.features y = artificial_characters.data.targets # metadata print(artificial_characters.metadata) # variable information print(artificial_characters.variables)
Guvenir, H., Acar, B., & Muderrisoglu, H. (1992). Artificial Characters [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5303Z.
Keywords
Creators
H. Guvenir
Burak Acar
Haldun Muderrisoglu
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.