Artificial Characters Data Set
Download: Data Folder, Data Set Description
Abstract: Dataset artificially generated by using first order theory which describes structure of ten capital letters of English alphabet
|
|
Data Set Characteristics: |
Multivariate |
Number of Instances: |
6000 |
Area: |
Computer |
Attribute Characteristics: |
Categorical, Integer, Real |
Number of Attributes: |
7 |
Date Donated |
1992-07-01 |
Associated Tasks: |
Classification |
Missing Values? |
No |
Number of Web Hits: |
294844 |
Source:
Original Owners of Database:
1. H. Altay Guvenir, PhD.,
Bilkent University,
Department of Computer Engineering and Information Science,
06533 Ankara, Turkey
Phone: +90 (312) 266 4133
Email: guvenir '@' cs.bilkent.edu.tr
2. Burak Acar, M.S.,
Bilkent University,
EE Eng. Dept.
06533 Ankara, Turkey
Email: buraka '@' ee.bilkent.edu.tr
3. Haldun Muderrisoglu, M.D., Ph.D.,
Baskent University,
School of Medicine
Ankara, Turkey
Donor:
H. Altay Guvenir
Bilkent University,
Department of Computer Engineering and Information Science,
06533 Ankara, Turkey
Phone: +90 (312) 266 4133
Email: guvenir '@' cs.bilkent.edu.tr
Data Set Information:
This database has been artificially generated by using a first order theory which describes the structure of ten capital letters of the English alphabet and a random choice theorem prover which accounts for etherogeneity in the instances. The capital letters represented are the following: A, C, D, E, F, G, H, L, P, R. Each instance is structured and is described by a set of segments (lines) which resemble the way an automatic program would segment an image. Each instance is stored in a separate file whose format is the following:
CLASS OBJNUM TYPE XX1 YY1 XX2 YY2 SIZE DIAG
where CLASS is an integer number indicating the class as described below, OBJNUM is an integer identifier of a segment (starting from 0) in the instance and the remaining columns represent attribute values. For further details, contact the author.
Attribute Information:
TYPE: the first attribute describes the type of segment and is always set to the string "line". Its C language type is char.
XX1,YY1,XX2,YY2: these attributes contain the initial and final coordinates of a segment in a cartesian plane. Their C language type is int.
SIZE: this is the length of a segment computed by using the geometric distance between two points A(X1,Y1) and B(X2,Y2). Its C language type is float.
DIAG: this is the length of the diagonal of the smallest rectangle which includes the picture of the character. The value of this attribute is the same in each object. Its C language type is float.
Relevant Papers:
M. Botta, A. Giordana, L. Saitta: "Learning Fuzzy Concept Definitions", IEEE-Fuzzy Conference, 1993.
[Web Link]
M. Botta, A. Giordana: "Learning Quantitative Feature in a Symbolic Environment", LNAI 542, 1991, pp. 296-305.
[Web Link]
Citation Request:
Please refer to the Machine Learning
Repository's citation policy
|