Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

Artificial Characters Data Set
Download: Data Folder, Data Set Description

Abstract: Dataset artificially generated by using first order theory which describes structure of ten capital letters of English alphabet

Data Set Characteristics:  

Multivariate

Number of Instances:

6000

Area:

Computer

Attribute Characteristics:

Categorical, Integer, Real

Number of Attributes:

7

Date Donated

1992-07-01

Associated Tasks:

Classification

Missing Values?

No

Number of Web Hits:

70240


Source:

Original Owners of Database:

1. H. Altay Guvenir, PhD.,
Bilkent University,
Department of Computer Engineering and Information Science,
06533 Ankara, Turkey
Phone: +90 (312) 266 4133
Email: guvenir '@' cs.bilkent.edu.tr

2. Burak Acar, M.S.,
Bilkent University,
EE Eng. Dept.
06533 Ankara, Turkey
Email: buraka '@' ee.bilkent.edu.tr

3. Haldun Muderrisoglu, M.D., Ph.D.,
Baskent University,
School of Medicine
Ankara, Turkey

Donor:

H. Altay Guvenir
Bilkent University,
Department of Computer Engineering and Information Science,
06533 Ankara, Turkey
Phone: +90 (312) 266 4133
Email: guvenir '@' cs.bilkent.edu.tr


Data Set Information:

This database has been artificially generated by using a first order theory which describes the structure of ten capital letters of the English alphabet and a random choice theorem prover which accounts for etherogeneity in the instances. The capital letters represented are the following: A, C, D, E, F, G, H, L, P, R. Each instance is structured and is described by a set of segments (lines) which resemble the way an automatic program would segment an image. Each instance is stored in a separate file whose format is the following:

CLASS OBJNUM TYPE XX1 YY1 XX2 YY2 SIZE DIAG

where CLASS is an integer number indicating the class as described below, OBJNUM is an integer identifier of a segment (starting from 0) in the instance and the remaining columns represent attribute values. For further details, contact the author.


Attribute Information:

TYPE: the first attribute describes the type of segment and is always set to the string "line". Its C language type is char.

XX1,YY1,XX2,YY2: these attributes contain the initial and final coordinates of a segment in a cartesian plane. Their C language type is int.

SIZE: this is the length of a segment computed by using the geometric distance between two points A(X1,Y1) and B(X2,Y2). Its C language type is float.

DIAG: this is the length of the diagonal of the smallest rectangle which includes the picture of the character. The value of this attribute is the same in each object. Its C language type is float.


Relevant Papers:

M. Botta, A. Giordana, L. Saitta: "Learning Fuzzy Concept Definitions", IEEE-Fuzzy Conference, 1993.
[Web Link]

M. Botta, A. Giordana: "Learning Quantitative Feature in a Symbolic Environment", LNAI 542, 1991, pp. 296-305.
[Web Link]



Citation Request:

Please refer to the Machine Learning Repository's citation policy


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML