Online Handwritten Assamese Characters Dataset

Donated on 3/31/2011

This is a dataset of 8235 online handwritten assamese characters. The “online” process involves capturing of data as text is written on a digitizing tablet with an electronic pen.

Dataset Characteristics

Multivariate, Sequential

Subject Area

Computer Science

Associated Tasks

Classification

Feature Type

Integer

# Instances

8235

# Features

-

Dataset Information

Additional Information

A dataset of online handwritten assamese characters by collecting samples from 45 writers is created. Each writer contributed 52 basic characters, 10 numerals and 121 assamese conjunct consonants. The total number of entries corresponding to each writer is 183 (= 52 characters + 10 numerals + 121 conjunct consonants). The total number of samples in the dataset is 8235 ( = 45 × 183 ). The handwriting samples were collected on an iball 8060U external digitizing tablet connected to a laptop using its cordless digital stylus pen. The data acquisition program consists of a GUI which shows a box on the screen along with other controls. The writers are instructed to write only inside the acquisition box. The acquisition program records the handwriting as a stream of (X, Y) coordinate points using the appropriate pen position sensor along with the pen-up/pen-down switching. No pressure level was recorded. The distribution of the dataset consists of 45 folders (one for each writer) and a “Data_Table.pdf” file. This file contains information about the character id (ID), character name (Label) and actual shape of the character (Char). Each folder contains 183 text files corresponding to the 183 characters written by a single writer. Each file is named based on the pair (M, N). The text file “M.N.txt” represents the character with ID “M” written by the writer with ID “N”. For instance the file “132.10.txt” represents the character with ID “132” written by the writer with ID “10”.

Has Missing Values?

No

Variable Information

1. Character Name: The first line of each sample is “CHARACTER_NAME: Character”. The “Character” is the Name of any one of the 183 characters listed below: Here “ID [i]” represents the name of the character with the ID “i”. ID [1] = “A” ID [2] = “AA” ID [3] = “E” ID [4] = “EE” ID [5] = “U” ID [6] = “UU” ID [7] = “REE” ID [8] = “AE” ID [9] = “OI” ID [10] = “O” ID [11] = “OU” ID [12] = “KA” ID [13] = “KHA” ID [14] = “GA” ID [15] = “GHA” ID [16] = “NG” ID [17] = “CA” ID [18] = “CCA” ID [19] = “JA” ID [20] = “JHA” ID [21] = “NIYA” ID [22] = “MTA” ID [23] = “MTHA” ID [24] = “MDA” ID [25] = “MDHA” ID [26] = “MNA” ID [27] = “TA” ID [28] = “THA” ID [29] = “DA” ID [30] = “DHA” ID [31] = “NA” ID [32] = “PA” ID [33] = “PHA” ID [34] = “BA” ID [35] = “BHA” ID [36] = “MA” ID [37] = “AJA” ID [38] = “RA” ID [39] = “LA” ID [40] = “WA” ID [41] = “TXA” ID [42] = “MXA” ID [43] = “DXA” ID [44] = “HA” ID [45] = “KHYA” ID [46] = “AYA” ID [47] = “DRA” ID [48] = “DHRA” ID [49] = “KTA” ID [50] = “ANSR” ID [51] = “BXG” ID [52] = “CBN” ID [53] = “KK” ID [54] = “KT” ID [55] = “KTT” ID [56] = “KS” ID [57] = “KL” ID [58] = “KM” ID [59] = “GL” ID [60] = “CC” ID [61] = “CCC” ID [62] = “JJ” ID [63] = “JB” ID [64] = “BJ” ID [65] = “GN” ID [66] = “TN” ID [67] = “JJB” ID [68] = “LG” ID [69] = “TT” ID [70] = “GDH” ID [71] = “GM” ID [72] = “GHN” ID [73] = “MDD” ID [74] = “NT” ID [75] = “NN” ID [76] = “NMM” ID [77] = “TTT” ID [78] = “TTB” ID [79] = “TM” ID [80] = “TR” ID [81] = “NTT” ID [82] = “RRG” ID [83] = “NDD” ID [84] = “NTH” ID [85] = “NDH” ID [86] = “NNN” ID [87] = “NB” ID [88] = “NS” ID [89] = “NM” ID [90] = “DB” ID [91] = “QJ” ID [92] = “PTT” ID [93] = “PL” ID [94] = “DV” ID [95] = “BL” ID [96] = “BD” ID [97] = “TB” ID [98] = “MM” ID [99] = “MV” ID [100] = “MP” ID [101] = “MN” ID [102] = “NTR” ID [103] = “MB” ID [104] = “LK” ID [105] = “MND” ID [106] = “FK” ID [107] = “LD” ID [108] = “LL” ID [109] = “LP” ID [110] = “LT” ID [111] = “SN” ID [112] = “SC” ID [113] = “SM” ID [114] = “SB” ID [115] = “FN” ID [116] = “FT” ID [117] = “SK” ID [118] = “SSTH” ID [119] = “SSM” ID [120] = “SSN” ID [121] = “SSB” ID [122] = “ST” ID [123] = “SP” ID [124] = “SPH” ID [125] = “STH” ID [126] = “SKH” ID [127] = “NGG” ID [128] = “NGC” ID [129] = “FP” ID [130] = “NGN” ID [131] = “XM” ID [132] = “NGJ” ID [133] = “MNTH” ID [134] = “NGK” ID [135] = “KR” ID [136] = “TRU” ID [137] = “BHR” ID [138] = “THB” ID [139] = “DG” ID [140] = “DGH” ID [141] = “DD” ID [142] = “DDH” ID [143] = “HR” ID [144] = “GGU” ID [145] = “GGN” ID [146] = “NKH” ID [147] = “NGH” ID [148] = “NGKH” ID [149] = “TTH” ID [150] = “PN” ID [151] = “HN” ID [152] = “XN” ID [153] = “MF” ID [154] = “BB” ID [155] = “LB” ID [156] = “LM” ID [157] = “BHM” ID [158] = “ML” ID [159] = “SL” ID [160] = “PS” ID [161] = “KHR” ID [162] = “GR” ID [163] = “GHR” ID [164] = “JR” ID [165] = “TRR” ID [166] = “DRR” ID [167] = “DHRR” ID [168] = “PRR” ID [169] = “BRR” ID [170] = “MRR” ID [171] = “TSR” ID [172] = “DSR” ID [173] = “HRR” ID [174] = “SUNYA” ID [175] = “EK” ID [176] = “DUI” ID [177] = “TINI” ID [178] = “CARI” ID [179] = “PAC” ID [180] = “CAY” ID [181] = “XAT” ID [182] = “ATH” ID [183] = “NAA” 2. The total number of strokes in the sample: The total number of strokes used to write a character is represented by the line “STROKE_COUNT: Number”, where “Number” is an integer value. 3. Sequence of Strokes: Each stroke begins with the “PEN_DOWN” information and there is a “PEN_UP” information followed by the “PEN_DOWN” information between two consecutive strokes. The end of a sample is represented by the “PEN_UP” information followed by the “END_CHARACTER: Character” information. Each stroke consists of a sequence of X and Y coordinates values which are given in the first and the second columns respectively. Corresponding to each pair of values of X and Y coordinates, there are “STYLUS_STATE” and “STROKE” information given in the third and the fourth columns respectively. “STYLUS_STATE” is either 1 or 0. Corresponding to each recorded (X, Y) point, “STYLUS_STATE” is 1 and corresponding to the “PEN_UP” information “STYLUS_STATE” is 0. “STYLUS_STATE” is kept blank corresponding to each “PEN_DOWN” information. The “STROKE” information represents the serial number of a constituent stroke of a sample. The value of X grows left-to-right and that of Y grows downwards. Coordinates are integer numbers ranging from 0 to 4392 for X and 0 to 4868 for Y respectively.

Dataset Files

FileSize
Online Handwritten Assamese Characters Dataset.rar7.7 MB

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download (7.7 MB)
0 citations
1387 views

Creators

Udayan Baruah

Shyamanta Hazarika

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy