Online Handwritten Assamese Characters Dataset
Donated on 3/31/2011
This is a dataset of 8235 online handwritten assamese characters. The “online†process involves capturing of data as text is written on a digitizing tablet with an electronic pen.
Dataset Characteristics
Multivariate, Sequential
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
Integer
# Instances
8235
# Features
-
Dataset Information
Additional Information
A dataset of online handwritten assamese characters by collecting samples from 45 writers is created. Each writer contributed 52 basic characters, 10 numerals and 121 assamese conjunct consonants. The total number of entries corresponding to each writer is 183 (= 52 characters + 10 numerals + 121 conjunct consonants). The total number of samples in the dataset is 8235 ( = 45 × 183 ). The handwriting samples were collected on an iball 8060U external digitizing tablet connected to a laptop using its cordless digital stylus pen. The data acquisition program consists of a GUI which shows a box on the screen along with other controls. The writers are instructed to write only inside the acquisition box. The acquisition program records the handwriting as a stream of (X, Y) coordinate points using the appropriate pen position sensor along with the pen-up/pen-down switching. No pressure level was recorded. The distribution of the dataset consists of 45 folders (one for each writer) and a “Data_Table.pdf†file. This file contains information about the character id (ID), character name (Label) and actual shape of the character (Char). Each folder contains 183 text files corresponding to the 183 characters written by a single writer. Each file is named based on the pair (M, N). The text file “M.N.txt†represents the character with ID “M†written by the writer with ID “Nâ€. For instance the file “132.10.txt†represents the character with ID “132†written by the writer with ID “10â€.
Has Missing Values?
No
Variable Information
1. Character Name: The first line of each sample is “CHARACTER_NAME: Characterâ€. The “Character†is the Name of any one of the 183 characters listed below: Here “ID [i]†represents the name of the character with the ID “iâ€. ID [1] = “A†ID [2] = “AA†ID [3] = “E†ID [4] = “EE†ID [5] = “U†ID [6] = “UU†ID [7] = “REE†ID [8] = “AE†ID [9] = “OI†ID [10] = “O†ID [11] = “OU†ID [12] = “KA†ID [13] = “KHA†ID [14] = “GA†ID [15] = “GHA†ID [16] = “NG†ID [17] = “CA†ID [18] = “CCA†ID [19] = “JA†ID [20] = “JHA†ID [21] = “NIYA†ID [22] = “MTA†ID [23] = “MTHA†ID [24] = “MDA†ID [25] = “MDHA†ID [26] = “MNA†ID [27] = “TA†ID [28] = “THA†ID [29] = “DA†ID [30] = “DHA†ID [31] = “NA†ID [32] = “PA†ID [33] = “PHA†ID [34] = “BA†ID [35] = “BHA†ID [36] = “MA†ID [37] = “AJA†ID [38] = “RA†ID [39] = “LA†ID [40] = “WA†ID [41] = “TXA†ID [42] = “MXA†ID [43] = “DXA†ID [44] = “HA†ID [45] = “KHYA†ID [46] = “AYA†ID [47] = “DRA†ID [48] = “DHRA†ID [49] = “KTA†ID [50] = “ANSR†ID [51] = “BXG†ID [52] = “CBN†ID [53] = “KK†ID [54] = “KT†ID [55] = “KTT†ID [56] = “KS†ID [57] = “KL†ID [58] = “KM†ID [59] = “GL†ID [60] = “CC†ID [61] = “CCC†ID [62] = “JJ†ID [63] = “JB†ID [64] = “BJ†ID [65] = “GN†ID [66] = “TN†ID [67] = “JJB†ID [68] = “LG†ID [69] = “TT†ID [70] = “GDH†ID [71] = “GM†ID [72] = “GHN†ID [73] = “MDD†ID [74] = “NT†ID [75] = “NN†ID [76] = “NMM†ID [77] = “TTT†ID [78] = “TTB†ID [79] = “TM†ID [80] = “TR†ID [81] = “NTT†ID [82] = “RRG†ID [83] = “NDD†ID [84] = “NTH†ID [85] = “NDH†ID [86] = “NNN†ID [87] = “NB†ID [88] = “NS†ID [89] = “NM†ID [90] = “DB†ID [91] = “QJ†ID [92] = “PTT†ID [93] = “PL†ID [94] = “DV†ID [95] = “BL†ID [96] = “BD†ID [97] = “TB†ID [98] = “MM†ID [99] = “MV†ID [100] = “MP†ID [101] = “MN†ID [102] = “NTR†ID [103] = “MB†ID [104] = “LK†ID [105] = “MND†ID [106] = “FK†ID [107] = “LD†ID [108] = “LL†ID [109] = “LP†ID [110] = “LT†ID [111] = “SN†ID [112] = “SC†ID [113] = “SM†ID [114] = “SB†ID [115] = “FN†ID [116] = “FT†ID [117] = “SK†ID [118] = “SSTH†ID [119] = “SSM†ID [120] = “SSN†ID [121] = “SSB†ID [122] = “ST†ID [123] = “SP†ID [124] = “SPH†ID [125] = “STH†ID [126] = “SKH†ID [127] = “NGG†ID [128] = “NGC†ID [129] = “FP†ID [130] = “NGN†ID [131] = “XM†ID [132] = “NGJ†ID [133] = “MNTH†ID [134] = “NGK†ID [135] = “KR†ID [136] = “TRU†ID [137] = “BHR†ID [138] = “THB†ID [139] = “DG†ID [140] = “DGH†ID [141] = “DD†ID [142] = “DDH†ID [143] = “HR†ID [144] = “GGU†ID [145] = “GGN†ID [146] = “NKH†ID [147] = “NGH†ID [148] = “NGKH†ID [149] = “TTH†ID [150] = “PN†ID [151] = “HN†ID [152] = “XN†ID [153] = “MF†ID [154] = “BB†ID [155] = “LB†ID [156] = “LM†ID [157] = “BHM†ID [158] = “ML†ID [159] = “SL†ID [160] = “PS†ID [161] = “KHR†ID [162] = “GR†ID [163] = “GHR†ID [164] = “JR†ID [165] = “TRR†ID [166] = “DRR†ID [167] = “DHRR†ID [168] = “PRR†ID [169] = “BRR†ID [170] = “MRR†ID [171] = “TSR†ID [172] = “DSR†ID [173] = “HRR†ID [174] = “SUNYA†ID [175] = “EK†ID [176] = “DUI†ID [177] = “TINI†ID [178] = “CARI†ID [179] = “PAC†ID [180] = “CAY†ID [181] = “XAT†ID [182] = “ATH†ID [183] = “NAA†2. The total number of strokes in the sample: The total number of strokes used to write a character is represented by the line “STROKE_COUNT: Numberâ€, where “Number†is an integer value. 3. Sequence of Strokes: Each stroke begins with the “PEN_DOWN†information and there is a “PEN_UP†information followed by the “PEN_DOWN†information between two consecutive strokes. The end of a sample is represented by the “PEN_UP†information followed by the “END_CHARACTER: Character†information. Each stroke consists of a sequence of X and Y coordinates values which are given in the first and the second columns respectively. Corresponding to each pair of values of X and Y coordinates, there are “STYLUS_STATE†and “STROKE†information given in the third and the fourth columns respectively. “STYLUS_STATE†is either 1 or 0. Corresponding to each recorded (X, Y) point, “STYLUS_STATE†is 1 and corresponding to the “PEN_UP†information “STYLUS_STATE†is 0. “STYLUS_STATE†is kept blank corresponding to each “PEN_DOWN†information. The “STROKE†information represents the serial number of a constituent stroke of a sample. The value of X grows left-to-right and that of Y grows downwards. Coordinates are integer numbers ranging from 0 to 4392 for X and 0 to 4868 for Y respectively.
Dataset Files
File | Size |
---|---|
Online Handwritten Assamese Characters Dataset.rar | 7.7 MB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset online_handwritten_assamese_characters_dataset = fetch_ucirepo(id=208) # data (as pandas dataframes) X = online_handwritten_assamese_characters_dataset.data.features y = online_handwritten_assamese_characters_dataset.data.targets # metadata print(online_handwritten_assamese_characters_dataset.metadata) # variable information print(online_handwritten_assamese_characters_dataset.variables)
Baruah, U. & Hazarika, S. (2015). Online Handwritten Assamese Characters Dataset [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C50C8Q.
Creators
Udayan Baruah
Shyamanta Hazarika
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.