Bongabdo
Donated on 9/29/2023
In this work, I have developed an Offline Handwritten Text Recognition (HTR) model architecture based on Neural Networks that can be taught to recognise whole pages of handwritten Bangla (Bengali) text without image segmentation. Bengali being a resource-constrained Indic language, there is a lack of proper annotated dataset consisting scanned images of Bangla handwritten scripts. In this work, I have introduced a new dataset, `Bongabdo', which consists of full-page handwritten scripts collected from a wide variety of contributors of various age groups, occupation and gender. Further, recently proposed State-of-the-art Image-to-Sequence architecture with different settings of hyperparameters have been applied on these images and they have been evaluated in terms of Character Error Rate (CER), Word Error Rate (WER) and Sequence Error Rate (SER) to finally come up with a comparative study.
Dataset Characteristics
Sequential, Text, Image
Subject Area
Computer Science
Associated Tasks
Other
Feature Type
-
# Instances
111
# Features
-
Dataset Information
Has Missing Values?
No
Introductory Paper
By Ayanabha Ghosh. 2023
Published in IEEE Silchar Subsection Conference, Silchar, Assam, India
Dataset Files
File | Size |
---|---|
Bongabdo1429.zip | 147.3 MB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset bongabdo = fetch_ucirepo(id=894) # data (as pandas dataframes) X = bongabdo.data.features y = bongabdo.data.targets # metadata print(bongabdo.metadata) # variable information print(bongabdo.variables)
Ghosh, A. (2023). Bongabdo [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5XK7S.
Keywords
Creators
Ayanabha Ghosh
ghoshayanabha@gmail.com
Indian Institute of Technology Jodhpur
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.