Bongabdo

Donated on 9/29/2023

In this work, I have developed an Offline Handwritten Text Recognition (HTR) model architecture based on Neural Networks that can be taught to recognise whole pages of handwritten Bangla (Bengali) text without image segmentation. Bengali being a resource-constrained Indic language, there is a lack of proper annotated dataset consisting scanned images of Bangla handwritten scripts. In this work, I have introduced a new dataset, `Bongabdo', which consists of full-page handwritten scripts collected from a wide variety of contributors of various age groups, occupation and gender. Further, recently proposed State-of-the-art Image-to-Sequence architecture with different settings of hyperparameters have been applied on these images and they have been evaluated in terms of Character Error Rate (CER), Word Error Rate (WER) and Sequence Error Rate (SER) to finally come up with a comparative study.

Dataset Characteristics

Sequential, Text, Image

Subject Area

Computer Science

Associated Tasks

Other

Feature Type

# Instances

111

# Features

Dataset Information

Has Missing Values?

Introductory Paper

Towards Full-page Offline Bangla Handwritten Text Recognition using Image-to-Sequence Architecture

By Ayanabha Ghosh. 2023

Published in IEEE Silchar Subsection Conference, Silchar, Assam, India

Dataset Files

File	Size
Bongabdo1429.zip	147.3 MB

Reviews

There are no reviews for this dataset yet.

Download (147.4 MB)

1 citations

2286 views

Keywords

Handwriting analysis

Creators

Ayanabha Ghosh

ghoshayanabha@gmail.com

Indian Institute of Technology Jodhpur

DOI

10.24432/C5XK7S

License

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.