Bongabdo

Donated on 9/29/2023

In this work, I have developed an Offline Handwritten Text Recognition (HTR) model architecture based on Neural Networks that can be taught to recognise whole pages of handwritten Bangla (Bengali) text without image segmentation. Bengali being a resource-constrained Indic language, there is a lack of proper annotated dataset consisting scanned images of Bangla handwritten scripts. In this work, I have introduced a new dataset, `Bongabdo', which consists of full-page handwritten scripts collected from a wide variety of contributors of various age groups, occupation and gender. Further, recently proposed State-of-the-art Image-to-Sequence architecture with different settings of hyperparameters have been applied on these images and they have been evaluated in terms of Character Error Rate (CER), Word Error Rate (WER) and Sequence Error Rate (SER) to finally come up with a comparative study.

Dataset Characteristics

Sequential, Text, Image

Subject Area

Computer Science

Associated Tasks

Other

Feature Type

-

# Instances

111

# Features

-

Dataset Information

Has Missing Values?

No

Introductory Paper

Towards Full-page Offline Bangla Handwritten Text Recognition using Image-to-Sequence Architecture

By Ayanabha Ghosh. 2023

Published in IEEE Silchar Subsection Conference, Silchar, Assam, India

Dataset Files

FileSize
Bongabdo1429.zip147.3 MB

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download (147.4 MB)
1 citations
1636 views

Creators

Ayanabha Ghosh

ghoshayanabha@gmail.com

Indian Institute of Technology Jodhpur

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy