Enhancing the Efficiency of Deep Learning Models for Handwritten Text Recognition by Utilizing Meta-learning Optimization Techniques

Authors

  • Rajendra Paudyal Advanced College of Engineering and Management
  • Dhiraj Pyakurel Advanced College of Engineering and Management

DOI:

https://doi.org/10.3126/jacem.v9i1.71399

Keywords:

RNN, CNN, BLSTM, BiGRU, TCN, Meta learner

Abstract

Recognizing handwritten text plays a crucial role in converting scanned documents, whether printed or handwritten, into editable and searchable formats. In this study, various models such as CRNN, TCN, and Transformer have been utilized for Handwritten Text Recognition (HTR), where input data consists of sequences of image patches representing English text. The CRNN model employed comprises three layers: a CNN for extracting feature maps from handwritten text images, and Bidirectional Long-Short Term Memory (BLSTM) and Bidirectional Gated Recurrent Unit (BiGRU) in the RNN layer to address the gradient vanishing/exploding issue of simple RNNs. Additionally, TCN and Transformer models are employed for HTR. Optimizers including SGD, RMSprop, Adam, and Adamax, along with fine-tuning of hyperparameters, are utilized to enhance model accuracy. Model performance is evaluated using metrics such as f1 Score, precision, and recall. Meta learner optimization is subsequently employed to enhance the performance of deep learning models. The IAM dataset in English is utilized for training, validation, and testing. The Bi-LSTM model achieves an accuracy of 90.04%, precision of 91.62%, recall of 88.98%, and an f1 Score of 0.9025. With TCN, similar metrics are achieved. The Transformer model achieves an accuracy of 85.86%, precision of 88.94%, recall of 83.86%, and an f1 Score of 0.8626. Furthermore, Bi-GRU achieves an accuracy of 90.32%, precision of 91.56%, recall of 89.53%, and an f1 Score of 0.9050. Following the basic models, a meta model is constructed for the best performing model, demonstrating significant enhancement in Handwritten Text Recognition with an accuracy of 92.30%, precision of 94.80%, recall of 93.18%, and an f1 Score of 0.9425.

Downloads

Download data is not yet available.
Abstract
24
PDF
17

Downloads

Published

2024-11-14

How to Cite

Paudyal, R., & Pyakurel, D. (2024). Enhancing the Efficiency of Deep Learning Models for Handwritten Text Recognition by Utilizing Meta-learning Optimization Techniques. Journal of Advanced College of Engineering and Management, 9(1), 1–13. https://doi.org/10.3126/jacem.v9i1.71399

Issue

Section

Articles