Video Captioning in Nepali Using Encoder Decoder

Authors

  • Kabita Parajuli Department of Electronics and Computer Engineering Pulchowk Campus, Tribhuvan University Lalitpur, Nepal
  • Shashidhar R Joshi Department of Electronics and Computer Engineering Pulchowk Campus, Tribhuvan University Lalitpur, Nepal

DOI:

https://doi.org/10.3126/jacem.v9i1.71424

Keywords:

MSVD, Encoder, Decoder, LSTM, GRU

Abstract

Video captioning is a challenging task as it requires accurately transforming visual understanding into natural language descriptions. This challenge is further compounded when dealing with Nepali, due to the lack of existing academic work in this domain. This study develops an encoder-decoder paradigm for Nepali video captioning to address this difficulty. Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) sequence-to-sequence models are utilized to produce relevant textual descriptions based on features extracted from video frames using Convolutional Neural Networks (CNNs). Additionally, a Nepali video captioning dataset is created by adapting the Microsoft Research Video Description Corpus (MSVD) datasets through Google Translate, followed by manual post-editing. The efficiency of the model for video captioning in Nepali is demonstrated using BLEU, METEOR, and ROUGE metrics to assess its performance.

Downloads

Download data is not yet available.
Abstract
13
PDF
10

Downloads

Published

2024-11-14

How to Cite

Parajuli , K., & Joshi, S. R. (2024). Video Captioning in Nepali Using Encoder Decoder . Journal of Advanced College of Engineering and Management, 9(1), 41–51. https://doi.org/10.3126/jacem.v9i1.71424

Issue

Section

Articles