Improving Hindi POS Tagger Accuracy Through Domain Adaptation

Authors

  • Anupama Pandey Mahatma Gandhi Antarrashtriya Hindi Vishwavidyalay, Wardha (Maharashtra), India

DOI:

https://doi.org/10.3126/nl.v35i01.46564

Keywords:

ILCI Hindi Tagger, Cricket domain, Domain adaptation, POS annotation, domain tagger (DT)

Abstract

The paper presents a comparative evaluation report on multi-domain Hindi taggers. Two taggers are trained in this experiment with the objective of detecting the accuracy rate of the tagger after adapting Cricket domain. The multi-domain tagger, trained as part of ILCI project, includes our major domain (Health, Tourism, Entertainment and Agriculture) presently and adapting Cricket as a new domain was recently proposed in Pandey (2017) which was calculated with a difference of approx. 6% in the tagger accuracy. Statistically, the accuracy of four domain tagger (without Cricket) is 85% and for five domain tagger (with Cricket) is approx. 93% which is 1% lower than the pre-existing Hindi tagger. This paper deals mainly with evaluation of the Hindi tagger (with and without Cricket as one of the domains). Author also attempts at finding the difference in terms of POS tagging issues in the output and the linguistic analysis of the errors found.

Downloads

Download data is not yet available.
Abstract
97
PDF
80

Downloads

Published

2022-07-11

How to Cite

Pandey, A. (2022). Improving Hindi POS Tagger Accuracy Through Domain Adaptation. Nepalese Linguistics, 35(01), 79–85. https://doi.org/10.3126/nl.v35i01.46564

Issue

Section

Articles