Improving Hindi POS Tagger Accuracy Through Domain Adaptation

Anupama Pandey

doi:10.3126/nl.v35i01.46564

Authors

Anupama Pandey Mahatma Gandhi Antarrashtriya Hindi Vishwavidyalay, Wardha (Maharashtra), India

DOI:

https://doi.org/10.3126/nl.v35i01.46564

Keywords:

ILCI Hindi Tagger, Cricket domain, Domain adaptation, POS annotation, domain tagger (DT)

Abstract

The paper presents a comparative evaluation report on multi-domain Hindi taggers. Two taggers are trained in this experiment with the objective of detecting the accuracy rate of the tagger after adapting Cricket domain. The multi-domain tagger, trained as part of ILCI project, includes our major domain (Health, Tourism, Entertainment and Agriculture) presently and adapting Cricket as a new domain was recently proposed in Pandey (2017) which was calculated with a difference of approx. 6% in the tagger accuracy. Statistically, the accuracy of four domain tagger (without Cricket) is 85% and for five domain tagger (with Cricket) is approx. 93% which is 1% lower than the pre-existing Hindi tagger. This paper deals mainly with evaluation of the Hindi tagger (with and without Cricket as one of the domains). Author also attempts at finding the difference in terms of POS tagging issues in the output and the linguistic analysis of the errors found.

Downloads

Download data is not yet available.

Abstract

111

PDF

97

Improving Hindi POS Tagger Accuracy Through Domain Adaptation

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Information

Current Issue