Preprocessing of Nepali News Corpus for Downstream Tasks
DOI:
https://doi.org/10.3126/nl.v35i01.46553Keywords:
Text processing, conjuncts, language models, glyphs, Nepali corpusAbstract
Text collected from online resources introduce a lot of errors which results in incorrect learning outcomes in automatic language learning tasks. In this paper, we discuss a Nepali text preprocessing pipeline to generate clean corpus. This pipeline is tested using a language model to observe impact of each steps in learning task. The relevancy of this work lies in systematizing the procedure in the development of standard Nepali corpus.
Downloads
Download data is not yet available.
Abstract
273
PDF
174
Downloads
Published
2022-07-11
How to Cite
Awale, S., Prasai, S., Rijal, B., & Basnet, S. B. (2022). Preprocessing of Nepali News Corpus for Downstream Tasks. Nepalese Linguistics, 35(01), 1–6. https://doi.org/10.3126/nl.v35i01.46553
Issue
Section
Articles