Preprocessing of Nepali News Corpus for Downstream Tasks

Sushil Awale; Suraj Prasai; Birodh Rijal; Santa B. Basnet

doi:10.3126/nl.v35i01.46553

Authors

Sushil Awale Integrated ICT Private LTD, Kupondole, Lalitpur, Nepal
Suraj Prasai Integrated ICT Private LTD, Kupondole, Lalitpur, Nepal
Birodh Rijal
Santa B. Basnet

DOI:

https://doi.org/10.3126/nl.v35i01.46553

Keywords:

Text processing, conjuncts, language models, glyphs, Nepali corpus

Abstract

Text collected from online resources introduce a lot of errors which results in incorrect learning outcomes in automatic language learning tasks. In this paper, we discuss a Nepali text preprocessing pipeline to generate clean corpus. This pipeline is tested using a language model to observe impact of each steps in learning task. The relevancy of this work lies in systematizing the procedure in the development of standard Nepali corpus.

Downloads

Download data is not yet available.

Abstract

311

PDF

209

Preprocessing of Nepali News Corpus for Downstream Tasks

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Information

Current Issue