Automated News Classification using N-gram Model and Key Features of Nepali Language
DOI:
https://doi.org/10.3126/scitech.v13i1.23504Keywords:
Document Similarity, Nepali Text Classification, Morphological analysis, Vector Space Model, Bag-of-words Model, N-gram, Bi-gram, Nepali News ClassificationAbstract
With an increasing trend of publishing news online on website, automatic text processing becomes more and more important. Automatic text classification has been a focus of many researchers in different languages for decades. There is a huge amount of research repository on features of English language and their uses on automated text processing. This research implements Nepali language key features for automatic text classification of Nepali news. In particular, the study on impact of Nepali language based features, which are extremely different than English language is more challenging because of the higher level of complexity to be resolved. The research experiment using vector space model, n-gram model and key feature based processing specific to Nepali language shows promising result compared to bag-of-words model for the task of automated Nepali news classification.