Importance of Data Preprocessing and Parameters Tuning for Supervised Machine Learning Models on Tweets Sentiment Analysis

Saurab Adhikari

doi:10.3126/batuk.v10i1.62303

Importance of Data Preprocessing and Parameters Tuning for Supervised Machine Learning Models on Tweets Sentiment Analysis

Authors

Saurab Adhikari Nesfield International College

DOI:

https://doi.org/10.3126/batuk.v10i1.62303

Keywords:

machine learning, natural language processing, sentiment analysis, text analysis

Abstract

This paper shows the comparison of five different supervised machine learning models by showing the accuracy and classification report of these models when used for tweets sentiments analysis while showing the improvement in accuracy when data was preprocessed and parameters were tuned. The five different models that were used are: NaiveBayes, Support Vector Machine, Random Forest, Long Short-Term Memory (LSTM) and XG Boost. Total of 25000 tweets were processed, analyzed and predicted the output as positive, negative, or neutral using those models. This research would help to understand which models should be used and followed and which model would yield higher accuracy while using various approaches of data preprocessing and parameters tuning. The paper also tries to show that the standard models can still perform better and are still viable for sentiment analysis while SVM and Random Forest classifiers maybe viewed as standard learning strategies.

Downloads

Download data is not yet available.

Abstract

205

Pdf

137

Author Biography

Saurab Adhikari, Nesfield International College

Faculty

Downloads

Published

2024-01-29

How to Cite

Adhikari, S. (2024). Importance of Data Preprocessing and Parameters Tuning for Supervised Machine Learning Models on Tweets Sentiment Analysis. The Batuk, 10(1), 133–151. https://doi.org/10.3126/batuk.v10i1.62303

Download Citation

Issue

Vol. 10 No. 1 (2024)

Section

Part II: Humanities and Social Sciences

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator.

Importance of Data Preprocessing and Parameters Tuning for Supervised Machine Learning Models on Tweets Sentiment Analysis

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biography

Saurab Adhikari, Nesfield International College

Downloads

Published

How to Cite

Issue

Section

License

Information

Current Issue