Comparison of machine learning algorithms in statistically imputed water potability dataset

Diwash Poudel; Dhadkan Shrestha; Sulove Bhattarai; Abhishek Ghimire

doi:10.3126/jiee.v5i1.42265

Comparison of machine learning algorithms in statistically imputed water potability dataset

Authors

Diwash Poudel IOE , Thapathali Campus
Dhadkan Shrestha IOE, Thapathali Campus
Sulove Bhattarai IOE, Thapathali Campus
Abhishek Ghimire IOE, Thapathali Campus

DOI:

https://doi.org/10.3126/jiee.v5i1.42265

Keywords:

ANN, K Nearest Neighbor, LR, missing values, RF

Abstract

Lack of safe drinking water is a growing concern in the present day and age. Since missing data is commonly found among most of the available datasets, the main purpose of this study is to find the best algorithm that works in the dataset that is statistically imputed and find the algorithm that gives the best prediction on whether water is potable or not. Water potability is predicted using its datasets with the help of the four algorithms evaluating nine features. Some values of the three features, specifically pH, chloramine, and trihalomethane, are found to be missing in the dataset. Missing values are filled in by the median of that particular feature. The performance of machine learning algorithms called LR, K-NN, RF, and ANN is compared in these given conditions. As per our research, RF, with 700 decision trees at a maximum depth of 30, is found to be the best-performing algorithm for the statically imputed water potability dataset. The study most certainly answers the question concerning the best algorithm, but still, further study is needed to optimize the algorithm in order to provide the best prediction.

Downloads

Download data is not yet available.

Abstract

172

PDF

155

Downloads

Published

2023-02-04 — Updated on 2023-03-10

How to Cite

Poudel, D., Shrestha, D., Bhattarai, S., & Ghimire, A. (2023). Comparison of machine learning algorithms in statistically imputed water potability dataset. Journal of Innovations in Engineering Education, 5(1), 38–46. https://doi.org/10.3126/jiee.v5i1.42265

Download Citation

Issue

Vol. 5 No. 1 (2022)

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Upon acceptance of an article, the copyright for the published works remains in the JIEE, Thapathali Campus and the authors.

Comparison of machine learning algorithms in statistically imputed water potability dataset

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Information