Water quality prediction using Machine Learning Models

The quality of water is a vital determinant of environmental sustainability, economic development, and general welfare. India has substantial water quality issues, with different areas facing varying levels of pollution. Industrial effluents introduce toxic chemicals and heavy metals into water bodi...

Full description

Saved in:
Bibliographic Details
Main Authors: Sharma Astha, Sharma Richa, Rana Rishi, Kalia Anshul
Format: Article
Language:English
Published: EDP Sciences 2024-01-01
Series:E3S Web of Conferences
Subjects:
Online Access:https://www.e3s-conferences.org/articles/e3sconf/pdf/2024/126/e3sconf_iccmes2024_01025.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846139444251328512
author Sharma Astha
Sharma Richa
Rana Rishi
Kalia Anshul
author_facet Sharma Astha
Sharma Richa
Rana Rishi
Kalia Anshul
author_sort Sharma Astha
collection DOAJ
description The quality of water is a vital determinant of environmental sustainability, economic development, and general welfare. India has substantial water quality issues, with different areas facing varying levels of pollution. Industrial effluents introduce toxic chemicals and heavy metals into water bodies, while agricultural runoff carries pesticides, fertilizers, and sediments, causing eutrophication and water pollution. The Ganges, Yamuna, and Godavari rivers have elevated amounts of pollution. According to the Central Pollution Control Board, the levels of biochemical oxygen demand, which is a measure of organic pollution, often above the acceptable thresholds in many sections of these rivers. Conventional techniques for monitoring water quality are often arduous, time-consuming, and incapable of delivering real- time evaluations. The objective of this study is to create a precise classification model that can accurately forecast water quality by using a range of indicators. The aim is to use machine learning techniques, including decision trees, K-Nearest Neighbor (KNN), and Random Forest, to develop prediction models that can effectively assess water quality and identify possible pollution incidents before they become major issues. This research used a comprehensive dataset of water quality metrics, including pH, turbidity, dissolved oxygen, temperature, phosphates, and nitrates, to assess the accuracy of each algorithm in forecasting water potability. The Random Forest method attained a superior accuracy of 70.4%, successfully handling intricate interactions and mitigating overfitting by using ensemble learning. The KNN method, which achieved an accuracy of 59%, had challenges arising from its susceptibility to the selection of k and distance measures, as well as processing inefficiencies. The Decision Tree approach, despite its speed and interpretability, had the lowest accuracy of 58% mostly owing to overfitting, which impeded its ability to generalize. This study highlights the better performance of the Random Forest model in predicting water quality because of its ability to capture complex non-linear relationships, handle noisy data, and prevent overfitting by aggregating multiple decision trees.
format Article
id doaj-art-d92bc40e3ab34855b7879eb9a3f3e094
institution Kabale University
issn 2267-1242
language English
publishDate 2024-01-01
publisher EDP Sciences
record_format Article
series E3S Web of Conferences
spelling doaj-art-d92bc40e3ab34855b7879eb9a3f3e0942024-12-06T10:16:10ZengEDP SciencesE3S Web of Conferences2267-12422024-01-015960102510.1051/e3sconf/202459601025e3sconf_iccmes2024_01025Water quality prediction using Machine Learning ModelsSharma Astha0Sharma Richa1Rana Rishi2Kalia Anshul3Department of civil Engineering, Jaypee University of Information Technology WaknaghatDepartment of civil Engineering, Jaypee University of Information Technology WaknaghatDepartment of civil Engineering, Jaypee University of Information Technology WaknaghatDepartment of Computer Science and Engineering, Himachal Pradesh UniversityThe quality of water is a vital determinant of environmental sustainability, economic development, and general welfare. India has substantial water quality issues, with different areas facing varying levels of pollution. Industrial effluents introduce toxic chemicals and heavy metals into water bodies, while agricultural runoff carries pesticides, fertilizers, and sediments, causing eutrophication and water pollution. The Ganges, Yamuna, and Godavari rivers have elevated amounts of pollution. According to the Central Pollution Control Board, the levels of biochemical oxygen demand, which is a measure of organic pollution, often above the acceptable thresholds in many sections of these rivers. Conventional techniques for monitoring water quality are often arduous, time-consuming, and incapable of delivering real- time evaluations. The objective of this study is to create a precise classification model that can accurately forecast water quality by using a range of indicators. The aim is to use machine learning techniques, including decision trees, K-Nearest Neighbor (KNN), and Random Forest, to develop prediction models that can effectively assess water quality and identify possible pollution incidents before they become major issues. This research used a comprehensive dataset of water quality metrics, including pH, turbidity, dissolved oxygen, temperature, phosphates, and nitrates, to assess the accuracy of each algorithm in forecasting water potability. The Random Forest method attained a superior accuracy of 70.4%, successfully handling intricate interactions and mitigating overfitting by using ensemble learning. The KNN method, which achieved an accuracy of 59%, had challenges arising from its susceptibility to the selection of k and distance measures, as well as processing inefficiencies. The Decision Tree approach, despite its speed and interpretability, had the lowest accuracy of 58% mostly owing to overfitting, which impeded its ability to generalize. This study highlights the better performance of the Random Forest model in predicting water quality because of its ability to capture complex non-linear relationships, handle noisy data, and prevent overfitting by aggregating multiple decision trees.https://www.e3s-conferences.org/articles/e3sconf/pdf/2024/126/e3sconf_iccmes2024_01025.pdfwq predictionmachine learningclassification modelsrandom rf forestgradient boosting machineswater quality indicators
spellingShingle Sharma Astha
Sharma Richa
Rana Rishi
Kalia Anshul
Water quality prediction using Machine Learning Models
E3S Web of Conferences
wq prediction
machine learning
classification models
random rf forest
gradient boosting machines
water quality indicators
title Water quality prediction using Machine Learning Models
title_full Water quality prediction using Machine Learning Models
title_fullStr Water quality prediction using Machine Learning Models
title_full_unstemmed Water quality prediction using Machine Learning Models
title_short Water quality prediction using Machine Learning Models
title_sort water quality prediction using machine learning models
topic wq prediction
machine learning
classification models
random rf forest
gradient boosting machines
water quality indicators
url https://www.e3s-conferences.org/articles/e3sconf/pdf/2024/126/e3sconf_iccmes2024_01025.pdf
work_keys_str_mv AT sharmaastha waterqualitypredictionusingmachinelearningmodels
AT sharmaricha waterqualitypredictionusingmachinelearningmodels
AT ranarishi waterqualitypredictionusingmachinelearningmodels
AT kaliaanshul waterqualitypredictionusingmachinelearningmodels