Optimized Ensemble Methods for Classifying Imbalanced Water Quality Index Data

River water pollution has increased due to human activities. Initially, numerical and analytical methods were used to classify river water quality, but machine learning now enables faster and more accurate water quality index (WQI) classification. This study aimed to develop an effective ensemble mo...

Full description

Saved in:
Bibliographic Details
Main Authors: Zaharaddeen Karami Lawal, Ali Aldrees, Hayati Yassin, Salisu Dan'azumi, Sujay Raghavendra Naganna, Sani I. Abba, Saad Sh. Sammen
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10757416/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846128554945806336
author Zaharaddeen Karami Lawal
Ali Aldrees
Hayati Yassin
Salisu Dan'azumi
Sujay Raghavendra Naganna
Sani I. Abba
Saad Sh. Sammen
author_facet Zaharaddeen Karami Lawal
Ali Aldrees
Hayati Yassin
Salisu Dan'azumi
Sujay Raghavendra Naganna
Sani I. Abba
Saad Sh. Sammen
author_sort Zaharaddeen Karami Lawal
collection DOAJ
description River water pollution has increased due to human activities. Initially, numerical and analytical methods were used to classify river water quality, but machine learning now enables faster and more accurate water quality index (WQI) classification. This study aimed to develop an effective ensemble model for classifying river water as drinkable or polluted using advanced machine learning. The objective was to apply a classification method to predict WQI using Kinta River data in Malaysia and improve on existing models&#x2019; <inline-formula> <tex-math notation="LaTeX">$70-95\%$ </tex-math></inline-formula> accuracy range. The dataset of this study comprises 301 records collected from eight monitoring stations along the Kinta River, encompassing 31 pollution indicators, including hydrological, chemical, physical, and microbiological parameters. Six algorithms used include decision tree, logistic regression, random forest, support vector machine, AdaBoost, and XGBoost. The three experiments were conducted with and without hyperparameter tuning. The dataset was normalized and oversampled to address the imbalance. In all experiments, XGBoost performed best individually, while SVM was worst. The ensemble models outperformed individuals, with the GridSearchCV ensemble achieving 97.3% accuracy, an improvement exceeding the existing literature&#x2019;s models by 2.3%. The study had limitations, such as the absence of advanced optimization or dimensionality reduction. In conclusion, it demonstrated that an ensemble model with optimized hyperparameters could classify river water quality more effectively than individual models, contributing to the advancement of sustainable development goals (SGD) related to water access.
format Article
id doaj-art-9b78b6a93b894857b6de0165ab664ada
institution Kabale University
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-9b78b6a93b894857b6de0165ab664ada2024-12-11T00:04:54ZengIEEEIEEE Access2169-35362024-01-011217853617855110.1109/ACCESS.2024.350236110757416Optimized Ensemble Methods for Classifying Imbalanced Water Quality Index DataZaharaddeen Karami Lawal0https://orcid.org/0000-0003-3011-5581Ali Aldrees1https://orcid.org/0000-0001-6575-6181Hayati Yassin2https://orcid.org/0000-0002-6271-2367Salisu Dan'azumi3Sujay Raghavendra Naganna4https://orcid.org/0000-0002-0482-1936Sani I. Abba5https://orcid.org/0000-0001-9356-2798Saad Sh. Sammen6https://orcid.org/0000-0002-1708-0612Faculty of Integrated Technologies, Universiti Brunei Darussalam, Bandar Seri Begawan, BruneiDepartment of Civil Engineering, College of Engineering in Al-Kharaj, Prince Sattam bin AbdulAziz University, Al-Kharaj, Saudi ArabiaFaculty of Integrated Technologies, Universiti Brunei Darussalam, Bandar Seri Begawan, BruneiDepartment of Civil Engineering, College of Engineering in Al-Kharaj, Prince Sattam bin AbdulAziz University, Al-Kharaj, Saudi ArabiaDepartment of Civil Engineering, Manipal Academy of Higher Education, Manipal Institute of Technology Bengaluru, Manipal, Karnataka, IndiaDepartment of Civil Engineering, Prince Mohammad Bin Fahd University, Al Khobar, Saudi ArabiaDepartment of Civil Engineering, College of Engineering, University of Diyala, Baqubah, IraqRiver water pollution has increased due to human activities. Initially, numerical and analytical methods were used to classify river water quality, but machine learning now enables faster and more accurate water quality index (WQI) classification. This study aimed to develop an effective ensemble model for classifying river water as drinkable or polluted using advanced machine learning. The objective was to apply a classification method to predict WQI using Kinta River data in Malaysia and improve on existing models&#x2019; <inline-formula> <tex-math notation="LaTeX">$70-95\%$ </tex-math></inline-formula> accuracy range. The dataset of this study comprises 301 records collected from eight monitoring stations along the Kinta River, encompassing 31 pollution indicators, including hydrological, chemical, physical, and microbiological parameters. Six algorithms used include decision tree, logistic regression, random forest, support vector machine, AdaBoost, and XGBoost. The three experiments were conducted with and without hyperparameter tuning. The dataset was normalized and oversampled to address the imbalance. In all experiments, XGBoost performed best individually, while SVM was worst. The ensemble models outperformed individuals, with the GridSearchCV ensemble achieving 97.3% accuracy, an improvement exceeding the existing literature&#x2019;s models by 2.3%. The study had limitations, such as the absence of advanced optimization or dimensionality reduction. In conclusion, it demonstrated that an ensemble model with optimized hyperparameters could classify river water quality more effectively than individual models, contributing to the advancement of sustainable development goals (SGD) related to water access.https://ieeexplore.ieee.org/document/10757416/Artificial intelligencewater quality modellingpollutionmachine learning
spellingShingle Zaharaddeen Karami Lawal
Ali Aldrees
Hayati Yassin
Salisu Dan'azumi
Sujay Raghavendra Naganna
Sani I. Abba
Saad Sh. Sammen
Optimized Ensemble Methods for Classifying Imbalanced Water Quality Index Data
IEEE Access
Artificial intelligence
water quality modelling
pollution
machine learning
title Optimized Ensemble Methods for Classifying Imbalanced Water Quality Index Data
title_full Optimized Ensemble Methods for Classifying Imbalanced Water Quality Index Data
title_fullStr Optimized Ensemble Methods for Classifying Imbalanced Water Quality Index Data
title_full_unstemmed Optimized Ensemble Methods for Classifying Imbalanced Water Quality Index Data
title_short Optimized Ensemble Methods for Classifying Imbalanced Water Quality Index Data
title_sort optimized ensemble methods for classifying imbalanced water quality index data
topic Artificial intelligence
water quality modelling
pollution
machine learning
url https://ieeexplore.ieee.org/document/10757416/
work_keys_str_mv AT zaharaddeenkaramilawal optimizedensemblemethodsforclassifyingimbalancedwaterqualityindexdata
AT alialdrees optimizedensemblemethodsforclassifyingimbalancedwaterqualityindexdata
AT hayatiyassin optimizedensemblemethodsforclassifyingimbalancedwaterqualityindexdata
AT salisudanazumi optimizedensemblemethodsforclassifyingimbalancedwaterqualityindexdata
AT sujayraghavendranaganna optimizedensemblemethodsforclassifyingimbalancedwaterqualityindexdata
AT saniiabba optimizedensemblemethodsforclassifyingimbalancedwaterqualityindexdata
AT saadshsammen optimizedensemblemethodsforclassifyingimbalancedwaterqualityindexdata