Optimized Ensemble Methods for Classifying Imbalanced Water Quality Index Data
River water pollution has increased due to human activities. Initially, numerical and analytical methods were used to classify river water quality, but machine learning now enables faster and more accurate water quality index (WQI) classification. This study aimed to develop an effective ensemble mo...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10757416/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846128554945806336 |
|---|---|
| author | Zaharaddeen Karami Lawal Ali Aldrees Hayati Yassin Salisu Dan'azumi Sujay Raghavendra Naganna Sani I. Abba Saad Sh. Sammen |
| author_facet | Zaharaddeen Karami Lawal Ali Aldrees Hayati Yassin Salisu Dan'azumi Sujay Raghavendra Naganna Sani I. Abba Saad Sh. Sammen |
| author_sort | Zaharaddeen Karami Lawal |
| collection | DOAJ |
| description | River water pollution has increased due to human activities. Initially, numerical and analytical methods were used to classify river water quality, but machine learning now enables faster and more accurate water quality index (WQI) classification. This study aimed to develop an effective ensemble model for classifying river water as drinkable or polluted using advanced machine learning. The objective was to apply a classification method to predict WQI using Kinta River data in Malaysia and improve on existing models’ <inline-formula> <tex-math notation="LaTeX">$70-95\%$ </tex-math></inline-formula> accuracy range. The dataset of this study comprises 301 records collected from eight monitoring stations along the Kinta River, encompassing 31 pollution indicators, including hydrological, chemical, physical, and microbiological parameters. Six algorithms used include decision tree, logistic regression, random forest, support vector machine, AdaBoost, and XGBoost. The three experiments were conducted with and without hyperparameter tuning. The dataset was normalized and oversampled to address the imbalance. In all experiments, XGBoost performed best individually, while SVM was worst. The ensemble models outperformed individuals, with the GridSearchCV ensemble achieving 97.3% accuracy, an improvement exceeding the existing literature’s models by 2.3%. The study had limitations, such as the absence of advanced optimization or dimensionality reduction. In conclusion, it demonstrated that an ensemble model with optimized hyperparameters could classify river water quality more effectively than individual models, contributing to the advancement of sustainable development goals (SGD) related to water access. |
| format | Article |
| id | doaj-art-9b78b6a93b894857b6de0165ab664ada |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-9b78b6a93b894857b6de0165ab664ada2024-12-11T00:04:54ZengIEEEIEEE Access2169-35362024-01-011217853617855110.1109/ACCESS.2024.350236110757416Optimized Ensemble Methods for Classifying Imbalanced Water Quality Index DataZaharaddeen Karami Lawal0https://orcid.org/0000-0003-3011-5581Ali Aldrees1https://orcid.org/0000-0001-6575-6181Hayati Yassin2https://orcid.org/0000-0002-6271-2367Salisu Dan'azumi3Sujay Raghavendra Naganna4https://orcid.org/0000-0002-0482-1936Sani I. Abba5https://orcid.org/0000-0001-9356-2798Saad Sh. Sammen6https://orcid.org/0000-0002-1708-0612Faculty of Integrated Technologies, Universiti Brunei Darussalam, Bandar Seri Begawan, BruneiDepartment of Civil Engineering, College of Engineering in Al-Kharaj, Prince Sattam bin AbdulAziz University, Al-Kharaj, Saudi ArabiaFaculty of Integrated Technologies, Universiti Brunei Darussalam, Bandar Seri Begawan, BruneiDepartment of Civil Engineering, College of Engineering in Al-Kharaj, Prince Sattam bin AbdulAziz University, Al-Kharaj, Saudi ArabiaDepartment of Civil Engineering, Manipal Academy of Higher Education, Manipal Institute of Technology Bengaluru, Manipal, Karnataka, IndiaDepartment of Civil Engineering, Prince Mohammad Bin Fahd University, Al Khobar, Saudi ArabiaDepartment of Civil Engineering, College of Engineering, University of Diyala, Baqubah, IraqRiver water pollution has increased due to human activities. Initially, numerical and analytical methods were used to classify river water quality, but machine learning now enables faster and more accurate water quality index (WQI) classification. This study aimed to develop an effective ensemble model for classifying river water as drinkable or polluted using advanced machine learning. The objective was to apply a classification method to predict WQI using Kinta River data in Malaysia and improve on existing models’ <inline-formula> <tex-math notation="LaTeX">$70-95\%$ </tex-math></inline-formula> accuracy range. The dataset of this study comprises 301 records collected from eight monitoring stations along the Kinta River, encompassing 31 pollution indicators, including hydrological, chemical, physical, and microbiological parameters. Six algorithms used include decision tree, logistic regression, random forest, support vector machine, AdaBoost, and XGBoost. The three experiments were conducted with and without hyperparameter tuning. The dataset was normalized and oversampled to address the imbalance. In all experiments, XGBoost performed best individually, while SVM was worst. The ensemble models outperformed individuals, with the GridSearchCV ensemble achieving 97.3% accuracy, an improvement exceeding the existing literature’s models by 2.3%. The study had limitations, such as the absence of advanced optimization or dimensionality reduction. In conclusion, it demonstrated that an ensemble model with optimized hyperparameters could classify river water quality more effectively than individual models, contributing to the advancement of sustainable development goals (SGD) related to water access.https://ieeexplore.ieee.org/document/10757416/Artificial intelligencewater quality modellingpollutionmachine learning |
| spellingShingle | Zaharaddeen Karami Lawal Ali Aldrees Hayati Yassin Salisu Dan'azumi Sujay Raghavendra Naganna Sani I. Abba Saad Sh. Sammen Optimized Ensemble Methods for Classifying Imbalanced Water Quality Index Data IEEE Access Artificial intelligence water quality modelling pollution machine learning |
| title | Optimized Ensemble Methods for Classifying Imbalanced Water Quality Index Data |
| title_full | Optimized Ensemble Methods for Classifying Imbalanced Water Quality Index Data |
| title_fullStr | Optimized Ensemble Methods for Classifying Imbalanced Water Quality Index Data |
| title_full_unstemmed | Optimized Ensemble Methods for Classifying Imbalanced Water Quality Index Data |
| title_short | Optimized Ensemble Methods for Classifying Imbalanced Water Quality Index Data |
| title_sort | optimized ensemble methods for classifying imbalanced water quality index data |
| topic | Artificial intelligence water quality modelling pollution machine learning |
| url | https://ieeexplore.ieee.org/document/10757416/ |
| work_keys_str_mv | AT zaharaddeenkaramilawal optimizedensemblemethodsforclassifyingimbalancedwaterqualityindexdata AT alialdrees optimizedensemblemethodsforclassifyingimbalancedwaterqualityindexdata AT hayatiyassin optimizedensemblemethodsforclassifyingimbalancedwaterqualityindexdata AT salisudanazumi optimizedensemblemethodsforclassifyingimbalancedwaterqualityindexdata AT sujayraghavendranaganna optimizedensemblemethodsforclassifyingimbalancedwaterqualityindexdata AT saniiabba optimizedensemblemethodsforclassifyingimbalancedwaterqualityindexdata AT saadshsammen optimizedensemblemethodsforclassifyingimbalancedwaterqualityindexdata |