Establishing Optimal Machine Learning Models for Monitoring Water Quality in Vietnam’s Upper Ma River

This study aims to establish the optimal regression model for predicting total suspended solids (TSS) and Turbidity based on in situ data and spectral regions of Sentinel-2 images. Various machine learning models were evaluated, including Multilayer Perceptron Regression (MLPR), Random Forest Regre...

Full description

Saved in:
Bibliographic Details
Main Authors: Ngo Thanh Son, Nguyen Duc Loc
Format: Article
Language:English
Published: Environmental Research Institute, Chulalongkorn University 2024-11-01
Series:Applied Environmental Research
Subjects:
Online Access:https://ph01.tci-thaijo.org/index.php/aer/article/view/257524
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846162505573859328
author Ngo Thanh Son
Nguyen Duc Loc
author_facet Ngo Thanh Son
Nguyen Duc Loc
author_sort Ngo Thanh Son
collection DOAJ
description This study aims to establish the optimal regression model for predicting total suspended solids (TSS) and Turbidity based on in situ data and spectral regions of Sentinel-2 images. Various machine learning models were evaluated, including Multilayer Perceptron Regression (MLPR), Random Forest Regression (RFR), AdaBoost Regression (ABR), Multiple Linear Regression (MLR), and K-Nearest Neighbors Regression (KNNR). These models were applied to different band combinations of spectral regions: visible (VIS), near-infrared (NIR), shortwave-infrared (SWIR), VIS+NIR (VNIR), and VIS+NIR+SWIR (VNIR+SWIR). The study results revealed that the MLR model, while not the best performer during training (R2 = 0.89 for TSS and R2 = 0.66 for turbidity), did not exhibit overfitting, with corresponding R² values in testing being 0.80 and 0.42, respectively. Variable selection for MLR models identified optimal spectral bands: B3, B5, B6, B8, B11, and B12 for TSS, and B4, B8, B11, and B12 for Turbidity. The final no-intercept multiple linear regression models achieved R2 = 0.88 for TSS and R2 = 0.62 for turbidity. Performance metrics for TSS were superior, with lower MAE, MSE, and RMSE compared to Turbidity. This study underscores the efficacy of using MLR models with selected spectral bands for accurate and generalizable predictions of TSS and turbidity.
format Article
id doaj-art-d7aaf0f27af54c36a3e0786b77d8b061
institution Kabale University
issn 2287-075X
language English
publishDate 2024-11-01
publisher Environmental Research Institute, Chulalongkorn University
record_format Article
series Applied Environmental Research
spelling doaj-art-d7aaf0f27af54c36a3e0786b77d8b0612024-11-20T11:05:09ZengEnvironmental Research Institute, Chulalongkorn UniversityApplied Environmental Research2287-075X2024-11-01464Establishing Optimal Machine Learning Models for Monitoring Water Quality in Vietnam’s Upper Ma RiverNgo Thanh Son0Nguyen Duc Loc1Faculty of Natural Resources and Environment, Vietnam National University of Agriculture, Hanoi, VietnamFaculty of Natural Resources and Environment, Vietnam National University of Agriculture, Hanoi, Vietnam This study aims to establish the optimal regression model for predicting total suspended solids (TSS) and Turbidity based on in situ data and spectral regions of Sentinel-2 images. Various machine learning models were evaluated, including Multilayer Perceptron Regression (MLPR), Random Forest Regression (RFR), AdaBoost Regression (ABR), Multiple Linear Regression (MLR), and K-Nearest Neighbors Regression (KNNR). These models were applied to different band combinations of spectral regions: visible (VIS), near-infrared (NIR), shortwave-infrared (SWIR), VIS+NIR (VNIR), and VIS+NIR+SWIR (VNIR+SWIR). The study results revealed that the MLR model, while not the best performer during training (R2 = 0.89 for TSS and R2 = 0.66 for turbidity), did not exhibit overfitting, with corresponding R² values in testing being 0.80 and 0.42, respectively. Variable selection for MLR models identified optimal spectral bands: B3, B5, B6, B8, B11, and B12 for TSS, and B4, B8, B11, and B12 for Turbidity. The final no-intercept multiple linear regression models achieved R2 = 0.88 for TSS and R2 = 0.62 for turbidity. Performance metrics for TSS were superior, with lower MAE, MSE, and RMSE compared to Turbidity. This study underscores the efficacy of using MLR models with selected spectral bands for accurate and generalizable predictions of TSS and turbidity. https://ph01.tci-thaijo.org/index.php/aer/article/view/257524Water quality monitoringMachine learning modelSentinel-2 imageryTurbidityTotal suspended solidsUpper Ma river
spellingShingle Ngo Thanh Son
Nguyen Duc Loc
Establishing Optimal Machine Learning Models for Monitoring Water Quality in Vietnam’s Upper Ma River
Applied Environmental Research
Water quality monitoring
Machine learning model
Sentinel-2 imagery
Turbidity
Total suspended solids
Upper Ma river
title Establishing Optimal Machine Learning Models for Monitoring Water Quality in Vietnam’s Upper Ma River
title_full Establishing Optimal Machine Learning Models for Monitoring Water Quality in Vietnam’s Upper Ma River
title_fullStr Establishing Optimal Machine Learning Models for Monitoring Water Quality in Vietnam’s Upper Ma River
title_full_unstemmed Establishing Optimal Machine Learning Models for Monitoring Water Quality in Vietnam’s Upper Ma River
title_short Establishing Optimal Machine Learning Models for Monitoring Water Quality in Vietnam’s Upper Ma River
title_sort establishing optimal machine learning models for monitoring water quality in vietnam s upper ma river
topic Water quality monitoring
Machine learning model
Sentinel-2 imagery
Turbidity
Total suspended solids
Upper Ma river
url https://ph01.tci-thaijo.org/index.php/aer/article/view/257524
work_keys_str_mv AT ngothanhson establishingoptimalmachinelearningmodelsformonitoringwaterqualityinvietnamsuppermariver
AT nguyenducloc establishingoptimalmachinelearningmodelsformonitoringwaterqualityinvietnamsuppermariver