Application of the Lasso regularisation technique in mitigating overfitting in air quality prediction models
Abstract As a significant global concern, air pollution triggers enormous challenges in public health and ecological sustainability, necessitating the development of precise algorithms to forecast and mitigate its impacts, which has led to the development of many machine learning (ML)-based models f...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | Scientific Reports |
Subjects: | |
Online Access: | https://doi.org/10.1038/s41598-024-84342-y |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841559601224351744 |
---|---|
author | Abbas Pak Abdullah Kaviani Rad Mohammad Javad Nematollahi Mohammadreza Mahmoudi |
author_facet | Abbas Pak Abdullah Kaviani Rad Mohammad Javad Nematollahi Mohammadreza Mahmoudi |
author_sort | Abbas Pak |
collection | DOAJ |
description | Abstract As a significant global concern, air pollution triggers enormous challenges in public health and ecological sustainability, necessitating the development of precise algorithms to forecast and mitigate its impacts, which has led to the development of many machine learning (ML)-based models for predicting air quality. Meanwhile, overfitting is a prevalent issue with ML algorithms that decreases their efficacy and generalizability. The present investigation, using an extensive collection of data from 16 sensors in Tehran, Iran, from 2013 to 2023, focuses on applying the Least Absolute Shrinkage and Selection Operator (Lasso) regularisation technique to enhance the forecasting precision of ambient air pollutants concentration models, including particulate matter (PM2.5 and PM10), CO, NO2, SO2, and O3 while decreasing overfitting. The outputs were compared using the R-squared (R2), mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and normalised mean square error (NMSE) indices. Despite the preliminary findings revealing that Lasso dramatically enhances model reliability by decreasing overfitting and determining key attributes, the model’s performance in predicting gaseous pollutants against PM remained unsatisfactory (R2 PM2.5 = 0.80, R2 PM10 = 0.75, R2 CO = 0.45, R2 NO2 = 0.55, R2 SO2 = 0.65, and R2 O3 = 0.35). The minimal degree of missing data presumably explained the strong performance of the PM model, while the high dynamism of gases and their chemical interactions, in conjunction with the inherent characteristics of the model, were the primary factors contributing to the poor performance of the model. Simultaneously, the successful implementation of the Lasso regularisation approach in mitigating overfitting and selecting more important features makes it highly suggested for application in air quality forecasting models. |
format | Article |
id | doaj-art-6cdbcae6e4dc4527b0d03bbd244e0182 |
institution | Kabale University |
issn | 2045-2322 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj-art-6cdbcae6e4dc4527b0d03bbd244e01822025-01-05T12:20:23ZengNature PortfolioScientific Reports2045-23222025-01-0115111710.1038/s41598-024-84342-yApplication of the Lasso regularisation technique in mitigating overfitting in air quality prediction modelsAbbas Pak0Abdullah Kaviani Rad1Mohammad Javad Nematollahi2Mohammadreza Mahmoudi3Department of Computer Sciences, Shahrekord UniversityDepartment of Environmental Engineering and Natural Resources, College of Agriculture, Shiraz UniversityDepartment of Geology, Faculty of Sciences, Urmia UniversityDepartment of Statistics, Faculty of Science, Fasa UniversityAbstract As a significant global concern, air pollution triggers enormous challenges in public health and ecological sustainability, necessitating the development of precise algorithms to forecast and mitigate its impacts, which has led to the development of many machine learning (ML)-based models for predicting air quality. Meanwhile, overfitting is a prevalent issue with ML algorithms that decreases their efficacy and generalizability. The present investigation, using an extensive collection of data from 16 sensors in Tehran, Iran, from 2013 to 2023, focuses on applying the Least Absolute Shrinkage and Selection Operator (Lasso) regularisation technique to enhance the forecasting precision of ambient air pollutants concentration models, including particulate matter (PM2.5 and PM10), CO, NO2, SO2, and O3 while decreasing overfitting. The outputs were compared using the R-squared (R2), mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and normalised mean square error (NMSE) indices. Despite the preliminary findings revealing that Lasso dramatically enhances model reliability by decreasing overfitting and determining key attributes, the model’s performance in predicting gaseous pollutants against PM remained unsatisfactory (R2 PM2.5 = 0.80, R2 PM10 = 0.75, R2 CO = 0.45, R2 NO2 = 0.55, R2 SO2 = 0.65, and R2 O3 = 0.35). The minimal degree of missing data presumably explained the strong performance of the PM model, while the high dynamism of gases and their chemical interactions, in conjunction with the inherent characteristics of the model, were the primary factors contributing to the poor performance of the model. Simultaneously, the successful implementation of the Lasso regularisation approach in mitigating overfitting and selecting more important features makes it highly suggested for application in air quality forecasting models.https://doi.org/10.1038/s41598-024-84342-yAir pollutionAir quality predictionOverfittingLasso regularisationMachine learning |
spellingShingle | Abbas Pak Abdullah Kaviani Rad Mohammad Javad Nematollahi Mohammadreza Mahmoudi Application of the Lasso regularisation technique in mitigating overfitting in air quality prediction models Scientific Reports Air pollution Air quality prediction Overfitting Lasso regularisation Machine learning |
title | Application of the Lasso regularisation technique in mitigating overfitting in air quality prediction models |
title_full | Application of the Lasso regularisation technique in mitigating overfitting in air quality prediction models |
title_fullStr | Application of the Lasso regularisation technique in mitigating overfitting in air quality prediction models |
title_full_unstemmed | Application of the Lasso regularisation technique in mitigating overfitting in air quality prediction models |
title_short | Application of the Lasso regularisation technique in mitigating overfitting in air quality prediction models |
title_sort | application of the lasso regularisation technique in mitigating overfitting in air quality prediction models |
topic | Air pollution Air quality prediction Overfitting Lasso regularisation Machine learning |
url | https://doi.org/10.1038/s41598-024-84342-y |
work_keys_str_mv | AT abbaspak applicationofthelassoregularisationtechniqueinmitigatingoverfittinginairqualitypredictionmodels AT abdullahkavianirad applicationofthelassoregularisationtechniqueinmitigatingoverfittinginairqualitypredictionmodels AT mohammadjavadnematollahi applicationofthelassoregularisationtechniqueinmitigatingoverfittinginairqualitypredictionmodels AT mohammadrezamahmoudi applicationofthelassoregularisationtechniqueinmitigatingoverfittinginairqualitypredictionmodels |