Application of the Lasso regularisation technique in mitigating overfitting in air quality prediction models

Abstract As a significant global concern, air pollution triggers enormous challenges in public health and ecological sustainability, necessitating the development of precise algorithms to forecast and mitigate its impacts, which has led to the development of many machine learning (ML)-based models f...

Full description

Saved in:
Bibliographic Details
Main Authors: Abbas Pak, Abdullah Kaviani Rad, Mohammad Javad Nematollahi, Mohammadreza Mahmoudi
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-84342-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559601224351744
author Abbas Pak
Abdullah Kaviani Rad
Mohammad Javad Nematollahi
Mohammadreza Mahmoudi
author_facet Abbas Pak
Abdullah Kaviani Rad
Mohammad Javad Nematollahi
Mohammadreza Mahmoudi
author_sort Abbas Pak
collection DOAJ
description Abstract As a significant global concern, air pollution triggers enormous challenges in public health and ecological sustainability, necessitating the development of precise algorithms to forecast and mitigate its impacts, which has led to the development of many machine learning (ML)-based models for predicting air quality. Meanwhile, overfitting is a prevalent issue with ML algorithms that decreases their efficacy and generalizability. The present investigation, using an extensive collection of data from 16 sensors in Tehran, Iran, from 2013 to 2023, focuses on applying the Least Absolute Shrinkage and Selection Operator (Lasso) regularisation technique to enhance the forecasting precision of ambient air pollutants concentration models, including particulate matter (PM2.5 and PM10), CO, NO2, SO2, and O3 while decreasing overfitting. The outputs were compared using the R-squared (R2), mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and normalised mean square error (NMSE) indices. Despite the preliminary findings revealing that Lasso dramatically enhances model reliability by decreasing overfitting and determining key attributes, the model’s performance in predicting gaseous pollutants against PM remained unsatisfactory (R2 PM2.5 = 0.80, R2 PM10 = 0.75, R2 CO = 0.45, R2 NO2 = 0.55, R2 SO2 = 0.65, and R2 O3 = 0.35). The minimal degree of missing data presumably explained the strong performance of the PM model, while the high dynamism of gases and their chemical interactions, in conjunction with the inherent characteristics of the model, were the primary factors contributing to the poor performance of the model. Simultaneously, the successful implementation of the Lasso regularisation approach in mitigating overfitting and selecting more important features makes it highly suggested for application in air quality forecasting models.
format Article
id doaj-art-6cdbcae6e4dc4527b0d03bbd244e0182
institution Kabale University
issn 2045-2322
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-6cdbcae6e4dc4527b0d03bbd244e01822025-01-05T12:20:23ZengNature PortfolioScientific Reports2045-23222025-01-0115111710.1038/s41598-024-84342-yApplication of the Lasso regularisation technique in mitigating overfitting in air quality prediction modelsAbbas Pak0Abdullah Kaviani Rad1Mohammad Javad Nematollahi2Mohammadreza Mahmoudi3Department of Computer Sciences, Shahrekord UniversityDepartment of Environmental Engineering and Natural Resources, College of Agriculture, Shiraz UniversityDepartment of Geology, Faculty of Sciences, Urmia UniversityDepartment of Statistics, Faculty of Science, Fasa UniversityAbstract As a significant global concern, air pollution triggers enormous challenges in public health and ecological sustainability, necessitating the development of precise algorithms to forecast and mitigate its impacts, which has led to the development of many machine learning (ML)-based models for predicting air quality. Meanwhile, overfitting is a prevalent issue with ML algorithms that decreases their efficacy and generalizability. The present investigation, using an extensive collection of data from 16 sensors in Tehran, Iran, from 2013 to 2023, focuses on applying the Least Absolute Shrinkage and Selection Operator (Lasso) regularisation technique to enhance the forecasting precision of ambient air pollutants concentration models, including particulate matter (PM2.5 and PM10), CO, NO2, SO2, and O3 while decreasing overfitting. The outputs were compared using the R-squared (R2), mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and normalised mean square error (NMSE) indices. Despite the preliminary findings revealing that Lasso dramatically enhances model reliability by decreasing overfitting and determining key attributes, the model’s performance in predicting gaseous pollutants against PM remained unsatisfactory (R2 PM2.5 = 0.80, R2 PM10 = 0.75, R2 CO = 0.45, R2 NO2 = 0.55, R2 SO2 = 0.65, and R2 O3 = 0.35). The minimal degree of missing data presumably explained the strong performance of the PM model, while the high dynamism of gases and their chemical interactions, in conjunction with the inherent characteristics of the model, were the primary factors contributing to the poor performance of the model. Simultaneously, the successful implementation of the Lasso regularisation approach in mitigating overfitting and selecting more important features makes it highly suggested for application in air quality forecasting models.https://doi.org/10.1038/s41598-024-84342-yAir pollutionAir quality predictionOverfittingLasso regularisationMachine learning
spellingShingle Abbas Pak
Abdullah Kaviani Rad
Mohammad Javad Nematollahi
Mohammadreza Mahmoudi
Application of the Lasso regularisation technique in mitigating overfitting in air quality prediction models
Scientific Reports
Air pollution
Air quality prediction
Overfitting
Lasso regularisation
Machine learning
title Application of the Lasso regularisation technique in mitigating overfitting in air quality prediction models
title_full Application of the Lasso regularisation technique in mitigating overfitting in air quality prediction models
title_fullStr Application of the Lasso regularisation technique in mitigating overfitting in air quality prediction models
title_full_unstemmed Application of the Lasso regularisation technique in mitigating overfitting in air quality prediction models
title_short Application of the Lasso regularisation technique in mitigating overfitting in air quality prediction models
title_sort application of the lasso regularisation technique in mitigating overfitting in air quality prediction models
topic Air pollution
Air quality prediction
Overfitting
Lasso regularisation
Machine learning
url https://doi.org/10.1038/s41598-024-84342-y
work_keys_str_mv AT abbaspak applicationofthelassoregularisationtechniqueinmitigatingoverfittinginairqualitypredictionmodels
AT abdullahkavianirad applicationofthelassoregularisationtechniqueinmitigatingoverfittinginairqualitypredictionmodels
AT mohammadjavadnematollahi applicationofthelassoregularisationtechniqueinmitigatingoverfittinginairqualitypredictionmodels
AT mohammadrezamahmoudi applicationofthelassoregularisationtechniqueinmitigatingoverfittinginairqualitypredictionmodels