Explainable machine learning for modeling of net ecosystem exchange in boreal forests

<p>There is a growing interest in applying machine learning methods to predict net ecosystem exchange (NEE) based on site information and climatic variables. We apply four machine learning models (cubist, random forest, averaged neural networks, and linear regression) to predict the NEE of bor...

Full description

Saved in:
Bibliographic Details
Main Authors: E. Ezhova, T. Laanti, A. Lintunen, P. Kolari, T. Nieminen, I. Mammarella, K. Heljanko, M. Kulmala
Format: Article
Language:English
Published: Copernicus Publications 2025-01-01
Series:Biogeosciences
Online Access:https://bg.copernicus.org/articles/22/257/2025/bg-22-257-2025.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841543623433256960
author E. Ezhova
T. Laanti
A. Lintunen
P. Kolari
T. Nieminen
I. Mammarella
K. Heljanko
K. Heljanko
M. Kulmala
author_facet E. Ezhova
T. Laanti
A. Lintunen
P. Kolari
T. Nieminen
I. Mammarella
K. Heljanko
K. Heljanko
M. Kulmala
author_sort E. Ezhova
collection DOAJ
description <p>There is a growing interest in applying machine learning methods to predict net ecosystem exchange (NEE) based on site information and climatic variables. We apply four machine learning models (cubist, random forest, averaged neural networks, and linear regression) to predict the NEE of boreal forest ecosystems based on climatic and site variables. We use data sets from two stations in the Finnish boreal forest (southern site Hyytiälä and northern site Värriö) and model NEE during the peak growing season and the whole year. For Hyytiälä, all nonlinear models demonstrated similar results with <span class="inline-formula"><i>R</i><sup>2</sup></span> <span class="inline-formula">=</span> 0.88 for the peak growing season and <span class="inline-formula"><i>R</i><sup>2</sup></span> <span class="inline-formula">=</span> 0.90 for the whole year. For Värriö, nonlinear models gave <span class="inline-formula"><i>R</i><sup>2</sup></span> <span class="inline-formula">=</span> 0.73–0.76 for the peak growing season, whereas random forest and cubist with <span class="inline-formula"><i>R</i><sup>2</sup></span> <span class="inline-formula">=</span> 0.74 were somewhat better than averaged neural networks with <span class="inline-formula"><i>R</i><sup>2</sup></span> <span class="inline-formula">=</span> 0.70 for the whole year. Using explainable artificial intelligence methods, we show that the most important input variables during the peak season are photosynthetically active radiation, diffuse radiation, and vapor pressure deficit (or air temperature), whereas, on the whole-year scale, vapor pressure deficit (or air temperature) is replaced by soil temperature. When the data sets from both stations were mixed, soil water content, the only variable clearly different between Hyytiälä and Värriö data sets, emerged as one of the most important variables, but its importance diminished when input variables labeling sites were added. In addition, we analyze the dependencies of NEE on input variables against the existing theoretical understanding of NEE drivers. We show that even though the statistical scores of some models can be very good, the results should be treated with caution, especially when applied to upscaling. In the model setup with several interdependent variables ubiquitous in atmospheric measurements, some models display strong opposite dependencies on these variables. This behavior might have adverse consequences if models are applied to the data sets in future climate conditions. Our results highlight the importance of explainable artificial intelligence methods for interpreting outcomes from machine learning models, particularly when a set containing interdependent variables is used as a model input.</p>
format Article
id doaj-art-84e1ed89d6aa464ca09a13ff3bf2a7da
institution Kabale University
issn 1726-4170
1726-4189
language English
publishDate 2025-01-01
publisher Copernicus Publications
record_format Article
series Biogeosciences
spelling doaj-art-84e1ed89d6aa464ca09a13ff3bf2a7da2025-01-13T09:11:15ZengCopernicus PublicationsBiogeosciences1726-41701726-41892025-01-012225728810.5194/bg-22-257-2025Explainable machine learning for modeling of net ecosystem exchange in boreal forestsE. Ezhova0T. Laanti1A. Lintunen2P. Kolari3T. Nieminen4I. Mammarella5K. Heljanko6K. Heljanko7M. Kulmala8INAR Physics, University of Helsinki, Helsinki, FinlandDepartment of Computer Science, University of Helsinki, Helsinki, FinlandINAR Physics, University of Helsinki, Helsinki, FinlandINAR Physics, University of Helsinki, Helsinki, FinlandINAR Physics, University of Helsinki, Helsinki, FinlandINAR Physics, University of Helsinki, Helsinki, FinlandDepartment of Computer Science, University of Helsinki, Helsinki, FinlandHelsinki Institute for Information Technology (HIIT), Helsinki, FinlandINAR Physics, University of Helsinki, Helsinki, Finland<p>There is a growing interest in applying machine learning methods to predict net ecosystem exchange (NEE) based on site information and climatic variables. We apply four machine learning models (cubist, random forest, averaged neural networks, and linear regression) to predict the NEE of boreal forest ecosystems based on climatic and site variables. We use data sets from two stations in the Finnish boreal forest (southern site Hyytiälä and northern site Värriö) and model NEE during the peak growing season and the whole year. For Hyytiälä, all nonlinear models demonstrated similar results with <span class="inline-formula"><i>R</i><sup>2</sup></span> <span class="inline-formula">=</span> 0.88 for the peak growing season and <span class="inline-formula"><i>R</i><sup>2</sup></span> <span class="inline-formula">=</span> 0.90 for the whole year. For Värriö, nonlinear models gave <span class="inline-formula"><i>R</i><sup>2</sup></span> <span class="inline-formula">=</span> 0.73–0.76 for the peak growing season, whereas random forest and cubist with <span class="inline-formula"><i>R</i><sup>2</sup></span> <span class="inline-formula">=</span> 0.74 were somewhat better than averaged neural networks with <span class="inline-formula"><i>R</i><sup>2</sup></span> <span class="inline-formula">=</span> 0.70 for the whole year. Using explainable artificial intelligence methods, we show that the most important input variables during the peak season are photosynthetically active radiation, diffuse radiation, and vapor pressure deficit (or air temperature), whereas, on the whole-year scale, vapor pressure deficit (or air temperature) is replaced by soil temperature. When the data sets from both stations were mixed, soil water content, the only variable clearly different between Hyytiälä and Värriö data sets, emerged as one of the most important variables, but its importance diminished when input variables labeling sites were added. In addition, we analyze the dependencies of NEE on input variables against the existing theoretical understanding of NEE drivers. We show that even though the statistical scores of some models can be very good, the results should be treated with caution, especially when applied to upscaling. In the model setup with several interdependent variables ubiquitous in atmospheric measurements, some models display strong opposite dependencies on these variables. This behavior might have adverse consequences if models are applied to the data sets in future climate conditions. Our results highlight the importance of explainable artificial intelligence methods for interpreting outcomes from machine learning models, particularly when a set containing interdependent variables is used as a model input.</p>https://bg.copernicus.org/articles/22/257/2025/bg-22-257-2025.pdf
spellingShingle E. Ezhova
T. Laanti
A. Lintunen
P. Kolari
T. Nieminen
I. Mammarella
K. Heljanko
K. Heljanko
M. Kulmala
Explainable machine learning for modeling of net ecosystem exchange in boreal forests
Biogeosciences
title Explainable machine learning for modeling of net ecosystem exchange in boreal forests
title_full Explainable machine learning for modeling of net ecosystem exchange in boreal forests
title_fullStr Explainable machine learning for modeling of net ecosystem exchange in boreal forests
title_full_unstemmed Explainable machine learning for modeling of net ecosystem exchange in boreal forests
title_short Explainable machine learning for modeling of net ecosystem exchange in boreal forests
title_sort explainable machine learning for modeling of net ecosystem exchange in boreal forests
url https://bg.copernicus.org/articles/22/257/2025/bg-22-257-2025.pdf
work_keys_str_mv AT eezhova explainablemachinelearningformodelingofnetecosystemexchangeinborealforests
AT tlaanti explainablemachinelearningformodelingofnetecosystemexchangeinborealforests
AT alintunen explainablemachinelearningformodelingofnetecosystemexchangeinborealforests
AT pkolari explainablemachinelearningformodelingofnetecosystemexchangeinborealforests
AT tnieminen explainablemachinelearningformodelingofnetecosystemexchangeinborealforests
AT imammarella explainablemachinelearningformodelingofnetecosystemexchangeinborealforests
AT kheljanko explainablemachinelearningformodelingofnetecosystemexchangeinborealforests
AT kheljanko explainablemachinelearningformodelingofnetecosystemexchangeinborealforests
AT mkulmala explainablemachinelearningformodelingofnetecosystemexchangeinborealforests