Explainable machine learning for modeling of net ecosystem exchange in boreal forests

<p>There is a growing interest in applying machine learning methods to predict net ecosystem exchange (NEE) based on site information and climatic variables. We apply four machine learning models (cubist, random forest, averaged neural networks, and linear regression) to predict the NEE of bor...

Full description

Saved in:
Bibliographic Details
Main Authors: E. Ezhova, T. Laanti, A. Lintunen, P. Kolari, T. Nieminen, I. Mammarella, K. Heljanko, M. Kulmala
Format: Article
Language:English
Published: Copernicus Publications 2025-01-01
Series:Biogeosciences
Online Access:https://bg.copernicus.org/articles/22/257/2025/bg-22-257-2025.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:<p>There is a growing interest in applying machine learning methods to predict net ecosystem exchange (NEE) based on site information and climatic variables. We apply four machine learning models (cubist, random forest, averaged neural networks, and linear regression) to predict the NEE of boreal forest ecosystems based on climatic and site variables. We use data sets from two stations in the Finnish boreal forest (southern site Hyytiälä and northern site Värriö) and model NEE during the peak growing season and the whole year. For Hyytiälä, all nonlinear models demonstrated similar results with <span class="inline-formula"><i>R</i><sup>2</sup></span> <span class="inline-formula">=</span> 0.88 for the peak growing season and <span class="inline-formula"><i>R</i><sup>2</sup></span> <span class="inline-formula">=</span> 0.90 for the whole year. For Värriö, nonlinear models gave <span class="inline-formula"><i>R</i><sup>2</sup></span> <span class="inline-formula">=</span> 0.73–0.76 for the peak growing season, whereas random forest and cubist with <span class="inline-formula"><i>R</i><sup>2</sup></span> <span class="inline-formula">=</span> 0.74 were somewhat better than averaged neural networks with <span class="inline-formula"><i>R</i><sup>2</sup></span> <span class="inline-formula">=</span> 0.70 for the whole year. Using explainable artificial intelligence methods, we show that the most important input variables during the peak season are photosynthetically active radiation, diffuse radiation, and vapor pressure deficit (or air temperature), whereas, on the whole-year scale, vapor pressure deficit (or air temperature) is replaced by soil temperature. When the data sets from both stations were mixed, soil water content, the only variable clearly different between Hyytiälä and Värriö data sets, emerged as one of the most important variables, but its importance diminished when input variables labeling sites were added. In addition, we analyze the dependencies of NEE on input variables against the existing theoretical understanding of NEE drivers. We show that even though the statistical scores of some models can be very good, the results should be treated with caution, especially when applied to upscaling. In the model setup with several interdependent variables ubiquitous in atmospheric measurements, some models display strong opposite dependencies on these variables. This behavior might have adverse consequences if models are applied to the data sets in future climate conditions. Our results highlight the importance of explainable artificial intelligence methods for interpreting outcomes from machine learning models, particularly when a set containing interdependent variables is used as a model input.</p>
ISSN:1726-4170
1726-4189