Introduction to deep learning methods for multi‐species predictions

Abstract Predicting species distributions and entire communities is crucial for ecologists, to enhance our understanding of the drivers behind species distributions and community assembly and to provide quantitative data for conservation efforts. Popular species distribution models use statistical a...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuqing Hu, Sara Si‐Moussi, Wilfried Thuiller
Format: Article
Language:English
Published: Wiley 2025-01-01
Series:Methods in Ecology and Evolution
Subjects:
Online Access:https://doi.org/10.1111/2041-210X.14466
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841555324441460736
author Yuqing Hu
Sara Si‐Moussi
Wilfried Thuiller
author_facet Yuqing Hu
Sara Si‐Moussi
Wilfried Thuiller
author_sort Yuqing Hu
collection DOAJ
description Abstract Predicting species distributions and entire communities is crucial for ecologists, to enhance our understanding of the drivers behind species distributions and community assembly and to provide quantitative data for conservation efforts. Popular species distribution models use statistical and machine learning methods but face limitations with multi‐species predictions at the community level, hindered by scalability and data imbalance sensitivity. This paper explores the potential of deep learning methods to overcome these challenges and provide more accurate multi‐species predictions. Specifically, we introduced four distinct deep learning models that use site × species community data but differ in their internal structure or on the input environmental data structure: (1) a multi‐layer perceptron (MLP) model for tabular data (e.g. in‐situ/raster climate or soil data), (2) a convolutional neural network (CNN) and (3) a vision transformer (ViT) models tailored for image data (e.g. aerial ortho‐photographs, satellite imagery), and a multimodal model that integrates both tabular and image data. We also show how adapted loss functions can address imbalance issues. We applied these deep learning models to a plant community dataset comprising 130,582 vegetation surveys encompassing 2522 species located in the French Alps. The tabular environmental data consisted of climate, terrain and soil information, while the images were derived from aerial photographs. All models achieved approximately 70% true skill statistics on hold‐out data, demonstrating high predictive capacity for community data, the multimodal model being the best performing one. Additionally, we showcased how interpretability tools can illuminate community structure as seen by deep learning models. Deep learning models offer a broad array of features for predicting entire species communities. They handle imbalance issues and accommodate various data types, from tabular datasets to images, while also being equipped with insightful interpretation tools. The versatility extends to tabular datasets and images, with no clear superiority between the two. The last hidden layers can provide valuable features for modelling other species, and the trained models can be used to support transfer learning to related tasks. The field of ecology now possesses an additional, potent tool in its arsenal that can foster basic and fundamental research.
format Article
id doaj-art-8073f8e7929843d0b02dc74d41fd4684
institution Kabale University
issn 2041-210X
language English
publishDate 2025-01-01
publisher Wiley
record_format Article
series Methods in Ecology and Evolution
spelling doaj-art-8073f8e7929843d0b02dc74d41fd46842025-01-08T05:44:11ZengWileyMethods in Ecology and Evolution2041-210X2025-01-0116122824610.1111/2041-210X.14466Introduction to deep learning methods for multi‐species predictionsYuqing Hu0Sara Si‐Moussi1Wilfried Thuiller2Université Grenoble Alpes, Université Savoie Mont Blanc, CNRS, LECA Grenoble FranceUniversité Grenoble Alpes, Université Savoie Mont Blanc, CNRS, LECA Grenoble FranceUniversité Grenoble Alpes, Université Savoie Mont Blanc, CNRS, LECA Grenoble FranceAbstract Predicting species distributions and entire communities is crucial for ecologists, to enhance our understanding of the drivers behind species distributions and community assembly and to provide quantitative data for conservation efforts. Popular species distribution models use statistical and machine learning methods but face limitations with multi‐species predictions at the community level, hindered by scalability and data imbalance sensitivity. This paper explores the potential of deep learning methods to overcome these challenges and provide more accurate multi‐species predictions. Specifically, we introduced four distinct deep learning models that use site × species community data but differ in their internal structure or on the input environmental data structure: (1) a multi‐layer perceptron (MLP) model for tabular data (e.g. in‐situ/raster climate or soil data), (2) a convolutional neural network (CNN) and (3) a vision transformer (ViT) models tailored for image data (e.g. aerial ortho‐photographs, satellite imagery), and a multimodal model that integrates both tabular and image data. We also show how adapted loss functions can address imbalance issues. We applied these deep learning models to a plant community dataset comprising 130,582 vegetation surveys encompassing 2522 species located in the French Alps. The tabular environmental data consisted of climate, terrain and soil information, while the images were derived from aerial photographs. All models achieved approximately 70% true skill statistics on hold‐out data, demonstrating high predictive capacity for community data, the multimodal model being the best performing one. Additionally, we showcased how interpretability tools can illuminate community structure as seen by deep learning models. Deep learning models offer a broad array of features for predicting entire species communities. They handle imbalance issues and accommodate various data types, from tabular datasets to images, while also being equipped with insightful interpretation tools. The versatility extends to tabular datasets and images, with no clear superiority between the two. The last hidden layers can provide valuable features for modelling other species, and the trained models can be used to support transfer learning to related tasks. The field of ecology now possesses an additional, potent tool in its arsenal that can foster basic and fundamental research.https://doi.org/10.1111/2041-210X.14466co‐occurrencedeep neural networksexplainable AIspecies communityspecies distribution models
spellingShingle Yuqing Hu
Sara Si‐Moussi
Wilfried Thuiller
Introduction to deep learning methods for multi‐species predictions
Methods in Ecology and Evolution
co‐occurrence
deep neural networks
explainable AI
species community
species distribution models
title Introduction to deep learning methods for multi‐species predictions
title_full Introduction to deep learning methods for multi‐species predictions
title_fullStr Introduction to deep learning methods for multi‐species predictions
title_full_unstemmed Introduction to deep learning methods for multi‐species predictions
title_short Introduction to deep learning methods for multi‐species predictions
title_sort introduction to deep learning methods for multi species predictions
topic co‐occurrence
deep neural networks
explainable AI
species community
species distribution models
url https://doi.org/10.1111/2041-210X.14466
work_keys_str_mv AT yuqinghu introductiontodeeplearningmethodsformultispeciespredictions
AT sarasimoussi introductiontodeeplearningmethodsformultispeciespredictions
AT wilfriedthuiller introductiontodeeplearningmethodsformultispeciespredictions