Comparative Evaluation of Ensemble Machine Learning Models for Methane Production from Anaerobic Digestion
This study provides a comparative evaluation of several ensemble model constructions for the prediction of specific methane yield (SMY) from anaerobic digestion. From the authors’ knowledge based on existing research, present knowledge of their prediction accuracy and utilization in anaerobic digest...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Fermentation |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2311-5637/11/3/130 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This study provides a comparative evaluation of several ensemble model constructions for the prediction of specific methane yield (SMY) from anaerobic digestion. From the authors’ knowledge based on existing research, present knowledge of their prediction accuracy and utilization in anaerobic digestion modeling relative to individual machine learning methods is incomplete. Three input datasets from compiled anaerobic digestion samples using agricultural and forestry lignocellulosic residues from previous studies were used in this study. A total of six individual machine learning methods and five ensemble constructions were evaluated per dataset, whose prediction accuracy was assessed using a robust 10-fold cross-validation in 100 repetitions. Ensemble models outperformed individual methods in one out of three datasets in terms of prediction accuracy. They also produced notably lower coefficients of variation in root-mean-square error (RMSE) than most accurate individual methods (0.031 to 0.393 for dataset A, 0.026 to 0.272 for dataset B, and 0.021 to 0.217 for dataset AB), being much less prone to randomness in the training and test data split. The optimal ensemble constructions generally benefited from the higher number of individual methods included, as well as from their diversity in terms of prediction principles. Since the reporting of prediction accuracy based on final model fitting and the single split-sample approach is highly prone to randomness, the adoption of a cross-validation in multiple repetitions is proposed as a standard in future studies. |
|---|---|
| ISSN: | 2311-5637 |