A Shooting Distance Adaptive Crop Yield Estimation Method Based on Multi-Modal Fusion
To address the low estimation accuracy of deep learning-based crop yield image recognition methods under untrained shooting distances, this study proposes a shooting distance adaptive crop yield estimation method by fusing RGB and depth image information through multi-modal data fusion. Taking straw...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Agronomy |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2073-4395/15/5/1036 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | To address the low estimation accuracy of deep learning-based crop yield image recognition methods under untrained shooting distances, this study proposes a shooting distance adaptive crop yield estimation method by fusing RGB and depth image information through multi-modal data fusion. Taking strawberry fruit fresh weight as an example, RGB and depth image data of 348 strawberries were collected at nine heights ranging from 70 to 115 cm. First, based on RGB images and shooting height information, a single-modal crop yield estimation model was developed by training a convolutional neural network (CNN) after cropping strawberry fruit images using the relative area conversion method. Second, the height information was expanded into a data matrix matching the RGB image dimensions, and multi-modal fusion models were investigated through input-layer and output-layer fusion strategies. Finally, two additional approaches were explored: direct fusion of RGB and depth images, and extraction of average shooting height from depth images for estimation. The models were tested at two untrained heights (80 cm and 100 cm). Results showed that when using only RGB images and height information, the relative area conversion method achieved the highest accuracy, with R<sup>2</sup> values of 0.9212 and 0.9304, normalized root mean square error (NRMSE) of 0.0866 and 0.0814, and mean absolute percentage error (MAPE) of 0.0696 and 0.0660 at the two untrained heights. By further incorporating depth data, the highest accuracy was achieved through input-layer fusion of RGB images with extracted average height from depth images, improving R<sup>2</sup> to 0.9475 and 0.9384, reducing NRMSE to 0.0707 and 0.0766, and lowering MAPE to 0.0591 and 0.0610. Validation using a developed shooting distance adaptive crop yield estimation platform at two random heights yielded MAPE values of 0.0813 and 0.0593. This model enables adaptive crop yield estimation across varying shooting distances, significantly enhancing accuracy under untrained conditions. |
|---|---|
| ISSN: | 2073-4395 |