A multimodal framework for pepper diseases and pests detection

Abstract Pepper diseases and pests typically exhibit small target proportions, diverse shapes and sizes, complex imaging backgrounds, and similarities with the background. Existing detection methods perform poorly in identifying targets of different sizes and shapes within the same scene, and they l...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jun Liu, Xuewei Wang
Format:	Article
Language:	English
Published:	Nature Portfolio 2024-11-01
Series:	Scientific Reports
Subjects:	Object detection Pepper diseases and pests image Natural Language Multimodal Visual features
Online Access:	https://doi.org/10.1038/s41598-024-80675-w
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846158613313224704
author	Jun Liu Xuewei Wang
author_facet	Jun Liu Xuewei Wang
author_sort	Jun Liu
collection	DOAJ
description	Abstract Pepper diseases and pests typically exhibit small target proportions, diverse shapes and sizes, complex imaging backgrounds, and similarities with the background. Existing detection methods perform poorly in identifying targets of different sizes and shapes within the same scene, and they lack adequate noise suppression capabilities. To address the practical needs of detecting pepper diseases and pests in complex scenarios, we have constructed the first multimodal pepper diseases and pests object detection dataset (PDD). This dataset includes a wide variety of diseases and pests images, along with detailed natural language descriptions of their attributes. Locating the described targets in complex scenes with similar disease symptoms and leaf occlusion presents a significant challenge. To tackle this issue, we propose the PepperNet model for object detection in pepper diseases and pests images using natural language descriptions. This model decomposes complex multimodal features of language and images into explicit attribute features and employs fine-grained multimodal attribute contrast learning strategies. This approach effectively distinguishes subtle local differences between similar objects, achieving fine-grained mapping from language to vision in complex scenarios. Our detection results show a mAP@0.5 of 91.93% and a detection speed of 121.8 frames per second. Visualizations indicate that the model maintains high robustness under varying noise levels and occlusion conditions, demonstrating superior performance and stability across diverse complex scenarios.
format	Article
id	doaj-art-a49c63fabdae45159e694c8e923db62d
institution	Kabale University
issn	2045-2322
language	English
publishDate	2024-11-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-a49c63fabdae45159e694c8e923db62d2024-11-24T12:22:55ZengNature PortfolioScientific Reports2045-23222024-11-0114112010.1038/s41598-024-80675-wA multimodal framework for pepper diseases and pests detectionJun Liu0Xuewei Wang1Shandong Provincial University Laboratory for Protected Horticulture, Weifang University of Science and TechnologyShandong Provincial University Laboratory for Protected Horticulture, Weifang University of Science and TechnologyAbstract Pepper diseases and pests typically exhibit small target proportions, diverse shapes and sizes, complex imaging backgrounds, and similarities with the background. Existing detection methods perform poorly in identifying targets of different sizes and shapes within the same scene, and they lack adequate noise suppression capabilities. To address the practical needs of detecting pepper diseases and pests in complex scenarios, we have constructed the first multimodal pepper diseases and pests object detection dataset (PDD). This dataset includes a wide variety of diseases and pests images, along with detailed natural language descriptions of their attributes. Locating the described targets in complex scenes with similar disease symptoms and leaf occlusion presents a significant challenge. To tackle this issue, we propose the PepperNet model for object detection in pepper diseases and pests images using natural language descriptions. This model decomposes complex multimodal features of language and images into explicit attribute features and employs fine-grained multimodal attribute contrast learning strategies. This approach effectively distinguishes subtle local differences between similar objects, achieving fine-grained mapping from language to vision in complex scenarios. Our detection results show a mAP@0.5 of 91.93% and a detection speed of 121.8 frames per second. Visualizations indicate that the model maintains high robustness under varying noise levels and occlusion conditions, demonstrating superior performance and stability across diverse complex scenarios.https://doi.org/10.1038/s41598-024-80675-wObject detectionPepper diseases and pests imageNatural LanguageMultimodalVisual features
spellingShingle	Jun Liu Xuewei Wang A multimodal framework for pepper diseases and pests detection Scientific Reports Object detection Pepper diseases and pests image Natural Language Multimodal Visual features
title	A multimodal framework for pepper diseases and pests detection
title_full	A multimodal framework for pepper diseases and pests detection
title_fullStr	A multimodal framework for pepper diseases and pests detection
title_full_unstemmed	A multimodal framework for pepper diseases and pests detection
title_short	A multimodal framework for pepper diseases and pests detection
title_sort	multimodal framework for pepper diseases and pests detection
topic	Object detection Pepper diseases and pests image Natural Language Multimodal Visual features
url	https://doi.org/10.1038/s41598-024-80675-w
work_keys_str_mv	AT junliu amultimodalframeworkforpepperdiseasesandpestsdetection AT xueweiwang amultimodalframeworkforpepperdiseasesandpestsdetection AT junliu multimodalframeworkforpepperdiseasesandpestsdetection AT xueweiwang multimodalframeworkforpepperdiseasesandpestsdetection

A multimodal framework for pepper diseases and pests detection

Similar Items