Semi-Open Set Object Detection Algorithm Leveraged by Multi-Modal Large Language Models

Currently, closed-set object detection models represented by YOLO are widely deployed in the industrial field. However, such closed-set models lack sufficient tuning ability for easily confused objects in complex detection scenarios. Open-set object detection models such as GroundingDINO expand the...

Full description

Saved in:
Bibliographic Details
Main Authors: Kewei Wu, Yiran Wang, Xiaogang He, Jinyu Yan, Yang Guo, Zhuqing Jiang, Xing Zhang, Wei Wang, Yongping Xiong, Aidong Men, Li Xiao
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Big Data and Cognitive Computing
Subjects:
Online Access:https://www.mdpi.com/2504-2289/8/12/175
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Currently, closed-set object detection models represented by YOLO are widely deployed in the industrial field. However, such closed-set models lack sufficient tuning ability for easily confused objects in complex detection scenarios. Open-set object detection models such as GroundingDINO expand the detection range to a certain extent, but they still have a gap in detection accuracy compared with closed-set detection models and cannot meet the requirements for high-precision detection in practical applications. In addition, existing detection technologies are also insufficient in interpretability, making it difficult to clearly show users the basis and process of judgment of detection results, causing users to have doubts about the trust and application of detection results. Based on the above deficiencies, we propose a new object detection algorithm based on multi-modal large language models that significantly improves the detection effect of closed-set object detection models for more difficult boundary tasks while ensuring detection accuracy, thereby achieving a semi-open set object detection algorithm. It has significant improvements in accuracy and interpretability under the verification of seven common traffic and safety production scenarios.
ISSN:2504-2289