Retail commodity detection method based on location learnable visual center mechanism

To address the problem of low detection accuracy caused by the difficulty in effectively capturing significant and diversified feature information for packaging deformation and overlap products, a location learnable visual center (LLVC) mechanism was designed to improve YOLOX-s, achieving higher det...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaohua LYU, Mingchen WEI, Libo LIU
Format: Article
Language:zho
Published: China InfoCom Media Group 2023-12-01
Series:物联网学报
Subjects:
Online Access:http://www.wlwxb.com.cn/zh/article/doi/10.11959/j.issn.2096-3750.2023.00366/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:To address the problem of low detection accuracy caused by the difficulty in effectively capturing significant and diversified feature information for packaging deformation and overlap products, a location learnable visual center (LLVC) mechanism was designed to improve YOLOX-s, achieving higher detection accuracy.To effectively deal with product packaging deformation and overlap phenomena, firstly, global context information was captured through a lightweight multi-layer perceptron to help the model better understand spatial information in product features.Secondly, the local feature representation ability was enhanced by the designed LLVC and the spatial information was used to allocate learnable weights for local features to increase the attention of discriminative local features.Finally, the intersection over union (IoU) loss function was replaced with centered intersection over union (CIoU) and power parameters were introduced on this basis to effectively reduce the missed detection rate.Experimental results show that the proposed method achieves an accuracy of 91.3% on the retail product checkout (RPC) dataset, which is 2.2% higher than YOLOX-s and better than current mainstream lightweight object detection algorithms.At the same time, frame per second (FPS) is 97 frame/s, and the model size is 9.48 MB.It can accurately and in real-time detect retail products in scenarios where computing resources are limited.
ISSN:2096-3750