YOLO-DAFS: A Composite-Enhanced Underwater Object Detection Algorithm

In computer vision applications, the primary task of object detection is to answer the following question: “What object is present and where is it located?”. However, underwater environments introduce challenges, such as poor lighting, high complexity, and diverse marine organism shapes, leading to...

Full description

Saved in:
Bibliographic Details
Main Authors: Shengfu Luo, Chao Dong, Guixin Dong, Rongmin Chen, Bing Zheng, Ming Xiang, Peng Zhang, Zhanwei Li
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Journal of Marine Science and Engineering
Subjects:
Online Access:https://www.mdpi.com/2077-1312/13/5/947
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In computer vision applications, the primary task of object detection is to answer the following question: “What object is present and where is it located?”. However, underwater environments introduce challenges, such as poor lighting, high complexity, and diverse marine organism shapes, leading to missed detections or false positives in deep learning-based algorithms. To improve detection accuracy and robustness, this paper proposes an enhanced YOLOv11-based algorithm for underwater object detection that strengthens the ability to capture both local and global details and global contextual information in complex underwater environments. To better capture local and global features while integrating contextual information, the proposed method introduces several enhancements. The backbone incorporates a DualBottleneck module to enhance feature extraction, replacing the standard bottleneck structure in C3k, thus enhancing the feature extraction and the channel aggregation. The detection head adopts DyHead-GDC, integrating ghost depthwise separable convolution with DyHead for greater efficiency. Furthermore, the ADown module replaces conventional feature extraction and downsampling convolutions, reducing parameters and FLOPs by 14%. The C2PSF module, combining focal modulation and C2, strengthens local feature extraction and global context processing. Additionally, a SCSA module is inserted before the detection head to fully utilize multi-semantic information, improving the detection performance in complex underwater scenes. Experimental results confirm the effectiveness of these improvements. The model achieves 84.2% mAP50 on UTDAC2020, 84.4% on DUO and 86.7% on RUOD, surpassing the baseline by 2.5%, 1.6% and 1.2%, respectively. It remains lightweight, with 6.5 M parameters and a computational cost of 7.1 GFLOPs.
ISSN:2077-1312