基于ODEM–YOLO的水泥骨料生產車間工人安全穿戴檢測模型

李鑫; 胡慢谷; 佟瑞鵬

doi:10.13374/j.issn2095-9389.2025.03.31.003

基于ODEM–YOLO的水泥骨料生產車間工人安全穿戴檢測模型

A safety wear detection model for workers in cement aggregate production workshop based on ODEM–YOLO

摘要

摘要: 針對水泥骨料生產車間監控視頻中工人安全穿戴檢測中多尺度小目標識別困難、漏檢誤檢頻繁以及實時檢測效率不高等問題，本文提出了一種基于改進YOLOv8的輕量化目標檢測模型ODEM–YOLO(Omni-dimensional efficient attention and multiscale enhancement YOLO). 首先，在YOLOv8基礎上引入全維動態卷積(ODConv)模塊，增強淺層特征提取能力，有效捕獲小目標的關鍵特征；其次，結合改進的高效多尺度注意力機制(iEMA)優化Neck網絡，有效提高多尺度目標的特征表達能力；同時，提出C2f多尺度邊緣信息增強(C2f_MSEIE)模塊顯式增強目標邊緣信息，提高對安全裝備邊界特征的識別精度. 實驗基于實際水泥骨料車間監控數據，構建了包含9877個多尺度小目標樣本的數據集，開展模型性能評估. 實驗結果表明，ODEM–YOLO模型在保持結構輕量化（6.9 MB）的同時，整體檢測精度（mAP@0.5）達到0.896，小目標（口罩）檢測精度（AP@0.5mask）達到0.746，單圖推理時間達8.2 ms，優于YOLOv5n、YOLOv10n等主流模型. 并且在NVIDIA Jetson Nano B01嵌入式設備上實際部署測試達到25 frame·s^–1的實時檢測效果，充分滿足工業現場實時安全監控需求.

Abstract: Safety compliance, particularly the correct usage of personal protective equipment (PPE), is critical in high-risk industrial settings such as cement aggregate production workshops, where traditional manual supervision is often insufficient owing to harsh conditions and operational dynamics. While artificial intelligence-driven video surveillance offers a promising solution, existing object detection models frequently struggle with accurately identifying small and multiscale targets, leading to high error rates and limited practical effectiveness. To address these limitations, this paper introduces ODEM–YOLO, a novel, lightweight yet highly accurate object detection model based on an enhanced YOLOv8 architecture, specifically engineered for robust safety wear detection. Methodologically, ODEM–YOLO incorporates several key innovations. First, the omni-dimensional dynamic convolution (ODConv) module is integrated into the early backbone stages. Unlike standard convolutions with fixed kernels, ODConv employs a multidimensional attention mechanism to dynamically learn kernel weights across spatial, input channel, output channel, and kernel number dimensions, enabling adaptive focus on salient features of small targets in complex scenes and enhancing shallow-level feature map discrimination. Second, the Neck network is optimized with an improved efficient multiscale attention (iEMA) mechanism, centered around an inverted residual mobile block core. This module strategically uses 1 × 1 pointwise convolutions for channel manipulation and 3 × 3 depth-wise separable convolutions for efficient spatial feature learning, allowing effective capture and fusion of multiscale contextual information with significantly reduced computational complexity to improve the representation of diverse PPE sizes. Third, a novel C2f multi-scale edge information enhancement (C2f_MSEIE) module replaces original C2f blocks, explicitly enhancing target edge information for clearer boundary definition. It comprises a local convolution branch for preserving fine-grained details and a multiscale edge modeling branch that utilizes AdaptiveAvgPool2d with multiple bin sizes and an innovative Edge Enhancer submodule to extract and reinforce high-frequency edge features, providing a more robust understanding of object contours for precise localization. The efficacy of ODEM–YOLO was rigorously validated on a custom dataset of 9877 images from actual cement aggregate workshops, featuring diverse small and multiscale targets under realistic and challenging conditions. The experimental results demonstrate ODEM–YOLO’s superior performance, achieving an overall mean average precision (mAP@0.5) of 0.896 and an AP@0.5mask (for the challenging small “mask” objects) of 0.746. Despite these significant accuracy gains, the model maintains a compact size of only 6.9 MB and achieves a rapid single-image processing time of 8.2 ms (utilizing 9.5 GFLOPs), outperforming other mainstream lightweight models such as YOLOv5n and YOLOv10n. Ablation studies systematically confirmed the individual and synergistic contributions of the ODConv, iEMA, and C2f_MSEIE modules to the overall performance improvement. Furthermore, practical deployment on an NVIDIA Jetson Nano B01 embedded device demonstrated ODEM–YOLO’s capability of real-time detection at 25 frames per second, fully satisfying the demanding requirements of industrial on-site safety monitoring. In conclusion, ODEM–YOLO presents a highly effective and efficient solution for real-time safety wear detection in challenging industrial environments. Its architectural innovations specifically target the difficulties of small and multiscale object detection, leading to substantial improvements in accuracy and reliability while preserving a lightweight structure crucial for edge deployment. ODEM–YOLO is a valuable and practical tool for enhancing occupational safety and potentially reducing accident rates.

HTML全文

參考文獻(30)

施引文獻

資源附件(0)