基于視覺特征引導的復雜環境下道路小目標檢測

翟志鵬; 邵金菊; 高松; 段志兵; 尹學浩; 邱致敏; 王磊

doi:10.13374/j.issn2095-9389.2025.08.26.001

基于視覺特征引導的復雜環境下道路小目標檢測

Small-object detection on roads in complex environments using visual-feature guidance

摘要

摘要: 復雜場景下的環境感知對自動駕駛安全至關重要. 為提升低光照霧天條件下車輛的感知能力，本文首先基于晴朗天氣下的KITTI數據集，引入一種結合深度信息的大氣散射模型，用于模擬生成真實的低光照霧天場景數據. 隨后，在YOLOv11 (You only look once)框架中設計了多層通道融合模塊MLCFM（Multi-layer channel fusion module），通過對通道的分割與重組，增強了各層次特征的提取能力；接著利用語義重要性驅動的動態多尺度檢測頭，實現了更強的多尺度感知效果；最后，采用ATSS(Adaptive training sample selection)策略自適應地優化正負樣本分配，以進一步提升小目標檢測性能. 在增強KITTI數據集上的實驗結果表明，改進后網絡在Car、Cyclist、Pedestrian三類目標上的檢測精度分別提高了2.2個百分點、11.8個百分點和7.8個百分點，總體的mAP@0.5提升了7.3個百分點，并通過可視化分析與消融實驗進一步驗證了各模塊在復雜環境下提升檢測性能的有效性.

Abstract: Accurate detection of small objects in complex road environments is essential for ensuring the safety, reliability, and robustness of autonomous driving systems. Under adverse conditions such as low illumination and fog, the performance of conventional vision-based perception systems degrades significantly. Images captured by cameras in such environments often exhibit reduced contrast, blurred textures, occluded details, and indistinct object boundaries due to insufficient lighting, light scattering by fog droplets, and atmospheric attenuation. These degradations increase the likelihood of missed and false detections, posing substantial risks in urban traffic scenarios where vulnerable road users, including pedestrians and cyclists, frequently appear. To address these challenges, this study proposes a visual-feature-guided small-object detection framework with systematic enhancements in three areas: training data construction, network architecture design, and adaptive sample allocation. Firstly, to overcome the scarcity of low-light, foggy training data, a depth-aware atmospheric scattering physical model is developed based on the KITTI clear-weather dataset. The model accurately simulates light scattering and attenuation under low-light, foggy conditions by incorporating scene depth, fog density, and illumination intensity. A low-illumination rendering strategy is introduced, and the realism of the generated images is evaluated using the AGGD metric, enabling the creation of diverse and realistic nighttime foggy images. This data augmentation substantially improves the model’s generalization capability under extreme weather conditions. Secondly, in network design, a Multi-Layer Channel Fusion Module (MLCFM) is introduced within the YOLOv11 framework. By splitting, reorganizing, and adaptively weighting feature channels across different levels, MLCFM preserves low-level texture details while enhancing high-level semantic discrimination, which is essential for small-object detection. In addition, a semantics-importance-driven dynamic multi-scale fusion structure is developed to adjust fusion weights based on the semantic contribution of features at different scales. This mechanism strengthens the detection of small objects, such as pedestrians and cyclists, while maintaining global contextual information for larger objects, such as vehicles, thereby improving sensitivity to small objects without compromising overall scene understanding. Finally, to address the difficulty of distinguishing targets from complex backgrounds and the imbalance of positive and negative samples in foggy scenes, an Adaptive Training Sample Selection (ATSS) strategy is introduced. ATSS dynamically determines positive and negative sample assignments based on the spatial distribution and statistical characteristics of candidate bounding boxes, improving the model’s attention to hard samples and reducing training instability under challenging conditions. Extensive experiments—including joint testing and ablation studies on a self-constructed low-light foggy dataset and the original KITTI dataset—demonstrate the effectiveness of the proposed approach. Detection accuracies for the Car, Cyclist, and Pedestrian categories are improved by 2.2, 11.8, and 7.8 percentage points, respectively, with an overall mean average precision (mAP@0.5) improvement of 7.3 percentage points. Visualization results further show that the enhanced network produces clearer and more precise bounding boxes, substantially reducing missed and false detections. In summary, this study presents a systematic small-object detection framework that introduces innovations in training data generation, feature-aware network design, and adaptive sample allocation. The proposed method effectively improves small-object detection performance under low-light foggy conditions, providing critical support for the safety and reliability of autonomous driving perception systems in complex environments.

HTML全文

參考文獻(30)

施引文獻

資源附件(0)