基于多源傳感器融合的抗天氣干擾目標檢測方法

周健鑫; 莫磊

doi:10.13374/j.issn2095-9389.2025.09.30.003

基于多源傳感器融合的抗天氣干擾目標檢測方法

周健鑫,
莫磊

Robust target detection method against weather interference based on multisource sensor fusion

摘要

摘要: 在復雜天氣條件下，單一傳感器的目標檢測性能易受影響，難以滿足自動駕駛與智能交通等應用對魯棒性的需求. 針對這一問題，本文提出一種基于四維（4D）毫米波雷達與激光雷達數據融合的目標檢測方法——SeparateFusion. 該方法利用兩類傳感器在感知能力上的互補性，通過神經網絡模型實現多源信息的高效融合. 首先，設計了三維早期融合模塊GSE編碼器，將兩類點云映射至相同的柱狀體視圖，并分別對LiDAR和Radar點云的幾何信息與語義信息進行增強處理，再提取柱狀體特征，實現多模態數據的底層融合. 其次，提出二維特征提取增強模塊BMM，在鳥瞰圖（BEV）視圖下引入MambaVisionMixer結構增強空間特征建模能力，并結合門控機制自適應過濾冗余信息，提升特征表達的有效性. 在公開的多模態數據集View-of-Delft（VOD）上的實驗表明，該方法在一般天氣及雨霧等惡劣天氣下的目標檢測精度和穩定性均優于多種現有的目標檢測方法，能夠有效減弱天氣干擾對檢測性能的影響. 研究結果驗證了結合GSE編碼器與BMM模塊的SeparateFusion網絡在多源傳感器融合與抗干擾目標檢測中的有效性，為全天候智能感知提供了一種可行方案.

Abstract: Robust object detection under adverse weather conditions remains a pressing challenge associated with autonomous driving and intelligent transportation, because single-sensor systems are prone to performance degradation in rain, fog, or snow. To address this issue, we propose SeparateFusion, a novel multisensor fusion framework that integrates four-dimensional (4D) millimeter-wave radar and LiDAR data via a deep neural network. By leveraging the resilience of radar to weather interference and the high spatial resolution of LiDAR, SeparateFusion delivers accurate, stable perception across diverse environments. The core idea is to treat geometry and semantics as complementary signals that should be modeled along separate but interacting paths to preserve the strengths of each modality, while noise and misalignment are mitigated early in the pipeline. The architecture comprises two key modules: the geometry–semantic enhancement (GSE) encoder for early three-dimensional (3D) fusion, and the bird’s-eye-view (BEV) feature enhancement module (BMM) for two-dimensional feature refinement. In practice, BMM includes a lightweight multiscale gating unit that operates alongside a Mamba-based mixer to refine BEV features. In the first stage, LiDAR and radar point clouds are independently projected into a shared pillar grid, ensuring spatial alignment. The GSE encoder enhances geometric and semantic information of each modality separately. Geometric features capture structural layouts from point coordinates, while semantic features encode attributes like intensity, Doppler velocity, and reflectivity. The encoder applies neighborhood-aware updates that preserve spatial continuity in the geometric stream, while allowing semantic cues to guide cross-modal correspondence. Restricting cross-modal interaction primarily to the semantic subspace mitigates discretization and registration errors that might otherwise propagate through deeper layers, while the geometric stream preserves the neighborhood structure for stable aggregation. Following this enhancement, pillar-level features are extracted, enabling early-stage multimodal fusion that aligns and preserves modality-specific advantages. In the second stage, the fused features are transformed into a BEV representation. The BMM module processes this representation using the MambaVisionMixer structure to capture both local and long-range dependencies in the spatial domain. In addition, a gating mechanism suppresses redundant or noisy signals, allowing the network to focus on discriminative information for detection. This two-stage design provides a balance between fine-grained geometry–semantic modeling in 3D space and high-level spatial reasoning in BEV space, contributing to strong robustness against weather-related degradation. Extensive experiments on the View-of-Delft (VoD) dataset show that our method consistently outperforms both state-of-the-art single-sensor detectors and existing multisensor fusion approaches. It achieves a mean average precision of 71.47% across the entire test area and one of 85.74% within the driving corridor, demonstrating notable gains in both global and lane-focused detection scenarios. Category-wise analysis further indicates consistent improvements for vehicles and vulnerable road users, with clearer benefits at longer ranges, where LiDAR sparsification and reflectivity decay are more severe. We follow the standard VoD protocol for training and evaluation and provide implementation details to facilitate reproducibility using the same splits and metrics. Additional evaluations on fog and snow simulation datasets confirm that SeparateFusion maintains clear advantages over previous methods in low-visibility conditions, indicating strong generalization capability. Ablation studies further validate the contributions of the GSE encoder and BMM module, showing that removing either component results in a significant drop in detection accuracy. This highlights the complementary nature of early 3D geometry–semantic enhancement and later-stage BEV feature gating. In summary, SeparateFusion introduces a structured two-stage fusion approach for integrating radar and LiDAR data, incorporating both early 3D geometry–semantic enhancement and later-stage BEV refinement with adaptive gating. The method achieves significant improvements over powerful single-sensor and existing fusion-based object detection methods under challenging weather, laying a promising foundation for next-generation all-weather intelligent perception intended for safety-critical applications.

HTML全文

參考文獻(38)

施引文獻

資源附件(0)