基于頻域感知與無損特征傳輸的無人機小目標檢測網絡

董加鈞; 張學鋒; 劉小龍; 耿明亮; 儲岳中(通訊作者)

doi:10.13374/j.issn2095-9389.2026.01.08.003

基于頻域感知與無損特征傳輸的無人機小目標檢測網絡

UAV Tiny Object Detection Network Based on Frequency-Domain Perception and Lossless Feature Transmission

摘要

摘要: 針對無人機航拍圖像中微小目標特征易消失以及背景遮擋的難題，本文提出精細化空間感知分布網絡 (RSD-Net)。該架構旨在解決現有檢測器頻率感知缺失及下采樣信息丟失的結構性缺陷。具體而言：(1) 設計了階段自適應特征提取 (SA-C3k2) 模塊，利用顯式邊緣銳化與頻域濾波，自適應增強淺層高頻紋理并抑制深層背景噪聲；(2) 構建重參數化空間無損分布頸部網絡(RSD-Neck)，結合SPD-Conv無損下采樣與全局上下文建模，防止跨尺度特征融合中的語義稀釋；(3) 引入雙先驗感知預測頭 (DP-Head)，融合顯式視覺與隱式幾何分布先驗，實現魯棒的定位質量評估。實驗表明，RSD-Net 在 VisDrone2019-DET和NWPU VHR-10上，mAP50分別提升了4.99%和5.08%，同時 mAP50:95分別提升了3.42%和7.2%。在TinyPerson 泛化測試中，取得了一定的性能提升，驗證了模型在跨域場景下良好的泛化魯棒性。

Abstract: Unmanned Aerial Vehicle (UAV) -based photography holds immense potential, yet detecting tiny objects remains a significant challenge due to extreme scale variations, complex background interference, and the tendency for feature information to vanish during network transmission. Existing detectors often suffer from structural limitations, specifically frequency-agnostic feature extraction and irreversible information loss from downsampling. To address these issues, this paper proposes a Refined Spatial-aware Distribution Network (RSD-Net), a novel end-to-end architecture designed to establish a full-link spatial awareness mechanism for robust tiny object detection. First, to resolve the mismatch between feature extraction and physical attributes, a Stage-Adaptive Feature Extraction (SA-C3k2) module is designed. Unlike traditional static convolutions, SA-C3k2 incorporates a frequency-domain adaptation mechanism. It utilizes the Scharr operator in shallow layers to explicitly sharpen tiny object edges, enhancing high-frequency texture signals, while employing a learnable Gaussian kernel in deep layers to suppress background noise. This design adaptively balances feature retention with noise suppression. Second, to prevent semantic dilution during cross-scale feature fusion, a Rep-parameterized Spatial-preserving Distribution Neck (RSD-Neck) is constructed. Addressing the limitations of the Nyquist sampling theorem in traditional strided convolutions, this module integrates Space-to-Depth Convolution (SPD-Conv) to achieve lossless downsampling and fine-grained feature alignment. Additionally, it employs a Rep-parameterized Local Adjacent Fusion (Rep-LAF) block to model global context, establishing a high-fidelity pathway for feature transmission. Third, a Dual-Prior Perception Head (DP-Head) is introduced to enhance localization quality estimation. By fusing explicit visual texture priors (derived from gradient magnitude) with implicit geometric distribution priors (derived from regression statistics), a "visual-statistical" dual verification mechanism is established, which significantly improves localization robustness in ambiguous scenarios. Extensive experiments on the VisDrone2019-DET and NWPU VHR-10 datasets demonstrate the effectiveness of the proposed method. Compared with baseline models, RSD-Net improves mAP50 by 4.99% and 5.08%, and mAP50:95 by 3.42% and 7.2%, respectively, while maintaining a lightweight parameter size (5.04M). Furthermore, generalization tests on the TinyPerson dataset verify the model's superior cross-domain robustness, proving its capability to efficiently handle pixel-level tiny objects in diverse aerial environments.

HTML全文

參考文獻(0)

施引文獻

資源附件(0)