基于深度學習的移動機器人同時定位與建圖研究綜述

李擎; 邵圣; 馬靖超; 王恒; 曾慧

doi:10.13374/j.issn2095-9389.2025.05.18.002

摘要: 近年來，深度學習技術在移動機器人同時定位與建圖（Simultaneous localization and mapping，SLAM）領域取得了顯著進展，為解決傳統視覺SLAM在動態環境下面臨的挑戰提供了新的思路. 本文首先總結了傳統視覺SLAM在預處理、視覺里程計以及閉環檢測模塊的局限性. 隨后，聚焦于深度學習在視覺SLAM中的應用，重點介紹了基于深度學習的預處理、視覺里程計和閉環檢測模塊，以及其如何提升視覺SLAM的魯棒性和精度. 最后，探討了基于深度學習SLAM面臨的挑戰并展望了未來研究方向，包括輕量化網絡設計、場景的長期建模以及自監督學習等，以推動深度學習SLAM在實際應用中的落地.

Abstract: Driven by the new round of global industrial revolution, the deep integration of new information technology and manufacturing production has extended the application of mobile robots. Simultaneous localization and mapping (SLAM) is one of the core technologies for the autonomous navigation of mobile robots, and its accuracy directly affects the application of mobile robots in the scene. Most mobile robot applications involve dynamic scenes, and the accuracy of traditional visual SLAM in dynamic environment localization and mapping cannot meet the actual demand due to the limitation of static assumptions. The core of deep learning technology is the autonomous learning of features and patterns from data using multilayer neural networks. By mimicking the hierarchical information processing mechanism of the human brain, it uses multilayer nonlinear transformations to extract high-level abstract features of the data step by step to effectively model the potential distribution of complex data modalities, such as temporal signals, spatial structures, and semantic relationships. Recently, deep learning techniques have made significant progress in the field of SLAM for mobile robots, providing new ideas to address the challenges faced by traditional visual SLAM in dynamic environments. This review first summarizes the limitations of traditional visual SLAM in terms of preprocessing, visual odometry, and loop-closure detection modules, such as sensitivity to light changes and texture-deficient scenes. We then focus on the application of deep learning in visual SLAM, highlighting deep-learning-based preprocessing, visual odometry, and loop-closure detection modules, and how to improve the robustness and accuracy of visual SLAM. Among these are the latest large models, embodied intelligence, and multimodal fusion approaches. We also identify areas for future optimization following an in-depth analysis. Prospects for subsequent research directions are outlined by comparing the latest research methodologies. Neural radiance fields (NeRFs) and 3D Gaussian splatting are deep-learning-based computer vision techniques that reconstruct continuous 3D scene models from multiview 2D images through implicit neural representations. Mobile robot navigation technology cannot be conducted smoothly without high-precision semantic maps. To ensure mobile robots can construct high-precision semantic maps in dynamic environments, the latest NeRF- and GS-based methods are introduced in the preprocessing module. This study also introduces several multisensor inputs and end-to-end SLAM to enrich the methods. At the end of each module, the methods used are analyzed, summarizing their individual strengths and weaknesses as well as the directions in which they can be improved. Finally, we discuss the challenges faced by deep-learning-based SLAM and anticipate future research directions, including lightweight network design, long-term modeling of scenes, and self-supervised learning, to promote deep-learning SLAM in practical applications. In short, as deep learning technology becomes more mature, the development of large-model AI technology will also be rapid, and the robot’s understanding of the environment and interaction will be more diversified. That large-model AI technology will further improve the performance of mobile robots in dynamic environments for localization and mapping is a reasonable expectation.

基于深度學習的移動機器人同時定位與建圖研究綜述

Review of deep-learning-based mobile robot localization and mapping