基于深度強化學習的無人車高精度軌跡跟蹤控制

林慶彬; 楊德龍; 許志平; 林立雄(通訊作者)

doi:10.13374/j.issn2095-9389.2025.09.18.002

基于深度強化學習的無人車高精度軌跡跟蹤控制

High precision trajectory tracking control of unmanned vehicles based on deep reinforcement learning

摘要

摘要: 為解決無人車軌跡跟蹤中動態適應性弱與精度不足的問題，本文通過將無人車軌跡跟蹤問題轉化為馬爾可夫決策過程并設計強化學習的狀態空間、動作空間及獎勵函數，提出一種基于深度強化學習的無人車高精度軌跡跟蹤控制方法。首先為增強系統對誤差變化率的響應能力，在狀態空間設計中引入橫向位置誤差微分補償與航向角誤差微分補償。然后針對傳統獎勵機制難以兼顧精準獎懲與動態適配的缺陷，提出雙機制獎勵函數協同策略：基于平滑階躍函數的區域化獎懲機制與基于高斯核函數的自適應權重獎勵機制。最后通過仿真驗證所提方法的有效性。研究結果表明：改進算法在直線軌跡跟蹤中初始偏差修正更快、收斂更迅速；正弦軌跡跟蹤時波峰與波谷等特征點貼合度更高，其跟蹤精度及動態適應性顯著優于原始深度確定性策略梯度算法和雙延遲深度確定性策略梯度算法。

Abstract: As a core technology in the field of autonomous driving, unmanned vehicle trajectory tracking serves as a crucial support for achieving precise and safe driving of unmanned vehicles. It plays an indispensable role in numerous practical scenarios such as logistics transportation and intelligent transportation. In complex dynamic environments, traditional trajectory tracking methods often struggle to meet the application requirements of high precision and high reliability due to their weak dynamic adaptability and insufficient accuracy. To address the issues of weak dynamic adaptability and low accuracy in unmanned vehicle trajectory tracking, this paper converts the unmanned vehicle trajectory tracking problem into a Markov Decision Process (MDP), designs the state space, action space, and reward function for reinforcement learning, and proposes a high-precision trajectory tracking control method for unmanned vehicles based on deep reinforcement learning. Firstly, to enhance the system's responsiveness to error change rates, lateral position error differential compensation and heading angle error differential compensation are introduced into the state space design. This enables the agent to more acutely perceive the error change trend during the trajectory tracking process and make control adjustments in advance. Secondly, aiming at the defect that traditional reward mechanisms are difficult to balance precise reward and punishment with dynamic adaptation, a dual-mechanism reward function coordination strategy is proposed. On one hand, it is a regionalized reward and punishment mechanism based on a smooth step function. According to the positional relationship between the unmanned vehicle and the desired trajectory, different reward regions are divided, and differentiated rewards and punishments are implemented for the unmanned vehicle in different regions to achieve precise reward and punishment for the trajectory tracking state. On the other hand, it is an adaptive weight reward mechanism based on a Gaussian kernel function. The Gaussian kernel function is used to weight factors such as errors, allowing the reward function to dynamically adjust the reward weights according to the actual tracking situation and better adapt to different trajectory tracking scenarios. Finally, the effectiveness of the proposed method is verified through simulations. The research results show that: in straight-line trajectory tracking, the improved algorithm has a faster correction speed for initial deviations and more rapid convergence, enabling the unmanned vehicle to return to the desired trajectory in a shorter time; in sinusoidal trajectory tracking, it has a higher degree of fitting to feature points such as wave crests and wave troughs. Its tracking accuracy and dynamic adaptability are significantly superior to those of the original Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms.

HTML全文

參考文獻(0)

施引文獻

資源附件(0)