CN112183288A - Multi-agent reinforcement learning method based on model - Google Patents
Multi-agent reinforcement learning method based on model Download PDFInfo
- Publication number
- CN112183288A CN112183288A CN202011002376.8A CN202011002376A CN112183288A CN 112183288 A CN112183288 A CN 112183288A CN 202011002376 A CN202011002376 A CN 202011002376A CN 112183288 A CN112183288 A CN 112183288A
- Authority
- CN
- China
- Prior art keywords
- agent
- model
- reinforcement learning
- environment
- opponent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G07—CHECKING-DEVICES
- G07C—TIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
- G07C5/00—Registering or indicating the working of vehicles
- G07C5/08—Registering or indicating performance data other than driving, working, idle, or waiting time, with or without registering driving, working, idle or waiting time
- G07C5/0808—Diagnosing performance data
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a multi-agent reinforcement learning method based on a model, which belongs to the field of multi-agent reinforcement learning and comprises the steps of modeling multi-agent environment and strategies, generating virtual tracks of multi-agents and updating the strategies of the multi-agents by utilizing the virtual tracks. In the invention, each intelligent agent is in a distributed type to make a decision, the multi-intelligent-agent environment and the opponent intelligent agent are respectively subjected to strategy modeling, and the obtained model is used for generating the virtual track, so that the sampling efficiency of the multi-intelligent-agent reinforcement learning can be effectively improved, the interaction times of the intelligent agents are reduced, the equipment damage risk is reduced, and the feasibility of deploying the distributed multi-intelligent-agent reinforcement learning method in the multi-intelligent-agent task is improved.
Description
Technical Field
The invention relates to the field of multi-agent reinforcement learning methods, in particular to a model-based multi-agent reinforcement learning method.
Background
Reinforcement learning is a sub-field of machine learning, whose goal is to perform decision-making actions based on received environmental information, in order to achieve maximum expected yields. The deep reinforcement learning utilizes the neural network to approximate the value function and the strategy function, and the performance exceeding the average level of human beings is obtained on a plurality of tasks. In a multi-agent scenario, each agent is learning and improving, resulting in an unstable environment, and the relationship between agents may be competitive, cooperative, or intermediate. How and what information is shared among agents also becomes a difficulty. Based on the above problems introduced by the multi-agent scenario, the single-agent approach cannot be directly applied to the multi-agent scenario. Similar to the algorithm of a single agent, the algorithm of the reinforcement learning of multiple agents is divided into two categories, i.e. no model and a model. Among them, the multi-agent reinforcement learning algorithm without model faces more serious sample efficiency problem.
A model-based multi-agent reinforcement learning method aims to improve the sample efficiency of a multi-agent reinforcement learning algorithm. I.e. to reduce the number of interactions of the agents with the environment and the number of interactions between the agents. In general, there are currently situations where reinforcement learning is inefficient when landing on a particular application. In the application of multi-agent reinforcement learning, the joint action space and the joint state space of each agent further reduce the sample efficiency. When a multi-agent reinforcement learning is used in a scene of training multi-vehicle automatic driving, a plurality of vehicles usually need to do reasonable actions in different scenes through massive training, and in the massive training, the vehicles continuously interact with the environment and the vehicles, so that the possibility of vehicle damage is high. Using a model-based approach helps to reduce training costs.
Analyzing recent patent technologies related to multi-agent reinforcement learning and model-based reinforcement learning:
1. the Chinese invention patent application with application number of 201811032979.5, a path planning method based on multi-agent reinforcement learning, provides a multi-agent path planning method based on the aircraft field, improves the survival rate and the task completion rate of the aircraft by establishing a global state division model of the air flight environment, mainly uses an environment model for planning, and considers the interaction among agents;
2. the chinese patent application No. 201911121271.1, entitled "learning method of cooperative agents based on multi-agent reinforcement learning" proposes a method for sharing target parameters among agents, and models the global environment, and the agents share the global model to improve the efficiency of multi-agent algorithm, and similarly, the method lacks consideration for interaction among agents.
(II) analyzing the recent research of the model-based multi-agent reinforcement learning method:
in 'Multi-agent recovery with improvement model for comprehensive functions' published in PLoS One journal in 2019, modeling of the global environment is used as an auxiliary task to deeply learn about the cognition of the Multi-agent environment. But this work does not improve the sample efficiency of the algorithm.
A paper 'Multi agent discovery with Multi-step generating models' published in 2019 at The Conference on Robot Learning (CoRL) Conference uses a differential automatic encoder to model a Multi-agent environment and a strategy of an opponent agent, directly predicts a section of track, and then selects an optimal track by using a model prediction control method. The method effectively improves the sample efficiency, but the lack of the strategy function increases the decision cost, and meanwhile, the centralized training and decision make the algorithm difficult to deploy in practical application.
Therefore, those skilled in the art are working on developing a model-based Multi-Agent Branched-roll out Policy Optimization (Multi-Agent Branched-roll out Policy Optimization), a Multi-Agent reinforcement learning method that can achieve higher sample efficiency in any environment.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, the technical problem to be solved by the present invention is how to reduce the number of interactions between a smart agent and an environment and between a smart agent and a smart agent, and at the same time, to enable distributed execution.
In order to achieve the above object, the present invention provides a model-based multi-agent reinforcement learning method, which is characterized in that in a multi-agent environment, a multi-agent environment and a strategy are modeled to generate a virtual track of a multi-agent, and the strategy of the multi-agent is updated by using the virtual track.
Further, the multi-agent makes distributed decisions.
Further, for current agent i, keeping the set of adversary agents as { -i }, the action of current agent i depends on the joint policy pi of adversary agents-iAnd the current state stLet the combined action of the adversary agent at time t beThe current agent's action is represented asWherein piiIs the policy of the current agent.
Further, multiple agents all hold independent multiple agent environment modelsAnd set of adversary policy models
Further, a method of dynamically selecting an opponent model is used when generating the virtual trajectory.
Further, for current agent i, the model for each adversary strategy is represented asWherein j belongs to { -i }, the method for dynamically selecting the adversary model comprises two steps:
step a, strategy model for each opponentSelecting a part of real interaction data which occur recently, calculating the generalization error of the strategy model, and marking as the epsilonj;
Step b, giving the length K of the virtual track, and then giving the length K of the virtual track to the opponent agent j, beforeUsing the model of the opponent agent's policy to generate an action of the opponent agent at that time; and following K-njThe adversary agent is requested in steps for the actions taken under its real policy.
Further, the generation of the virtual trajectory comprises the following steps:
step 1, initializing t to be 0, wherein the length of a virtual track is K;
step 2, selecting a state s from the real trackt;
Step 4, obtaining the action a of the current agent by using the strategy function of the current agenti=πi(s,a-i);
Step 6, mixing(s)t,ai,a-i,st+1,rt) Put into an experience playback poolPerforming the following steps;
and 7, repeating the step 1 until t is larger than K by making t equal to t + 1.
Further, after the multi-agent environment and the opponent agent strategy are modeled to a certain precision, a virtual track is generated.
Further, a Gaussian distribution is used to represent the output of the model when modeling multi-agent environment and opponent agent strategies, and a multi-agent ring is formedSetting up a plurality of models, and using a multi-agent environment model by using an ensemble learning method; let the number of environment models be B, then the set of environment models beWherein B ∈ {1, …, B }; the adversary strategy model isWherein j ∈ { -i }.
Further, gradient descent methods are used for updating when modeling multi-agent environments and opponent agent strategies.
Further, when updating the policy of the current agent, a flexible Actor-Critic (Soft Actor Critic) algorithm is used.
The updated formula of the critic part Q function is:
the updating formula of the actor part strategy function is as follows:
wherein etFor noise sampled from a gaussian distribution, f is the reparameterization function.
The multi-agent environment and the opponent agent strategy model applied in the invention are closer to the real model and strategy in a regular way, so that the generated virtual track is more and more real. The intelligent agent utilizes the generated virtual track to be more approximate to the state which can be reached under the real condition and the real intelligent agent interaction, and simultaneously can explore the state and the interaction situation which are difficult to reach by the real track. Therefore, the intelligent agent can effectively train in the virtual track, the possibility of experiencing dangerous states and interaction in a real situation is reduced, the damage risk is reduced, and the training cost is reduced. In a rule, the agent can be trained more comprehensively and abundantly by using a multi-agent environment model and an opponent agent strategy model.
The invention has the following technical effects:
1. in the invention, the decision of each agent can be independently carried out, and optionally, the effect can be improved by carrying out communication by each agent.
2. The agent of the present invention may not be limited to a particular action space, including discrete and continuous action spaces, and may thus be used in conjunction with any reinforcement learning algorithm, such as DQN, A3C, PPO, etc.
3. The agent of the present invention may not be limited to a specific state space, and may thus be combined with a modeling method.
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.
Drawings
FIG. 1 is a diagram of a training framework for the method of the present invention;
FIG. 2 is a flow chart of the method of the present invention.
Detailed Description
A preferred embodiment of the present invention will be described below with reference to the accompanying drawings for clarity and understanding of the technical contents thereof. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein.
The embodiment of the invention provides a model-based multi-agent reinforcement learning method. The embodiments of the present invention apply the method in an environment where vehicles are automatically driven, where there are several vehicles, each with a different destination. The method comprises the following specific steps:
1. an observation space (namely an input space of the method) in the automatic driving scene of the vehicle is defined, and the observation space comprises the position of the vehicle in the high-definition space semantic map, the positions of other vehicles, pedestrians and other individuals in the high-definition space semantic map, a driving track in the plan, the distance and the direction from a peripheral obstacle to the vehicle sensed by a sensor, and the like. The motion space of the robot is defined as acceleration, direction, braking, etc. Defining the external reward that can be obtained by the vehicle to be determined by factors such as speed, route, impact, comfort and the like;
2. for each vehicle, a strategy function pi, a Q function network and a multi-agent environment model are initialized randomlyOther vehicle policy model setsReal track database DenvDatabase of virtual trajectories Dmodel;
3. For each Epoch (Epoch):
(1) updating multiple agent environment model for each vehicleWhere state s consists of vehicle observations, each vehicle sends its own observations to other vehicles during training.
4. For each time t:
(2) Each vehicle independently makes a decision, and when making a decision, a model related to other vehicle strategies is used to generate real interactive data, and the real interactive data is added into a real track database DenvPerforming the following steps;
(3) each vehicle calculates its model error { ∈ with respect to other vehicle strategiesjCalculating the length n that each model should use when generating the virtual trackj};
(4) Each vehicle uses a method for dynamically selecting an opponent model, generates a virtual track by using a respective multi-agent environment model, and adds the virtual track into a virtual track database DmodelIn (1). Wherein, in the process of dynamically selecting the opponent model, when the vehicle i needs to use the vehicle j in shapeState stWhen the real strategy is applied, if the state is generated in the real environment, the state s is calculated firsttAnd (4) obtaining the observation result of the vehicle j, otherwise, directly outputting the observation result of the vehicle j by the multi-agent environment model of the vehicle i. Vehicle i obtains observation o of vehicle jjThen, o is mixedjThe decision a is transmitted to the vehicle j, and the vehicle j then makes the decision a under the real strategyjAnd transmitted to vehicle i.
5. And each vehicle updates the strategy function and the Q value function by using the data of the real track database and the virtual track database. Wherein, the loss function of the Q value function is:
the penalty function for the policy function is:
wherein etFor noise sampled from a gaussian distribution, a re-parameterized function.
In the scene of automatic driving of multiple vehicles, the method can improve the sample efficiency of the multi-agent reinforcement learning algorithm and reduce the times of real actions of the vehicles in the training process. Under the condition of only using a model-free multi-agent reinforcement learning algorithm, a large amount of training is carried out by each vehicle under a real environment, so that the damage risk is high, the vehicles using the method can carry out virtual interaction during training, the actions under the real environment are reduced, the risk is reduced, meanwhile, the states and the action space can be explored more comprehensively, and a better strategy can be learned under a safer condition.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.
Claims (10)
1. A model-based multi-agent reinforcement learning method is characterized in that in a multi-agent environment, modeling is performed on the multi-agent environment and strategies, virtual tracks of the multi-agent are generated, and the strategies of the multi-agent are updated by using the virtual tracks.
2. The model-based multi-agent reinforcement learning method of claim 1, wherein the multi-agent makes distributed decisions.
3. The model-based multi-agent reinforcement learning method of claim 2, characterized in that for a current agent i, the set of opponent agents is remembered { -i }, the action of the current agent i depends on the joint policy pi of the opponent agents-iAnd the current state stLet the combined action of the opponent agent at time t beThe action of the current agent is represented asWherein piiA policy for the current agent.
5. The model-based multi-agent reinforcement learning method of claim 4, wherein a method of dynamically selecting an opponent model is used in generating the virtual trajectory.
6. The model-based multi-agent reinforcement learning method of claim 5, characterized in that for the current agent i, the model for each adversary strategy is represented asWherein j ∈ { -i }, the method of dynamically selecting an adversary model comprises two steps:
step a, strategy model for each opponentSelecting a part of real interaction data which occur recently, calculating the generalization error of the strategy model, and marking as the epsilonj;
B, giving the length K of the virtual track, and giving the length K of the virtual track to the opponent agent j, the first nj=Using a model of the opponent agent's policy to generate an action of the opponent agent at that time; and following K-njThe adversary agent is requested in steps for the actions taken under its real policy.
7. The model-based multi-agent reinforcement learning method of claim 6, wherein the generation of the virtual trajectory comprises the steps of:
step 1, initializing t to be 0, wherein the length of a virtual track is K;
step 2, selecting a state s from the real trackt;
Step 4, obtaining the action a of the current agent by using the strategy function of the current agenti=πi(s,a-i);
Step 5, using the model of the multi-agent environment to predict the state at the next momentAnd the current time prize rt;
Step 6, mixing(s)t,ai,a-i,st+1,rt) Put into an experience playback poolPerforming the following steps;
and 7, repeating the step 1 until t is larger than K by making t equal to t + 1.
8. The model-based multi-agent reinforcement learning method of claim 7, wherein said virtual trajectory is regenerated after modeling said multi-agent environment and opponent agent policies to a certain accuracy.
9. The model-based multi-agent reinforcement learning method of claim 8, wherein gaussian distributions are used to represent model outputs in modeling the multi-agent environment and the adversary agent policies, and a plurality of models are built for the multi-agent environment, the multi-agent environment model being used using an ensemble learning approach; let the number of environment models be B, then the set of environment models beWherein B ∈ {1, …, B }; the adversary strategy model isWherein j ∈ { -i }.
10. The model-based multi-agent reinforcement learning method of claim 9, wherein gradient descent is used to update in modeling the multi-agent environment and the opponent agent policy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011002376.8A CN112183288B (en) | 2020-09-22 | 2020-09-22 | Multi-agent reinforcement learning method based on model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011002376.8A CN112183288B (en) | 2020-09-22 | 2020-09-22 | Multi-agent reinforcement learning method based on model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112183288A true CN112183288A (en) | 2021-01-05 |
CN112183288B CN112183288B (en) | 2022-10-21 |
Family
ID=73955716
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011002376.8A Active CN112183288B (en) | 2020-09-22 | 2020-09-22 | Multi-agent reinforcement learning method based on model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112183288B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113239629A (en) * | 2021-06-03 | 2021-08-10 | 上海交通大学 | Method for reinforcement learning exploration and utilization of trajectory space determinant point process |
CN113599832A (en) * | 2021-07-20 | 2021-11-05 | 北京大学 | Adversary modeling method, apparatus, device and storage medium based on environment model |
CN114114911A (en) * | 2021-11-12 | 2022-03-01 | 上海交通大学 | Automatic hyper-parameter adjusting method based on model reinforcement learning |
CN115293334A (en) * | 2022-08-11 | 2022-11-04 | 电子科技大学 | Model-based unmanned equipment control method for high sample rate deep reinforcement learning |
CN116079747A (en) * | 2023-03-29 | 2023-05-09 | 上海数字大脑科技研究院有限公司 | Robot cross-body control method, system, computer equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110764507A (en) * | 2019-11-07 | 2020-02-07 | 舒子宸 | Artificial intelligence automatic driving system for reinforcement learning and information fusion |
CN110852448A (en) * | 2019-11-15 | 2020-02-28 | 中山大学 | Cooperative intelligent agent learning method based on multi-intelligent agent reinforcement learning |
US20200090074A1 (en) * | 2018-09-14 | 2020-03-19 | Honda Motor Co., Ltd. | System and method for multi-agent reinforcement learning in a multi-agent environment |
CN111324358A (en) * | 2020-02-14 | 2020-06-23 | 南栖仙策(南京)科技有限公司 | Training method for automatic operation and maintenance strategy of information system |
CN111330279A (en) * | 2020-02-24 | 2020-06-26 | 网易(杭州)网络有限公司 | Strategy decision model training method and device for game AI |
CN111639809A (en) * | 2020-05-29 | 2020-09-08 | 华中科技大学 | Multi-agent evacuation simulation method and system based on leaders and panic emotions |
-
2020
- 2020-09-22 CN CN202011002376.8A patent/CN112183288B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200090074A1 (en) * | 2018-09-14 | 2020-03-19 | Honda Motor Co., Ltd. | System and method for multi-agent reinforcement learning in a multi-agent environment |
CN110764507A (en) * | 2019-11-07 | 2020-02-07 | 舒子宸 | Artificial intelligence automatic driving system for reinforcement learning and information fusion |
CN110852448A (en) * | 2019-11-15 | 2020-02-28 | 中山大学 | Cooperative intelligent agent learning method based on multi-intelligent agent reinforcement learning |
CN111324358A (en) * | 2020-02-14 | 2020-06-23 | 南栖仙策(南京)科技有限公司 | Training method for automatic operation and maintenance strategy of information system |
CN111330279A (en) * | 2020-02-24 | 2020-06-26 | 网易(杭州)网络有限公司 | Strategy decision model training method and device for game AI |
CN111639809A (en) * | 2020-05-29 | 2020-09-08 | 华中科技大学 | Multi-agent evacuation simulation method and system based on leaders and panic emotions |
Non-Patent Citations (3)
Title |
---|
MICHAEL JANNER ET AL: "When to Trust Your Model: Model-Based Policy Optimization", 《ARXIV:1906.08253V2》 * |
北京大学前沿计算研究中心: "IJTCS 2020多智能体强化学习论坛精彩回顾", 《HTTP://CFCS.PKU.EDU.CN/NEWS/239156.HTM》 * |
吴锋等: "基于决策理论的多智能体系统规划问题研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113239629A (en) * | 2021-06-03 | 2021-08-10 | 上海交通大学 | Method for reinforcement learning exploration and utilization of trajectory space determinant point process |
CN113239629B (en) * | 2021-06-03 | 2023-06-16 | 上海交通大学 | Method for reinforcement learning exploration and utilization of trajectory space determinant point process |
CN113599832A (en) * | 2021-07-20 | 2021-11-05 | 北京大学 | Adversary modeling method, apparatus, device and storage medium based on environment model |
CN113599832B (en) * | 2021-07-20 | 2023-05-16 | 北京大学 | Opponent modeling method, device, equipment and storage medium based on environment model |
CN114114911A (en) * | 2021-11-12 | 2022-03-01 | 上海交通大学 | Automatic hyper-parameter adjusting method based on model reinforcement learning |
CN114114911B (en) * | 2021-11-12 | 2024-04-30 | 上海交通大学 | Automatic super-parameter adjusting method based on model reinforcement learning |
CN115293334A (en) * | 2022-08-11 | 2022-11-04 | 电子科技大学 | Model-based unmanned equipment control method for high sample rate deep reinforcement learning |
CN116079747A (en) * | 2023-03-29 | 2023-05-09 | 上海数字大脑科技研究院有限公司 | Robot cross-body control method, system, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112183288B (en) | 2022-10-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112183288B (en) | Multi-agent reinforcement learning method based on model | |
Sun et al. | A fast integrated planning and control framework for autonomous driving via imitation learning | |
CN113110592B (en) | Unmanned aerial vehicle obstacle avoidance and path planning method | |
CN110989576B (en) | Target following and dynamic obstacle avoidance control method for differential slip steering vehicle | |
Chen et al. | Stabilization approaches for reinforcement learning-based end-to-end autonomous driving | |
CN109726804B (en) | Intelligent vehicle driving behavior personification decision-making method based on driving prediction field and BP neural network | |
Naveed et al. | Trajectory planning for autonomous vehicles using hierarchical reinforcement learning | |
CN111580544B (en) | Unmanned aerial vehicle target tracking control method based on reinforcement learning PPO algorithm | |
Grigorescu et al. | Neurotrajectory: A neuroevolutionary approach to local state trajectory learning for autonomous vehicles | |
Bouton et al. | Reinforcement learning with iterative reasoning for merging in dense traffic | |
Wang et al. | Efficient reinforcement learning for autonomous driving with parameterized skills and priors | |
CN114013443A (en) | Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning | |
CN116679719A (en) | Unmanned vehicle self-adaptive path planning method based on dynamic window method and near-end strategy | |
CN113511222A (en) | Scene self-adaptive vehicle interactive behavior decision and prediction method and device | |
Hou et al. | Hybrid residual multiexpert reinforcement learning for spatial scheduling of high-density parking lots | |
Elallid et al. | Deep Reinforcement Learning for Autonomous Vehicle Intersection Navigation | |
Regier et al. | Improving navigation with the social force model by learning a neural network controller in pedestrian crowds | |
CN117553798A (en) | Safe navigation method, equipment and medium for mobile robot in complex crowd scene | |
CN116027788A (en) | Intelligent driving behavior decision method and equipment integrating complex network theory and part of observable Markov decision process | |
Coad et al. | Safe trajectory planning using reinforcement learning for self driving | |
CN114386620B (en) | Offline multi-agent reinforcement learning method based on action constraint | |
Li et al. | DDPG-Based Path Planning Approach for Autonomous Driving | |
CN113353102B (en) | Unprotected left-turn driving control method based on deep reinforcement learning | |
Lin et al. | Connectivity guaranteed multi-robot navigation via deep reinforcement learning | |
CN116227622A (en) | Multi-agent landmark coverage method and system based on deep reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |