Nothing Special   »   [go: up one dir, main page]

CN112183288A - Multi-agent reinforcement learning method based on model - Google Patents

Multi-agent reinforcement learning method based on model Download PDF

Info

Publication number
CN112183288A
CN112183288A CN202011002376.8A CN202011002376A CN112183288A CN 112183288 A CN112183288 A CN 112183288A CN 202011002376 A CN202011002376 A CN 202011002376A CN 112183288 A CN112183288 A CN 112183288A
Authority
CN
China
Prior art keywords
agent
model
reinforcement learning
environment
opponent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011002376.8A
Other languages
Chinese (zh)
Other versions
CN112183288B (en
Inventor
张伟楠
王锡淮
沈键
周铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202011002376.8A priority Critical patent/CN112183288B/en
Publication of CN112183288A publication Critical patent/CN112183288A/en
Application granted granted Critical
Publication of CN112183288B publication Critical patent/CN112183288B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C5/00Registering or indicating the working of vehicles
    • G07C5/08Registering or indicating performance data other than driving, working, idle, or waiting time, with or without registering driving, working, idle or waiting time
    • G07C5/0808Diagnosing performance data

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a multi-agent reinforcement learning method based on a model, which belongs to the field of multi-agent reinforcement learning and comprises the steps of modeling multi-agent environment and strategies, generating virtual tracks of multi-agents and updating the strategies of the multi-agents by utilizing the virtual tracks. In the invention, each intelligent agent is in a distributed type to make a decision, the multi-intelligent-agent environment and the opponent intelligent agent are respectively subjected to strategy modeling, and the obtained model is used for generating the virtual track, so that the sampling efficiency of the multi-intelligent-agent reinforcement learning can be effectively improved, the interaction times of the intelligent agents are reduced, the equipment damage risk is reduced, and the feasibility of deploying the distributed multi-intelligent-agent reinforcement learning method in the multi-intelligent-agent task is improved.

Description

Multi-agent reinforcement learning method based on model
Technical Field
The invention relates to the field of multi-agent reinforcement learning methods, in particular to a model-based multi-agent reinforcement learning method.
Background
Reinforcement learning is a sub-field of machine learning, whose goal is to perform decision-making actions based on received environmental information, in order to achieve maximum expected yields. The deep reinforcement learning utilizes the neural network to approximate the value function and the strategy function, and the performance exceeding the average level of human beings is obtained on a plurality of tasks. In a multi-agent scenario, each agent is learning and improving, resulting in an unstable environment, and the relationship between agents may be competitive, cooperative, or intermediate. How and what information is shared among agents also becomes a difficulty. Based on the above problems introduced by the multi-agent scenario, the single-agent approach cannot be directly applied to the multi-agent scenario. Similar to the algorithm of a single agent, the algorithm of the reinforcement learning of multiple agents is divided into two categories, i.e. no model and a model. Among them, the multi-agent reinforcement learning algorithm without model faces more serious sample efficiency problem.
A model-based multi-agent reinforcement learning method aims to improve the sample efficiency of a multi-agent reinforcement learning algorithm. I.e. to reduce the number of interactions of the agents with the environment and the number of interactions between the agents. In general, there are currently situations where reinforcement learning is inefficient when landing on a particular application. In the application of multi-agent reinforcement learning, the joint action space and the joint state space of each agent further reduce the sample efficiency. When a multi-agent reinforcement learning is used in a scene of training multi-vehicle automatic driving, a plurality of vehicles usually need to do reasonable actions in different scenes through massive training, and in the massive training, the vehicles continuously interact with the environment and the vehicles, so that the possibility of vehicle damage is high. Using a model-based approach helps to reduce training costs.
Analyzing recent patent technologies related to multi-agent reinforcement learning and model-based reinforcement learning:
1. the Chinese invention patent application with application number of 201811032979.5, a path planning method based on multi-agent reinforcement learning, provides a multi-agent path planning method based on the aircraft field, improves the survival rate and the task completion rate of the aircraft by establishing a global state division model of the air flight environment, mainly uses an environment model for planning, and considers the interaction among agents;
2. the chinese patent application No. 201911121271.1, entitled "learning method of cooperative agents based on multi-agent reinforcement learning" proposes a method for sharing target parameters among agents, and models the global environment, and the agents share the global model to improve the efficiency of multi-agent algorithm, and similarly, the method lacks consideration for interaction among agents.
(II) analyzing the recent research of the model-based multi-agent reinforcement learning method:
in 'Multi-agent recovery with improvement model for comprehensive functions' published in PLoS One journal in 2019, modeling of the global environment is used as an auxiliary task to deeply learn about the cognition of the Multi-agent environment. But this work does not improve the sample efficiency of the algorithm.
A paper 'Multi agent discovery with Multi-step generating models' published in 2019 at The Conference on Robot Learning (CoRL) Conference uses a differential automatic encoder to model a Multi-agent environment and a strategy of an opponent agent, directly predicts a section of track, and then selects an optimal track by using a model prediction control method. The method effectively improves the sample efficiency, but the lack of the strategy function increases the decision cost, and meanwhile, the centralized training and decision make the algorithm difficult to deploy in practical application.
Therefore, those skilled in the art are working on developing a model-based Multi-Agent Branched-roll out Policy Optimization (Multi-Agent Branched-roll out Policy Optimization), a Multi-Agent reinforcement learning method that can achieve higher sample efficiency in any environment.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, the technical problem to be solved by the present invention is how to reduce the number of interactions between a smart agent and an environment and between a smart agent and a smart agent, and at the same time, to enable distributed execution.
In order to achieve the above object, the present invention provides a model-based multi-agent reinforcement learning method, which is characterized in that in a multi-agent environment, a multi-agent environment and a strategy are modeled to generate a virtual track of a multi-agent, and the strategy of the multi-agent is updated by using the virtual track.
Further, the multi-agent makes distributed decisions.
Further, for current agent i, keeping the set of adversary agents as { -i }, the action of current agent i depends on the joint policy pi of adversary agents-iAnd the current state stLet the combined action of the adversary agent at time t be
Figure BDA0002694773740000021
The current agent's action is represented as
Figure BDA0002694773740000022
Wherein piiIs the policy of the current agent.
Further, multiple agents all hold independent multiple agent environment models
Figure BDA0002694773740000023
And set of adversary policy models
Figure BDA0002694773740000024
Further, a method of dynamically selecting an opponent model is used when generating the virtual trajectory.
Further, for current agent i, the model for each adversary strategy is represented as
Figure BDA0002694773740000025
Wherein j belongs to { -i }, the method for dynamically selecting the adversary model comprises two steps:
step a, strategy model for each opponent
Figure BDA0002694773740000026
Selecting a part of real interaction data which occur recently, calculating the generalization error of the strategy model, and marking as the epsilonj
Step b, giving the length K of the virtual track, and then giving the length K of the virtual track to the opponent agent j, before
Figure BDA0002694773740000027
Using the model of the opponent agent's policy to generate an action of the opponent agent at that time; and following K-njThe adversary agent is requested in steps for the actions taken under its real policy.
Further, the generation of the virtual trajectory comprises the following steps:
step 1, initializing t to be 0, wherein the length of a virtual track is K;
step 2, selecting a state s from the real trackt
Step 3, obtaining the combined action of the other opponents under the state s
Figure BDA0002694773740000031
Step 4, obtaining the action a of the current agent by using the strategy function of the current agenti=πi(s,a-i);
Step 5, using the model of the multi-agent environment to predict the state at the next moment
Figure BDA0002694773740000032
And the current time prize rt
Step 6, mixing(s)t,ai,a-i,st+1,rt) Put into an experience playback pool
Figure BDA0002694773740000037
Performing the following steps;
and 7, repeating the step 1 until t is larger than K by making t equal to t + 1.
Further, after the multi-agent environment and the opponent agent strategy are modeled to a certain precision, a virtual track is generated.
Further, a Gaussian distribution is used to represent the output of the model when modeling multi-agent environment and opponent agent strategies, and a multi-agent ring is formedSetting up a plurality of models, and using a multi-agent environment model by using an ensemble learning method; let the number of environment models be B, then the set of environment models be
Figure BDA0002694773740000033
Wherein B ∈ {1, …, B }; the adversary strategy model is
Figure BDA0002694773740000034
Wherein j ∈ { -i }.
Further, gradient descent methods are used for updating when modeling multi-agent environments and opponent agent strategies.
Further, when updating the policy of the current agent, a flexible Actor-Critic (Soft Actor Critic) algorithm is used.
The updated formula of the critic part Q function is:
Figure BDA0002694773740000035
the updating formula of the actor part strategy function is as follows:
Figure BDA0002694773740000036
wherein etFor noise sampled from a gaussian distribution, f is the reparameterization function.
The multi-agent environment and the opponent agent strategy model applied in the invention are closer to the real model and strategy in a regular way, so that the generated virtual track is more and more real. The intelligent agent utilizes the generated virtual track to be more approximate to the state which can be reached under the real condition and the real intelligent agent interaction, and simultaneously can explore the state and the interaction situation which are difficult to reach by the real track. Therefore, the intelligent agent can effectively train in the virtual track, the possibility of experiencing dangerous states and interaction in a real situation is reduced, the damage risk is reduced, and the training cost is reduced. In a rule, the agent can be trained more comprehensively and abundantly by using a multi-agent environment model and an opponent agent strategy model.
The invention has the following technical effects:
1. in the invention, the decision of each agent can be independently carried out, and optionally, the effect can be improved by carrying out communication by each agent.
2. The agent of the present invention may not be limited to a particular action space, including discrete and continuous action spaces, and may thus be used in conjunction with any reinforcement learning algorithm, such as DQN, A3C, PPO, etc.
3. The agent of the present invention may not be limited to a specific state space, and may thus be combined with a modeling method.
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.
Drawings
FIG. 1 is a diagram of a training framework for the method of the present invention;
FIG. 2 is a flow chart of the method of the present invention.
Detailed Description
A preferred embodiment of the present invention will be described below with reference to the accompanying drawings for clarity and understanding of the technical contents thereof. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein.
The embodiment of the invention provides a model-based multi-agent reinforcement learning method. The embodiments of the present invention apply the method in an environment where vehicles are automatically driven, where there are several vehicles, each with a different destination. The method comprises the following specific steps:
1. an observation space (namely an input space of the method) in the automatic driving scene of the vehicle is defined, and the observation space comprises the position of the vehicle in the high-definition space semantic map, the positions of other vehicles, pedestrians and other individuals in the high-definition space semantic map, a driving track in the plan, the distance and the direction from a peripheral obstacle to the vehicle sensed by a sensor, and the like. The motion space of the robot is defined as acceleration, direction, braking, etc. Defining the external reward that can be obtained by the vehicle to be determined by factors such as speed, route, impact, comfort and the like;
2. for each vehicle, a strategy function pi, a Q function network and a multi-agent environment model are initialized randomly
Figure BDA0002694773740000041
Other vehicle policy model sets
Figure BDA0002694773740000042
Real track database DenvDatabase of virtual trajectories Dmodel
3. For each Epoch (Epoch):
(1) updating multiple agent environment model for each vehicle
Figure BDA0002694773740000043
Where state s consists of vehicle observations, each vehicle sends its own observations to other vehicles during training.
4. For each time t:
(1) updating models for other vehicle strategies for each vehicle
Figure BDA0002694773740000044
(2) Each vehicle independently makes a decision, and when making a decision, a model related to other vehicle strategies is used to generate real interactive data, and the real interactive data is added into a real track database DenvPerforming the following steps;
(3) each vehicle calculates its model error { ∈ with respect to other vehicle strategiesjCalculating the length n that each model should use when generating the virtual trackj};
(4) Each vehicle uses a method for dynamically selecting an opponent model, generates a virtual track by using a respective multi-agent environment model, and adds the virtual track into a virtual track database DmodelIn (1). Wherein, in the process of dynamically selecting the opponent model, when the vehicle i needs to use the vehicle j in shapeState stWhen the real strategy is applied, if the state is generated in the real environment, the state s is calculated firsttAnd (4) obtaining the observation result of the vehicle j, otherwise, directly outputting the observation result of the vehicle j by the multi-agent environment model of the vehicle i. Vehicle i obtains observation o of vehicle jjThen, o is mixedjThe decision a is transmitted to the vehicle j, and the vehicle j then makes the decision a under the real strategyjAnd transmitted to vehicle i.
5. And each vehicle updates the strategy function and the Q value function by using the data of the real track database and the virtual track database. Wherein, the loss function of the Q value function is:
Figure BDA0002694773740000051
the penalty function for the policy function is:
Figure BDA0002694773740000052
wherein etFor noise sampled from a gaussian distribution, a re-parameterized function.
In the scene of automatic driving of multiple vehicles, the method can improve the sample efficiency of the multi-agent reinforcement learning algorithm and reduce the times of real actions of the vehicles in the training process. Under the condition of only using a model-free multi-agent reinforcement learning algorithm, a large amount of training is carried out by each vehicle under a real environment, so that the damage risk is high, the vehicles using the method can carry out virtual interaction during training, the actions under the real environment are reduced, the risk is reduced, meanwhile, the states and the action space can be explored more comprehensively, and a better strategy can be learned under a safer condition.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (10)

1. A model-based multi-agent reinforcement learning method is characterized in that in a multi-agent environment, modeling is performed on the multi-agent environment and strategies, virtual tracks of the multi-agent are generated, and the strategies of the multi-agent are updated by using the virtual tracks.
2. The model-based multi-agent reinforcement learning method of claim 1, wherein the multi-agent makes distributed decisions.
3. The model-based multi-agent reinforcement learning method of claim 2, characterized in that for a current agent i, the set of opponent agents is remembered { -i }, the action of the current agent i depends on the joint policy pi of the opponent agents-iAnd the current state stLet the combined action of the opponent agent at time t be
Figure FDA0002694773730000011
The action of the current agent is represented as
Figure FDA0002694773730000012
Wherein piiA policy for the current agent.
4. The model-based multi-agent reinforcement learning method of claim 3, wherein each of said multi-agents holds an independent multi-agent environment model
Figure FDA0002694773730000013
And set of adversary policy models
Figure FDA0002694773730000014
5. The model-based multi-agent reinforcement learning method of claim 4, wherein a method of dynamically selecting an opponent model is used in generating the virtual trajectory.
6. The model-based multi-agent reinforcement learning method of claim 5, characterized in that for the current agent i, the model for each adversary strategy is represented as
Figure FDA0002694773730000015
Wherein j ∈ { -i }, the method of dynamically selecting an adversary model comprises two steps:
step a, strategy model for each opponent
Figure FDA0002694773730000016
Selecting a part of real interaction data which occur recently, calculating the generalization error of the strategy model, and marking as the epsilonj
B, giving the length K of the virtual track, and giving the length K of the virtual track to the opponent agent j, the first nj
Figure FDA0002694773730000017
Using a model of the opponent agent's policy to generate an action of the opponent agent at that time; and following K-njThe adversary agent is requested in steps for the actions taken under its real policy.
7. The model-based multi-agent reinforcement learning method of claim 6, wherein the generation of the virtual trajectory comprises the steps of:
step 1, initializing t to be 0, wherein the length of a virtual track is K;
step 2, selecting a state s from the real trackt
Step 3, obtaining the combined action of the other opponents under the state s
Figure FDA0002694773730000021
Step 4, obtaining the action a of the current agent by using the strategy function of the current agenti=πi(s,a-i);
Step 5, using the model of the multi-agent environment to predict the state at the next moment
Figure FDA0002694773730000022
And the current time prize rt
Step 6, mixing(s)t,ai,a-i,st+1,rt) Put into an experience playback pool
Figure FDA0002694773730000023
Performing the following steps;
and 7, repeating the step 1 until t is larger than K by making t equal to t + 1.
8. The model-based multi-agent reinforcement learning method of claim 7, wherein said virtual trajectory is regenerated after modeling said multi-agent environment and opponent agent policies to a certain accuracy.
9. The model-based multi-agent reinforcement learning method of claim 8, wherein gaussian distributions are used to represent model outputs in modeling the multi-agent environment and the adversary agent policies, and a plurality of models are built for the multi-agent environment, the multi-agent environment model being used using an ensemble learning approach; let the number of environment models be B, then the set of environment models be
Figure FDA0002694773730000024
Wherein B ∈ {1, …, B }; the adversary strategy model is
Figure FDA0002694773730000025
Wherein j ∈ { -i }.
10. The model-based multi-agent reinforcement learning method of claim 9, wherein gradient descent is used to update in modeling the multi-agent environment and the opponent agent policy.
CN202011002376.8A 2020-09-22 2020-09-22 Multi-agent reinforcement learning method based on model Active CN112183288B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011002376.8A CN112183288B (en) 2020-09-22 2020-09-22 Multi-agent reinforcement learning method based on model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011002376.8A CN112183288B (en) 2020-09-22 2020-09-22 Multi-agent reinforcement learning method based on model

Publications (2)

Publication Number Publication Date
CN112183288A true CN112183288A (en) 2021-01-05
CN112183288B CN112183288B (en) 2022-10-21

Family

ID=73955716

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011002376.8A Active CN112183288B (en) 2020-09-22 2020-09-22 Multi-agent reinforcement learning method based on model

Country Status (1)

Country Link
CN (1) CN112183288B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239629A (en) * 2021-06-03 2021-08-10 上海交通大学 Method for reinforcement learning exploration and utilization of trajectory space determinant point process
CN113599832A (en) * 2021-07-20 2021-11-05 北京大学 Adversary modeling method, apparatus, device and storage medium based on environment model
CN114114911A (en) * 2021-11-12 2022-03-01 上海交通大学 Automatic hyper-parameter adjusting method based on model reinforcement learning
CN115293334A (en) * 2022-08-11 2022-11-04 电子科技大学 Model-based unmanned equipment control method for high sample rate deep reinforcement learning
CN116079747A (en) * 2023-03-29 2023-05-09 上海数字大脑科技研究院有限公司 Robot cross-body control method, system, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110764507A (en) * 2019-11-07 2020-02-07 舒子宸 Artificial intelligence automatic driving system for reinforcement learning and information fusion
CN110852448A (en) * 2019-11-15 2020-02-28 中山大学 Cooperative intelligent agent learning method based on multi-intelligent agent reinforcement learning
US20200090074A1 (en) * 2018-09-14 2020-03-19 Honda Motor Co., Ltd. System and method for multi-agent reinforcement learning in a multi-agent environment
CN111324358A (en) * 2020-02-14 2020-06-23 南栖仙策(南京)科技有限公司 Training method for automatic operation and maintenance strategy of information system
CN111330279A (en) * 2020-02-24 2020-06-26 网易(杭州)网络有限公司 Strategy decision model training method and device for game AI
CN111639809A (en) * 2020-05-29 2020-09-08 华中科技大学 Multi-agent evacuation simulation method and system based on leaders and panic emotions

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200090074A1 (en) * 2018-09-14 2020-03-19 Honda Motor Co., Ltd. System and method for multi-agent reinforcement learning in a multi-agent environment
CN110764507A (en) * 2019-11-07 2020-02-07 舒子宸 Artificial intelligence automatic driving system for reinforcement learning and information fusion
CN110852448A (en) * 2019-11-15 2020-02-28 中山大学 Cooperative intelligent agent learning method based on multi-intelligent agent reinforcement learning
CN111324358A (en) * 2020-02-14 2020-06-23 南栖仙策(南京)科技有限公司 Training method for automatic operation and maintenance strategy of information system
CN111330279A (en) * 2020-02-24 2020-06-26 网易(杭州)网络有限公司 Strategy decision model training method and device for game AI
CN111639809A (en) * 2020-05-29 2020-09-08 华中科技大学 Multi-agent evacuation simulation method and system based on leaders and panic emotions

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MICHAEL JANNER ET AL: "When to Trust Your Model: Model-Based Policy Optimization", 《ARXIV:1906.08253V2》 *
北京大学前沿计算研究中心: "IJTCS 2020多智能体强化学习论坛精彩回顾", 《HTTP://CFCS.PKU.EDU.CN/NEWS/239156.HTM》 *
吴锋等: "基于决策理论的多智能体系统规划问题研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239629A (en) * 2021-06-03 2021-08-10 上海交通大学 Method for reinforcement learning exploration and utilization of trajectory space determinant point process
CN113239629B (en) * 2021-06-03 2023-06-16 上海交通大学 Method for reinforcement learning exploration and utilization of trajectory space determinant point process
CN113599832A (en) * 2021-07-20 2021-11-05 北京大学 Adversary modeling method, apparatus, device and storage medium based on environment model
CN113599832B (en) * 2021-07-20 2023-05-16 北京大学 Opponent modeling method, device, equipment and storage medium based on environment model
CN114114911A (en) * 2021-11-12 2022-03-01 上海交通大学 Automatic hyper-parameter adjusting method based on model reinforcement learning
CN114114911B (en) * 2021-11-12 2024-04-30 上海交通大学 Automatic super-parameter adjusting method based on model reinforcement learning
CN115293334A (en) * 2022-08-11 2022-11-04 电子科技大学 Model-based unmanned equipment control method for high sample rate deep reinforcement learning
CN116079747A (en) * 2023-03-29 2023-05-09 上海数字大脑科技研究院有限公司 Robot cross-body control method, system, computer equipment and storage medium

Also Published As

Publication number Publication date
CN112183288B (en) 2022-10-21

Similar Documents

Publication Publication Date Title
CN112183288B (en) Multi-agent reinforcement learning method based on model
Sun et al. A fast integrated planning and control framework for autonomous driving via imitation learning
CN113110592B (en) Unmanned aerial vehicle obstacle avoidance and path planning method
CN110989576B (en) Target following and dynamic obstacle avoidance control method for differential slip steering vehicle
Chen et al. Stabilization approaches for reinforcement learning-based end-to-end autonomous driving
CN109726804B (en) Intelligent vehicle driving behavior personification decision-making method based on driving prediction field and BP neural network
Naveed et al. Trajectory planning for autonomous vehicles using hierarchical reinforcement learning
CN111580544B (en) Unmanned aerial vehicle target tracking control method based on reinforcement learning PPO algorithm
Grigorescu et al. Neurotrajectory: A neuroevolutionary approach to local state trajectory learning for autonomous vehicles
Bouton et al. Reinforcement learning with iterative reasoning for merging in dense traffic
Wang et al. Efficient reinforcement learning for autonomous driving with parameterized skills and priors
CN114013443A (en) Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning
CN116679719A (en) Unmanned vehicle self-adaptive path planning method based on dynamic window method and near-end strategy
CN113511222A (en) Scene self-adaptive vehicle interactive behavior decision and prediction method and device
Hou et al. Hybrid residual multiexpert reinforcement learning for spatial scheduling of high-density parking lots
Elallid et al. Deep Reinforcement Learning for Autonomous Vehicle Intersection Navigation
Regier et al. Improving navigation with the social force model by learning a neural network controller in pedestrian crowds
CN117553798A (en) Safe navigation method, equipment and medium for mobile robot in complex crowd scene
CN116027788A (en) Intelligent driving behavior decision method and equipment integrating complex network theory and part of observable Markov decision process
Coad et al. Safe trajectory planning using reinforcement learning for self driving
CN114386620B (en) Offline multi-agent reinforcement learning method based on action constraint
Li et al. DDPG-Based Path Planning Approach for Autonomous Driving
CN113353102B (en) Unprotected left-turn driving control method based on deep reinforcement learning
Lin et al. Connectivity guaranteed multi-robot navigation via deep reinforcement learning
CN116227622A (en) Multi-agent landmark coverage method and system based on deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant