CN116307464A - AGV task allocation method based on multi-agent deep reinforcement learning - Google Patents
AGV task allocation method based on multi-agent deep reinforcement learning Download PDFInfo
- Publication number
- CN116307464A CN116307464A CN202211683067.0A CN202211683067A CN116307464A CN 116307464 A CN116307464 A CN 116307464A CN 202211683067 A CN202211683067 A CN 202211683067A CN 116307464 A CN116307464 A CN 116307464A
- Authority
- CN
- China
- Prior art keywords
- agv
- reinforcement learning
- deep reinforcement
- task allocation
- agent deep
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000002787 reinforcement Effects 0.000 title claims abstract description 37
- 230000006870 function Effects 0.000 claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 10
- 230000009471 action Effects 0.000 claims description 11
- 230000007246 mechanism Effects 0.000 claims description 9
- 238000005516 engineering process Methods 0.000 claims description 5
- 238000009792 diffusion process Methods 0.000 claims description 3
- 238000006467 substitution reaction Methods 0.000 claims description 3
- 239000003795 chemical substances by application Substances 0.000 description 46
- 238000004422 calculation algorithm Methods 0.000 description 7
- 230000006399 behavior Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 238000013016 damping Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000009916 joint effect Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06312—Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/08—Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
- G06Q10/083—Shipping
- G06Q10/0835—Relationships between shipper or supplier and carriers
- G06Q10/08355—Routing methods
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- General Physics & Mathematics (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to an AGV task allocation method based on multi-agent deep reinforcement learning, which comprises the steps of firstly constructing an AGV cargo handling environment and constructing an AGV kinematic model; secondly, establishing a locally observable Markov decision model, and designing an improved information potential field rewarding function; and training based on a multi-agent deep reinforcement learning method MADDPG, and finally deploying a trained strategy network to each AGV for carrying distributed collaborative cargo. The method of the invention further improves the coordination capability among the independent agents on the basis of the existing multi-agent deep reinforcement learning model, so that each AGV can carry out carrying work in parallel, distributed and coordinated manner, and the predefined overall objective can be completed in a more effective way.
Description
Technical Field
The invention belongs to the technical field of multi-agent cooperation, and particularly relates to an AGV task allocation method based on multi-agent deep reinforcement learning.
Background
With the push of industry 4.0, automated guided vehicles (Automated Guided Vehicle, AGV) as intelligent devices integrated with a variety of advanced technologies have been widely used for flexible shop material handling due to their high degree of autonomy and flexibility. The efficient group AGV task allocation strategy can reduce the transportation cost and improve the distribution efficiency. However, how to distribute multiple AGV tasks to multiple AGVs, so that the running path cost of the AGV system is the lowest and the running efficiency is the highest, is a key challenge.
Traditional research works apply classical optimization algorithms, such as genetic algorithms, particle swarm algorithms, ant colony algorithms, etc., to the field of AGV task allocation. However, the centralized task allocation methods take the benefit maximization of the whole system as an optimization target, and the control center gathers global information to make a unified decision, so that higher requirements are put on the computing capacity and the real-time capacity of the control center. In addition, small information changes or perturbations may affect the overall plan with poor adaptability and scalability. Different from a centralized decision, the distributed or centerless decision method can reasonably distribute the calculation load, fully utilize the autonomous decision capability of an intelligent agent, not only can reduce the complexity of system modeling, but also can improve the robustness and the expandability of the system.
The continued development of multi-agent deep reinforcement learning (MADRL) technology provides a new solution for implementing group AGV distributed task allocation. The intelligent agent interacts with the environment through an error testing mechanism, learns and optimizes in a mode of maximizing the jackpot prize, and finally achieves the optimal strategy. The autonomous decision-making method based on multi-agent deep reinforcement learning can complete more complex tasks through interaction and decision-making in a higher-dimensional and dynamic scene. However, when the practical application problem is directly solved from the multi-agent deep reinforcement learning point of view, challenges such as environmental non-stationarity and local observability exist. Furthermore, the rewarding mechanism of multi-agent systems is more complex than single agent systems, and rewarding sparsity problems often result in model training that is difficult to converge. Therefore, how to design an effective rewarding mechanism to improve model performance and accelerate model convergence is a key issue.
Disclosure of Invention
Technical problem to be solved
In order to avoid the defects of the prior art, the invention provides the AGV task allocation method based on multi-agent deep reinforcement learning, which can improve the information potential field rewarding function to solve the problem of rewarding sparseness, provide continuous rewards for the AGV and implicitly guide the AGV to move to different cargo targets.
Technical proposal
An AGV task allocation method based on multi-agent deep reinforcement learning is characterized in that: the method adopts a multi-agent deep reinforcement learning model and an information potential field rewarding mechanism combining technology; the multi-agent deep reinforcement learning model is learned and optimized by the AGV in a mode of maximizing the accumulated rewards, and finally an optimal task allocation strategy is achieved; the information potential field rewarding mechanism is used for guiding the AGV to move to the target position along a specific gradient direction by utilizing virtual information gradient diffusion of the data of the target position.
The invention further adopts the technical scheme that: the method comprises the following steps:
step 1: constructing an AGV cargo carrying environment and constructing a kinematic model of the AGV;
step 2: establishing a locally observable Markov decision model, and determining an action space, a state space and a reward function;
step 3: calculating an information potential field rewarding function based on the current state, and providing continuous rewards for AGV decision-making;
step 4: training based on a multi-agent deep reinforcement learning method MADDPG;
step 5: and deploying the trained strategy network to each AGV, and acquiring an action instruction by each AGV according to the local observation information of each AGV to carry out distributed collaborative cargo handling.
The invention further adopts the technical scheme that: a plurality of AGVs are arranged in the kinematic model of the AGVs, and are modeled as discs with the radius of R; the distance between any two AGVs in the model is greater than 2R to avoid collisions between AGVs.
The invention further adopts the technical scheme that: setting the target position information potential value as a fixed positive information potential value, setting the position information potential values of other AGVs as fixed negative information potential values, setting the boundary information potential value as 0, and iterating by using a formula; after iteration, each position has a corresponding information potential value; the AGV obtains a reward value r according to the information potential value of the position where the time step t is.
The invention further adopts the technical scheme that: the AGV obtains a reward r at a time step t IPF 、r g And r c Three-part sum represents:the prize r, r obtained by AGVi at time step t g Exciting only one AGV near each task point, r c It is desirable to minimize collisions, r IPF An implicit thrust is given to the AGV to guide the AGV to move toward the target position in a scattered manner.
The invention further adopts the technical scheme that: and (4) training the MADDPG by the multi-agent deep reinforcement learning method in the step (4) comprises parameter updating of an Actor network and parameter updating of a Critic network.
The invention further adopts the technical scheme that: in the step 4, when the target network parameter is updated, a soft update strategy is adopted: θ'. i ←(1-τ)θ′ i +τθ i Wherein τ represents soft substitution, representing the update amplitude of the target network parameter; θ'. i 、θ i Respectively representing target network parametersAnd estimating network parameters.
A computer system, comprising: one or more processors, a computer-readable storage medium storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods described above.
A computer readable storage medium, characterized by storing computer executable instructions that when executed are configured to implement the method described above.
Advantageous effects
According to the AGV task allocation method MADDPG-IPF based on multi-agent deep reinforcement learning, an AGV cargo handling environment is built, and a kinematic model of the AGV is built; secondly, establishing a locally observable Markov decision model, and designing an improved information potential field rewarding function; and training based on a multi-agent deep reinforcement learning method MADDPG, and finally deploying a trained strategy network to each AGV for carrying distributed collaborative cargo. The method of the invention further improves the coordination capability among the independent agents on the basis of the existing multi-agent deep reinforcement learning model, so that each AGV can carry out carrying work in parallel, distributed and coordinated manner, and the predefined overall objective can be completed in a more effective way.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.
FIG. 1 is a block diagram of an AGV task allocation method based on multi-agent deep reinforcement learning in an example of the present invention.
FIG. 2 is a diagram of a multi-agent based deep reinforcement learning and information potential field rewards network model in accordance with an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
The invention provides an AGV task allocation method based on multi-agent deep reinforcement learning, which adopts the following principle: the traditional centralized AGV task allocation method has high requirements on the real-time capability and decision-making capability of the control center, and does not have adaptability and expandability. The autonomous decision-making method based on multi-agent deep reinforcement learning can complete complex tasks through interaction and decision-making in a higher-dimensional and dynamic scene. We propose a solution based on multi-agent deep reinforcement learning (madppg-IPF), where multiple AGVs implement self-organizing task allocation by trial and error. In addition, we have devised an information potential field rewarding mechanism that provides continuous rewards to the AGV at each time step to address the problem of sparse rewards. The invention improves the cooperation capability among the independent AGVs and provides a solution for realizing flexible and self-organizing task allocation of the group AGVs.
The scheme of the invention comprises the following steps:
multi-agent deep reinforcement learning model: the AGV learns and optimizes in a mode of maximizing the accumulated rewards, and finally, an optimal task allocation strategy is achieved;
information potential field rewarding mechanism: the AGV is guided to move to the target position along a specific gradient direction by utilizing virtual information gradient diffusion of the data of the target position;
MADDPG-IPF: the coordination capability among the independent agents is further improved by combining multi-agent reinforcement learning and information potential field technology, so that each AGV can carry out carrying work in parallel, in a distributed and coordinated manner.
The specific steps of the invention are as follows:
step one: constructing AGV (automatic guided vehicle) cargo carrying environment and constructing AGV kinematic model
The invention assumes a total of N V The AGVs model them as disks of radius R. And all AGVs are isomorphic, having the same parametersNumber and function. At each time step, use tuplesRepresents the ith table (i is more than or equal to 1 and less than or equal to N) v ) The state of the AGV at this point, where position +.>Speed->Perceived distance r i . The i-th AGV can obtain the sensing distance r i Observations in->According to policy pi θ Calculate action->Wherein θ represents policy parameters, calculated actions +.>Control the speed of the next step to be switched +.>The AGVs are directed to reach different task target points while avoiding collisions with other AGVs.
Definition l= { L i I=1, …, N } is the trajectory of all AGVs, satisfying the following physical constraints:
wherein,,indicating the speed of the AGV by using a policy pi based on the current observed value θ The selected action is obtained; the second formula states that the travel speed of the AGV cannot exceed the maximum speed of the AGV; />Indicating that the current position of the AGV is determined by the last position and the current speed; the last equation states that for any two AGVs, the distance between them is greater than 2R to avoid collisions between the AGVs.
Step two: establishing a locally observable Markov decision model, and determining an action space, a state space and a reward function
In a real scenario, the AGV agent will observe the environmental conditions, make action selections based on the acquired observations, and since in most cases only local observations can be obtained, this process is typically modeled as a partially observable Markov decision process. In general, POMDP can be represented by a six-tuple m= (N, S, a, P, R, O), where N is the number of agents, S represents the state space of the system, a represents the joint action space of all agents, P represents the state transition probability matrix, R is the reward function, and O is the observed probability distribution obtained from the system state, subject to (O-O (S)).
Aiming at the problem of the allocation of the specific group AGV cooperative tasks, a state space S, an action space A and a reward function R are designed as follows:
state space S: the design state space is { v, p, D A ,D B (v, p) represents the speed and position of the agent itself, { D }, where A ,D B And the relative distance to the target point and other agents.
Action space a: in order to be more practical, the motion space of the intelligent agent is set to be a continuous motion space, which is represented by a one-dimensional vector { x, y }, and the value interval is (-1, 1), which represents the acceleration of the intelligent agent in the front-back direction and the left-right direction at the current moment. The speed at which the AGV will switch next can be calculated in combination with the weight of the AGV itself and the damping.
Bonus function R: the goal of the task allocation is that multiple AGVs can reach different task target points in a self-organizing and decentralized manner in as short a time as possible while avoiding collisions. A generic rewarding function is designed to achieve this goal. The target rewards can be obtained when the intelligent agent reaches the target position, and the collision punishment is carried out when the intelligent agent collides with other intelligent agents or walls.
Wherein, rewards r to the target g And a collision penalty r c Is defined as follows:
wherein d is ij Is the distance from task point j to AGVi.
Step three: calculating an informative potential field rewarding function based on current state, providing continuous rewards for AGV decisions
By dividing the environment into a bounded grid map, setting positive information potential values for positions of target points and negative information potential values for positions of other AGVs, the AGVs can be implicitly guided to move in a scattered manner to different targets. The specific method comprises the steps of setting a target position information potential value as a fixed positive information potential value, setting the position information potential values of other AGVs as a fixed negative information potential value, setting a boundary information potential value as 0, and iterating by utilizing a formula:
wherein phi is k (u) is the information potential value of the node u in the kth round, N (u) is the neighbor node set of the node u, and d (u) is the number of neighbor nodes of the node u.
After the iteration, each location will have a corresponding information potential value. AGV rewards value r according to information potential value of position of time step t IPF . The information potential value near the target position is higher, and when multiple targets are located in the similar positions, the information potential fields are higher by superposition, and the AGV is attracted to move towards more positions of the targets. However, if other AGVs are already in the vicinity of the target, a negative informational potential field will be created and the attractive force to the AGVs will be reduced. The design can well avoid the contention of a plurality of agents for the same target point, guide the agents to go to different task target points in a self-organizing and decentralized manner, and achieve the purpose of cooperative transportation.
In general, the invention uses r IPF 、r g And r c The three-part sum represents the prize r, r that AGVi gets at time step t g Exciting only one AGV near each task point, r c It is desirable to minimize collisions, r IPF An implicit thrust is given to the AGV to guide the AGV to move toward the target position in a scattered manner.
Step four: training based on multi-agent deep reinforcement learning method MADDPG
In the multi-agent training part, the method adopts an algorithm MADDPG based on an Actor-Critic framework. In the training stage, the Actor network combines a strategy gradient method and a state-behavior value function, calculates deterministic optimal behaviors according to the current state, and optimizes the neural network parameters theta according to scores of the behaviors by the Critic network. Critic network uses the observed information of all agents to evaluate the behavior generated by the Actor network by calculating TD-error.
The parameter update of the Actor network of the madppg algorithm can be given by:
wherein o is i Representing local observations acquired by the ith agent, x= [ o ] 1 ,…,o n ]The global observation vector, i.e. the state at this time, is represented, integrating the information acquired by all agents.Representing the state-action function of the i-th agent's centralized type.
The parameter update of the Critic network of the madppg algorithm can be given by:
wherein,,represents the target network, μ ' = [ μ ' ' 1 ,μ′ 2 ,…,μ′ n ]Parameter θ 'with hysteresis update for target policy' j . In addition, in performing the target network parameter update, a soft update (soft update) policy is generally adopted:
θ′ i ←(1-τ)θ′ i +τθ i
where τ represents soft replacement (soft replacement), representing the update magnitude of the target network parameter; θ'. i 、θ i Representing the target network parameter and the estimated network parameter, respectively.
Step five: and deploying the trained strategy network to each AGV, and acquiring an action instruction by each AGV according to the local observation information of each AGV to carry out distributed collaborative cargo handling.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made without departing from the spirit and scope of the invention.
Claims (9)
1. An AGV task allocation method based on multi-agent deep reinforcement learning is characterized in that: the method adopts a multi-agent deep reinforcement learning model and an information potential field rewarding mechanism combining technology; the multi-agent deep reinforcement learning model is learned and optimized by the AGV in a mode of maximizing the accumulated rewards, and finally an optimal task allocation strategy is achieved; the information potential field rewarding mechanism is used for guiding the AGV to move to the target position along a specific gradient direction by utilizing virtual information gradient diffusion of the data of the target position.
2. The AGV task allocation method based on multi-agent deep reinforcement learning according to claim 1, wherein the AGV task allocation method comprises the following steps: the method comprises the following steps:
step 1: constructing an AGV cargo carrying environment and constructing a kinematic model of the AGV;
step 2: establishing a locally observable Markov decision model, and determining an action space, a state space and a reward function;
step 3: calculating an information potential field rewarding function based on the current state, and providing continuous rewards for AGV decision-making;
step 4: training based on a multi-agent deep reinforcement learning method MADDPG;
step 5: and deploying the trained strategy network to each AGV, and acquiring an action instruction by each AGV according to the local observation information of each AGV to carry out distributed collaborative cargo handling.
3. The AGV task allocation method based on multi-agent deep reinforcement learning according to claim 1, wherein the AGV task allocation method comprises the following steps: a plurality of AGVs are arranged in the kinematic model of the AGVs, and are modeled as discs with the radius of R;
the distance between any two AGVs in the model is greater than 2R to avoid collisions between AGVs.
4. The AGV task allocation method based on multi-agent deep reinforcement learning according to claim 2, wherein the AGV task allocation method is characterized in that: setting the target position information potential value as a fixed positive information potential value, setting the position information potential values of other AGVs as fixed negative information potential values, setting the boundary information potential value as 0, and iterating by using a formula; after iteration, each position has a corresponding information potential value; the AGV obtains a reward value r according to the information potential value of the position where the time step t is.
5. The AGV task allocation method based on multi-agent deep reinforcement learning according to claim 4, wherein: the AGV obtains a reward r at a time step t IPF 、r g And r c Three-part sum represents: the AGV i obtains the rewards r, r at time step t g Exciting only one AGV near each task point, r c It is desirable to minimize collisions, r IPF An implicit thrust is given to the AGV to guide the AGV to move toward the target position in a scattered manner.
6. The AGV task allocation method based on multi-agent deep reinforcement learning according to claim 2, wherein the AGV task allocation method is characterized in that: and (4) training the MADDPG by the multi-agent deep reinforcement learning method in the step (4) comprises parameter updating of an Actor network and parameter updating of a Critic network.
7. The AGV task allocation method based on multi-agent deep reinforcement learning according to claim 6, wherein: in the step 4, when the target network parameter is updated, a soft update strategy is adopted: θ'. i ←(1-τ)θ′ i +τθ i Wherein τ represents soft substitution, representing the update amplitude of the target network parameter; θ'. i 、θ i Respectively representThe target network parameters and the estimated network parameters.
8. A computer system, comprising: one or more processors, a computer-readable storage medium storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of claim 1.
9. A computer readable storage medium, characterized by storing computer executable instructions that, when executed, are adapted to implement the method of claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211683067.0A CN116307464A (en) | 2022-12-27 | 2022-12-27 | AGV task allocation method based on multi-agent deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211683067.0A CN116307464A (en) | 2022-12-27 | 2022-12-27 | AGV task allocation method based on multi-agent deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116307464A true CN116307464A (en) | 2023-06-23 |
Family
ID=86833040
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211683067.0A Pending CN116307464A (en) | 2022-12-27 | 2022-12-27 | AGV task allocation method based on multi-agent deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116307464A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116779150A (en) * | 2023-07-03 | 2023-09-19 | 浙江一山智慧医疗研究有限公司 | Personalized medical decision method, device and application based on multi-agent interaction |
CN117236821A (en) * | 2023-11-10 | 2023-12-15 | 淄博纽氏达特机器人系统技术有限公司 | Online three-dimensional boxing method based on hierarchical reinforcement learning |
CN117272842A (en) * | 2023-11-21 | 2023-12-22 | 中国电建集团西北勘测设计研究院有限公司 | Cooperative control system and method for multi-industrial park comprehensive energy system |
CN117406706A (en) * | 2023-08-11 | 2024-01-16 | 汕头大学 | Multi-agent obstacle avoidance method and system combining causal model and deep reinforcement learning |
CN118051035A (en) * | 2024-04-15 | 2024-05-17 | 山东大学 | Multi-AGV scheduling method based on local distance visual field reinforcement learning |
CN118296702A (en) * | 2024-04-09 | 2024-07-05 | 合肥工业大学 | Automatic building space combination method based on multi-agent deep reinforcement learning |
CN118365099A (en) * | 2024-06-19 | 2024-07-19 | 华南理工大学 | Multi-AGV scheduling method, device, equipment and storage medium |
-
2022
- 2022-12-27 CN CN202211683067.0A patent/CN116307464A/en active Pending
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116779150A (en) * | 2023-07-03 | 2023-09-19 | 浙江一山智慧医疗研究有限公司 | Personalized medical decision method, device and application based on multi-agent interaction |
CN116779150B (en) * | 2023-07-03 | 2023-12-22 | 浙江一山智慧医疗研究有限公司 | Personalized medical decision method, device and application based on multi-agent interaction |
CN117406706A (en) * | 2023-08-11 | 2024-01-16 | 汕头大学 | Multi-agent obstacle avoidance method and system combining causal model and deep reinforcement learning |
CN117406706B (en) * | 2023-08-11 | 2024-04-09 | 汕头大学 | Multi-agent obstacle avoidance method and system combining causal model and deep reinforcement learning |
CN117236821A (en) * | 2023-11-10 | 2023-12-15 | 淄博纽氏达特机器人系统技术有限公司 | Online three-dimensional boxing method based on hierarchical reinforcement learning |
CN117236821B (en) * | 2023-11-10 | 2024-02-06 | 淄博纽氏达特机器人系统技术有限公司 | Online three-dimensional boxing method based on hierarchical reinforcement learning |
CN117272842A (en) * | 2023-11-21 | 2023-12-22 | 中国电建集团西北勘测设计研究院有限公司 | Cooperative control system and method for multi-industrial park comprehensive energy system |
CN117272842B (en) * | 2023-11-21 | 2024-02-27 | 中国电建集团西北勘测设计研究院有限公司 | Cooperative control system and method for multi-industrial park comprehensive energy system |
CN118296702A (en) * | 2024-04-09 | 2024-07-05 | 合肥工业大学 | Automatic building space combination method based on multi-agent deep reinforcement learning |
CN118051035A (en) * | 2024-04-15 | 2024-05-17 | 山东大学 | Multi-AGV scheduling method based on local distance visual field reinforcement learning |
CN118365099A (en) * | 2024-06-19 | 2024-07-19 | 华南理工大学 | Multi-AGV scheduling method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116307464A (en) | AGV task allocation method based on multi-agent deep reinforcement learning | |
CN110442129B (en) | Control method and system for multi-agent formation | |
Khan et al. | Learning safe unlabeled multi-robot planning with motion constraints | |
CN114815882B (en) | Unmanned aerial vehicle autonomous formation intelligent control method based on reinforcement learning | |
CN113485323B (en) | Flexible formation method for cascading multiple mobile robots | |
CN115330095A (en) | Mine car dispatching model training method, device, chip, terminal, equipment and medium | |
Li et al. | Decentralized multi-agv task allocation based on multi-agent reinforcement learning with information potential field rewards | |
CN116501069A (en) | Water surface unmanned cluster route planning method based on multi-agent reinforcement learning | |
Zheng et al. | A behavior decision method based on reinforcement learning for autonomous driving | |
CN116466662A (en) | Multi-AGV intelligent scheduling method based on layered internal excitation | |
Fan et al. | Spatiotemporal path tracking via deep reinforcement learning of robot for manufacturing internal logistics | |
Zhang et al. | Multi-target encirclement with collision avoidance via deep reinforcement learning using relational graphs | |
CN116551703B (en) | Motion planning method based on machine learning in complex environment | |
CN112034844A (en) | Multi-intelligent-agent formation handling method, system and computer-readable storage medium | |
CN116227622A (en) | Multi-agent landmark coverage method and system based on deep reinforcement learning | |
Yang | Reinforcement learning for multi-robot system: A review | |
CN114489035B (en) | Multi-robot collaborative search method based on accumulated trace reinforcement learning | |
Gong et al. | Reinforcement learning for multi-agent formation navigation with scalability | |
CN114706384A (en) | Multi-machine navigation method, system and medium for maintaining connectivity | |
Zhu et al. | A cooperative task assignment method of multi-UAV based on self organizing map | |
CN115187056A (en) | Multi-agent cooperative resource allocation method considering fairness principle | |
CN116449864A (en) | Optimal path selection method for unmanned aerial vehicle cluster | |
Ma et al. | Improved DRL-based energy-efficient UAV control for maximum lifecycle | |
CN117575220B (en) | Heterogeneous multi-agent-oriented multi-task strategy game method | |
CN114997617B (en) | Multi-unmanned platform multi-target combined detection task allocation method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |