Nothing Special   »   [go: up one dir, main page]

CN116307464A - AGV task allocation method based on multi-agent deep reinforcement learning - Google Patents

AGV task allocation method based on multi-agent deep reinforcement learning Download PDF

Info

Publication number
CN116307464A
CN116307464A CN202211683067.0A CN202211683067A CN116307464A CN 116307464 A CN116307464 A CN 116307464A CN 202211683067 A CN202211683067 A CN 202211683067A CN 116307464 A CN116307464 A CN 116307464A
Authority
CN
China
Prior art keywords
agv
reinforcement learning
deep reinforcement
task allocation
agent deep
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211683067.0A
Other languages
Chinese (zh)
Inventor
郭斌
李梦媛
刘佳琪
刘思聪
於志文
邱晨
王亮
王柱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202211683067.0A priority Critical patent/CN116307464A/en
Publication of CN116307464A publication Critical patent/CN116307464A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • G06Q10/0835Relationships between shipper or supplier and carriers
    • G06Q10/08355Routing methods

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to an AGV task allocation method based on multi-agent deep reinforcement learning, which comprises the steps of firstly constructing an AGV cargo handling environment and constructing an AGV kinematic model; secondly, establishing a locally observable Markov decision model, and designing an improved information potential field rewarding function; and training based on a multi-agent deep reinforcement learning method MADDPG, and finally deploying a trained strategy network to each AGV for carrying distributed collaborative cargo. The method of the invention further improves the coordination capability among the independent agents on the basis of the existing multi-agent deep reinforcement learning model, so that each AGV can carry out carrying work in parallel, distributed and coordinated manner, and the predefined overall objective can be completed in a more effective way.

Description

AGV task allocation method based on multi-agent deep reinforcement learning
Technical Field
The invention belongs to the technical field of multi-agent cooperation, and particularly relates to an AGV task allocation method based on multi-agent deep reinforcement learning.
Background
With the push of industry 4.0, automated guided vehicles (Automated Guided Vehicle, AGV) as intelligent devices integrated with a variety of advanced technologies have been widely used for flexible shop material handling due to their high degree of autonomy and flexibility. The efficient group AGV task allocation strategy can reduce the transportation cost and improve the distribution efficiency. However, how to distribute multiple AGV tasks to multiple AGVs, so that the running path cost of the AGV system is the lowest and the running efficiency is the highest, is a key challenge.
Traditional research works apply classical optimization algorithms, such as genetic algorithms, particle swarm algorithms, ant colony algorithms, etc., to the field of AGV task allocation. However, the centralized task allocation methods take the benefit maximization of the whole system as an optimization target, and the control center gathers global information to make a unified decision, so that higher requirements are put on the computing capacity and the real-time capacity of the control center. In addition, small information changes or perturbations may affect the overall plan with poor adaptability and scalability. Different from a centralized decision, the distributed or centerless decision method can reasonably distribute the calculation load, fully utilize the autonomous decision capability of an intelligent agent, not only can reduce the complexity of system modeling, but also can improve the robustness and the expandability of the system.
The continued development of multi-agent deep reinforcement learning (MADRL) technology provides a new solution for implementing group AGV distributed task allocation. The intelligent agent interacts with the environment through an error testing mechanism, learns and optimizes in a mode of maximizing the jackpot prize, and finally achieves the optimal strategy. The autonomous decision-making method based on multi-agent deep reinforcement learning can complete more complex tasks through interaction and decision-making in a higher-dimensional and dynamic scene. However, when the practical application problem is directly solved from the multi-agent deep reinforcement learning point of view, challenges such as environmental non-stationarity and local observability exist. Furthermore, the rewarding mechanism of multi-agent systems is more complex than single agent systems, and rewarding sparsity problems often result in model training that is difficult to converge. Therefore, how to design an effective rewarding mechanism to improve model performance and accelerate model convergence is a key issue.
Disclosure of Invention
Technical problem to be solved
In order to avoid the defects of the prior art, the invention provides the AGV task allocation method based on multi-agent deep reinforcement learning, which can improve the information potential field rewarding function to solve the problem of rewarding sparseness, provide continuous rewards for the AGV and implicitly guide the AGV to move to different cargo targets.
Technical proposal
An AGV task allocation method based on multi-agent deep reinforcement learning is characterized in that: the method adopts a multi-agent deep reinforcement learning model and an information potential field rewarding mechanism combining technology; the multi-agent deep reinforcement learning model is learned and optimized by the AGV in a mode of maximizing the accumulated rewards, and finally an optimal task allocation strategy is achieved; the information potential field rewarding mechanism is used for guiding the AGV to move to the target position along a specific gradient direction by utilizing virtual information gradient diffusion of the data of the target position.
The invention further adopts the technical scheme that: the method comprises the following steps:
step 1: constructing an AGV cargo carrying environment and constructing a kinematic model of the AGV;
step 2: establishing a locally observable Markov decision model, and determining an action space, a state space and a reward function;
step 3: calculating an information potential field rewarding function based on the current state, and providing continuous rewards for AGV decision-making;
step 4: training based on a multi-agent deep reinforcement learning method MADDPG;
step 5: and deploying the trained strategy network to each AGV, and acquiring an action instruction by each AGV according to the local observation information of each AGV to carry out distributed collaborative cargo handling.
The invention further adopts the technical scheme that: a plurality of AGVs are arranged in the kinematic model of the AGVs, and are modeled as discs with the radius of R; the distance between any two AGVs in the model is greater than 2R to avoid collisions between AGVs.
The invention further adopts the technical scheme that: setting the target position information potential value as a fixed positive information potential value, setting the position information potential values of other AGVs as fixed negative information potential values, setting the boundary information potential value as 0, and iterating by using a formula; after iteration, each position has a corresponding information potential value; the AGV obtains a reward value r according to the information potential value of the position where the time step t is.
The invention further adopts the technical scheme that: the AGV obtains a reward r at a time step t IPF 、r g And r c Three-part sum represents:
Figure BDA0004018925750000031
the prize r, r obtained by AGVi at time step t g Exciting only one AGV near each task point, r c It is desirable to minimize collisions, r IPF An implicit thrust is given to the AGV to guide the AGV to move toward the target position in a scattered manner.
The invention further adopts the technical scheme that: and (4) training the MADDPG by the multi-agent deep reinforcement learning method in the step (4) comprises parameter updating of an Actor network and parameter updating of a Critic network.
The invention further adopts the technical scheme that: in the step 4, when the target network parameter is updated, a soft update strategy is adopted: θ'. i ←(1-τ)θ′ i +τθ i Wherein τ represents soft substitution, representing the update amplitude of the target network parameter; θ'. i 、θ i Respectively representing target network parametersAnd estimating network parameters.
A computer system, comprising: one or more processors, a computer-readable storage medium storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods described above.
A computer readable storage medium, characterized by storing computer executable instructions that when executed are configured to implement the method described above.
Advantageous effects
According to the AGV task allocation method MADDPG-IPF based on multi-agent deep reinforcement learning, an AGV cargo handling environment is built, and a kinematic model of the AGV is built; secondly, establishing a locally observable Markov decision model, and designing an improved information potential field rewarding function; and training based on a multi-agent deep reinforcement learning method MADDPG, and finally deploying a trained strategy network to each AGV for carrying distributed collaborative cargo. The method of the invention further improves the coordination capability among the independent agents on the basis of the existing multi-agent deep reinforcement learning model, so that each AGV can carry out carrying work in parallel, distributed and coordinated manner, and the predefined overall objective can be completed in a more effective way.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.
FIG. 1 is a block diagram of an AGV task allocation method based on multi-agent deep reinforcement learning in an example of the present invention.
FIG. 2 is a diagram of a multi-agent based deep reinforcement learning and information potential field rewards network model in accordance with an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
The invention provides an AGV task allocation method based on multi-agent deep reinforcement learning, which adopts the following principle: the traditional centralized AGV task allocation method has high requirements on the real-time capability and decision-making capability of the control center, and does not have adaptability and expandability. The autonomous decision-making method based on multi-agent deep reinforcement learning can complete complex tasks through interaction and decision-making in a higher-dimensional and dynamic scene. We propose a solution based on multi-agent deep reinforcement learning (madppg-IPF), where multiple AGVs implement self-organizing task allocation by trial and error. In addition, we have devised an information potential field rewarding mechanism that provides continuous rewards to the AGV at each time step to address the problem of sparse rewards. The invention improves the cooperation capability among the independent AGVs and provides a solution for realizing flexible and self-organizing task allocation of the group AGVs.
The scheme of the invention comprises the following steps:
multi-agent deep reinforcement learning model: the AGV learns and optimizes in a mode of maximizing the accumulated rewards, and finally, an optimal task allocation strategy is achieved;
information potential field rewarding mechanism: the AGV is guided to move to the target position along a specific gradient direction by utilizing virtual information gradient diffusion of the data of the target position;
MADDPG-IPF: the coordination capability among the independent agents is further improved by combining multi-agent reinforcement learning and information potential field technology, so that each AGV can carry out carrying work in parallel, in a distributed and coordinated manner.
The specific steps of the invention are as follows:
step one: constructing AGV (automatic guided vehicle) cargo carrying environment and constructing AGV kinematic model
The invention assumes a total of N V The AGVs model them as disks of radius R. And all AGVs are isomorphic, having the same parametersNumber and function. At each time step, use tuples
Figure BDA0004018925750000051
Represents the ith table (i is more than or equal to 1 and less than or equal to N) v ) The state of the AGV at this point, where position +.>
Figure BDA0004018925750000052
Speed->
Figure BDA0004018925750000053
Perceived distance r i . The i-th AGV can obtain the sensing distance r i Observations in->
Figure BDA0004018925750000054
According to policy pi θ Calculate action->
Figure BDA0004018925750000055
Wherein θ represents policy parameters, calculated actions +.>
Figure BDA0004018925750000056
Control the speed of the next step to be switched +.>
Figure BDA0004018925750000057
The AGVs are directed to reach different task target points while avoiding collisions with other AGVs.
Definition l= { L i I=1, …, N } is the trajectory of all AGVs, satisfying the following physical constraints:
Figure BDA0004018925750000058
Figure BDA0004018925750000059
Figure BDA00040189257500000510
Figure BDA00040189257500000511
wherein,,
Figure BDA00040189257500000512
indicating the speed of the AGV by using a policy pi based on the current observed value θ The selected action is obtained; the second formula states that the travel speed of the AGV cannot exceed the maximum speed of the AGV; />
Figure BDA00040189257500000513
Indicating that the current position of the AGV is determined by the last position and the current speed; the last equation states that for any two AGVs, the distance between them is greater than 2R to avoid collisions between the AGVs.
Step two: establishing a locally observable Markov decision model, and determining an action space, a state space and a reward function
In a real scenario, the AGV agent will observe the environmental conditions, make action selections based on the acquired observations, and since in most cases only local observations can be obtained, this process is typically modeled as a partially observable Markov decision process. In general, POMDP can be represented by a six-tuple m= (N, S, a, P, R, O), where N is the number of agents, S represents the state space of the system, a represents the joint action space of all agents, P represents the state transition probability matrix, R is the reward function, and O is the observed probability distribution obtained from the system state, subject to (O-O (S)).
Aiming at the problem of the allocation of the specific group AGV cooperative tasks, a state space S, an action space A and a reward function R are designed as follows:
state space S: the design state space is { v, p, D A ,D B (v, p) represents the speed and position of the agent itself, { D }, where A ,D B And the relative distance to the target point and other agents.
Action space a: in order to be more practical, the motion space of the intelligent agent is set to be a continuous motion space, which is represented by a one-dimensional vector { x, y }, and the value interval is (-1, 1), which represents the acceleration of the intelligent agent in the front-back direction and the left-right direction at the current moment. The speed at which the AGV will switch next can be calculated in combination with the weight of the AGV itself and the damping.
Bonus function R: the goal of the task allocation is that multiple AGVs can reach different task target points in a self-organizing and decentralized manner in as short a time as possible while avoiding collisions. A generic rewarding function is designed to achieve this goal. The target rewards can be obtained when the intelligent agent reaches the target position, and the collision punishment is carried out when the intelligent agent collides with other intelligent agents or walls.
Wherein, rewards r to the target g And a collision penalty r c Is defined as follows:
Figure BDA0004018925750000061
Figure BDA0004018925750000071
wherein d is ij Is the distance from task point j to AGVi.
Step three: calculating an informative potential field rewarding function based on current state, providing continuous rewards for AGV decisions
By dividing the environment into a bounded grid map, setting positive information potential values for positions of target points and negative information potential values for positions of other AGVs, the AGVs can be implicitly guided to move in a scattered manner to different targets. The specific method comprises the steps of setting a target position information potential value as a fixed positive information potential value, setting the position information potential values of other AGVs as a fixed negative information potential value, setting a boundary information potential value as 0, and iterating by utilizing a formula:
Figure BDA0004018925750000072
wherein phi is k (u) is the information potential value of the node u in the kth round, N (u) is the neighbor node set of the node u, and d (u) is the number of neighbor nodes of the node u.
After the iteration, each location will have a corresponding information potential value. AGV rewards value r according to information potential value of position of time step t IPF . The information potential value near the target position is higher, and when multiple targets are located in the similar positions, the information potential fields are higher by superposition, and the AGV is attracted to move towards more positions of the targets. However, if other AGVs are already in the vicinity of the target, a negative informational potential field will be created and the attractive force to the AGVs will be reduced. The design can well avoid the contention of a plurality of agents for the same target point, guide the agents to go to different task target points in a self-organizing and decentralized manner, and achieve the purpose of cooperative transportation.
In general, the invention uses r IPF 、r g And r c The three-part sum represents the prize r, r that AGVi gets at time step t g Exciting only one AGV near each task point, r c It is desirable to minimize collisions, r IPF An implicit thrust is given to the AGV to guide the AGV to move toward the target position in a scattered manner.
Figure BDA0004018925750000073
Step four: training based on multi-agent deep reinforcement learning method MADDPG
In the multi-agent training part, the method adopts an algorithm MADDPG based on an Actor-Critic framework. In the training stage, the Actor network combines a strategy gradient method and a state-behavior value function, calculates deterministic optimal behaviors according to the current state, and optimizes the neural network parameters theta according to scores of the behaviors by the Critic network. Critic network uses the observed information of all agents to evaluate the behavior generated by the Actor network by calculating TD-error.
The parameter update of the Actor network of the madppg algorithm can be given by:
Figure BDA0004018925750000081
wherein o is i Representing local observations acquired by the ith agent, x= [ o ] 1 ,…,o n ]The global observation vector, i.e. the state at this time, is represented, integrating the information acquired by all agents.
Figure BDA0004018925750000082
Representing the state-action function of the i-th agent's centralized type.
The parameter update of the Critic network of the madppg algorithm can be given by:
Figure BDA0004018925750000083
wherein,,
Figure BDA0004018925750000084
represents the target network, μ ' = [ μ ' ' 1 ,μ′ 2 ,…,μ′ n ]Parameter θ 'with hysteresis update for target policy' j . In addition, in performing the target network parameter update, a soft update (soft update) policy is generally adopted:
θ′ i ←(1-τ)θ′ i +τθ i
where τ represents soft replacement (soft replacement), representing the update magnitude of the target network parameter; θ'. i 、θ i Representing the target network parameter and the estimated network parameter, respectively.
Step five: and deploying the trained strategy network to each AGV, and acquiring an action instruction by each AGV according to the local observation information of each AGV to carry out distributed collaborative cargo handling.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made without departing from the spirit and scope of the invention.

Claims (9)

1. An AGV task allocation method based on multi-agent deep reinforcement learning is characterized in that: the method adopts a multi-agent deep reinforcement learning model and an information potential field rewarding mechanism combining technology; the multi-agent deep reinforcement learning model is learned and optimized by the AGV in a mode of maximizing the accumulated rewards, and finally an optimal task allocation strategy is achieved; the information potential field rewarding mechanism is used for guiding the AGV to move to the target position along a specific gradient direction by utilizing virtual information gradient diffusion of the data of the target position.
2. The AGV task allocation method based on multi-agent deep reinforcement learning according to claim 1, wherein the AGV task allocation method comprises the following steps: the method comprises the following steps:
step 1: constructing an AGV cargo carrying environment and constructing a kinematic model of the AGV;
step 2: establishing a locally observable Markov decision model, and determining an action space, a state space and a reward function;
step 3: calculating an information potential field rewarding function based on the current state, and providing continuous rewards for AGV decision-making;
step 4: training based on a multi-agent deep reinforcement learning method MADDPG;
step 5: and deploying the trained strategy network to each AGV, and acquiring an action instruction by each AGV according to the local observation information of each AGV to carry out distributed collaborative cargo handling.
3. The AGV task allocation method based on multi-agent deep reinforcement learning according to claim 1, wherein the AGV task allocation method comprises the following steps: a plurality of AGVs are arranged in the kinematic model of the AGVs, and are modeled as discs with the radius of R;
the distance between any two AGVs in the model is greater than 2R to avoid collisions between AGVs.
4. The AGV task allocation method based on multi-agent deep reinforcement learning according to claim 2, wherein the AGV task allocation method is characterized in that: setting the target position information potential value as a fixed positive information potential value, setting the position information potential values of other AGVs as fixed negative information potential values, setting the boundary information potential value as 0, and iterating by using a formula; after iteration, each position has a corresponding information potential value; the AGV obtains a reward value r according to the information potential value of the position where the time step t is.
5. The AGV task allocation method based on multi-agent deep reinforcement learning according to claim 4, wherein: the AGV obtains a reward r at a time step t IPF 、r g And r c Three-part sum represents:
Figure FDA0004018925740000021
Figure FDA0004018925740000022
the AGV i obtains the rewards r, r at time step t g Exciting only one AGV near each task point, r c It is desirable to minimize collisions, r IPF An implicit thrust is given to the AGV to guide the AGV to move toward the target position in a scattered manner.
6. The AGV task allocation method based on multi-agent deep reinforcement learning according to claim 2, wherein the AGV task allocation method is characterized in that: and (4) training the MADDPG by the multi-agent deep reinforcement learning method in the step (4) comprises parameter updating of an Actor network and parameter updating of a Critic network.
7. The AGV task allocation method based on multi-agent deep reinforcement learning according to claim 6, wherein: in the step 4, when the target network parameter is updated, a soft update strategy is adopted: θ'. i ←(1-τ)θ′ i +τθ i Wherein τ represents soft substitution, representing the update amplitude of the target network parameter; θ'. i 、θ i Respectively representThe target network parameters and the estimated network parameters.
8. A computer system, comprising: one or more processors, a computer-readable storage medium storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of claim 1.
9. A computer readable storage medium, characterized by storing computer executable instructions that, when executed, are adapted to implement the method of claim 1.
CN202211683067.0A 2022-12-27 2022-12-27 AGV task allocation method based on multi-agent deep reinforcement learning Pending CN116307464A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211683067.0A CN116307464A (en) 2022-12-27 2022-12-27 AGV task allocation method based on multi-agent deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211683067.0A CN116307464A (en) 2022-12-27 2022-12-27 AGV task allocation method based on multi-agent deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN116307464A true CN116307464A (en) 2023-06-23

Family

ID=86833040

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211683067.0A Pending CN116307464A (en) 2022-12-27 2022-12-27 AGV task allocation method based on multi-agent deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN116307464A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116779150A (en) * 2023-07-03 2023-09-19 浙江一山智慧医疗研究有限公司 Personalized medical decision method, device and application based on multi-agent interaction
CN117236821A (en) * 2023-11-10 2023-12-15 淄博纽氏达特机器人系统技术有限公司 Online three-dimensional boxing method based on hierarchical reinforcement learning
CN117272842A (en) * 2023-11-21 2023-12-22 中国电建集团西北勘测设计研究院有限公司 Cooperative control system and method for multi-industrial park comprehensive energy system
CN117406706A (en) * 2023-08-11 2024-01-16 汕头大学 Multi-agent obstacle avoidance method and system combining causal model and deep reinforcement learning
CN118051035A (en) * 2024-04-15 2024-05-17 山东大学 Multi-AGV scheduling method based on local distance visual field reinforcement learning
CN118296702A (en) * 2024-04-09 2024-07-05 合肥工业大学 Automatic building space combination method based on multi-agent deep reinforcement learning
CN118365099A (en) * 2024-06-19 2024-07-19 华南理工大学 Multi-AGV scheduling method, device, equipment and storage medium

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116779150A (en) * 2023-07-03 2023-09-19 浙江一山智慧医疗研究有限公司 Personalized medical decision method, device and application based on multi-agent interaction
CN116779150B (en) * 2023-07-03 2023-12-22 浙江一山智慧医疗研究有限公司 Personalized medical decision method, device and application based on multi-agent interaction
CN117406706A (en) * 2023-08-11 2024-01-16 汕头大学 Multi-agent obstacle avoidance method and system combining causal model and deep reinforcement learning
CN117406706B (en) * 2023-08-11 2024-04-09 汕头大学 Multi-agent obstacle avoidance method and system combining causal model and deep reinforcement learning
CN117236821A (en) * 2023-11-10 2023-12-15 淄博纽氏达特机器人系统技术有限公司 Online three-dimensional boxing method based on hierarchical reinforcement learning
CN117236821B (en) * 2023-11-10 2024-02-06 淄博纽氏达特机器人系统技术有限公司 Online three-dimensional boxing method based on hierarchical reinforcement learning
CN117272842A (en) * 2023-11-21 2023-12-22 中国电建集团西北勘测设计研究院有限公司 Cooperative control system and method for multi-industrial park comprehensive energy system
CN117272842B (en) * 2023-11-21 2024-02-27 中国电建集团西北勘测设计研究院有限公司 Cooperative control system and method for multi-industrial park comprehensive energy system
CN118296702A (en) * 2024-04-09 2024-07-05 合肥工业大学 Automatic building space combination method based on multi-agent deep reinforcement learning
CN118051035A (en) * 2024-04-15 2024-05-17 山东大学 Multi-AGV scheduling method based on local distance visual field reinforcement learning
CN118365099A (en) * 2024-06-19 2024-07-19 华南理工大学 Multi-AGV scheduling method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN116307464A (en) AGV task allocation method based on multi-agent deep reinforcement learning
CN110442129B (en) Control method and system for multi-agent formation
Khan et al. Learning safe unlabeled multi-robot planning with motion constraints
CN114815882B (en) Unmanned aerial vehicle autonomous formation intelligent control method based on reinforcement learning
CN113485323B (en) Flexible formation method for cascading multiple mobile robots
CN115330095A (en) Mine car dispatching model training method, device, chip, terminal, equipment and medium
Li et al. Decentralized multi-agv task allocation based on multi-agent reinforcement learning with information potential field rewards
CN116501069A (en) Water surface unmanned cluster route planning method based on multi-agent reinforcement learning
Zheng et al. A behavior decision method based on reinforcement learning for autonomous driving
CN116466662A (en) Multi-AGV intelligent scheduling method based on layered internal excitation
Fan et al. Spatiotemporal path tracking via deep reinforcement learning of robot for manufacturing internal logistics
Zhang et al. Multi-target encirclement with collision avoidance via deep reinforcement learning using relational graphs
CN116551703B (en) Motion planning method based on machine learning in complex environment
CN112034844A (en) Multi-intelligent-agent formation handling method, system and computer-readable storage medium
CN116227622A (en) Multi-agent landmark coverage method and system based on deep reinforcement learning
Yang Reinforcement learning for multi-robot system: A review
CN114489035B (en) Multi-robot collaborative search method based on accumulated trace reinforcement learning
Gong et al. Reinforcement learning for multi-agent formation navigation with scalability
CN114706384A (en) Multi-machine navigation method, system and medium for maintaining connectivity
Zhu et al. A cooperative task assignment method of multi-UAV based on self organizing map
CN115187056A (en) Multi-agent cooperative resource allocation method considering fairness principle
CN116449864A (en) Optimal path selection method for unmanned aerial vehicle cluster
Ma et al. Improved DRL-based energy-efficient UAV control for maximum lifecycle
CN117575220B (en) Heterogeneous multi-agent-oriented multi-task strategy game method
CN114997617B (en) Multi-unmanned platform multi-target combined detection task allocation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination