CN111198568A - Underwater robot obstacle avoidance control method based on Q learning - Google Patents
Underwater robot obstacle avoidance control method based on Q learning Download PDFInfo
- Publication number
- CN111198568A CN111198568A CN201911338069.4A CN201911338069A CN111198568A CN 111198568 A CN111198568 A CN 111198568A CN 201911338069 A CN201911338069 A CN 201911338069A CN 111198568 A CN111198568 A CN 111198568A
- Authority
- CN
- China
- Prior art keywords
- underwater robot
- penalty
- robot
- action
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000009471 action Effects 0.000 claims abstract description 46
- 230000006870 function Effects 0.000 claims abstract description 25
- 238000013528 artificial neural network Methods 0.000 claims abstract description 8
- 230000007246 mechanism Effects 0.000 claims abstract description 7
- 238000013459 approach Methods 0.000 claims abstract description 6
- 239000011159 matrix material Substances 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 12
- 210000002569 neuron Anatomy 0.000 claims description 6
- 230000004888 barrier function Effects 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 238000013016 damping Methods 0.000 claims description 3
- 230000005484 gravity Effects 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000011478 gradient descent method Methods 0.000 abstract description 2
- 230000004044 response Effects 0.000 abstract 1
- 241000251468 Actinopterygii Species 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 235000014653 Carica parviflora Nutrition 0.000 description 1
- 241000243321 Cnidaria Species 0.000 description 1
- 241000282414 Homo sapiens Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012876 topography Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/04—Control of altitude or depth
- G05D1/06—Rate of change of altitude or depth
- G05D1/0692—Rate of change of altitude or depth specially adapted for under-water vehicles
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S15/00—Systems using the reflection or reradiation of acoustic waves, e.g. sonar systems
- G01S15/88—Sonar systems specially adapted for specific applications
- G01S15/93—Sonar systems specially adapted for specific applications for anti-collision purposes
Landscapes
- Engineering & Computer Science (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Aviation & Aerospace Engineering (AREA)
- Automation & Control Theory (AREA)
- Acoustics & Sound (AREA)
- Computer Networks & Wireless Communication (AREA)
- Manipulator (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention discloses an underwater robot obstacle avoidance control method based on Q learning, which belongs to the field of underwater robot control and mainly comprises the following steps: establishing a current environment according to sonar devices arranged around the underwater robot; setting a safety alert distance and a target threshold range of the underwater robot, and determining the position of the underwater robot in real time by adopting a positioning technology; creating an action space, a neural network, initializing action rewarding and penalizing, a state space and an iteration value; setting a reward and penalty mechanism, selecting each action according to a reward and penalty function, and enabling the actions to reach a convergence requirement through an iteration Q function to approach a target; the neural network approximation is adopted to improve the efficiency, and the gradient descent method is used for iteration. The invention has the advantages of improving the response capability and the learning capability of the underwater robot, having high data utilization rate, reducing errors and the like.
Description
Technical Field
The invention belongs to the technical field of underwater robot control, in particular to optimal control for timely avoiding underwater obstacles, and particularly relates to an underwater robot obstacle avoidance control method based on Q learning.
Background
The ocean accounts for about 71% of the earth's surface and will become a new exploration space for human beings. The underwater robot senses the obstacles through a specific sensor to avoid the obstacles. However, the characteristics of marine environments are very complex, such as reef, coral, sea ditch and even marine emergencies (fast gathering fish shoal), so that it is very important that the underwater robot can avoid obstacles smoothly during exploration.
The patent application with the publication number of CN107121985A discloses a radar obstacle avoidance system of an underwater intelligent robot, and the scheme takes a radar transceiver as a main carrier and combines a timer of a single chip microcomputer to effectively avoid obstacles. Although the method can complete obstacle avoidance work of the underwater robot, the radar transmission mode is mainly electromagnetic waves, the electromagnetic waves are quickly attenuated when being transmitted underwater, and received signals are weakened, so that the obstacle avoidance is not timely, and further the robot is collided.
Furthermore, patent application with publication number CN108829134A discloses a real-time autonomous obstacle avoidance method for a deep-sea robot, which uses a geometric sphere to model irregular obstacles, projects the obstacles on a horizontal plane and a vertical plane, and analyzes a course infeasible area affected by the obstacles by adopting a tangent method to obtain a course infeasible set navigated by an unmanned underwater vehicle; analyzing the motion characteristics of the unmanned underwater vehicle to obtain a heading window and a linear velocity window of the unmanned underwater vehicle; searching out an optimal navigation angle by constructing an optimal navigation angle optimization function, and constructing a leading line speed model according to the distribution of the obstacles and the size of a yaw angle; and finally, outputting the navigation angle and the navigation linear speed to an unmanned underwater vehicle motion control module to guide the unmanned underwater vehicle to realize real-time obstacle avoidance in a three-dimensional environment. However, the above method is complicated and time-consuming in analysis and calculation, and cannot cope with seabed emergencies such as fish swarms moving everywhere. Therefore, it is necessary to design an obstacle avoidance control method for an underwater robot, which considers timeliness and has strong adaptability, so that on one hand, seabed emergencies can be avoided in time, and on the other hand, the obstacle avoidance control method can adapt to various complex seabed conditions.
Disclosure of Invention
The invention aims to provide an underwater robot obstacle avoidance control method which is timely in avoidance, strong in adaptability and wide in application.
In order to achieve the purpose, the invention adopts the technical scheme that:
an underwater robot obstacle avoidance control method based on Q learning comprises the following steps:
Wherein M represents an inertia matrix, C represents a Coriolis force matrix, D represents a damping matrix, G represents a gravity matrix, τ is a control input, and v is a control output;
the underwater robot has 6 degrees of freedom, and the distance between the robot and the obstacle is x on the assumption of the nth degree of freedomnThe underwater robot sets a safety warning distance d, if the underwater robot has x in the nth degree of freedomn<d, indicating that the underwater robot is likely to collide and simultaneously taking corresponding evasive action on the degree of freedom;
step 2, determining the position D of the underwater robot at each moment by using a positioning technologyiWherein i represents the ith moment, and the distance D between the underwater robot and the target point at the moment is comparediAnd the distance D between the underwater robot and the target point at the last momenti-1If D isi>Di-1Indicating that the robot is moving away from the target point, if Di<Di-1Indicating that the robot is approaching the target point, calculating the distance D between the underwater robot and the target point at the current moment, setting a target point threshold value D0 by considering the underwater fluctuation, and if D is detected<d0, indicating that the underwater robot has reached the target point; establishing an action space A according to the degree of freedom of the underwater robot;
step 3, selecting action to punish minimum according to Q learning of the underwater robot, setting a per-step reward and penalty mechanism, setting an initial penalty as K, and in step 1, selecting action to punish minimum by the underwater robot, wherein the underwater robot sets the initial penalty as KDistance rewarding function R from target point1The following formula is given,
i.e. the occurrence of Di>Di-1Then a penalty K is given and D appearsi<Di-1Then a negative penalty-K is given, and in step 2, the underwater robot approaches the reward function R of the obstacle within the security alert threshold2Is given by
Wherein the above formula indicates that when the obstacle enters the safety warning distance, the reward and penalty function value increases along with the decrease of the distance that the underwater robot approaches the obstacle; when the barrier is out of the safety warning distance, the reward and penalty function value is K, and the total reward and penalty of each step of the underwater robot is R ═ R1+R2(ii) a Meanwhile, the underwater robot avoids the obstacle according to a reward penalty function, when the penalty of the step is larger than that of the previous step, the underwater robot is close to the obstacle, and the underwater robot moves away from the obstacle; when the penalty of the step is smaller than that of the previous step, the underwater robot is far away from the barrier, and the underwater robot moves to a target point;
step 4, carrying out weight distribution on the multidimensional input by utilizing the neural network, copying the actual network weight into the target network weight after each training, wherein the weight updating formula is as follows
Wherein xmFor input signals, ωmRepresenting the weight, M being the total number of neurons, netlIs the relation of input and output, f is the activation function, ylIs output for the neuron;
step 5, training the underwater robot to search an optimal obstacle avoidance path scheme, and initializing an action rewarding penalty R; initializing a state matrix S; initializing the total training times M of the robot; setting an iteration value j to represent the number of times of robot training; setting a discount factor gamma; according to the Q function
Q(s,a)=R(s,a)+γmaxaQ(s',a') (5)
The above equation represents the reward penalty R (s, a) for taking the action a in one state s plus the highest Q value at the discount rate for the next state s'; for the Q value that seeks the maximum, perform gradient descent, in order to minimize the punishment of each step; inputting the updated state of each step into a Q learning network, and then returning the Q values of all possible actions in the state; selecting an action, selecting a random action a when Q values of all selected actions are the same, and selecting the action with the highest Q value when the Q values of all selected actions are different; after the action a is selected, the underwater robot executes the selected action in the state S, and moves to a new state S', and receives the prize R; these steps M rounds are repeated until the Q value meets the convergence requirement.
The technical scheme of the invention is further improved as follows: in step 2, the target point threshold range is a circular area with d0 as a radius and the target point as a center.
The technical scheme of the invention is further improved as follows: in step 1, the safety alert range is a circular moving area with d as a radius and the center of mass of the underwater robot as the center of a circle.
The technical scheme of the invention is further improved as follows: the convergence requirement of the Q value in the step 5 is that the difference between the Q value in the step and the Q value in the previous step is not more than 0.01, namely the Q value reaches the convergence.
Due to the adoption of the technical scheme, the invention has the following technical effects:
1. aiming at the seabed emergency, the method of the invention is respectively provided with distance measuring sonar and forward looking sonar at the bow, the stern, the port and the starboard of the robot, thus being capable of timely measuring the surrounding obstacle situation to effectively avoid.
2. Aiming at the complex topography condition of the seabed, the weight distribution is carried out on the input by utilizing the neural network, the weight distribution is combined with Q learning, the weight is updated by utilizing experience every time, and the data utilization efficiency is higher. Learning directly from consecutive samples is insufficient due to the high correlation between consecutive samples. The method of combining the neural network to randomly draw the samples breaks the correlation, so that the variance of the updated weight can be reduced.
3. The method is also provided with a target network independently to process TD deviation in a time difference algorithm. Therefore, the method has strong learning ability and fast environment adaptation, and can be better competent for complex tasks when being applied to obstacle avoidance control of the underwater robot.
4. In the method, the underwater robot selects actions to punish minimum by using Q learning, sets a per-step reward-penalty mechanism, and enables the underwater robot to avoid obstacles more accurately and reasonably by setting a reasonable per-step total reward-penalty function.
Drawings
FIG. 1 is a flow chart of an underwater robot learning process;
FIG. 2 is a schematic diagram of an obstacle avoidance of the underwater robot on a simulated seabed;
in fig. 2: u is an underwater robot; g is a target point; x is an obstacle; 1, 2, 3, 4, and robot learning training.
Detailed Description
The invention is described in further detail below with reference to the following figures and specific embodiments:
as shown in fig. 1 and 2, the invention discloses an underwater robot obstacle avoidance control method based on Q learning, which is mainly an autonomous and cableless underwater robot U, senses the surrounding environment through a sonar receiving device around the autonomous and cableless underwater robot U, and can perform underwater autonomous obstacle avoidance by using a self-control system. The method takes timeliness into consideration and has strong adaptability.
The obstacle avoidance method comprises the following steps:
Wherein M represents an inertia matrix, C represents a Coriolis force matrix, D represents a damping matrix, G represents a gravity matrix, τ is a control input, and v is a control output.
The underwater robot has 6 degrees of freedom. Suppose that the distance between the robot and the obstacle is x in the nth degree of freedomn. The safety warning distance set by the underwater robot is d, wherein the safety warning range refers to a circular moving area which takes d as a radius and takes the center of mass of the underwater robot as the center of a circle, and if the underwater robot has x in the nth degree of freedomn<d, the underwater robot is possible to collide, and corresponding evasive action is taken on the degree of freedom.
Step 2, determining the position D of the underwater robot at each moment by using a positioning technologyiWherein i represents the ith moment, and the distance D between the underwater robot and the target point at the moment is comparediAnd the distance D between the underwater robot and the target point at the last momenti-1If D isi>Di-1Indicating that the robot is moving away from the target point, if Di<Di-1Indicating that the robot is approaching the target point. Calculating the distance D between the underwater robot and a target point at the current moment, and setting a target point threshold value D0 by considering the underwater fluctuation, wherein the target point threshold value range is a circular area taking D0 as a radius and taking the target point as a circle center; if D is<d0, it indicates that the underwater robot has reached the target point. And establishing an action space A according to the degree of freedom of the underwater robot.
And 3, selecting action to punish minimum according to Q learning of the underwater robot, setting a per-step reward and penalty mechanism, and setting an initial penalty as K. In step 1, a distance reward function R between the underwater robot and a target point1Is given by
I.e. the occurrence of Di>Di-1Then a penalty K is given and D appearsi<Di-1Then a negative penalty-K is given. In step 2, the underwater robot is safeReward function R close to obstacle within warning threshold2Is given by
Wherein the above formula indicates that when the obstacle enters the safety warning distance, the reward and penalty function value increases along with the decrease of the distance that the underwater robot approaches the obstacle; when the barrier is out of the safety warning distance, the reward and penalty function value is K. The total reward per step of the underwater robot is R ═ R1+R2. And meanwhile, action selection is carried out by combining a reward and penalty mechanism of the underwater robot. By setting the reward and penalty mechanism and giving a cost function, the target is approached by iteratively seeking the minimum penalty.
And 4, carrying out weight distribution on the multidimensional input by utilizing the neural network, and copying the actual network weight to the target network weight after each training. The weight update is as follows
Wherein xmFor input signals, ωmRepresenting the weight, M being the total number of neurons, netlIs the relation of input and output, f is the activation function, ylIs output by the neuron.
Step 5, training the underwater robot to search an optimal obstacle avoidance path scheme, and initializing an action rewarding penalty R; initializing a state matrix S; initializing the total training times M of the robot; setting an iteration value j; setting a discount factor gamma; according to the Q function
Q(s,a)=R(s,a)+γmaxaQ(s',a') (5)
The above equation represents the reward penalty R (s, a) for taking the action a in one state s plus the highest Q value at the discount rate for the next state s'. Gradient descent is performed to minimize per-step penalty for seeking the maximum Q value. The updated state for each step is input into the Q learning network and then the Q values for all possible actions in that state are returned. At this point, an action is selected, and when the Q value of each selected action is the same, we select a random action a, and when the Q value of each selected action is different, the action with the highest Q value is selected. After selecting action a, the underwater robot performs the selected action in state S and proceeds to a new state S', receiving the prize R. These steps M rounds are repeated until the Q value meets the convergence requirement. In a specific operation, the requirement for convergence of the Q value is that the Q value of the step is not more than 0.01 different from the Q value of the previous step, i.e. the Q value has reached convergence. The neural network approximation is adopted to improve the efficiency, and the gradient descent method is utilized to iterate so as to seek the optimal control strategy.
The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and not restrictive, and various changes and modifications may be made to the technical solution of the present invention by those skilled in the art without departing from the spirit of the present invention, which is defined by the claims.
Claims (4)
1. An underwater robot obstacle avoidance control method based on Q learning is characterized in that: the method comprises the following steps:
step 1, establishing the current environment of a robot through signals of a sonar receiving device arranged on an underwater robot; the underwater robot adopts a dynamic model of
Wherein M represents an inertia matrix, C represents a Coriolis force matrix, D represents a damping matrix, G represents a gravity matrix, τ is a control input, and v is a control output;
the underwater robot has 6 degrees of freedom, and the distance between the robot and the obstacle is x on the assumption of the nth degree of freedomnThe underwater robot sets a safety warning distance d, if the underwater robot has x in the nth degree of freedomn<d, indicating that the underwater robot is likely to collide and simultaneously taking corresponding evasive action on the degree of freedom;
step 2, determining the underwater robot at each moment by using a positioning technologyPosition D ofiWherein i represents the ith moment, and the distance D between the underwater robot and the target point at the moment is comparediAnd the distance D between the underwater robot and the target point at the last momenti-1If D isi>Di-1Indicating that the robot is moving away from the target point, if Di<Di-1Indicating that the robot is approaching the target point, calculating the distance D between the underwater robot and the target point at the current moment, setting a target point threshold value D0 by considering the underwater fluctuation, and if D is detected<d0, indicating that the underwater robot has reached the target point; establishing an action space A according to the degree of freedom of the underwater robot;
step 3, selecting action to punish minimum according to Q learning of the underwater robot, setting a per-step reward-penalty mechanism, setting an initial penalty as K, and in step 1, selecting a distance reward-penalty function R between the underwater robot and a target point1The following formula is given,
i.e. the occurrence of Di>Di-1Then a penalty K is given and D appearsi<Di-1Then a negative penalty-K is given, and in step 2, the underwater robot approaches the reward function R of the obstacle within the security alert threshold2Is given by
Wherein the above formula indicates that when the obstacle enters the safety warning distance, the reward and penalty function value increases along with the decrease of the distance that the underwater robot approaches the obstacle; when the barrier is out of the safety warning distance, the reward and penalty function value is K, and the total reward and penalty of each step of the underwater robot is R ═ R1+R2(ii) a Meanwhile, the underwater robot avoids the obstacle according to a reward penalty function, when the penalty of the step is larger than that of the previous step, the underwater robot is close to the obstacle, and the underwater robot moves away from the obstacle; when the step penalty becomes smaller than the previous step penalty, tableThe underwater robot is shown to be away from the obstacle, and the underwater robot needs to move to a target point;
step 4, carrying out weight distribution on the multidimensional input by utilizing the neural network, copying the actual network weight into the target network weight after each training, wherein the weight updating formula is as follows
Wherein xmFor input signals, ωmRepresenting the weight, M being the total number of neurons, netlIs the relation of input and output, f is the activation function, ylIs output for the neuron;
step 5, training the underwater robot to search an optimal obstacle avoidance path scheme, and initializing an action rewarding penalty R; initializing a state matrix S; initializing the total training times M of the robot; setting an iteration value j to represent the number of times of robot training; setting a discount factor gamma; according to the Q function
Q(s,a)=R(s,a)+γmaxaQ(s',a') (5)
The above equation represents the reward penalty R (s, a) for taking the action a in one state s plus the highest Q value at the discount rate for the next state s'; for the Q value that seeks the maximum, perform gradient descent, in order to minimize the punishment of each step; inputting the updated state of each step into a Q learning network, and then returning the Q values of all possible actions in the state; selecting an action, selecting a random action a when Q values of all selected actions are the same, and selecting the action with the highest Q value when the Q values of all selected actions are different; after the action a is selected, the underwater robot executes the selected action in the state S, and moves to a new state S', and receives the prize R; these steps M rounds are repeated until the Q value meets the convergence requirement.
2. The Q learning-based underwater robot obstacle avoidance control method according to claim 1, characterized in that: in step 2, the target point threshold range is a circular area with d0 as a radius and the target point as a center.
3. The Q learning-based underwater robot obstacle avoidance control method according to claim 1, characterized in that: in step 1, the safety alert range is a circular moving area with d as a radius and the center of mass of the underwater robot as the center of a circle.
4. The Q learning-based underwater robot obstacle avoidance control method according to claim 1, characterized in that: the convergence requirement of the Q value in the step 5 is that the difference between the Q value in the step and the Q value in the previous step is not more than 0.01, namely the Q value reaches the convergence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911338069.4A CN111198568A (en) | 2019-12-23 | 2019-12-23 | Underwater robot obstacle avoidance control method based on Q learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911338069.4A CN111198568A (en) | 2019-12-23 | 2019-12-23 | Underwater robot obstacle avoidance control method based on Q learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111198568A true CN111198568A (en) | 2020-05-26 |
Family
ID=70744597
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911338069.4A Pending CN111198568A (en) | 2019-12-23 | 2019-12-23 | Underwater robot obstacle avoidance control method based on Q learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111198568A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102795323A (en) * | 2011-05-25 | 2012-11-28 | 中国科学院沈阳自动化研究所 | Unscented Kalman filter (UKF)-based underwater robot state and parameter joint estimation method |
CN109240091A (en) * | 2018-11-13 | 2019-01-18 | 燕山大学 | A kind of underwater robot control method based on intensified learning and its control method tracked |
CN109540151A (en) * | 2018-03-25 | 2019-03-29 | 哈尔滨工程大学 | A kind of AUV three-dimensional path planning method based on intensified learning |
CN109933086A (en) * | 2019-03-14 | 2019-06-25 | 天津大学 | Unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study |
CN110333739A (en) * | 2019-08-21 | 2019-10-15 | 哈尔滨工程大学 | A kind of AUV conduct programming and method of controlling operation based on intensified learning |
CN110345948A (en) * | 2019-08-16 | 2019-10-18 | 重庆邮智机器人研究院有限公司 | Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm |
-
2019
- 2019-12-23 CN CN201911338069.4A patent/CN111198568A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102795323A (en) * | 2011-05-25 | 2012-11-28 | 中国科学院沈阳自动化研究所 | Unscented Kalman filter (UKF)-based underwater robot state and parameter joint estimation method |
CN109540151A (en) * | 2018-03-25 | 2019-03-29 | 哈尔滨工程大学 | A kind of AUV three-dimensional path planning method based on intensified learning |
CN109240091A (en) * | 2018-11-13 | 2019-01-18 | 燕山大学 | A kind of underwater robot control method based on intensified learning and its control method tracked |
CN109933086A (en) * | 2019-03-14 | 2019-06-25 | 天津大学 | Unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study |
CN110345948A (en) * | 2019-08-16 | 2019-10-18 | 重庆邮智机器人研究院有限公司 | Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm |
CN110333739A (en) * | 2019-08-21 | 2019-10-15 | 哈尔滨工程大学 | A kind of AUV conduct programming and method of controlling operation based on intensified learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109540151B (en) | AUV three-dimensional path planning method based on reinforcement learning | |
Sun et al. | Mapless motion planning system for an autonomous underwater vehicle using policy gradient-based deep reinforcement learning | |
JP6854549B2 (en) | AUV action planning and motion control methods based on reinforcement learning | |
CN109765929B (en) | UUV real-time obstacle avoidance planning method based on improved RNN | |
CN108803313B (en) | Path planning method based on ocean current prediction model | |
Cao et al. | Target search control of AUV in underwater environment with deep reinforcement learning | |
CN111273670B (en) | Unmanned ship collision prevention method for fast moving obstacle | |
CN109241552A (en) | A kind of underwater robot motion planning method based on multiple constraint target | |
Wang et al. | Cooperative collision avoidance for unmanned surface vehicles based on improved genetic algorithm | |
US20230286158A1 (en) | Autonomous sense and guide machine learning system | |
Yan et al. | Real-world learning control for autonomous exploration of a biomimetic robotic shark | |
Hadi et al. | Adaptive formation motion planning and control of autonomous underwater vehicles using deep reinforcement learning | |
Zhang et al. | Intelligent vector field histogram based collision avoidance method for auv | |
Wang et al. | Obstacle avoidance for environmentally-driven USVs based on deep reinforcement learning in large-scale uncertain environments | |
Wu et al. | Multi-vessels collision avoidance strategy for autonomous surface vehicles based on genetic algorithm in congested port environment | |
CN117311160A (en) | Automatic control system and control method based on artificial intelligence | |
CN109916400B (en) | Unmanned ship obstacle avoidance method based on combination of gradient descent algorithm and VO method | |
Tang et al. | Path planning of autonomous underwater vehicle in unknown environment based on improved deep reinforcement learning | |
CN111198568A (en) | Underwater robot obstacle avoidance control method based on Q learning | |
Jose et al. | Navigating the Ocean with DRL: Path following for marine vessels | |
CN114609925B (en) | Training method of underwater exploration strategy model and underwater exploration method of bionic machine fish | |
CN117406716A (en) | Unmanned ship collision avoidance control method, unmanned ship collision avoidance control device, terminal equipment and medium | |
Li et al. | LSDA-APF: A Local Obstacle Avoidance Algorithm for Unmanned Surface Vehicles Based on 5G Communication Environment. | |
US20220371709A1 (en) | Path planning system and method for sea-aerial cooperative underwater target tracking | |
Ferrandino et al. | A Comparison between Crisp and Fuzzy Logic in an Autonomous Driving System for Boats |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200526 |
|
RJ01 | Rejection of invention patent application after publication |