CN111413974B - Automobile automatic driving motion planning method and system based on learning sampling type - Google Patents
Automobile automatic driving motion planning method and system based on learning sampling type Download PDFInfo
- Publication number
- CN111413974B CN111413974B CN202010236474.1A CN202010236474A CN111413974B CN 111413974 B CN111413974 B CN 111413974B CN 202010236474 A CN202010236474 A CN 202010236474A CN 111413974 B CN111413974 B CN 111413974B
- Authority
- CN
- China
- Prior art keywords
- track
- forward simulation
- trajectory
- optimal
- vehicle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 76
- 238000005070 sampling Methods 0.000 title claims abstract description 22
- 238000004088 simulation Methods 0.000 claims abstract description 87
- 230000002787 reinforcement Effects 0.000 claims abstract description 59
- 238000011156 evaluation Methods 0.000 claims abstract description 45
- 230000006870 function Effects 0.000 claims abstract description 28
- 238000001514 detection method Methods 0.000 claims abstract description 10
- 230000008569 process Effects 0.000 claims abstract description 9
- 238000012216 screening Methods 0.000 claims abstract description 7
- 230000009471 action Effects 0.000 claims description 42
- 230000001133 acceleration Effects 0.000 claims description 34
- 238000012549 training Methods 0.000 claims description 11
- 238000010187 selection method Methods 0.000 claims description 5
- 238000005457 optimization Methods 0.000 description 3
- 230000008447 perception Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Steering Control In Accordance With Driving Conditions (AREA)
Abstract
The invention relates to an automobile automatic driving movement planning method and system based on a learning sampling type, which comprises the following steps: establishing a vehicle kinematic model; initializing an Open table and a Closed table; calculating the evaluation value of each forward simulation track, and selecting the track with the highest evaluation value as a regular optimal track; performing Q value function estimation on the forward simulation track, and selecting the track with the maximum Q value as a reinforcement learning track; selecting an initial optimal track from the regular optimal track and the reinforcement learning track, and storing the initial optimal track into a Closed table; screening a non-collision forward simulation track by using a collision detection method, and storing the non-collision forward simulation track into an Open table; calculating the evaluation value of each forward simulation track, selecting the forward simulation track with the highest evaluation value as a candidate optimal track, and storing the candidate optimal track in a Closed table; ending the motion planning process when the candidate optimal track end point is within the end point range required by the motion planning; and connecting the initial optimal trajectory and the candidate optimal trajectory in the Closed table to form a final planning trajectory.
Description
Technical Field
The invention relates to the field of intelligent vehicles, in particular to an automobile automatic driving motion planning method and system based on a learning sampling mode.
Background
In recent years, artificial intelligence technology is gradually beginning to be commercialized in the fields of intelligent transportation and vehicles, and intelligent networked vehicles gradually come into the visual field of people. Generally, an automatic driving software system of an intelligent vehicle can be divided into four modules of perception, positioning, decision and control. The motion planning is the most important part in the decision module and determines the decision quality of the intelligent vehicle. Since the control module generally only performs the task of motion/trajectory tracking, the outcome of the motion planning is crucial to the final driving behavior of the vehicle.
Existing motion planning methods can be broadly divided into sampling-based methods, optimization-based methods, and end-to-end learning-based methods. The method based on end-to-end learning establishes the mapping from sensor data to driving actions, but engineering practice and optimization are difficult to carry out due to the black box characteristic of the learning method; the optimization-based method generally depends on lane lines or other prior road information, and the solving time is often difficult to ensure; the sampling-based method is widely applied to the motion planning of automatic driving due to the characteristics that the solving speed is high and the method can adapt to various environmental characteristics.
The sampling-based method generally selects a sampling trajectory or a motion state through a cost function, which is essentially based on the selection of an optimal trajectory/motion state by an artificially set rule, but the artificially set cost function is difficult to adapt to a complex and variable real environment.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide a method and a system for planning an automatic driving motion of an automobile based on a learning sampling mode, which can better consider uncertainty and randomness in the environment and can improve the safety and robustness of the automatic driving motion planning.
In order to achieve the purpose, the invention adopts the following technical scheme: an automobile automatic driving motion planning method based on a learning sampling mode comprises the following steps: s1: establishing a vehicle kinematic model according to vehicle parameters; s2: initializing a storage table of a heuristic motion planning method: an Open table and a Closed table; s3: generating a series of forward simulation tracks from a starting point based on a learning sampling method, calculating an evaluation value of each forward simulation track through a heuristic function, and selecting the track with the highest evaluation value as a regular optimal track; performing Q value function estimation on the forward simulation track by using a reinforcement learning method, and selecting the track with the maximum Q value as a reinforcement learning track; selecting an initial optimal track from the regular optimal track and the reinforcement learning track, storing the initial optimal track into a Closed table, and taking an end point of the initial optimal track as a starting point of subsequent planning; s4: generating a series of forward simulation tracks from a planning starting point based on a heuristic planning method; screening non-collision forward simulation tracks by using a collision detection method, and storing the non-collision forward simulation tracks into an Open table; calculating the evaluation value of each forward simulation track through a heuristic function, selecting the forward simulation track with the highest evaluation value as a candidate optimal track, storing the candidate optimal track and the evaluation value thereof into a Closed table, and taking the candidate optimal track end point as a subsequent planning start point; s5: repeating the step S4 until the candidate optimal trajectory end point in the step S4 is within the end point range required by the motion planning, and ending the motion planning process; s6: and connecting the initial optimal trajectory and the candidate optimal trajectory in the Closed table to form a final planning trajectory.
Further, in step S3, the forward simulation trajectory is generated by using the steering wheel angular acceleration and the accelerator/brake input of the vehicle.
Further, the forward simulation trajectory generation method comprises: determining a simulation step length delta t according to a using scene, and solving through a vehicle kinematic model to obtainAnd updating the positions x and y of the vehicle and the direction theta of the vehicle, continuously iterating, updating the track of the vehicle, and finally obtaining the forward simulation track.
Further, in step S3, before the forward simulation trajectory is stored in the Open table, collision detection is performed on the forward simulation trajectory, it is detected whether the generated forward simulation trajectory collides with the boundary of the obstacle in the sensing result, and if the trajectory collides, the trajectory is directly deleted, and the trajectory that does not collide is stored in the Open table.
Further, in the step S3, a reinforcement learning trajectory is selected based on a reinforcement learning method, specifically, a reinforcement learning method based on a Q learning algorithm.
Further, the method is based on intensive chemistryThe reinforcement learning track selection method of the learning method comprises the following steps: s31, initializing a reinforcement learning algorithm: determining a state space, an action space and a reward function R; s32, establishing a Q network, wherein the Q network stores expected rewards obtained by taking different actions under different states, the rewards are called Q values, and Q value parameters in the Q network are randomly initialized before offline training is started: q(s)t,at) Wherein s istIs in a state of time t, atIs the action at time t; s33, state S from current time ttStarting with a different action atGenerating a track, wherein the action comprises the steering wheel rotation acceleration gamma and the longitudinal acceleration a of the vehicle at the next moment; steering wheel angle and theta at the current time ttCalculating the steering wheel angle sum theta expected at the next time t + delta tt+Δt=θt+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, a trajectory is generated, and the Q value at that time: q(s)t,at) The trajectory generated by the operation having the maximum Q value in the current state is set as the reinforcement learning trajectory.
Further, the selection method of the initial optimal trajectory comprises the following steps: a series of forward simulation tracks are generated by inputting different steering wheel turning angles and accelerator/brake opening degrees into a vehicle dynamics model, and the evaluation values F of the forward simulation tracks are as follows:
F=g+h
wherein g is the distance from the candidate track end point to the starting point, and h is the distance from the candidate track end point to the final track end point;
selecting the track with the minimum evaluation value F from a series of forward simulation tracks as a rule optimal track, and estimating the Q value of the rule optimal track by using a reinforcement learning Q network; and comparing the Q values of the regular optimal track and the reinforcement learning track, and selecting the track with a larger Q value as an initial optimal track.
Further, in the step S4, the current state S is settStarting with a different action atGenerating a trajectory through the steering wheel angle and theta at the current timetCalculating the steering wheel angle sum theta expected at the next momentt+Δt=θt+ γ × Δ t, inputting a desired steering wheel angle and a desired longitudinal acceleration into the vehicle model, and generating a trajectory; the evaluation values of these forward simulated trajectories are:
F=g+h
wherein g is the distance from the candidate track end point to the starting point, and h is the distance from the candidate track end point to the final track end point.
An automotive autopilot motion planning system based on learning sampling, comprising: the system comprises a vehicle kinematics model establishing module, a storage table initializing module, an initial optimal track selecting module, a candidate optimal track selecting module, a motion planning ending module and a final planning track forming module; the vehicle kinematic model building module builds a vehicle kinematic model according to vehicle parameters; the storage table initialization module is used for initializing a storage table of a heuristic motion planning method: an Open table and a Closed table; the initial segment optimal track selection module generates a series of forward simulation tracks from a starting point based on a learning sampling method, calculates the evaluation value of each forward simulation track through a heuristic function, and selects the track with the highest evaluation value as a regular optimal track; performing Q value function estimation on the forward simulation track by using a reinforcement learning method, and selecting the track with the maximum Q value as a reinforcement learning track; selecting an initial optimal track from the regular optimal track and the reinforcement learning track, storing the initial optimal track into a Closed table, and taking an end point of the initial optimal track as a starting point of subsequent planning; the candidate optimal track selection module generates a series of forward simulation tracks from a planning starting point based on a heuristic planning method; screening non-collision forward simulation tracks by using a collision detection method, and storing the non-collision forward simulation tracks into an Open table; calculating the evaluation value of each forward simulation track through a heuristic function, selecting the forward simulation track with the highest evaluation value as a candidate optimal track, storing the candidate optimal track and the evaluation value thereof into a Closed table, and taking the candidate optimal track end point as a subsequent planning start point; the motion planning ending module ends the motion planning process until the candidate optimal trajectory end point is within the end point range required by the motion planning; and the final planning track forming module connects the initial optimal track and the candidate optimal track in the Closed table to form a final planning track.
Further, in the initial segment optimal trajectory selection module, the reinforcement learning trajectory selection based on the reinforcement learning method includes the following steps: s1, initializing a reinforcement learning algorithm: determining a state space, an action space and a reward function R; s2, establishing a Q network, wherein the Q network stores expected rewards obtained by taking different actions under different states, the rewards are called Q values, and Q value parameters in the Q network are randomly initialized before offline training is started: q(s)t,at) Wherein s istIs in a state of time t, atIs the action at time t; s3, state S from current time ttStarting with a different action atGenerating a track, wherein the action comprises the steering wheel rotation acceleration gamma and the longitudinal acceleration a of the vehicle at the next moment; steering wheel angle and theta at the current time ttCalculating the steering wheel angle sum theta expected at the next time t + delta tt+Δt=θt+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, a trajectory is generated, and the Q value at that time: q(s)t,at) The trajectory generated by the operation having the maximum Q value in the current state is set as the reinforcement learning trajectory.
Due to the adoption of the technical scheme, the invention has the following advantages: the method combines the vehicle kinematics model, and replaces the traditional rule-based candidate track selection mode by adding the reinforcement learning module in the sampling type motion planning method based on the vehicle kinematics model, so that the track selection is more reasonable, the uncertainty and the randomness in the environment can be better considered, and the safety and the robustness of the automatic driving motion planning can be improved.
Drawings
FIG. 1 is a schematic overall flow diagram of the process of the present invention;
FIG. 2 is a simplified vehicle dynamics model schematic of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and examples.
As shown in fig. 1, the present invention provides a learning sampling based vehicle automatic driving motion planning method, which combines reinforcement learning, receives automatic driving perception results, i.e. information such as the position and speed of an obstacle in the surrounding environment, and outputs a series of actions that an intelligent vehicle can perform under the constraint of the starting point and the ending point.
The method specifically comprises the following steps:
s1: establishing a vehicle kinematic model according to vehicle parameters;
the vehicle parameters include vehicle size, minimum turning radius, maximum acceleration/deceleration, and the like.
S2: initializing a storage table of a heuristic motion planning method: open and Closed tables.
S3: generating a series of forward simulation tracks from a starting point based on a learning sampling method, calculating the evaluation value of each forward simulation track through a heuristic function, and selecting the track with the highest evaluation value from the forward simulation tracks as a regular optimal track; and performing Q value function estimation on the forward simulation track by using a reinforcement learning method, and selecting the track with the maximum Q value as a reinforcement learning track. And selecting the initial optimal track from the regular optimal track and the reinforcement learning track, storing the initial optimal track into a Closed table, and taking the end point of the initial optimal track as the starting point of subsequent planning.
S4: based on a heuristic planning method, a series of forward simulation tracks are generated from a planning starting point: firstly, screening non-collision forward simulation tracks by using a collision detection method, and storing the non-collision forward simulation tracks into an Open table; and calculating the evaluation value of each forward simulation track through a heuristic function. And then selecting the forward simulation track with the highest evaluation value from the Open table as a candidate optimal track, storing the candidate optimal track and the evaluation value thereof into a Closed table, and taking the candidate optimal track end point as a subsequent planning start point.
S5: and repeating the step S4 until the candidate optimal track end point in the step S4 is within the end point range required by the motion planning, and ending the motion planning process.
S6: and connecting the initial optimal trajectory and the candidate optimal trajectory in the Closed table to form a final planning trajectory.
In step S1, the vehicle kinematics model may be the simplest two-degree-of-freedom bicycle model, and as shown in fig. 2, the kinematics model may be described as:
establishing an xOy coordinate system, (x and y) are coordinates of the vehicle under the coordinate system, L is a distance between a front axle and a rear axle of a wheel, theta is a vehicle orientation angle, delta is a steering wheel corner, gamma is a steering wheel angular acceleration of the vehicle, and a is a steering wheel longitudinal acceleration of the vehicle. v represents the speed of the vehicle and is,respectively, the derivative values of the respective variables.
In step S1, other vehicle kinematics or dynamics models may be established, and only the model needs to be input as the vehicle accelerator/brake opening, the vehicle steering wheel angle, and the output as the vehicle forward simulation trajectory.
In step S2, the Open table and the Closed table have the same format, and store the trajectory generated in step S3 and the evaluation value corresponding to the trajectory, and the evaluation value is a digital value larger than 0.
In step S3, the forward simulation trajectory is generated by using the vehicle steering wheel angular acceleration and the accelerator/brake input, and the specific generation method is as follows:
determining a simulation step length delta t according to a using scene, and solving through a vehicle kinematic model to obtainAnd updating the vehicle position x, y and the vehicle direction theta is realized, and the vehicle track can be updated by continuous iteration, so that the forward simulation track is finally obtained.
For example, γ and a are the steering wheel angular accelerations of the vehicle, respectively, as shown in the vehicle kinematics model in step S1Degrees and longitudinal acceleration. The vehicle longitudinal acceleration may be derived directly from throttle/brake input. In the formula, x, y, theta, delta and v are known numbers, and simultaneous solution can be obtainedThrough the following formula, the updating of the vehicle position x, y and the vehicle direction theta can be realized, and the updating of the vehicle track can be realized through continuous iteration.
In step S3, before storing the forward simulation trajectory in the Open table, collision detection is performed on the forward simulation trajectory, and it is detected whether the generated forward simulation trajectory collides with the boundary of the obstacle in the sensing result. And if the tracks collide, directly deleting the tracks, and storing the non-colliding tracks into an Open table.
In step S3, selecting a reinforcement learning trajectory based on a reinforcement learning method, specifically, reinforcement learning based on a Q learning algorithm;
the reinforcement learning track selection method based on the reinforcement learning method comprises the following steps:
s31, initializing a reinforcement learning algorithm: determining a state space, an action space and a reward function R:
the state space includes information of 3 obstacles closest to the host vehicle in the environment and the current state of the trajectory. Wherein the obstacle information includes a position and a speed (x) of the obstaclen,yn,vxn,vyn),xn,ynCoordinates in the xOy coordinate system, v, of the nth obstacle, respectivelyxn,vynThe speed of the obstacle in the x and y directions in the xOy coordinate system, respectively.
The track current state comprises the coordinates x of the current pointt,ytDesired velocity vxt,vytDesired track state time t, vehicle throttle/brake opening for track state, vehicle steering wheel angle. The action space comprises the steering wheel rotation acceleration gamma and the longitudinal acceleration a of the vehicle at the next moment;
in the off-line training process, the reward function R of the reinforcement learning method needs to be determined:
R=A-B
wherein, A is the reward of successfully reaching the terminal when the vehicle executes the final calculated track, B is the collision penalty, and when the vehicle executes the final calculated track, if the collision happens, the penalty is obtained.
S32, establishing a Q network, wherein the Q network stores expected rewards obtained by taking different actions under different states, the rewards are called Q values, and Q value parameters in the Q network are randomly initialized before offline training is started: q(s)t,at) Wherein s istIs in a state of time t, atThe action at time t.
S33, state S from current time ttStarting with a different action atA trajectory may be generated. As can be seen from step S31, the action includes the steering wheel rotation acceleration γ and the longitudinal acceleration a of the vehicle at the next time. Steering wheel angle and theta at the current time ttThe steering wheel angle and θ desired at the next time t + Δ t can be calculatedt+Δt=θt+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, i.e., the trajectory can be generated. The Q value at this time is: q(s)t,at) As the Q value of the generated trajectory. And taking the track generated by the action with the maximum Q value in the current state as the reinforcement learning track.
Meanwhile, a series of forward simulation tracks are generated by inputting different steering wheel turning angles and accelerator/brake opening degrees into the vehicle dynamics model, and the evaluation values F of the forward simulation tracks are as follows:
F=g+h
wherein g is the distance from the candidate track end point to the starting point, and h is the distance from the candidate track end point to the final track end point.
And selecting the track with the minimum evaluation value F from a series of forward simulation tracks as a regular optimal track. And estimating the Q value of the regular optimal track by using a reinforcement learning Q network. And comparing the Q values of the regular optimal track and the reinforcement learning track, and selecting the track with a larger Q value as an initial optimal track.
S34, during off-line training, different common strengths can be used by collecting training dataThe learning training method calculates s in each statetTake different actions atExpectation of future rewards gained Q(s)t,at) And taking the Q values of different states and different actions as training data, and updating parameters of the Q network by using a gradient descent method.
In step S4, from the current state StStarting with a different action atA trajectory may be generated. As can be seen from step S31, the action includes the steering wheel rotation acceleration γ and the longitudinal acceleration a of the vehicle at the next time. Steering wheel angle and theta through current timetThe steering wheel angle and theta desired at the next time can be calculatedt+Δt=θt+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, i.e., the trajectory can be generated. The evaluation values of these forward simulated trajectories are:
F=g+h
wherein g is the distance from the candidate track end point to the starting point, and h is the distance from the candidate track end point to the final track end point.
In step S5, if the number of times of repeating step S4 exceeds the predetermined number of times and no solution is found, the motion planning outputs a planning failure result.
The invention also provides an automobile automatic driving motion planning system based on the learning sampling type, which comprises a vehicle kinematics model establishing module, a storage table initializing module, an initial optimal track selecting module, a candidate optimal track selecting module, a motion planning finishing module and a final planning track forming module;
the vehicle kinematic model building module builds a vehicle kinematic model according to the vehicle parameters;
the storage table initialization module is used for initializing a storage table of the heuristic motion planning method: an Open table and a Closed table;
the initial optimal track selection module generates a series of forward simulation tracks from a starting point based on a learning sampling method, calculates the evaluation value of each forward simulation track through a heuristic function, and selects the track with the highest evaluation value as a regular optimal track; performing Q value function estimation on the forward simulation track by using a reinforcement learning method, and selecting the track with the maximum Q value as a reinforcement learning track; selecting an initial optimal track from the regular optimal track and the reinforcement learning track, storing the initial optimal track into a Closed table, and taking an end point of the initial optimal track as a starting point of subsequent planning;
the candidate optimal trajectory selection module generates a series of forward simulation trajectories from a planning starting point based on a heuristic planning method; screening non-collision forward simulation tracks by using a collision detection method, and storing the non-collision forward simulation tracks into an Open table; calculating the evaluation value of each forward simulation track through a heuristic function, selecting the forward simulation track with the highest evaluation value as a candidate optimal track, storing the candidate optimal track and the evaluation value thereof into a Closed table, and taking the candidate optimal track end point as a subsequent planning start point;
the motion planning ending module is used for ending the motion planning process until the candidate optimal track end point is in the end point range required by the motion planning;
and the final planning track forming module connects the initial optimal track and the candidate optimal track in the Closed table to form a final planning track.
In the above embodiment, in the initial optimal trajectory selection module, the reinforcement learning trajectory selection based on the reinforcement learning method includes the following steps:
s1, initializing a reinforcement learning algorithm: determining a state space, an action space and a reward function R;
s2, establishing a Q network, wherein the Q network stores expected rewards obtained by taking different actions under different states, the rewards are called Q values, and Q value parameters in the Q network are randomly initialized before offline training is started: q(s)t,at) Wherein s istIs in a state of time t, atIs the action at time t;
s3, state S from current time ttStarting with a different action atGenerating a track, wherein the action comprises the steering wheel rotation acceleration gamma and the longitudinal acceleration a of the vehicle at the next moment; steering wheel angle and theta at the current time ttCalculating the next time t + Deltat expectationSteering wheel angle of and thetat+Δt=θt+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, a trajectory is generated, and the Q value at that time: q(s)t,at) The trajectory generated by the operation having the maximum Q value in the current state is set as the reinforcement learning trajectory.
The above embodiments are only for illustrating the present invention, and the steps may be changed, and on the basis of the technical solution of the present invention, the modification and equivalent changes of the individual steps according to the principle of the present invention should not be excluded from the protection scope of the present invention.
Claims (7)
1. An automobile automatic driving motion planning method based on a learning sampling mode is characterized by comprising the following steps:
s1: establishing a vehicle kinematic model according to vehicle parameters;
s2: initializing a storage table of a heuristic motion planning method: an Open table and a Closed table;
s3: generating a series of forward simulation tracks from a starting point based on a learning sampling method, calculating an evaluation value of each forward simulation track through a heuristic function, and selecting the track with the highest evaluation value as a regular optimal track; performing Q value function estimation on the forward simulation track by using a reinforcement learning method, and selecting the track with the maximum Q value as a reinforcement learning track; selecting an initial optimal track from the regular optimal track and the reinforcement learning track, storing the initial optimal track into a Closed table, and taking an end point of the initial optimal track as a starting point of subsequent planning;
s4: generating a series of forward simulation tracks from a planning starting point based on a heuristic planning method; screening non-collision forward simulation tracks by using a collision detection method, and storing the non-collision forward simulation tracks into an Open table; calculating the evaluation value of each forward simulation track through a heuristic function, selecting the forward simulation track with the highest evaluation value as a candidate optimal track, storing the candidate optimal track and the evaluation value thereof into a Closed table, and taking the candidate optimal track end point as a subsequent planning start point;
s5: repeating the step S4 until the candidate optimal trajectory end point in the step S4 is within the end point range required by the motion planning, and ending the motion planning process;
s6: connecting the initial optimal trajectory and the candidate optimal trajectory in the Closed table to form a final planning trajectory;
in the step S3, a reinforcement learning trajectory is selected based on a reinforcement learning method, specifically, a reinforcement learning method based on a Q learning algorithm;
the reinforcement learning track selection method based on the reinforcement learning method comprises the following steps:
s31, initializing a reinforcement learning algorithm: determining a state space, an action space and a reward function R;
s32, establishing a Q network, wherein the Q network stores expected rewards obtained by taking different actions under different states, the rewards are called Q values, and Q value parameters in the Q network are randomly initialized before offline training is started: q(s)t,at) Wherein s istIs in a state of time t, atIs the action at time t;
s33, state S from current time ttStarting with a different action atGenerating a track, wherein the action comprises the steering wheel rotation acceleration gamma and the longitudinal acceleration a of the vehicle at the next moment; steering wheel angle and theta at the current time ttCalculating the steering wheel angle sum theta expected at the next time t + delta tt+Δt=θt+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, a trajectory is generated, and the Q value at that time: q(s)t,at) Taking the track generated by the action with the maximum Q value in the current state as a reinforced learning track; where Δ t is the simulation step size.
2. The automotive autonomous driving motion planning method of claim 1, wherein: in step S3, the forward simulation trajectory is generated using the steering wheel angular acceleration and the accelerator/brake input of the vehicle.
3. The automotive autonomous driving motion planning method of claim 2, wherein: the forward simulation track generation method comprises the following steps: determining a simulation step length delta t according to a using scene, and solving through a vehicle kinematic model to obtainAnd updating the positions x and y of the vehicle and the direction theta of the vehicle, continuously iterating, updating the track of the vehicle, and finally obtaining the forward simulation track.
4. The automotive autonomous driving motion planning method of claim 1, wherein: in step S3, before the forward simulation trajectory is stored in the Open table, collision detection is performed on the forward simulation trajectory, it is detected whether the generated forward simulation trajectory collides with the boundary of the obstacle in the sensing result, and if the trajectory collides, the trajectory is directly deleted, and the trajectory that does not collide is stored in the Open table.
5. The automotive autonomous driving motion planning method of claim 1, wherein: the selection method of the initial optimal track comprises the following steps:
a series of forward simulation tracks are generated by inputting different steering wheel turning angles and accelerator/brake opening degrees into a vehicle dynamics model, and the evaluation values F of the forward simulation tracks are as follows:
F=g+h
wherein g is the distance from the candidate track end point to the starting point, and h is the distance from the candidate track end point to the final track end point;
selecting the track with the minimum evaluation value F from a series of forward simulation tracks as a rule optimal track, and estimating the Q value of the rule optimal track by using a reinforcement learning Q network; and comparing the Q values of the regular optimal track and the reinforcement learning track, and selecting the track with a larger Q value as an initial optimal track.
6. The automotive autonomous driving motion planning method of claim 1, wherein: in the step S4, the current state S is settStarting with a different action atGenerating a trajectory through the steering wheel angle and theta at the current timetCalculating the steering wheel angle sum theta expected at the next momentt+Δt=θt+ γ × Δ t, inputting a desired steering wheel angle and a desired longitudinal acceleration into the vehicle model, and generating a trajectory; the evaluation values of these forward simulated trajectories are:
F=g+h
wherein g is the distance from the candidate track end point to the starting point, and h is the distance from the candidate track end point to the final track end point.
7. An automobile automatic driving motion planning system based on learning sampling type is characterized by comprising: the system comprises a vehicle kinematics model establishing module, a storage table initializing module, an initial optimal track selecting module, a candidate optimal track selecting module, a motion planning ending module and a final planning track forming module;
the vehicle kinematic model building module builds a vehicle kinematic model according to vehicle parameters;
the storage table initialization module is used for initializing a storage table of a heuristic motion planning method: an Open table and a Closed table;
the initial segment optimal track selection module generates a series of forward simulation tracks from a starting point based on a learning sampling method, calculates the evaluation value of each forward simulation track through a heuristic function, and selects the track with the highest evaluation value as a regular optimal track; performing Q value function estimation on the forward simulation track by using a reinforcement learning method, and selecting the track with the maximum Q value as a reinforcement learning track; selecting an initial optimal track from the regular optimal track and the reinforcement learning track, storing the initial optimal track into a Closed table, and taking an end point of the initial optimal track as a starting point of subsequent planning;
the candidate optimal track selection module generates a series of forward simulation tracks from a planning starting point based on a heuristic planning method; screening non-collision forward simulation tracks by using a collision detection method, and storing the non-collision forward simulation tracks into an Open table; calculating the evaluation value of each forward simulation track through a heuristic function, selecting the forward simulation track with the highest evaluation value as a candidate optimal track, storing the candidate optimal track and the evaluation value thereof into a Closed table, and taking the candidate optimal track end point as a subsequent planning start point;
the motion planning ending module ends the motion planning process until the candidate optimal trajectory end point is within the end point range required by the motion planning;
the final planning track forming module connects the initial optimal track and the candidate optimal track in the Closed table to form a final planning track;
in the initial segment optimal trajectory selection module, the reinforcement learning trajectory selection based on the reinforcement learning method comprises the following steps:
s1, initializing a reinforcement learning algorithm: determining a state space, an action space and a reward function R;
s2, establishing a Q network, wherein the Q network stores expected rewards obtained by taking different actions under different states, the rewards are called Q values, and Q value parameters in the Q network are randomly initialized before offline training is started: q(s)t,at) Wherein s istIs in a state of time t, atIs the action at time t;
s3, state S from current time ttStarting with a different action atGenerating a track, wherein the action comprises the steering wheel rotation acceleration gamma and the longitudinal acceleration a of the vehicle at the next moment; steering wheel angle and theta at the current time ttCalculating the steering wheel angle sum theta expected at the next time t + delta tt+Δt=θt+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, a trajectory is generated, and the Q value at that time: q(s)t,at) Taking the track generated by the action with the maximum Q value in the current state as a reinforced learning track; where Δ t is the simulation step size.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010236474.1A CN111413974B (en) | 2020-03-30 | 2020-03-30 | Automobile automatic driving motion planning method and system based on learning sampling type |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010236474.1A CN111413974B (en) | 2020-03-30 | 2020-03-30 | Automobile automatic driving motion planning method and system based on learning sampling type |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111413974A CN111413974A (en) | 2020-07-14 |
CN111413974B true CN111413974B (en) | 2021-03-30 |
Family
ID=71494691
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010236474.1A Active CN111413974B (en) | 2020-03-30 | 2020-03-30 | Automobile automatic driving motion planning method and system based on learning sampling type |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111413974B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113239986B (en) * | 2021-04-25 | 2023-04-18 | 浙江吉利控股集团有限公司 | Training method and device for vehicle track evaluation network model and storage medium |
CN114564016A (en) * | 2022-02-24 | 2022-05-31 | 江苏大学 | Navigation obstacle avoidance control method, system and model combining path planning and reinforcement learning |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105139072A (en) * | 2015-09-09 | 2015-12-09 | 东华大学 | Reinforcement learning algorithm applied to non-tracking intelligent trolley barrier-avoiding system |
CN106595671A (en) * | 2017-02-22 | 2017-04-26 | 南方科技大学 | Unmanned aerial vehicle path planning method and device based on reinforcement learning |
CN107194612A (en) * | 2017-06-20 | 2017-09-22 | 清华大学 | A kind of train operation dispatching method learnt based on deeply and system |
CN107423761A (en) * | 2017-07-24 | 2017-12-01 | 清华大学 | Feature based selects and the rail locomotive energy saving optimizing method of operating of machine learning |
CN108153585A (en) * | 2017-12-01 | 2018-06-12 | 北京大学 | A kind of method and apparatus of the operational efficiency based on locality expression function optimization MapReduce frames |
CN109521774A (en) * | 2018-12-27 | 2019-03-26 | 南京芊玥机器人科技有限公司 | A kind of spray robot track optimizing method based on intensified learning |
CN109598934A (en) * | 2018-12-13 | 2019-04-09 | 清华大学 | A kind of rule-based method for sailing out of high speed with learning model pilotless automobile |
CN109726676A (en) * | 2018-12-28 | 2019-05-07 | 苏州大学 | The planing method of automated driving system |
CN109936865A (en) * | 2018-06-30 | 2019-06-25 | 北京工业大学 | A kind of mobile sink paths planning method based on deeply learning algorithm |
CN110083160A (en) * | 2019-05-16 | 2019-08-02 | 哈尔滨工业大学(深圳) | A kind of method for planning track of robot based on deep learning |
CN110472738A (en) * | 2019-08-16 | 2019-11-19 | 北京理工大学 | A kind of unmanned boat Real Time Obstacle Avoiding algorithm based on deeply study |
KR102055141B1 (en) * | 2018-12-31 | 2019-12-12 | 한국기술교육대학교 산학협력단 | System for remote controlling of devices based on reinforcement learning |
CN110666793A (en) * | 2019-09-11 | 2020-01-10 | 大连理工大学 | Method for realizing robot square part assembly based on deep reinforcement learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
BR112018074062A2 (en) * | 2016-11-18 | 2019-03-06 | Eyedaptic, Inc. | improved assistive systems and augmented reality visual tools |
-
2020
- 2020-03-30 CN CN202010236474.1A patent/CN111413974B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105139072A (en) * | 2015-09-09 | 2015-12-09 | 东华大学 | Reinforcement learning algorithm applied to non-tracking intelligent trolley barrier-avoiding system |
CN106595671A (en) * | 2017-02-22 | 2017-04-26 | 南方科技大学 | Unmanned aerial vehicle path planning method and device based on reinforcement learning |
CN107194612A (en) * | 2017-06-20 | 2017-09-22 | 清华大学 | A kind of train operation dispatching method learnt based on deeply and system |
CN107423761A (en) * | 2017-07-24 | 2017-12-01 | 清华大学 | Feature based selects and the rail locomotive energy saving optimizing method of operating of machine learning |
CN108153585A (en) * | 2017-12-01 | 2018-06-12 | 北京大学 | A kind of method and apparatus of the operational efficiency based on locality expression function optimization MapReduce frames |
CN109936865A (en) * | 2018-06-30 | 2019-06-25 | 北京工业大学 | A kind of mobile sink paths planning method based on deeply learning algorithm |
CN109598934A (en) * | 2018-12-13 | 2019-04-09 | 清华大学 | A kind of rule-based method for sailing out of high speed with learning model pilotless automobile |
CN109521774A (en) * | 2018-12-27 | 2019-03-26 | 南京芊玥机器人科技有限公司 | A kind of spray robot track optimizing method based on intensified learning |
CN109726676A (en) * | 2018-12-28 | 2019-05-07 | 苏州大学 | The planing method of automated driving system |
KR102055141B1 (en) * | 2018-12-31 | 2019-12-12 | 한국기술교육대학교 산학협력단 | System for remote controlling of devices based on reinforcement learning |
CN110083160A (en) * | 2019-05-16 | 2019-08-02 | 哈尔滨工业大学(深圳) | A kind of method for planning track of robot based on deep learning |
CN110472738A (en) * | 2019-08-16 | 2019-11-19 | 北京理工大学 | A kind of unmanned boat Real Time Obstacle Avoiding algorithm based on deeply study |
CN110666793A (en) * | 2019-09-11 | 2020-01-10 | 大连理工大学 | Method for realizing robot square part assembly based on deep reinforcement learning |
Non-Patent Citations (2)
Title |
---|
Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem;Damien Ernst;《IEEE》;20091231;第517-529页 * |
基于深度强化学习的自动驾驶策略学习方法;夏伟,等;《集成技术》;20170521(第3期);第29-35页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111413974A (en) | 2020-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Human-like autonomous vehicle speed control by deep reinforcement learning with double Q-learning | |
CN110136481B (en) | Parking strategy based on deep reinforcement learning | |
CN107063280B (en) | Intelligent vehicle path planning system and method based on control sampling | |
Rosolia et al. | Autonomous racing using learning model predictive control | |
CN112162555B (en) | Vehicle control method based on reinforcement learning control strategy in hybrid vehicle fleet | |
CN112356830A (en) | Intelligent parking method based on model reinforcement learning | |
CN110969848A (en) | Automatic driving overtaking decision method based on reinforcement learning under opposite double lanes | |
CN111098852A (en) | Parking path planning method based on reinforcement learning | |
Rubies-Royo et al. | A classification-based approach for approximate reachability | |
CN114312830B (en) | Intelligent vehicle coupling decision model and method considering dangerous driving conditions | |
CN114564016A (en) | Navigation obstacle avoidance control method, system and model combining path planning and reinforcement learning | |
CN111679660B (en) | Unmanned deep reinforcement learning method integrating human-like driving behaviors | |
CN112286218B (en) | Aircraft large-attack-angle rock-and-roll suppression method based on depth certainty strategy gradient | |
CN115257745A (en) | Automatic driving lane change decision control method based on rule fusion reinforcement learning | |
CN110861651B (en) | Method for estimating longitudinal and lateral motion states of front vehicle | |
CN111413974B (en) | Automobile automatic driving motion planning method and system based on learning sampling type | |
CN116679719A (en) | Unmanned vehicle self-adaptive path planning method based on dynamic window method and near-end strategy | |
CN113296523A (en) | Mobile robot obstacle avoidance path planning method | |
Firl et al. | Probabilistic Maneuver Prediction in Traffic Scenarios. | |
CN115542733A (en) | Self-adaptive dynamic window method based on deep reinforcement learning | |
CN117222915A (en) | System and method for tracking an expanded state of a moving object using a composite measurement model | |
CN115743178A (en) | Automatic driving method and system based on scene self-adaptive recognition | |
CN113391633A (en) | Urban environment-oriented mobile robot fusion path planning method | |
CN116486356A (en) | Narrow scene track generation method based on self-adaptive learning technology | |
Chen et al. | Automatic overtaking on two-way roads with vehicle interactions based on proximal policy optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |