CN113848880B

CN113848880B - Agricultural machinery path optimization method based on improved Q-learning

Info

Publication number: CN113848880B
Application number: CN202111006894.1A
Authority: CN
Inventors: 董笑辰; 陶斯友; 纪铁生
Original assignee: CRRC Dalian R&D Co Ltd
Current assignee: CRRC Dalian R&D Co Ltd
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2023-12-22
Anticipated expiration: 2041-08-30
Also published as: CN113848880A

Abstract

The invention discloses an agricultural machinery path optimization method based on improved Q-learning, which comprises the following steps: s1: determining initial parameters of path planning; s2: translating the original field boundary to the inside of the field boundary by a distance L; s3: calculating the minimum span of the working area of the agricultural machine; s4: generating parallel paths of an agricultural machine working area; s5: calculating the length of the turning path; s6: the global path is optimized based on the modified Q-learning algorithm. According to the invention, the original field is calculated to rotate by the optimal rotation angle, and then the parallel path of the boundary of the agricultural machine working area parallel to the rotation by the optimal rotation angle is generated, and the parallel path coincides with one edge of the field, so that the calculation is greatly simplified, the global path is optimized based on an improved Q-learning algorithm, and the minimum total length of the agricultural machine work is determined. The planned global path of the agricultural machinery is shortest, and the aim of improving the working efficiency is fulfilled.

Description

Agricultural machinery path optimization method based on improved Q-learning

Technical Field

The invention relates to the technical field of agricultural machinery path optimization, in particular to an agricultural machinery path optimization method based on improved Q-learning.

Background

Depending on the operating characteristics of the agricultural machine, the path of travel of the agricultural machine through the work area is typically a straight path and requires coverage of the entire work area with minimal repetition. The turning path connecting two adjacent straight paths is generally determined by the relationship between the distance between the adjacent straight paths and the turning radius of the agricultural machinery, and the distances of different turning paths are often different. Because turning can seriously affect the working efficiency compared with straight running and the turning process of the agricultural machine can be approximately regarded as uniform speed, the working efficiency of the agricultural machine can be improved by reducing the number of turning times and optimizing the turning path.

For simple convex polygon fields, all straight paths can be connected through simple turning paths, and the main factor affecting the length of the agricultural machine path is the connection sequence of the straight paths. Thus, the global path can be optimized by adjusting its order given the straight path, which can further translate the problem of agricultural path optimization into a hybrid optimization problem. The hybrid optimization problem is a problem that NP is difficult, the traditional dynamic programming method, the backtracking method and the like have large calculation amount through trial-and-error methods, and a globally optimal solution is not found well.

For complex fields with complex shapes or barriers in the middle, if the linear paths in the whole area are planned according to the same direction, the repeated area and the missing area are easily increased, and the working efficiency is reduced. In addition, turning paths can become complex, presenting difficulties for both path planning and path tracking.

Disclosure of Invention

The invention provides an agricultural machinery path optimization method based on improved Q-learning, which aims to solve the technical problems that the traditional dynamic planning method, backtracking method and the like are large in calculation amount through trial-and-error methods, and a globally optimal solution is not found well.

In order to achieve the above object, the technical scheme of the present invention is as follows:

an agricultural machinery path optimization method based on improved Q-learning comprises the following steps:

s1: determining initial parameters of path planning, wherein the initial parameters comprise an original field boundary point set P, the width w of each row of scanning of the agricultural machinery and the minimum turning radius R of the agricultural machinery;

s2: translating the original field boundary point set P to the inside of the field boundary by a distance L to determine the boundary of an agricultural machine working area;

s3: establishing an x-y axis rectangular coordinate system, and calculating the minimum span of the agricultural machine working area to determine the optimal rotation angle of the boundary of the original field relative to the x axis;

s4: generating parallel paths of boundaries when the working area of the agricultural machine is parallel to the optimal rotation angle so as to determine straight paths of the agricultural machine;

s5: determining the type of the turning path and calculating the length of the turning path;

s6: the agricultural work area path is optimized based on a modified Q-learning algorithm to determine an optimal overall length of agricultural work.

Further, in the step S3, the method for calculating the minimum span of the working area of the agricultural machine is as follows:

when the boundary of the working area of the agricultural machine is a convex polygon, the number of times n of turning the agricultural machine is as follows:

where D is the distance from a boundary of the agricultural work area to the apex of the agricultural work area,

y＝y _min ,x∈[x _min ,x _max ] (2)

wherein y is _min Is the minimum value of the boundary of the working area of the agricultural machinery on the y axis; y is _max Is the maximum value of the boundary of the working area of the agricultural machinery on the y axis; x is x _min Is the minimum value of the boundary of the working area of the agricultural machinery on the x axis; x is x _max Is the maximum value of the boundary of the working area of the agricultural machinery on the x axis;

the rotation angle of each time of the original field is as follows:

in the formula, [ x ] ₁ ,y ₁ ]Is the starting point coordinates of the side parallel to the x axis in the boundary of the working area of the agricultural machinery; [ x ] ₂ ,y ₂ ]Is the end point coordinates, theta, of the side parallel to the x-axis in the boundary of the working area of the agricultural machine _t Is a rotation angle, and the rotation angle when the span D is minimum is the optimal rotation angle theta of the working area of the agricultural machine after multiple rotations ^* 。

Further, in the step S4, the method for determining the straight path of the agricultural machine includes:

s41: to be optimalThe rotation angle is theta ^* Is used as a scanning line to translate towards the interior of the agricultural machine working area, and each translation w; calculating the number of intersection points of the scanning line and the boundary of the agricultural machine working area after each translation;

s42: if the number of the intersection points is 2, the coordinates of the two intersection points are recorded in the boundary of the agricultural machinery working area, and the scanning is continued; if the number of the intersection points is 1 or 0, judging that the intersection points exceed the boundary range of the working area of the agricultural machinery, stopping scanning, and completing the generation process of the parallel paths.

Further, the turning path in S5 includes a semicircle type, a fish tail type and a pi type;

if the distance between adjacent straight paths is equal to twice the turning radius, namely w=2r, the turning path is semicircular, and the length of the turning path is pi×r;

if the distance between adjacent straight paths is smaller than twice the turning radius, namely w is smaller than 2R; the turning path is a fish tail type; the linear distance that needs to be travelled by the agricultural machine at this time is:

l _r ＝2R-w (4)

wherein, I _r Is the straight line distance required to travel when the agricultural machinery turns on the fishtail turning path; r is the turning radius; the length of the turning path is (2+pi) multiplied by R-w;

if the distance between adjacent straight paths is larger than twice the turning radius, namely w is larger than 2R; the turning path is pi-shaped; the linear distance that needs to be travelled by the agricultural machine at this time is:

l _f ＝w-2R (5)

wherein, I _f Is the straight line distance required to travel when the agricultural machinery turns on the pi-shaped turning path; the length of the turning path at this time is (pi-2) ×R+w.

Further, the method for optimizing the global path based on the Q-learning algorithm in the step S6 is as follows:

s61: defining and initializing a Q value table to start calculation;

s62: defining a state Flag table Flag to store a state quantity of whether each straight line path is connected;

s63: randomly selecting an initial path of the agricultural machine to determine a linear path when the agricultural machine starts working;

s64: judging whether the straight paths are connected;

s65: if the straight line paths are connected, directly judging the convergence condition of the Q value table;

if the straight line path is not connected, selecting a next action set based on the current state of the agricultural machinery, and calculating the rewarding value of the next action set to acquire the next action when the rewarding value is maximum; updating the Q value table and the state Flag table Flag; then judging the convergence condition of the Q value table;

s66: if the calculated Q value table is converged, the calculation is ended;

if the calculated Q value table is not converged, judging whether a convergence element exists in the current Q value table;

s67: if the current Q value table is judged to have no convergence element, repeating S65-S66;

if the current Q value table has the convergence element, the state is latched, the Q value table is updated, and the step S66 is repeated.

Further, the calculation method for determining whether the straight path is connected in S64 is as follows:

wherein: f(s) _n ) Is a state quantity of whether each path is connected.

Further, in the step S65, the Q function established by determining the convergence status of the Q table is:

Q(s _c ,a′ _c )＝Q(s _l ,a′ _l )+γ*max(r(s _c ,f(s _c ,δ))) (7)

wherein Q(s) _c ,a′ _c ) Is based on the current state s of the agricultural machinery _c Action a 'corresponding to maximum awards for current state of agricultural machinery' _c Q, s of (C) _l Is the last state of the current state of the agricultural machinery; a' _l The action corresponding to the maximum awarded by the last state of the current state of the agricultural machine; a discount factor of gamma;r(s _c ,s _n ) Is from the current state s of the agricultural machinery _c To the next state s of the current state of the agricultural machine _n Is a reward function of f(s) _c Delta) represents a current state s based on the agricultural machinery _c And at the current state s of the agricultural machine _c A state set of the time-selectable action set δ, namely:

wherein,is the first alternative, and m is the number of alternative actions.

Further, the bonus function for calculating the bonus value in the step S65 is set up as follows:

wherein D(s) _c ,s _n ) Is from the current state of the agricultural machinery s _c To the next state s of the current state of the agricultural machine _n Is a length of a turning path; l is a weighting coefficient.

Further, before the steps S1 to S6, the method further includes:

if an obstacle exists in the field to be optimized or the field to be optimized is not a convex polygon field, firstly dividing the complex field into a plurality of convex polygon subareas so as to carry out path planning on the plurality of convex polygon subareas by the steps S1-S6.

Further, after the steps S1 to S6, the method further includes: and carrying out path planning on the sub-areas of the plurality of convex polygons so as to obtain the optimal path of the whole field.

The beneficial effects are that: according to the improved Q-learning-based agricultural machinery path optimization method, the original field is calculated to rotate by the optimal rotation angle, then the parallel path of the boundary when the agricultural machinery working area is parallel to the optimal rotation angle is generated, and the parallel path coincides with one edge of the field, so that calculation is greatly simplified, a turning path is planned, the global path is optimized based on an improved Q-learning algorithm, and the minimum total length of agricultural machinery work is determined. The planned global path of the agricultural machinery is shortest, and the aim of improving the working efficiency is fulfilled.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it will be obvious that the drawings in the following description are some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a flow chart of an overall agricultural machinery path optimization method of the present invention;

FIG. 2 is a schematic diagram of a planned parallel path according to the present invention;

FIG. 3 is a schematic diagram of a complex field divided into convex polygons using a cell-composition method;

FIG. 4a is a schematic diagram of a fish tail type turning path and its parameters according to the present invention;

FIG. 4b is a schematic diagram of a semicircular turning path and parameters thereof according to the present invention;

FIG. 4c is a diagram of a pi-turn path and its parameters according to the present invention;

FIG. 5 is a flow chart of an agricultural machinery path optimization method based on improved Q-learning of the present invention.

Wherein: 1. a straight line path; 2. agricultural work area boundaries.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment provides an agricultural machinery path optimization method based on improved Q-learning, which comprises the following steps, as shown in the accompanying figure 1:

s1: determining initial parameters of path planning, wherein the initial parameters comprise an original field boundary point set P, the width w of each row of scanning of the agricultural machinery and the minimum turning radius of the agricultural machinery;

s2: translating the original field boundary point set P to the inside of the field boundary by a distance L to determine the boundary of an agricultural machine working area, and planning a linear path in the boundary of the agricultural machine working area;

in the step S3, the minimum span method for calculating the working area of the agricultural machine is as follows: the working mode of the straight path is simple, the coverage rate is high, and the straight path is used for covering a main working area of the agricultural machinery. The main point of the linear path planning is to find a better advancing direction so as to reduce the turning times. For convex polygon field without obstacle, the straight line paths in all directions are continuous, and the planning method is simple. Because the work content of the agricultural machinery is generally fixed and the whole field is covered, the distance between every two adjacent straight paths is equal, and the turning times are minimized, namely, the minimum span of the convex polygon in the direction perpendicular to the straight paths is found. Since the minimum span of a convex polygon always occurs in the boundary formed by one vertex and one side as shown in fig. 2. The optimal direction is parallel to a certain side of the convex polygon, the specific determination method is to calculate the span between the certain side and the point farthest from the side, and the direction of the side with the smallest span is selected as the optimal linear path direction of the convex polygon field.

Specifically, when the boundary of the working area of the agricultural machine is a convex polygon, the number of times n of turning the agricultural machine is:

wherein D is the distance from one boundary of the agricultural machine working area to the vertex of the agricultural machine working area, and the scanning line is a straight line parallel to the x axis in an x-y axis rectangular coordinate system;

y＝y _min ,x∈[x _min ,x _max ] (2)

wherein y is _min Is the minimum value of the boundary of the working area of the agricultural machinery on the y axis; y is _max Is the maximum value of the boundary of the working area of the agricultural machinery on the y axis; x is x _min Is the minimum value of the boundary of the working area of the agricultural machinery on the x axis; x is x _max Is the maximum value of the boundary of the working area of the agricultural machinery on the x axis; d=y _max -y _min Therefore, the turning times are reduced, namely D is reduced;

the rotation angle of each time of the original field is as follows:

in the formula, [ x ] ₁ ,y ₁ ]Is the starting point coordinates of the side parallel to the x axis in the boundary of the working area of the agricultural machinery; [ x ] ₂ ,y ₂ ]Is the end point coordinates, theta, of the side parallel to the x-axis in the boundary of the working area of the agricultural machine _t Is a rotation angle, and the rotation angle when the span D is minimum is the optimal rotation angle theta of the working area of the agricultural machine after multiple rotations ^* ；

S4: generating parallel paths of boundaries when the working area of the agricultural machine is parallel to the optimal rotation angle so as to determine straight paths of the agricultural machine; the method comprises the following steps:

s41: at an optimal rotation angle of theta ^* Is used as a scanning line to translate towards the inside of the agricultural machine working area, and each translation w; calculating the number of intersection points of the scanning line and the boundary of the agricultural machine working area after each translation;

specifically, the turning path in S5 includes a semicircle type, a fish tail type and a pi type, as shown in fig. 4;

specifically, since the traveling speed of the agricultural machine is low, the agricultural machine turning path can be regarded as an arc of a fixed radius, and the turning radius of the planned path is assumed to be R. Further, the pitch of two adjacent straight paths, i.e., the width w of each row scan of the agricultural machine, is equal, and therefore, the turning path is related to the pitch and turning radius of the adjacent straight paths.

if the distance between adjacent straight paths is less than twice the turning radius, i.e. w<2R; the turning path is a fish tail type; at this time, the agricultural machinery can not turn once to realize 180 degrees of turning. Therefore, the vehicle is required to turn a quarter circle first and then travel straight backward _r After the distance, turning the round by a quarter to finish turning, wherein:

l _r ＝2R-w (4)

if the distance between adjacent straight paths is greater than twice the turning radius, i.e. w>2R; the turning path is pi-shaped; at the moment, the pesticide turns a quarter circle first and then moves forward straight _f After the distance, turning the round by a quarter to finish turning, wherein:

l _f ＝w-2R (5)

S6: optimizing the path of the working area of the agricultural machine based on a Q-learning algorithm to determine the optimal total working length of the agricultural machine;

the Q-learning algorithm is a classical reinforcement learning algorithm, and the algorithm optimizes the decision of an agent through the interaction result of the agent and the environment. Specifically, r (S, a) is the immediate rewards of the agent (i.e., the agricultural machine) for the agent by executing the action a (a e a) in the state S (S e S) at a certain moment in time, i.e., the straight line path of the agricultural machine when executing the action a. The agent may choose the optimal sequence of actions by performing a series of actions, from an initial state to a target state, the environment will evaluate for each action, and Q-learning maximizes rewards.

The method for optimizing the global path based on the improved Q-learning algorithm comprises the following steps:

s61: defining and initializing a Q value table so as to start iterative calculation; the initial values of the Q value tables are all 0;

s62: defining a state Flag table Flag to store a state quantity of whether each path is connected; the initial value of the state Flag table Flag is 1;

s63: randomly selecting an initial linear path of the agricultural machine to determine the linear path when the agricultural machine starts working;

s64: judging whether the straight paths are connected;

preferably, the calculating method for determining whether the straight path is connected in S64 is as follows:

F(s _n ) Is a state quantity of whether each path is connected or not, for recording whether the straight path has been connected or not, if s _n Has been connected, it is set to 0, otherwise to 1, i.e.:

s66: if the calculated Q value table is converged, the calculation is ended;

specifically, the agricultural machine path optimization is to reduce the total length of the path, on the premise of not repeatedly walking any straight path, the global path length is related to the turning path length, and the turning path length is related to the turning path type, so that the global path planning can be optimized by adjusting the turning path type. Meanwhile, the type of the turning path is determined according to the relation between the distance and the turning radius of two adjacent (i.e. connected back and forth) straight paths, so that the current straight path of the agricultural machine (i.e. the intelligent agent) can be regarded as the current state s _c The selection of the next straight path is an action a _c The next straight path selected is the state s at the next moment _n Thus, the Q value function to be optimized is established as follows:

Q(s _c ,a′ _c )＝Q(s _l ,a′ _l )+γ*max(r(s _c ,f(s _c ,δ))) (7)

wherein Q(s) _c ,a′ _c ) Is based on the current state s of the agricultural machinery _c Action a 'corresponding to maximum awards for current state of agricultural machinery' _c Q, s of (C) _l Is the last state of the current state of the agricultural machinery; a' _l The action corresponding to the maximum awarded by the last state of the current state of the agricultural machine; discount factor of gamma (0<γ<1)；r(s _c ,s _n ) Is from the current state s of the agricultural machinery _c To the next state s of the current state of the agricultural machine _n Is a reward function of f(s) _c A) represents a current state s based on the agricultural machinery _c And at the current state s of the agricultural machine _c The next possible state set of the time-optional action set a is:

wherein,is the first selectable action, m is the number of selectable actions;

for the problem of agricultural machinery path optimization, the bonus function of calculating the bonus value in the step S65 is set up as follows:

wherein D(s) _c ,s _n ) Is from the current state (i.e. straight path) s of the agricultural machine _c To the next state s of the current state of the agricultural machine _n Is a length of a turning path; l is a weighting coefficient, and the length of the longest turning path or straight path can be selected to enable the front term and the rear term to be in the same order of magnitude;

if the current Q value table has the convergence element, the state is latched, and the Q value table is updated according to the transfer relation, and the step S66 is repeated.

Preferably, the method for optimizing the global path based on the improved Q-learning algorithm is as shown in fig. 5: assuming that a problem has i states and j inputs, in combination with the agricultural machinery path optimization problem, a Q value table with element initial values of 0 and a state Flag table with element initial values of 1 are defined first, where the state Flag table is used to store a state quantity of whether each path is connected. In this embodiment, the dimension of the Q value table is (k×4), where m= [ d/w ], where [ ] is a rounding calculation, and k is the number of straight paths; the dimension of the status Flag table Flag is (k 1); and secondly, determining the calculated initial state (namely the initial linear path of the agricultural machinery), and then finding the optimal path by the method for optimizing the global path based on the Q-learning algorithm.

The iteration method of a certain time is to determine a possible next state set according to the current state, calculate the rewarding value corresponding to each state in the set, and select the state with the largest rewarding value to update the Q value table and the state marking table Flag; and (5) the iterative process is circulated until the Q value table converges, namely, the optimal path is found. It is noted that, during the iteration process, when an element of the Q value table converges, it is locked, i.e. the choice of action in the state corresponding to the element is not optimized any more. Because there are a limited number of actions corresponding to one state, after one element is locked, the next optimal state can be determined, and the states can be simultaneously locked. The calculation method of the invention can avoid calculating the rewarding value of all possible next states each time, and can quickly determine the next state according to the transfer relation after determining the optimal state, namely according to the current state, thereby greatly reducing the calculation amount.

In theory, all straight paths can be in the next state, so that the iteration complexity of the Q-learning algorithm is greatly increased. Also, repeatedly walking through paths increases the path length, contrary to the goal of optimization, and if too far paths are taken as alternative paths, the computation becomes large. Further, for pi-turn path, if l _f Too large is detrimental to minimizing the total path length, so the invention will next set f (s _c Delta) is limited to a distance from the current linear path, in particular the current state of the connected agricultural machine (i.e. linear path) s _c Next state s to the current state of the agricultural machine _n The distance between the two is less than or equal to 4w.

Preferably, in this embodiment, before the steps S1 to S6, the method further includes: if an obstacle exists in the field to be optimized or the field to be optimized is not a convex polygon field, the path planning process is complicated, the path is also complicated, and the repetition rate and the leakage rate of the working path of the agricultural machine are improved. Then, firstly, the complex field is divided into a plurality of convex polygon subregions by adopting a mature cell-composition method, as shown in fig. 3, specifically, the field is divided into a plurality of small regions, such as regions numbered 1-7 in fig. 3, by taking the vertices on all boundaries (including outer boundaries and barriers, etc.) of the field as parallel lines. The regions thus divided are all convex polygons, and the present invention will verify whether the region composed of adjacent (co-bounded) sub-regions is a convex polygon, and if so, the composed region is the final divided sub-region (e.g., regions 1 and 2 and regions 5 and 6 in fig. 3). And carrying out path planning on a plurality of sub-areas of the convex polygon by the steps S1-S6.

Preferably, after the steps S1 to S6, the method further comprises: and carrying out path planning on the sub-areas of the plurality of convex polygons so as to obtain the optimal path of the whole field. Specifically, for complex field blocks, after the path planning in the subarea of each small convex polygon is completed, the paths of the small convex polygons are connected to complete the path planning of the whole field block. In order to minimize the total path length, the paths between the sub-regions are straight paths. Automated agricultural machinery typically works in large farmlands, and therefore the number of convex polygons is much lower than the number of straight paths in convex polygons. The present embodiment selects the shortest connection path between convex polygons by means of enumeration. Specifically, an initial field is firstly selected, and then any one convex polygon is selected as a target to be connected until all the convex polygons are connected. And taking the sequence of connecting all the fields as a combination, calculating the path length of all the combinations, and selecting the shortest combination as an optimal connection scheme, thereby completing the path planning of the whole field.

The invention has the advantages that:

1: according to the method, the complex farmland is decomposed into a plurality of simple subareas, the planning problem of the overall path is decomposed into the path planning in the subareas and the path planning problem between the subareas, and the complexity of farmland path planning is reduced. Meanwhile, an improved Q-learning-based agricultural machinery path optimization method is introduced to carry out path planning, so that the working efficiency of the agricultural machinery is improved.

2: the invention solves the problem of large calculation amount of the traditional algorithm based on the improved Q-learning algorithm, introduces a state marking table by combining the characteristics of the agricultural machine path, and designs proper path planning termination conditions.

3: the invention locks the state which has reached the optimum in the iterative process of the algorithm, and then rapidly completes the optimization of the adjacent states through the transfer relation. The calculation amount of the iterative process can be greatly reduced.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. An agricultural machinery path optimization method based on improved Q-learning is characterized by comprising the following steps:

in the step S3, the minimum span method for calculating the working area of the agricultural machine is as follows:

y＝y _min ,x∈[x _min ,x _max ] (2)

wherein y is _min Is an agricultural machine working areaMinimum of domain boundaries on y-axis; y is _max Is the maximum value of the boundary of the working area of the agricultural machinery on the y axis; x is x _min Is the minimum value of the boundary of the working area of the agricultural machinery on the x axis; x is x _max Is the maximum value of the boundary of the working area of the agricultural machinery on the x axis;

the rotation angle of each time of the original field is as follows:

in the step S4, the method for determining the straight path of the agricultural machine includes:

s41: at an optimal rotation angle of theta ^* Is used as a scanning line to translate towards the interior of the agricultural machine working area, and each translation w; calculating the number of intersection points of the scanning line and the boundary of the agricultural machine working area after each translation;

s42: if the number of the intersection points is 2, the coordinates of the two intersection points are recorded in the boundary of the agricultural machinery working area, and the scanning is continued; if the number of the intersection points is 1 or 0, judging that the intersection points exceed the boundary range of the working area of the agricultural machinery, stopping scanning, and finishing the generation process of the parallel paths;

s6: optimizing the agricultural work area path based on an improved Q-learning algorithm to determine an optimal overall agricultural work length;

the method for optimizing the global path based on the Q-learning algorithm in the S6 is as follows:

s61: defining and initializing a Q value table to start calculation;

s64: judging whether the straight paths are connected;

s66: if the calculated Q value table is converged, the calculation is ended;

2. The improved Q-learning based agricultural machine path optimization method of claim 1, wherein the turning path in S5 comprises a semicircle type, a fish tail type and a pi type;

l _r ＝2R-w (4)

wherein, I _r Is required to travel when the agricultural machinery turns along the fishtail turning pathIs a straight line distance of (2); r is the turning radius; the length of the turning path is (2+pi) multiplied by R-w;

l _f ＝w-2R (5)

3. The method for optimizing an agricultural machine path based on improved Q-learning according to claim 1, wherein the calculating method for determining whether the straight path is connected in S64 is as follows:

wherein: f(s) _n ) Is a state quantity of whether each path is connected.

4. The method for optimizing an agricultural machine path based on improved Q-learning according to claim 1, wherein the Q function established by determining the convergence status of the Q value table in step S65 is:

Q(s _c ,a′ _c )＝Q(s _l ,a′ _l )+γ*max(r(s _c ,f(s _c ,δ))) (7)

wherein Q(s) _c ,a′ _c ) Is based on the current state s of the agricultural machinery _c Action a 'corresponding to maximum awards for current state of agricultural machinery' _c Q, s of (C) _l Is the last state of the current state of the agricultural machinery; a' _l The action corresponding to the maximum awarded by the last state of the current state of the agricultural machine; a discount factor of gamma; r(s) _c ,s _n ) Is from the current state s of the agricultural machinery _c To the next state s of the current state of the agricultural machine _n Is a reward function of f(s) _c Delta) represents a current state s based on the agricultural machinery _c And at the current state of the agricultural machinerys _c A state set of the time-selectable action set δ, namely:

wherein,is the first alternative, and m is the number of alternative actions.

5. The improved Q-learning based agricultural machine path optimization method of claim 1, wherein the bonus function of calculating the bonus value in step S65 is set up as:

6. The improved Q-learning based agricultural machinery path optimization method of claim 1, further comprising, prior to steps S1-S6:

7. The improved Q-learning based agricultural machinery path optimization method of claim 1, further comprising, after steps S1-S6: and carrying out path planning on the sub-areas of the plurality of convex polygons so as to obtain the optimal path of the whole field.