CN112949493A

CN112949493A - Lane line detection method and system combining semantic segmentation and attention mechanism

Info

Publication number: CN112949493A
Application number: CN202110235668.4A
Authority: CN
Inventors: 易安明; 王汉超; 袁嘉言; 徐绍凯; 陈明木
Original assignee: Shenzhen Ruiwei Intelligent Technology Co ltd
Current assignee: Shenzhen Ruiwei Intelligent Technology Co ltd
Priority date: 2021-03-03
Filing date: 2021-03-03
Publication date: 2021-06-11
Anticipated expiration: 2041-03-03
Also published as: CN112949493B

Abstract

The invention discloses a lane line detection method and system combining semantic segmentation and attention mechanism, relating to the technical field of lane line detection and comprising the following steps: inputting the picture into a backbone network to extract lane line characteristics in the picture, inputting the characteristics extracted by the backbone network into a full connection layer, and obtaining local characteristics of each lane line after processing by the full connection layer; inputting the local features of each lane line into an attention mechanism module, acquiring the global features of each lane line, and superposing the global features and the local features to obtain final lane line parameters; and comparing the calculated mean square error loss of the lane lines with the real lane lines in the input picture, calculating the cross entropy loss of the mask map and calculating the gradient of the corresponding convolutional layer, updating the parameters of the convolutional layer and the full-connection layer by using a back propagation algorithm, and continuously iterating until the model converges. The method of the invention accelerates the convergence speed of the model and improves the robustness of the model under the condition of not increasing the operation amount.

Description

Lane line detection method and system combining semantic segmentation and attention mechanism

Technical Field

The invention relates to the technical field of lane line detection, in particular to a lane line detection method and system combining semantic segmentation and attention mechanism.

Background

The lane line detection is one of basic tasks in the field of assistant driving, and in an assistant driving system, the lane line position is accurately acquired, which is an important prerequisite for subsequent vehicle driving route planning and lane departure early warning; the driving assistance system captures a driving area in front of a currently driving vehicle through a vehicle camera, and can position the specific position of the lane line in the image according to the image clues of the lane line by utilizing a digital image processing technology.

The general flow of the traditional lane line detection system is based on camera calibration of monocular images, image cleaning), feature extraction, lane line model fitting, time domain integration and image physical space correspondence; the core lane line detection method relies on manually extracted features to identify, such as color features, structure tensor, lane line outline and the like, and the position of a lane line is obtained after screening; the algorithm based on the gradient change of the local area has poor effect under the condition of complex road conditions, the lane line is easy to miss detection under the conditions of lane line shielding, blurring, light reflection and the like, and the traditional method has insufficient robustness in complex scenes; and the traditional algorithm generally takes a long time and is difficult to run in real time in a low-end embedded device.

With the wide application of deep learning in the field of computer vision, the schemes for solving the problem of lane line detection by utilizing deep learning are gradually increased, and the current common method is to use semantic segmentation to treat each pixel point in an image as a two-classification problem of a background and a lane line, use a lane line pixel point set obtained by post-processing and fit the lane line pixel point set into a lane line equation; the method based on semantic segmentation can effectively solve the problem of lane line ambiguity generally, but under the condition of occlusion, the scheme still has missed detection; in addition, certain time is consumed for post-processing, and the performance of the whole lane line detection process is reduced. At present, how to effectively solve the problem of missing detection of a lane line in complex scenes such as blurring and shielding on the basis of accelerating the convergence speed of a model and improving the robustness of the model is a difficult problem to be solved by technical personnel in the field.

Disclosure of Invention

In view of the above, the present invention provides a lane line detection method and system combining semantic segmentation and attention mechanism, so as to solve the problems proposed in the background art, and effectively solve the problem of missing detection of lane lines in complicated scenes such as blur and occlusion on the basis of accelerating the model convergence speed and improving the model robustness.

In order to achieve the above object, an aspect of the present invention provides a lane line detection method combining semantic segmentation and attention mechanism, including the following steps:

inputting the picture into a backbone network to extract lane line characteristics in the picture, and acquiring a lane line characteristic diagram;

inputting the lane line feature map into a full-connection layer to obtain local features of each lane line;

and acquiring the global characteristic of each lane line by using the attention mechanism through the local characteristic, and superposing the global characteristic and the local characteristic to obtain the final lane line parameter.

Preferably, the lane line feature map is assisted by segmentation to obtain a mask map of each lane line;

comparing the final lane line parameters and the mask image with the real lane lines in the input image, and respectively calculating the mean square error loss of the lane lines and the cross entropy loss of the mask image;

calculating the gradient of the corresponding convolution layer according to the mean square error loss of the lane line and the cross entropy loss of the mask map, updating the parameters of the convolution layer and the full-connection layer by using a back propagation algorithm, and continuously iterating until the model converges.

Preferably, the local features include cubic curve equation coefficients a, b, c, d of each lane line, a confidence p of each lane line, a start point ordinate s, and an end point ordinate e.

Preferably, the specific calculation step of the mean square error loss is as follows:

setting N anchor points at equal intervals on a longitudinal axis of the input picture, and acquiring a longitudinal coordinate of each anchor point;

inputting the vertical coordinate of the anchor point into an actual parameter equation and a network prediction parameter equation to obtain the horizontal coordinate of the anchor point on the actual lane line and the horizontal coordinate of the predicted lane line;

making a difference between the abscissa of the predicted lane line and the abscissa of the actual lane line, adding the differences between the abscissas of all the anchor points to obtain the total offset of the lane line, and dividing the total offset by the number of the anchor points to obtain the mean square error loss of the lane line, wherein the specific formula is as follows:

wherein M is the number of target lane lines, N is the number of anchor points,

is the predicted abscissa of the ith anchor point of the jth lane line,

the actual abscissa of the ith anchor point representing the jth lane.

Preferably, the parameter equation of the network prediction is

y＝ax³+bx²+cx+d；

Wherein y represents the ordinate of the anchor point and x represents the abscissa of the anchor point.

Preferably, the calculation formula of the cross entropy loss of the mask map is as follows:

wherein w represents the width of the mask pattern, h represents the height of the mask pattern, y_iActual tag, y, representing the ith position_iWhen 0 denotes that the position is background, y_i1 denotes that the position is a lane line, p_iRepresenting the probability value of the predicted lane line.

Preferably, the attention mechanism comprises:

inputting each vector output by the full-connection layer, namely the parameter of a specific lane line, into another full-connection layer, and then connecting a softmax layer to obtain a weight between the specific lane line and other lane lines;

and multiplying the weight values with corresponding lane line vectors respectively and then adding the multiplied weights to obtain the specific global vector of one lane line.

By adopting the technical scheme, the method has the following beneficial technical effects: generally, the fully connected layer can also extract global information of the input picture, but since there are many mutual relationships between lane lines in the actual scene, for example, the relative position relationship between different lane lines, a common vanishing point, and the like. In order to utilize the global information composed of the constraints among different lane lines and improve the capture capability of the model for the global information, an attention mechanism module is introduced.

Preferably, the segmentation assistance comprises the following specific steps:

extracting a specified number of feature maps from each layer of the backbone network to obtain feature maps with different scales;

unifying the feature maps of different scales by utilizing a linear interpolation mode to obtain a feature map of a unified scale;

and stacking the characteristic graphs with the uniform scale, and obtaining a mask graph of the lane line in the input picture through a shallow convolutional neural network.

By adopting the technical scheme, the method has the following beneficial technical effects: the segmentation auxiliary module makes full use of the characteristics of the input picture, improves the number of the supervision signals, accelerates the convergence speed of the model, and improves the robustness of the model.

On the other hand, the invention provides a lane line detection system combining semantic segmentation and an attention mechanism, which comprises a backbone network module, a full connection layer module, an attention mechanism module and a segmentation auxiliary module; wherein,

inputting the picture into a backbone network module to extract lane line characteristics in the picture, and acquiring a lane line characteristic diagram;

inputting the lane line feature map into a full-connection layer module to obtain local features of each lane line;

inputting the local features into an attention mechanism module, acquiring global features of each lane line, and superposing the global features with the local features to obtain final lane line parameters;

inputting the lane line feature map into a segmentation auxiliary module to obtain a mask map of each lane line;

Preferably, when the trained model is used for lane line prediction, the segmentation assisting module does not participate in the prediction process.

By adopting the technical scheme, the method has the following beneficial technical effects: the better training effect is achieved without increasing the reasoning time.

The technical scheme can show that the invention discloses and provides a lane line detection method combining semantic segmentation and attention mechanism, and compared with the prior art, the lane line detection method has the following beneficial technical effects:

(1) an end-to-end algorithm for directly predicting a lane curve equation is provided, post-processing is not needed, and the speed is higher.

(2) The semantic segmentation module is introduced into a training frame, extra supervision signals are added to assist the training of the model, and the segmentation module is closed when deployment is carried out, so that the convergence speed of the model is accelerated and the robustness of the model is improved under the condition that the deployment reasoning time and extra calculation amount are not increased.

(3) An attention mechanism is introduced into a training frame, the extraction capability of the model on the global features of the lane line is improved, and the problem of missing detection of the lane line under complex scenes such as blurring and shielding is effectively solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic diagram of a segmentation assistance module of the present invention;

FIG. 3 is a schematic illustration of the attention mechanism of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The first aspect of the embodiment of the invention discloses a lane line detection system combining semantic segmentation and an attention mechanism, which comprises a backbone network module, a full connection layer module, an attention mechanism module and a segmentation auxiliary module; as shown in fig. 1, the method comprises the following steps:

the method comprises the following steps: inputting the picture to a backbone network module to extract lane line characteristics in the picture and obtain a lane line characteristic diagram;

it should be noted that, according to actual needs, a suitable backbone network may be selected, such as Resnet, Darknet, hourgals, etc., and Resnet18 is used in this embodiment.

Step two: inputting the lane line feature map into the full-connection layer module to obtain the local features of each lane line;

further, the local features include cubic curve equation coefficients a, b, c, d of each lane line, a confidence p of each lane line, a start point ordinate s, and an end point ordinate e. Wherein p is a probability value of 0 or more and 1 or less.

Step three: inputting the lane line feature map into a segmentation auxiliary module to obtain a mask map of each lane line;

the polynomial equation coefficients of the lane lines are learned from features extracted from the trunk network by utilizing the full connection layer, and only the coefficients of the lane line equations are used as supervision signals when loss is calculated in the mode, namely, the learnable content of the network is less, the difficulty of model convergence is increased when the number of the supervision signals is less, overfitting of the model is easily caused, and the robustness of the model in different scenes is reduced. Therefore, the model can learn the polynomial equation coefficient of the lane line by using a learning mode of semantic segmentation, and meanwhile, the segmentation information of the lane line can be learned by using another branch.

As shown in fig. 2, the operation manner of the segmentation assistance module is as follows:

stacking feature maps with uniform scales, and obtaining a mask map of a lane line in an input picture through a shallow convolutional neural network; the mask map is a gray scale map in which the pixel value at the background position is 0 and the pixel value at the predicted lane line position is 1.

By adopting the technical scheme, the method has the following beneficial technical effects: the segmentation auxiliary module makes full use of the characteristics of the input picture, improves the number of the supervision signals, accelerates the convergence speed of the model, and improves the robustness of the model. In addition, when the trained model is used for lane line prediction, the segmentation auxiliary module can be closed, so that a better training effect is achieved without increasing inference time.

Step four: inputting the local features into an attention mechanism module to obtain the global features of each lane line;

in order to improve the capture capability of the model for global information, an attention mechanism module is introduced. Generally, the full link layer module can also extract global information of an input picture, but since there are many mutual relationships between lane lines in an actual scene, for example, the relative position relationship between different lane lines, a common vanishing point, and the like. Therefore, for each vector output by the fully-connected layer module, namely parameters of a specific lane line, including coefficients of a curve equation, a starting point and a final point of the lane line and confidence of the lane line, the vector consisting of the numbers is input into another fully-connected layer module, and then a softmax layer is connected to obtain a weight between the lane line and other lane lines; and finally, multiplying the weights by corresponding lane line vectors respectively and then adding the multiplied weights to obtain a global vector of the lane line, and adding the vector obtained by the original full-connection layer module to obtain final output. Taking lane line number 0 as an example, an example of the attention mechanism is shown in fig. 3.

Step five: comparing the parameters of the lane lines and the mask image with the real lane lines in the input image, and respectively calculating the mean square error loss of the lane lines and the cross entropy loss of the mask image;

specifically, the calculation steps of the mean square error loss of the lane line are as follows:

setting N anchor points at equal intervals on a longitudinal axis of an input picture, and acquiring a longitudinal coordinate of each anchor point;

inputting the ordinate of the anchor point into an actual parameter equation and a network prediction parameter equation to obtain the abscissa of the anchor point on an actual lane line and the abscissa of a predicted lane line;

making difference between the abscissa of the predicted lane line and the abscissa of the actual lane line, adding the differences between the abscissas of all the anchor points to obtain the total offset of the lane line, and dividing the total offset by the number of the anchor points to obtain the mean square error loss of the lane line, wherein the specific formula is as follows:

wherein M is the number of target lane lines, N is the number of anchor points,

is the predicted abscissa of the ith anchor point of the jth lane line,

the actual abscissa of the ith anchor point representing the jth lane.

Further, the calculation formula of the cross entropy loss of the mask map is as follows:

wherein w represents the width of the mask, h represents the height of the mask, y_iActual tag, y, representing the ith position_iWhen 0 denotes that the position is background, y_i1 denotes that the position is a lane line, p_iRepresenting the probability value of the predicted lane line.

It should be noted that the parameter equation of the network prediction is

y＝ax³+bx²+cx+d；

Wherein y represents the ordinate of the anchor point, x represents the abscissa of the anchor point, and a, b, c, d are cubic curve equation coefficients obtained from the fully-connected layer and are known quantities.

Step six: calculating the gradient of the corresponding convolutional layer according to the mean square error loss of the lane line and the cross entropy loss of the mask graph, updating the parameters of the convolutional layer and the full-connection layer by using a back propagation algorithm, and continuously iterating until the model converges.

Preferably, the attention mechanism module is applied in the following manner:

inputting each vector output by the fully-connected layer module, namely the parameter of a specific lane line, into another fully-connected layer module, and then connecting a softmax layer to obtain the weight between the specific lane line and other lane lines;

and multiplying the weights by corresponding lane line vectors respectively and then adding the weights to obtain a specific global vector of one lane line.

The second aspect of the embodiments of the present invention provides a lane line detection method combining semantic segmentation and attention mechanism, including the following steps:

inputting the lane line feature map into the full-connection layer to obtain the local feature of each lane line;

and acquiring the global characteristic of each lane line by using the attention mechanism for the local characteristic, and superposing the global characteristic and the local characteristic to obtain the final lane line parameter.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A lane line detection method combining semantic segmentation and attention mechanism is characterized by comprising the following steps:

2. The method for lane line detection with semantic segmentation and attention mechanism according to claim 1, further comprising:

the lane line feature map is subjected to segmentation assistance to obtain a mask map of each lane line;

3. The method for lane line detection with semantic segmentation and attention mechanism according to claim 1, wherein the local features comprise cubic curve equation coefficients a, b, c and d of each lane line, and a confidence p, a start point ordinate s and an end point ordinate e of each lane line.

4. The method for detecting the lane line by combining the semantic segmentation and the attention mechanism according to claim 3, wherein the specific calculation steps of the mean square error loss of the lane line are as follows:

making a difference between the abscissa of the predicted lane line and the abscissa of the actual lane line, adding the differences between the abscissas of all the anchor points to obtain a total offset of the lane line, and dividing the total offset by the number of the anchor points to obtain a mean square error loss of the lane line, wherein the specific formula is as follows:

wherein M is the number of target lane lines, N is the number of anchor points,

is the predicted abscissa of the ith anchor point of the jth lane line,

the actual abscissa of the ith anchor point representing the jth lane.

5. The method for detecting lane lines by combining semantic segmentation and attention mechanism as claimed in claim 4, wherein the network prediction parameter equation is

y＝ax³+bx²+cx+d；

6. The method for detecting lane lines by combining semantic segmentation and attention mechanism according to claim 2, wherein the cross entropy loss of the mask map is calculated by the formula:

7. The method of claim 1, wherein the attention mechanism comprises:

inputting the parameters of the lane line output by the full connection layer into another full connection layer, and then connecting a softmax layer to obtain the weight between the lane line and other lane lines;

and multiplying the weight values with corresponding lane line vectors respectively and then adding the multiplied weights to obtain the global vector of the lane line.

8. The method for detecting lane lines by combining semantic segmentation and attention mechanism according to claim 2, wherein the specific steps of segmentation assistance comprise:

9. A lane line detection system combining semantic segmentation and attention mechanism is characterized by comprising a backbone network module, a full connection layer module, an attention mechanism module and a segmentation auxiliary module; wherein,

10. The system of claim 9, wherein the segmentation assistance module does not participate in the prediction process when using the trained model for lane line prediction.