CN111523548B

CN111523548B - Image semantic segmentation and intelligent driving control method and device

Info

Publication number: CN111523548B
Application number: CN202010331448.7A
Authority: CN
Inventors: 李祥泰; 程光亮; 李夏; 石建萍
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2023-11-28
Anticipated expiration: 2040-04-24
Also published as: CN111523548A

Abstract

The disclosure provides an image semantic segmentation method and device, comprising the following steps: extracting first features of the image to be segmented to obtain an original feature map; generating an offset feature map corresponding to the original feature map based on the original feature map, wherein the value of each feature point in the offset feature map represents the value of the feature point to be offset at the position corresponding to the position of the feature point in the original feature map; generating an edge feature map and a main feature map corresponding to the image to be segmented based on the offset feature map and the original feature map; the edge feature map comprises object edge features in the image to be segmented, and the main feature map comprises object main features in the image to be segmented; and generating a semantic segmentation image corresponding to the image to be segmented based on the edge feature image and the main feature image.

Description

Image semantic segmentation and intelligent driving control method and device

Technical Field

The disclosure relates to the technical field of computers, in particular to an image semantic segmentation and intelligent driving control method and device.

Background

Image semantic segmentation is an important component in image processing and machine vision techniques. The semantic segmentation is to classify each pixel point in the image, determine the category of each point (such as belonging to the background, the person or the car, etc.), and thus divide the region. At present, semantic segmentation has been widely applied to scenes such as automatic driving and unmanned aerial vehicle landing point judgment.

In the related art, when image semantic segmentation is performed, an image to be segmented is generally segmented directly through a neural network, but due to the limited receptive field of the neural network, two parts belonging to the same object may be segmented into different categories, so that the segmentation result is affected.

Disclosure of Invention

The embodiment of the disclosure at least provides an image semantic segmentation and intelligent driving control method and device.

In a first aspect, an embodiment of the present disclosure provides an image semantic segmentation method, including:

extracting first features of the image to be segmented to obtain an original feature map;

generating an offset feature map corresponding to the original feature map based on the original feature map, wherein the value of each feature point in the offset feature map represents the value of the feature point to be offset at the position corresponding to the position of the feature point in the original feature map;

generating an edge feature map and a main feature map corresponding to the image to be segmented based on the offset feature map and the original feature map; the edge feature map comprises object edge features in the image to be segmented, and the main feature map comprises object main features in the image to be segmented;

And generating a semantic segmentation image corresponding to the image to be segmented based on the edge feature image and the main feature image.

Here, the features of the regions belonging to the same object are similar, the value of each feature point in the offset feature map determined according to the original feature map represents the value of the feature point to be offset at the position corresponding to the position of the feature point in the original feature map, after the feature points in the original feature map are offset by the offset feature map, the features of the regions belonging to the same object in the original feature map can be more aggregated, so that the feature points contained in the feature regions belonging to the same object are more comprehensive, and the feature part belonging to the edge (i.e., the edge feature map) in the original feature map and the feature part belonging to the main body (i.e., the main body feature map) can be distinguished according to the offset feature map and the original feature map; in addition, after the feature points in the original feature map are shifted through the offset feature map, the region features belonging to the same object in the original feature map can be gathered more, which is equivalent to expanding the receptive field of the neural network in the process of generating the main feature map, so that the semantic segmentation image generated based on the edge feature map and the main feature map has higher precision.

In a possible implementation manner, the generating, based on the original feature map, an offset feature map corresponding to the original feature map includes:

extracting features of the original feature map to generate a depth feature map corresponding to the original feature map;

and generating an offset feature map corresponding to the original feature map according to the original feature map and the depth feature map.

Here, since features between regions belonging to the same target object in the image to be segmented should be similar, a depth feature map generated by feature extraction of an original feature map includes high-level features in the original feature map, that is, includes high-level features belonging to the same target object, and an offset feature map generated according to the original feature map and the depth feature map includes features in both the original feature map and the high-level features belonging to the same target object in the original feature map, so that feature points in the original feature map are controlled to be offset based on the offset feature map, so that feature points belonging to the same target object can be gathered, and a main feature portion is extracted from the original feature map.

In a possible implementation manner, the generating, based on the original feature map, a depth feature map corresponding to the original feature map includes:

And performing downsampling processing on the original feature map, and performing upsampling processing on the feature map subjected to the downsampling processing to obtain a depth feature map corresponding to the original feature map.

In a possible implementation manner, the generating an offset feature map corresponding to the original feature map according to the original feature map and the depth feature map includes:

and cascading the original feature map and the depth feature map, and extracting features of the feature map after cascading to obtain the offset feature map.

In a possible implementation manner, the generating, based on the offset feature map and the original feature map, an edge feature map and a main feature map corresponding to the image to be segmented includes:

generating a main feature map corresponding to the image to be segmented based on the offset feature map and the original feature map;

and generating an edge feature map corresponding to the image to be segmented based on the original feature map and the main feature map.

In a possible implementation manner, the generating a main feature map corresponding to the image to be segmented based on the offset feature map and the original feature map includes:

shifting each feature point in the original feature map according to a value to be shifted corresponding to the feature point in the shifting feature map, so as to obtain an intermediate feature map corresponding to the original feature map;

And calculating bilinear difference values of the values of each feature point in the intermediate feature map according to the weights of the corresponding positions in the offset feature map, and obtaining a main feature map corresponding to the original feature map.

In a possible implementation manner, the generating, based on the original feature map and the main feature map, an edge feature map corresponding to the image to be segmented includes:

and subtracting the values of the feature points at the positions corresponding to the original feature map and the main feature map, and generating an edge feature map corresponding to the image to be segmented according to the subtracted values.

In a possible embodiment, the method further comprises:

extracting second features of the image to be segmented to obtain a low-level feature map; the convolution times corresponding to the low-level feature images are smaller than the convolution times corresponding to the original feature images;

the step of subtracting the values of the feature points at the positions corresponding to the original feature map and the main feature map, and generating an edge feature map corresponding to the image to be segmented according to the subtracted values, includes:

subtracting the values of the feature points of the original feature map and the main feature map at the corresponding positions to obtain an initial edge feature map;

And cascading the low-level feature image and the initial edge feature image, and carrying out feature extraction on the feature images after cascading to obtain the edge feature image corresponding to the image to be segmented.

In the low-level feature map, the edge features of the target object in the original image are more obvious, and the low-level feature map and the initial edge feature map are cascaded, so that the edge features in the initial edge feature map can be supplemented, and the edge recognition accuracy of the edge feature map is improved.

In a possible implementation manner, the generating the semantic segmentation image corresponding to the image to be segmented based on the edge feature map and the main feature map includes:

adding the values of the feature points of the edge feature map and the main feature map at the corresponding positions to obtain a semantic feature map corresponding to the image to be segmented;

and carrying out convolution operation on the semantic feature image to obtain a semantic segmentation image corresponding to the image to be segmented.

In a possible implementation manner, the semantic segmentation image is obtained by processing the image to be segmented through a neural network;

the neural network is trained by the following method:

acquiring a sample image with first annotation information and second annotation information, wherein the first annotation information is an annotation added to a pixel region of a target object in the sample image, and the second annotation information is an annotation added to an edge of the target object in the sample image;

Inputting the sample image into the neural network to obtain an edge feature map, a main feature map and a semantic feature map corresponding to the sample image;

determining a predicted edge image corresponding to the sample image based on the edge feature map; and determining a predicted subject image corresponding to the sample image based on the subject feature map; determining a predicted semantic segmentation image corresponding to the sample image based on the semantic feature map;

and determining a loss value in the training process based on the predicted edge image, the predicted main body image, the predicted semantic segmentation image, the first labeling information and the second labeling information of the sample image, and training the neural network based on the loss value.

In the training process, the edge characteristic part and the main characteristic part of the original characteristic diagram are separately supervised, so that compared with the method for performing supervision training by adding all loss values together, the method can perform targeted training, and the neural network trained by the method has higher segmentation precision.

In a possible implementation manner, the determining the loss value in the training process based on the predicted edge image, the predicted main image, the predicted semantic segmentation image, and the first labeling information and the second labeling information of the sample image includes:

Determining a first loss value based on first labeling information of the prediction subject image and the sample image; the method comprises the steps of,

determining a second loss value based on the predicted edge image, the first labeling information and the second labeling information of the sample image; the method comprises the steps of,

determining a third loss value based on the predicted semantic segmentation image and the first annotation information of the sample image;

and determining a loss value in the training process based on the first loss value, the second loss value and the third loss value.

In a possible implementation manner, the determining the second loss value based on the predicted edge image, the first labeling information of the sample image, and the second labeling information includes:

determining a first edge prediction loss in the training process based on the predicted edge image and the second labeling information of the sample image; determining a second edge prediction loss in the training process based on the predicted edge image and the first labeling information of the sample image;

and carrying out weighted summation on the first edge prediction loss and the second edge prediction loss to obtain the second loss value.

Here, the second loss value includes a loss of edge prediction accuracy and a loss of edge semantic prediction based on the prediction as an edge, and the edge prediction and the edge semantic prediction can be optimized by the loss when training the neural network.

In one possible embodiment, the determining the loss value in the training process based on the first loss value, the second loss value, and the third loss value includes:

and carrying out weighted summation on the first loss value, the second loss value and the third loss value to obtain the loss value in the training process.

In a second aspect, an embodiment of the present disclosure further provides an intelligent driving control method, including:

acquiring an image acquired by a running device in the running process;

semantically segmenting the image by a method based on the image semantic segmentation as described in the first aspect or any of the possible implementation manners of the first aspect;

and controlling the driving device based on the semantic segmentation result.

In a third aspect, an embodiment of the present disclosure further provides an image semantic segmentation apparatus, including:

the feature extraction module is used for carrying out first feature extraction on the image to be segmented to obtain an original feature image;

The first generation module is used for generating an offset feature map corresponding to the original feature map based on the original feature map, wherein the value of each feature point in the offset feature map represents the value of the feature point, which is required to be offset, at the position corresponding to the position of the feature point in the original feature map;

the second generation module is used for generating an edge feature map and a main feature map corresponding to the image to be segmented based on the offset feature map and the original feature map; the edge feature map comprises object edge features in the image to be segmented, and the main feature map comprises object main features in the image to be segmented;

and the image segmentation module is used for generating a semantic segmentation image corresponding to the image to be segmented based on the edge feature image and the main feature image.

In a possible implementation manner, the first generating module is configured to, when generating, based on the original feature map, an offset feature map corresponding to the original feature map:

In a possible implementation manner, the first generating module is configured to, when performing feature extraction on the original feature map to generate a depth feature map corresponding to the original feature map:

In a possible implementation manner, the first generation module is configured to, when generating an offset feature map corresponding to the original feature map according to the original feature map and the depth feature map:

In a possible implementation manner, the second generating module is configured to, when generating, based on the offset feature map and the original feature map, an edge feature map and a main feature map corresponding to the image to be segmented:

In a possible implementation manner, the second generating module is configured to, when generating the main feature map corresponding to the image to be segmented based on the offset feature map and the original feature map:

In a possible implementation manner, the second generating module is configured to, when generating an edge feature map corresponding to the image to be segmented based on the original feature map and the main feature map:

In a possible implementation manner, the feature extraction module is further configured to:

The image segmentation module is used for subtracting the values of the feature points at the positions corresponding to the original feature image and the main feature image and generating an edge feature image corresponding to the image to be segmented according to the subtracted values, wherein the image segmentation module is used for:

In a possible implementation manner, the image segmentation module is configured to, when generating a semantic segmentation image corresponding to the image to be segmented based on the edge feature map and the main feature map:

The apparatus further includes a training module for training the neural network according to the following method:

In a possible implementation manner, the training module is configured to, when determining the loss value in the training process based on the predicted edge image, the predicted main image, the predicted semantic segmentation image, and the first labeling information and the second labeling information of the sample image:

In a possible implementation manner, the training module is configured to, when determining the second loss value based on the predicted edge image, the first labeling information of the sample image, and the second labeling information, determine:

In one possible implementation manner, the training module is configured to, when determining a loss value in the current training process based on the first loss value, the second loss value, and the third loss value:

In a fourth aspect, an embodiment of the present disclosure further provides an intelligent travel control apparatus, including:

the acquisition module is used for acquiring images acquired by the running device in the running process;

an image segmentation module for semantically segmenting the image by a semantic segmentation method based on the image according to the first aspect or any one of the possible implementation manners of the first aspect;

and the control module is used for controlling the running device based on the semantic segmentation result.

In a fifth aspect, embodiments of the present disclosure further provide a computer device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the computer device is running, the machine-readable instructions when executed by the processor performing the steps of the first aspect, or any of the possible implementations of the first aspect, or the steps of the second aspect.

In a sixth aspect, the disclosed embodiments further provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the first aspect, or any of the possible implementations of the first aspect, or performs the steps of the second aspect.

The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the embodiments are briefly described below, which are incorporated in and constitute a part of the specification, these drawings showing embodiments consistent with the present disclosure and together with the description serve to illustrate the technical solutions of the present disclosure. It is to be understood that the following drawings illustrate only certain embodiments of the present disclosure and are therefore not to be considered limiting of its scope, for the person of ordinary skill in the art may admit to other equally relevant drawings without inventive effort.

FIG. 1 illustrates a flow chart of an image semantic segmentation method provided by an embodiment of the present disclosure;

FIG. 2 illustrates a flow chart for generating an offset feature map corresponding to an original feature map provided by an embodiment of the present disclosure;

FIG. 3 illustrates a flowchart of an edge feature map and body feature map generation method provided by an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an image semantic segmentation method according to an embodiment of the present disclosure;

FIG. 5 is a flow chart of a neural network training method according to an embodiment of the present disclosure;

FIG. 6 illustrates a schematic diagram of a neural network training process provided by an embodiment of the present disclosure;

fig. 7 is a schematic flow chart of an intelligent driving control method according to an embodiment of the disclosure;

FIG. 8 illustrates an architecture diagram of an image semantic segmentation device provided by an embodiment of the present disclosure;

fig. 9 shows a schematic architecture diagram of an intelligent travel control apparatus according to an embodiment of the present disclosure;

FIG. 10 illustrates a schematic diagram of a computer device 1000 provided by an embodiment of the present disclosure;

fig. 11 shows a schematic structural diagram of a computer device 1100 provided by an embodiment of the disclosure.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. The components of the embodiments of the present disclosure, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.

In the related art, when image semantic segmentation is performed, an image to be segmented is generally directly segmented through a neural network, but due to the limited receptive field of the neural network, two parts belonging to the same object may be segmented into different categories, for example, wheels and body parts of a vehicle may be segmented into different two categories.

In addition, since the image to be segmented needs to be downsampled during the semantic segmentation of the image to be segmented, the feature extraction is performed on the image to be segmented, and in the downsampling process, the edge information of the object in the image to be segmented may be lost, so that the segmentation result of the edge of the object in the final semantic segmentation result is affected.

The present invention is directed to a method for manufacturing a semiconductor device, and a semiconductor device manufactured by the method.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

For the sake of understanding the present embodiment, first, a detailed description will be given of an image semantic segmentation method disclosed in the present embodiment, where an execution body of the image semantic segmentation method provided in the present embodiment is generally an electronic device with a certain computing capability, and the electronic device includes, for example: the terminal device, or server or other processing device, may be a User Equipment (UE), mobile device, user terminal, cellular phone, cordless phone, personal digital assistant (Personal Digital Assistant, PDA), handheld device, computing device, vehicle mounted device, wearable device, etc. In some possible implementations, the electronic method may be implemented by way of a processor invoking computer readable instructions stored in a memory.

Referring to fig. 1, a flowchart of an image semantic segmentation method according to an embodiment of the present disclosure is shown, where the method includes steps 101 to 104, where:

and step 101, carrying out first feature extraction on the image to be segmented to obtain an original feature map.

Step 102, generating an offset feature map corresponding to the original feature map based on the original feature map, wherein the value of each feature point in the offset feature map represents the value of the feature point to be offset at the position corresponding to the position of the feature point in the original feature map.

Step 103, generating an edge feature map and a main feature map corresponding to the image to be segmented based on the offset feature map and the original feature map; the edge feature map comprises object edge features in the image to be segmented, and the subject feature map comprises object subject features in the image to be segmented.

And 104, generating a semantic segmentation image corresponding to the image to be segmented based on the edge feature image and the main feature image.

The method comprises the steps of generating an offset feature map based on an original feature map of an image to be segmented, generating an edge feature map and a main feature map corresponding to the image to be segmented according to the offset feature map and the original feature map, and generating a semantic segmentation image corresponding to the image to be segmented according to the edge feature map and the main feature map.

The value of each feature point in the offset feature map determined according to the original feature map represents the value of the feature point to be offset at the position corresponding to the position of the feature point in the original feature map, after the feature point in the original feature map is offset by the offset feature map, the region features belonging to the same object in the original feature map can be more concentrated, so that the feature points contained in the feature region belonging to the same object are more comprehensive, and the feature part belonging to the edge (namely the edge feature map) and the feature part belonging to the main body (namely the main body feature map) in the original feature map can be distinguished according to the offset feature map and the original feature map; in addition, after the feature points in the original feature map are shifted through the offset feature map, the region features belonging to the same object in the original feature map can be gathered more, which is equivalent to expanding the receptive field of the neural network in the process of generating the main feature map, so that the semantic segmentation image generated based on the edge feature map and the main feature map has higher precision.

The following is a detailed description of the steps 101 to 104, and it should be noted that, the methods in the steps 101 to 104 are all executed by a neural network, and the semantic segmentation image is obtained by processing an image to be segmented through the neural network.

Aiming at step 101,

The neural network comprises a convolution neural sub-network, and the first feature extraction is performed on the image to be segmented, namely the image to be segmented is input into the convolution neural sub-network in the neural network, and the convolution operation is performed for a plurality of times. In one possible implementation, the convolutional neural sub-network may be a depth residual network resnet.

Aiming at step 102,

When generating an offset feature map corresponding to the original feature map based on the original feature map, reference may be made to a method as shown in fig. 2, which includes the following steps:

step 201, extracting features of the original feature map, and generating a depth feature map corresponding to the original feature map.

In specific implementation, when the original feature map is subjected to feature extraction to generate a depth feature map corresponding to the original feature map, the original feature map may be subjected to downsampling processing to extract high-level features corresponding to the original feature map, and then the downsampled feature map is subjected to upsampling processing to obtain the depth feature map corresponding to the original feature map.

After the original feature map is subjected to the downsampling process, the size of the feature map after the downsampling process is smaller than that of the original feature map, so that the downsampling process can be performed on the feature map after the downsampling process again, at this time, the features contained in the feature map after the upsampling process are identical to those contained in the feature map after the downsampling process, but the size of the feature map after the upsampling process is consistent with that of the original feature map, and the feature map after the upsampling process is the depth feature map corresponding to the original feature map.

Step 202, generating an offset feature map corresponding to the original feature map according to the original feature map and the depth feature map.

In specific implementation, when an offset feature map corresponding to an original feature map is generated according to the original feature map and a depth feature map, the original feature map and the depth feature map can be cascaded first, and then feature extraction is performed on the feature map after the cascade, so that the offset feature map is obtained.

When the original feature map and the depth feature map are cascaded, for example, if the original feature map has a size of h×w×c and the depth feature map has the same size as the original feature map, and is also h×w×c, the size of the feature map after cascading is h×w×2c, and then the feature map after cascading is subjected to feature extraction to obtain an offset feature map, where the size of the offset feature map is also the same as the size of the original feature map and is h×w×c; where H W represents length and width and C represents the number of channels.

The feature extraction of the feature map after cascade may be a convolution operation of the feature map after cascade, where the size of the convolution kernel is preset, and in specific implementation, the convolution kernel may be adjusted according to the actual situation.

Aiming at step 103,

In an example of the present disclosure, when generating an edge feature map and a main feature map corresponding to an image to be segmented based on an offset feature map and an original feature map, reference may be made to a method shown in fig. 3, which includes the following steps:

step 301, generating a main feature map corresponding to the image to be segmented based on the offset feature map and the original feature map.

Specifically, when the main feature map corresponding to the image to be segmented is generated based on the offset feature map and the original feature map, each feature point in the original feature map can be offset according to a value, which is required to be offset, corresponding to the feature point in the offset feature map, so as to obtain an intermediate feature map corresponding to the original feature map, and then the value of each feature point in the intermediate feature map is subjected to bilinear difference calculation according to the weight of the corresponding position in the offset feature map so as to obtain the main feature map corresponding to the original feature map.

Because the size of the offset feature map is the same as that of the original feature map, each position in the original feature map is in one-to-one correspondence with the position in the offset feature map, and the value of each position in the offset feature map comprises a value which needs to be offset and corresponds to the feature point at the position in the original feature map, and each feature point in the original feature map can be offset according to the value which needs to be offset and corresponds to the position of the feature point.

Exemplary, if the coordinates of position A in the original feature map are (x ₁ ，y ₁ ) The position corresponding to the position a in the offset feature map is the position B, and the value of the position B to be offset is (x ₂ ，y ₂ ) The value of the position A is shifted to the coordinate (x ₁ +x ₂ ，y ₁ +y ₂ ) Corresponding positions.

After each feature point is shifted, the feature points corresponding to the same object region are gathered together, so that the feature points belonging to the main body part in the original feature map can be gathered more by shifting the feature map.

The value of each position in the offset characteristic diagram comprises a weight besides the value required to be offset, wherein the weight is the weight of each position point when bilinear difference is carried out. Specifically, when the value of each feature point in the intermediate feature map is calculated according to the weight of the corresponding position in the offset feature map, the following formula may be referred to:

wherein F is _body Representing a principal sketch, F _body (P _x ) Representing feature points P in a principal feature diagram _x Is of value of omega _p The weight of the p-th feature point in the offset diagnosis map is represented, N (pl) represents the feature point (typically, four feature points) adjacent to the feature point pl in the intermediate feature map after the original feature map is offset, and F (p) represents the value of the p-th feature point in the intermediate feature map.

And 302, generating an edge feature map corresponding to the image to be segmented based on the original feature map and the main feature map.

The original feature map includes a main feature and an edge feature, and after the main feature is obtained in step 301, other features in the original feature map except the main feature may be used as the edge feature.

Specifically, the values of the feature points at the positions corresponding to the original feature map and the main feature map may be subtracted, and according to the subtracted values, an edge feature map corresponding to the image to be segmented is generated.

In a possible implementation manner, the second feature extraction may be performed on the image to be segmented to obtain a low-level feature map, the convolution number corresponding to the low-level feature map is smaller than the convolution number corresponding to the original feature map, and then the edge feature map corresponding to the image to be segmented is generated based on the low-level feature map and the feature map obtained after subtraction.

When the second feature extraction is performed on the image to be segmented, the neural network used when the first feature extraction is performed on the image to be segmented in step 101 may be used, the more the number of convolutions is performed on the image to be segmented, the more the main feature in the image to be segmented is highlighted, the more the edge feature is weakened, the less the number of convolutions is, the more the edge feature is highlighted, the more the main feature is weakened, and the lower-level feature map is used for supplementing edge details, so that the number of convolutions corresponding to the lower-level feature map is smaller than the number of convolutions corresponding to the original feature map.

In specific implementation, when an edge feature map corresponding to an image to be segmented is generated, the values of feature points of the original feature map and the main feature map at corresponding positions can be subtracted to obtain an initial edge feature map, then the initial edge feature map and the initial edge feature map are cascaded, and feature extraction is performed on the feature map after the cascade connection to obtain the edge feature map corresponding to the image to be segmented.

The feature extraction of the feature map after cascading may be a convolution operation of the feature map after cascading, so as to obtain an edge feature map corresponding to the image to be segmented.

For step 104,

The edge feature map and the main feature map have the same size as the original feature map, and when the semantic segmentation image corresponding to the image to be segmented is generated based on the edge feature map and the main feature map, the values of the feature points of the edge feature map and the main feature map at the corresponding positions can be added to obtain the semantic feature map corresponding to the image to be segmented, and then convolution operation is carried out on the semantic feature map to obtain the semantic segmentation image corresponding to the image to be segmented.

The image semantic segmentation method will be described below with reference to the detailed drawings.

Referring to fig. 4, a schematic diagram of an image semantic segmentation method according to an embodiment of the present disclosure is shown, where the method includes two parts, one is a process of generating a main feature map, and the other is a generating part of an edge feature map.

First, the process of generating the main feature map will be described. F in the figure represents an original feature image corresponding to an image to be segmented, firstlyDownsampling the original feature map F to obtain a feature map F _low Then for the characteristic diagram F _low Performing up-sampling processing to obtain a depth feature map F corresponding to the original feature map F _α The method comprises the steps of carrying out a first treatment on the surface of the And then the original characteristic diagram F and the depth characteristic diagram F _α Cascading (registration) is carried out, then convolution operation is carried out through a convolution kernel of 3*3, and feature extraction is carried out, so that an offset feature map delta is obtained; then performing warp operation according to the original feature map F and the offset feature map delta (namely performing offset according to the value required to be offset, and performing bilinear difference calculation according to the weight), and finally obtaining a main feature map F _body 。

In the process of generating the edge feature map, the main feature map F is subtracted from the original feature map F _body Is to obtain a feature map F _β Then, feature map F _β And a low-level feature map F _fine Performing concatenation (registration), and performing convolution operation through a convolution kernel of 1*1 after concatenation to obtain an edge feature map F _edge 。

Obtaining a main feature map F _body And edge feature map F _edge Thereafter, the main body characteristic diagram F _body And edge feature map F _edge The values of the feature points at the corresponding positions are added to obtain a semantic feature map F _final Then for semantic feature diagram F _final And performing convolution operation through a convolution kernel of 1*1 to obtain a semantic segmentation image corresponding to the image to be segmented.

The training process of the neural network used in the above process will be described below. Referring to fig. 5, a flowchart of a neural network training method provided by an embodiment of the disclosure includes the following steps:

step 501, a sample image with first labeling information and second labeling information is obtained, wherein the first labeling information is a label added to a pixel area of a target object in the sample image, and the second labeling information is a label added to an edge of the target object in the sample image.

The labeling information of the sample image may be adding a label to each pixel point in the sample image, or adding a label to a pixel region of a target object of the sample image, where the target object includes an object in the sample image and a background in the sample image.

In one possible implementation, the same label may be added to the pixel regions belonging to the same target object in the sample image, and the labels are different between different target objects.

When the second labeling information is added to the sample image, a label may be added to each pixel in the sample image, where the label is used to indicate whether the pixel is an edge pixel. For example, a 0-1 label may be added to each pixel, 0 indicating that the pixel is not an edge pixel, and 1 indicating that the pixel is an edge pixel.

Step 502, inputting the sample image into the neural network to obtain an edge feature map, a main feature map and a semantic feature map corresponding to the sample image.

Corresponding to the graph shown in FIG. 4, after the sample image is input into the neural network, an edge feature graph F corresponding to the sample image can be obtained _edge Main body characteristic diagram F _body Semantic feature map F _final 。

Step 503, determining a predicted edge image corresponding to the sample image based on the edge feature map; and determining a predicted subject image corresponding to the sample image based on the subject feature map; and determining a predicted semantic segmentation image corresponding to the sample image based on the semantic feature map.

When determining a predicted edge image corresponding to the sample image based on the edge feature image, performing convolution operation on the edge feature image through convolution check of 1*1 to obtain the predicted edge image; when determining a predicted main body image corresponding to the sample image based on the main body feature map, performing convolution operation on the main body feature map through convolution check of 1*1 to obtain the predicted main body image; when the predicted semantic segmentation image corresponding to the sample image is determined based on the semantic feature image, the semantic feature image can be subjected to convolution operation through convolution check of 1*1, so that the predicted semantic segmentation image is obtained.

Step 504, determining a loss value in the training process based on the predicted edge image, the predicted main image, the predicted semantic segmentation image, and the first labeling information and the second labeling information of the sample image, and training the neural network based on the loss value.

In a specific implementation, when determining the loss value in the training process based on the predicted edge image, the predicted main image, the predicted semantic segmentation image, and the first labeling information and the second labeling information of the sample image, the method may include the following three parts:

first, a first loss value is determined based on first labeling information of the prediction subject image and the sample image.

The first loss value may be used to represent accuracy of the main body portion identification in the training process.

And (II) determining a second loss value based on the predicted edge image, the first labeling information and the second labeling information of the sample image.

The second loss value comprises two loss values, one loss value is used for identifying whether the pixel point is an edge pixel point or not, and the other loss value is used for identifying the semantic identification of the edge pixel point.

Specifically, when determining the second loss value, the first edge prediction loss (i.e. the loss of identifying whether the pixel point is an edge pixel point) in the training process can be determined based on the second labeling results of the predicted edge image and the sample image; and determining second edge prediction loss (namely, loss value for identifying the semantic recognition of the edge pixel points) in the training process based on the first labeling information of the predicted edge image and the sample image, then carrying out weighted summation on the first edge prediction loss and the second edge prediction loss, and taking the summation result as a second loss value.

When determining the second edge prediction loss in the training process, to improve the calculation efficiency, the cross entropy loss value corresponding to each pixel point in the predicted edge image may be calculated first, then the top K pixel points are selected for optimization according to the order of the cross entropy loss values from large to small, then the pixel point with the confidence coefficient greater than the preset confidence coefficient threshold value in the top K pixel points is selected as the target pixel point, the cross entropy loss value of the target pixel point is calculated, and the calculation result is used as the second edge prediction loss.

Here, when the edge pixel points are predicted semantically, since the edge pixel points may be pixel points located between two objects, for example, if the image includes "people" and "vehicles", the model has a relatively large prediction difficulty when predicting the pixel points at the intersection position of the people and the vehicles in the image if the people are outside the vehicles.

Based on this, the method provided in the present disclosure calculates the second edge prediction loss by adopting a difficult sample mining manner, specifically, may calculate the second edge prediction loss by using the loss value of the pixel point with the cross entropy loss arranged in front K bits, the higher the cross entropy loss value corresponding to the pixel point is, the lower the accuracy of model prediction may be, select the pixel point with the higher cross entropy loss value (i.e. the difficult sample) from the multiple pixel points of the sample image, and calculate the second edge prediction loss value based on these selected pixel points, when the network parameter is adjusted based on the second edge prediction loss value, the identification accuracy of the difficult sample can be enhanced, so as to improve the identification accuracy of the neural network for edge prediction.

Illustratively, the second edge prediction loss may be calculated according to the following formula:

k represents the number of pixel points to be optimized and is a preset value; n represents the number of pixel points in the sample image;the weight of the ith pixel point is represented as a preset value; 1[]To indicate the function, when meeting []The value of 1 is 1 when the condition in (a) is not satisfied]In the case of the conditions in (1)A value of 0; />The cross entropy loss of the ith pixel point is shown to be arranged in the front K bits, namely the ith pixel point belongs to the pixel point to be optimized; sigma (b) _i )＞t _b Indicating that the confidence coefficient corresponding to the ith pixel point is greater than a preset confidence coefficient threshold t _b Meets the two conditions, 1[ []The value of (1) is 1, the two conditions are not satisfied, 1[ []The value of (2) is 0; />The probability that the predicted result of the ith pixel point is the labeling result of the ith pixel point is represented.

And (III) determining a third loss value based on the predicted semantic segmentation image and the first labeling information of the sample image.

After the first loss value, the second loss value and the third loss value are calculated through the method, the first loss value, the second loss value and the third loss value can be weighted and summed according to the weights corresponding to the first loss value, the second loss value and the third loss value respectively, and the loss value in the training process is obtained.

The training process of the neural network will be described in detail below with reference to specific embodiments.

Referring to fig. 6, a schematic diagram of a neural network training process according to an embodiment of the present disclosure is shown, firstly, an image to be segmented (may be an RGB image) is input into a depth residual error network, and an original feature map F and a low-level feature map F corresponding to the image to be segmented are output _fine Then, carrying out semantic segmentation on the original feature image F through a semantic segmentation algorithm (ASPP module shown in fig. 6), and segmenting a main feature part and an edge feature part in the original feature image to obtain a main feature image; then adding a low-level feature map F into the segmented edge feature portions _fine Obtaining edge feature images, and then respectively determining first loss values L of the main feature images _body And a second loss value L of the edge feature map _edge Determining semantic features according to the edge feature map and the main feature mapSign F _final Thereafter, the semantic feature map F is determined _final Third loss value L of (2) _final Finally according to the first loss value L _body Second loss value L _edge And a third loss value L _final And calculating a loss value in the training process, and training the neural network according to the loss value in the training process.

The loss value in the training process can be determined by the following formula:

L＝λ ₁ L _body +λ ₂ L _edge +λ ₃ L _final

wherein lambda is ₁ 、λ ₂ 、λ ₃ Respectively representing weights corresponding to the first loss value, the second loss value and the third loss value, L _body Represents a first loss value, L _edge Representing a second loss value, L _final The third loss value is represented, and L represents the loss value in the present training process.

In the training process, the edge feature part and the main feature part of the original feature diagram are separately supervised, so that compared with the prior art, all the loss values are added together to carry out supervision training, and the neural network trained by the method can be trained in a targeted mode, and the segmentation accuracy is higher.

In addition, the embodiment of the disclosure further provides an intelligent running control method, referring to fig. 7, which is a schematic flow chart of the intelligent running control method provided by the embodiment of the disclosure, and the method comprises the following steps:

step 701, acquiring an image acquired by a running device in the running process.

The traveling device includes, but is not limited to, an autonomous vehicle, a vehicle equipped with an advanced driving assistance system (Advanced Driving Assistance System, ADAS), a robot, or the like.

Step 702, performing semantic segmentation on the image.

In specific implementation, the image may be semantically segmented by using the image semantic segmentation method shown in fig. 1.

Step 703, controlling the driving device based on the semantic segmentation result.

When the running device is controlled, the running device can be controlled to accelerate, decelerate, turn, brake and the like, or voice prompt information can be played to prompt a driver to control the running device to accelerate, decelerate, turn, brake and the like.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

Based on the same inventive concept, the embodiment of the disclosure further provides an image semantic segmentation device corresponding to the image semantic segmentation method, and since the principle of solving the problem by the device in the embodiment of the disclosure is similar to that of the image semantic segmentation method in the embodiment of the disclosure, the implementation of the device can refer to the implementation of the method, and the repetition is omitted.

Referring to fig. 8, an architecture diagram of an image semantic segmentation apparatus according to an embodiment of the disclosure is shown, where the apparatus includes: a feature extraction module 801, a first generation module 802, a second generation module 803, an image segmentation module 804, and a training module 805; wherein,

the feature extraction module 801 is configured to perform first feature extraction on an image to be segmented to obtain an original feature map;

a first generating module 802, configured to generate an offset feature map corresponding to the original feature map based on the original feature map, where a value of each feature point in the offset feature map represents a value that needs to be offset for a feature point in the original feature map at a position corresponding to the position of the feature point;

a second generating module 803, configured to generate an edge feature map and a main feature map corresponding to the image to be segmented based on the offset feature map and the original feature map; the edge feature map comprises object edge features in the image to be segmented, and the main feature map comprises object main features in the image to be segmented;

The image segmentation module 804 is configured to generate a semantic segmentation image corresponding to the image to be segmented based on the edge feature map and the main feature map.

In a possible implementation manner, the first generating module 802 is configured to, when generating, based on the original feature map, an offset feature map corresponding to the original feature map:

In a possible implementation manner, the first generating module 802 is configured to, when performing feature extraction on the original feature map, generate a depth feature map corresponding to the original feature map:

In a possible implementation manner, the first generating module 802 is configured to, when generating an offset feature map corresponding to the original feature map according to the original feature map and the depth feature map:

In a possible implementation manner, the second generating module 803 is configured to, when generating, based on the offset feature map and the original feature map, an edge feature map and a main feature map corresponding to the image to be segmented:

In a possible implementation manner, the second generating module 803 is configured to, when generating the main feature map corresponding to the image to be segmented based on the offset feature map and the original feature map:

In a possible implementation manner, the second generating module 803 is configured to, when generating an edge feature map corresponding to the image to be segmented based on the original feature map and the main feature map:

In a possible implementation manner, the feature extraction module 801 is further configured to:

the image segmentation module 804 is configured to, when subtracting the values of the feature points at the positions corresponding to the original feature map and the main feature map, generate an edge feature map corresponding to the image to be segmented according to the subtracted values:

In a possible implementation manner, the image segmentation module 804 is configured to, when generating a semantic segmentation image corresponding to the image to be segmented based on the edge feature map and the principal feature map:

the apparatus further comprises a training module 805, the training module 805 being configured to train the neural network according to the following method:

In a possible implementation manner, the training module 805 is configured to, when determining the loss value in the training process based on the predicted edge image, the predicted main image, the predicted semantic segmentation image, and the first label information and the second label information of the sample image:

In a possible implementation manner, the training module 805 is configured to, when determining the second loss value based on the predicted edge image, the first labeling information of the sample image, and the second labeling information,:

In a possible implementation manner, the training module 805 is configured to, when determining a loss value in the current training process based on the first loss value, the second loss value, and the third loss value:

The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.

Based on the same inventive concept, the embodiment of the disclosure further provides an intelligent running control device corresponding to the intelligent running control method, and since the principle of solving the problem by the device in the embodiment of the disclosure is similar to that of the intelligent running control method in the embodiment of the disclosure, the implementation of the device can be referred to the implementation of the method, and the repetition is omitted.

Referring to fig. 9, an architecture diagram of an intelligent driving control device according to an embodiment of the disclosure includes an obtaining module 901, an image dividing module 902, and a control module 903; specific:

the image segmentation module is used for carrying out semantic segmentation on the image through the image semantic segmentation method provided by the embodiment of the disclosure;

Based on the same technical conception, the embodiment of the application also provides computer equipment. Referring to fig. 10, a schematic diagram of a computer device 1000 according to an embodiment of the present application includes a processor 1001, a memory 1002, and a bus 1003. The memory 1002 is configured to store execution instructions, including a memory 10021 and an external memory 10022; the memory 10021 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 1001 and data exchanged with the external memory 10022 such as a hard disk, and the processor 1001 exchanges data with the external memory 10022 through the memory 10021, and when the computer device 1000 operates, the processor 1001 and the memory 1002 communicate with each other through the bus 1003, so that the processor 1001 executes the following instructions:

Based on the same technical conception, the embodiment of the application also provides computer equipment. Referring to fig. 11, a schematic structural diagram of a computer device 1100 according to an embodiment of the present application includes a processor 1101, a memory 1102, and a bus 1103. The memory 1102 is used for storing execution instructions, including a memory 11021 and an external memory 11022; the memory 11021 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 1101 and data exchanged with the external memory 11022 such as a hard disk, and the processor 1101 exchanges data with the external memory 11022 through the memory 11021, and when the computer device 1100 operates, the processor 1101 and the memory 1102 communicate with each other through the bus 1103, so that the processor 1101 executes the following instructions:

Acquiring an image acquired by a running device in the running process;

performing semantic segmentation on the image by using the image semantic segmentation method provided by the embodiment;

and controlling the driving device based on the semantic segmentation result.

The disclosed embodiments also provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor performs the steps of the image semantic segmentation, intelligent travel control method described in the above method embodiments. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.

The computer program product of the image semantic segmentation method provided by the embodiment of the disclosure includes a computer readable storage medium storing program codes, where the instructions included in the program codes may be used to execute the steps of the image semantic segmentation and intelligent driving control method described in the above method embodiment, and specifically, reference may be made to the above method embodiment, which is not repeated herein.

The disclosed embodiments also provide a computer program which, when executed by a processor, implements any of the methods of the previous embodiments. The computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present disclosure, and are not intended to limit the scope of the disclosure, but the present disclosure is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, it is not limited to the disclosure: any person skilled in the art, within the technical scope of the disclosure of the present disclosure, may modify or easily conceive changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features thereof; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. An image semantic segmentation method, comprising:

generating an offset feature map corresponding to the original feature map based on the original feature map, wherein the value of each feature point in the offset feature map represents the value of the feature point to be offset at the position corresponding to the position of the feature point in the original feature map; the method comprises the steps of generating an offset feature map corresponding to an original feature map based on the original feature map, wherein the step of extracting features of the original feature map comprises the step of generating a depth feature map corresponding to the original feature map; generating an offset feature map corresponding to the original feature map according to the original feature map and the depth feature map;

generating a semantic segmentation image corresponding to the image to be segmented based on the edge feature image and the main feature image; the generating, based on the offset feature map and the original feature map, an edge feature map and a main feature map corresponding to the image to be segmented includes:

calculating bilinear difference values of the values of each feature point in the intermediate feature map according to the weights of the corresponding positions in the offset feature map, and obtaining a main feature map corresponding to the original feature map;

2. The method of claim 1, wherein the performing feature extraction on the original feature map to generate a depth feature map corresponding to the original feature map includes:

3. The method according to claim 1, wherein the generating an offset feature map corresponding to the original feature map according to the original feature map and the depth feature map includes:

4. The method according to claim 1, wherein the method further comprises:

5. The method according to claim 1, wherein the generating the semantic segmentation image corresponding to the image to be segmented based on the edge feature map and the body feature map comprises:

6. The method according to claim 1, wherein the semantically segmented image is obtained by processing the image to be segmented through a neural network;

the neural network is trained by the following method:

7. The method of claim 6, wherein determining the loss value during the present training based on the predicted edge image, the predicted subject image, the predicted semantic segmentation image, and the first and second labeling information of the sample image comprises:

8. The method of claim 7, wherein the determining a second loss value based on the predicted edge image, the first annotation information for the sample image, and the second annotation information comprises:

9. An intelligent travel control method is characterized by comprising the following steps:

acquiring an image acquired by a running device in the running process;

semantically segmenting the image by a semantic segmentation method based on the image according to any one of claims 1 to 8;

and controlling the driving device based on the semantic segmentation result.

10. An image semantic segmentation apparatus, comprising:

the first generation module is used for generating an offset feature map corresponding to the original feature map based on the original feature map, wherein the value of each feature point in the offset feature map represents the value of the feature point, which is required to be offset, at the position corresponding to the position of the feature point in the original feature map; the first generation module is used for carrying out feature extraction on the original feature map and generating a depth feature map corresponding to the original feature map when generating an offset feature map corresponding to the original feature map based on the original feature map; generating an offset feature map corresponding to the original feature map according to the original feature map and the depth feature map;

the image segmentation module is used for generating a semantic segmentation image corresponding to the image to be segmented based on the edge feature image and the main feature image; the second generating module is configured to, when generating an edge feature map and a main feature map corresponding to the image to be segmented based on the offset feature map and the original feature map:

11. An intelligent travel control device, comprising:

an image segmentation module for semantically segmenting the image by an image semantic segmentation method based on any one of claims 1 to 8;

12. A computer device, comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory communicating via the bus when the computer device is running, said machine readable instructions when executed by said processor performing the steps of the image semantic segmentation method according to any one of claims 1 to 8 or the steps of the intelligent ride control method according to claim 9.

13. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the image semantic segmentation method according to any one of claims 1 to 8 or performs the steps of the intelligent travel control method according to claim 9.