CN113421267B

CN113421267B - Point cloud semantic and instance joint segmentation method and system based on improved PointConv

Info

Publication number: CN113421267B
Application number: CN202110495434.3A
Authority: CN
Inventors: 顾寄南; 张文浩
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2021-05-07
Filing date: 2021-05-07
Publication date: 2024-04-12
Anticipated expiration: 2041-05-07
Also published as: CN113421267A

Abstract

The invention provides a point cloud semantic and instance joint segmentation method and system based on an improved PointConv, wherein point cloud acquired by a laser radar or a depth camera is used as input of an improved PointConv feature extraction module, points passing through a shared coding module are subjected to semantic segmentation decoding and instance segmentation decoding simultaneously to obtain instance feature prediction and semantic feature prediction, and a bidirectional self-attention module performs feature fusion on the semantic feature prediction and the instance feature prediction obtained by the improved PointConv feature extraction module, and performs instance segmentation and semantic segmentation respectively to obtain instance information containing semantic features and semantic information containing instance features. The method and the device improve the speed of example segmentation and reduce the dependence on semantic segmentation precision.

Description

Point cloud semantic and instance joint segmentation method and system based on improved PointConv

Technical Field

The invention belongs to the technical field of point cloud segmentation, and particularly relates to a point cloud semantic and instance joint segmentation method and system based on improved PointConv.

Background

In the process of extracting image features, the neural network has strong feature learning capability, so that the image semantics and the example segmentation task in the field of computer vision are greatly broken through. After the PointNet algorithm, the end-to-end point cloud segmentation algorithm is rapidly developed, but the following defects still exist: (1) when searching KNN or Radius NN, because of the difference of the sequence of the search points, the disorder of the point cloud is caused, most methods use mlp and maxpooling forms to extract the characteristics, and the extracted point characteristics cannot acquire the interaction between the local geometric shape of the point cloud and the points; (2) the first step of the current point cloud algorithm adopts the furthest point sampling, belongs to non-uniform sampling, causes a great deal of aggregation of certain local area points, disappears partial area points, and has weak characteristic learning ability; (3) most networks combine semantic segmentation with instance segmentation tasks in a serial fashion, with the problems of sub-optimal, inefficient, and too strong of a dependency of the two.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a point cloud semantic and instance joint segmentation method and system based on improved PointConv, which improve the speed of instance segmentation and reduce the dependence on semantic segmentation precision.

The present invention achieves the above technical object by the following means.

A point cloud semantic and instance joint segmentation method based on improved PointConv specifically comprises the following steps:

the obtained point cloud input improved PointConv feature extraction module obtains points with feature dimension of 512 through the shared coding module, and the points with feature dimension of 512 are subjected to semantic segmentation decoding and instance segmentation decoding simultaneously to obtain instance feature prediction F _ins And semantic feature prediction F _sem The method comprises the steps of carrying out a first treatment on the surface of the The instance segmentation decoding part introduces a context aggregation module and a gating propagation module, so that the learning of the characteristics is enhanced;

the bidirectional self-attention module improves the semantic feature prediction F obtained by the PointConv feature extraction module _sem And example feature prediction F _ins And carrying out feature fusion, and respectively carrying out instance segmentation and semantic segmentation to obtain instance information containing semantic features and semantic information containing the instance features.

According to a further technical scheme, the point cloud of the input improved PointConv feature extraction module comprises xyz normalized absolute coordinates of points, rgb color information and relative coordinates x ' y ' z ' of the points relative to a local coordinate system.

According to a further technical scheme, the example segmentation decoding specifically comprises the following steps:

deconvolution up-sampling of the final result of the shared encoding module to N using PointDeconv _d A plurality of points, N _d The characteristics of each point are input into a context aggregation module, the characteristics Q, the characteristics K and the characteristics V are obtained by three times of 1x1 convolution, the characteristics Q and the transposed characteristics K are subjected to matrix multiplication, and then the weighting matrix W is obtained by sigmoid compression ₁ The weight matrix W ₁ Performing matrix multiplication on the characteristic V, and adding the obtained result with the characteristic V element by element to obtain a final aggregation characteristic;

final polymerization characteristics F _dec Feature F output by shared encoding module PointConv_3 _enc Input as a gated propagation moduleF is to F _dec And F _enc Performing channel splicing to obtain F _con Then 1x1 convolution and sigmoid compression are carried out to obtain N _d Weight matrix W of x1 ₂ The method comprises the steps of carrying out a first treatment on the surface of the Matrix the weights W ₂ Tiling 256 times in the feature dimension with F _enc Multiplying element by element to obtain F _enc ' weight matrix 1-W ₂ Tiling 256 times in the feature dimension with F _dec Multiplying element by element to obtain F _dec ' then F _enc ' and F _dec ' channel splicing is carried out, and a final result is output;

will F _dec And F _enc The two parts of characteristics are fused to finish the decoding operation of the first step to obtain N with 128 dimensionalities _c A plurality of points; in a similar operation, the points are up-sampled to N with feature dimension 128 _b The method comprises the steps of carrying out a first treatment on the surface of the Finally, through PointDeconv, the point is up-sampled to the input point N _a Feature dimension 128 is maintained, resulting in example feature prediction F _ins 。

According to a further technical scheme, the semantic segmentation decoding specifically comprises the following steps: gradually up-sampling in deconvolution mode until the point number is sampled to an input point N with a characteristic dimension of 128 _a Obtaining semantic feature prediction F _sem 。

According to a further technical scheme, the example segmentation is specifically as follows:

semantic feature prediction F _sem And example feature prediction F _ins Into STOI module, F _sem Meanwhile, through two times of 1x1 convolutions, the two results are multiplied after being transposed, and then a weight matrix is obtained through sigmoid compression, and the weight matrix is combined with F _ins Multiplying by F _ins Splicing to obtain example feature F with semantic information _stoi After buffer of two full-connection layers, N is finally obtained _a ×N _e Is embedded in F _s ′ _toi And performing clustering operation once after repeated back propagation optimization, and completing instance segmentation.

According to a further technical scheme, the semantic segmentation specifically comprises the following steps:

dividing the example into sections through the full connection layer Fc1Buffering to obtain N _a Example feature information F of x 128 _i ′ _ns And F is equal to _sem Input ITOS module, input F _i ′ _ns Meanwhile, through two times of 1x1 convolutions, the transposed multiplication is carried out on the two convolutions, and then the weighting matrix is obtained through sigmoid compression, and the weighting matrix and F are obtained _sem Multiplying by F _sem Splicing to obtain semantic features F with instance information _itos Then the N is obtained through the full connection layer _a ×N _c After repeated back propagation optimization, the semantic segmentation is completed through one argmax.

A point cloud semantic and instance joint segmentation system based on improved PointConv, comprising:

improved PointConv feature extraction module for obtaining example feature predictions F _ins And semantic feature prediction F _sem ；

Bidirectional self-attention module, predict F for instance features _ins And semantic feature prediction F _sem And carrying out feature fusion, and respectively carrying out instance segmentation and semantic segmentation.

In the above technical solution, the input channel of the improved poinconv feature extraction module is 9, which represents xyz normalized absolute coordinates of the points, rgb color information, and relative coordinates x ' y ' z ' of the points with respect to the local coordinate system, respectively.

The beneficial effects of the invention are as follows:

(1) The improved PointConv feature extraction module adds a context aggregation module and a gating propagation module to the instance segmentation part, enhances the instance information through weight learning, and improves the precision of joint segmentation.

(2) The improved PointConv feature extraction module adopts the joint segmentation parallel branches, and simultaneously obtains semantic feature prediction F _sem And example feature prediction F _ins The baseline work is used for improving the speed of instance segmentation and reducing the dependence on semantic segmentation precision.

(3) The bidirectional self-attention module adopts the STOI module and the ITOS module to fuse the semantic features and the instance features to obtain instance information endowed with the semantic features and semantic information rich in the instance features, and the mutual promotion of the two tasks is completed in a soft constraint mode.

Drawings

FIG. 1 is a flow chart of a point cloud semantic and instance joint segmentation method based on improved PointConv;

FIG. 2 is a block diagram of a context aggregation module according to the present invention;

FIG. 3 is a block diagram of a gating propagation module according to the present invention;

FIG. 4 is a block diagram of a STOI module according to the present invention;

fig. 5 is a block diagram of an ITOS module according to the present invention.

Detailed Description

The invention will be further described with reference to the drawings and the specific embodiments, but the scope of the invention is not limited thereto.

As shown in FIG. 1, the point cloud semantic and instance joint segmentation system based on the improved PointConv comprises an improved PointConv feature extraction module and a bidirectional self-attention module. The processing object of the point cloud semantic and instance joint segmentation system based on the improved PointConv is as follows: point cloud acquired by laser radar or depth camera, and the point cloud is subjected to an improved PointConv feature extraction module to obtain example feature prediction F _ins And semantic feature prediction F _sem Example feature prediction F _ins And semantic feature prediction F _sem And then carrying out feature fusion through a bidirectional self-attention module to obtain instance information containing semantic features and semantic information containing instance features.

Table 1 is a specific network structure table of the point cloud semantic and instance joint segmentation system based on the improved PointConv.

Table 1 network structure table

With continued reference to fig. 1, the point cloud semantic and instance joint segmentation method based on the improved PointConv specifically includes the following steps:

the input channel of the improved PointConv feature extraction module is 9, and the input channel respectively represents xyz normalized absolute coordinates of points, rgb color information and relative coordinates x ' y ' z ' of the points relative to a local coordinate system; the relative coordinates of the local coordinate system are introduced and are mainly used for guaranteeing the translation invariance of the input points. N to be input _a The point is convolved by the shared encoding module PointConv_1 (each PointConv has BN operation) to obtain N _b The input points are put into a high-dimensional space to have 64-dimensional characteristics, and then are convolved through the PointConv_2, the PointConv_3 and the PointConv_4 in sequence, so that the characteristic dimension of the points is increased to 512 dimensions, and enough characteristic information is used for a later decoding part.

The decoding section is divided into two branches: one for semantic segmentation decoding and the other for instance segmentation decoding. The instance segmentation decoding part introduces a context aggregation module and a gating propagation module to enhance the learning of the features, and the specific operation is as follows: deconvolution up-sampling of the final result of the shared encoding module to N using PointDeconv (with BN operation each time) first _d A plurality of points, N _d Feature input (corresponding to Fin in FIG. 2) of each point is subjected to three times of 1x1 convolutions to obtain feature Q, feature K and feature V in a subsection manner, matrix multiplication is performed on the feature Q and the transposed feature K, and then a weight matrix W is obtained through sigmoid compression ₁ The weight matrix W ₁ Performing matrix multiplication on the characteristic V, and adding the obtained result with the characteristic V element by element to obtain a final aggregation characteristic; the context aggregation module weights the features in a weight learning mode, so that effective features are enhanced, and ineffective features are weakened; the final aggregate feature (corresponding to F in FIG. 3 _dec ) Features output by the shared encoding module PointConv_3 (pairF in FIG. 3 _enc ) As input to the gating propagation module, will F _dec And F _enc Performing channel splicing to obtain F _con (512-dimensional characteristics), and then carrying out 1x1 convolution and sigmoid compression to obtain N _d Weight matrix W of x1 ₂ The weight matrix W ₂ Tiling 256 times in the feature dimension with F _enc Multiplying element by element to obtain F _enc ' weight matrix 1-W ₂ Tiling 256 times in the feature dimension with F _dec Multiplying element by element to obtain F _dec ' then F _enc ' and F _dec ' channel splicing is carried out, and a final result is output; the gating propagation module screens out effective features in the two parts of features in the form of learning weights, so that circulation of irrelevant information is reduced; finally F is arranged _dec And F _enc The two parts of characteristics are fused to finish the decoding operation of the first step to obtain N with 256 dimensions _c A point. In a similar manner, the number of points is up-sampled to N _b Feature dimension 128; finally, the point is up-sampled to the input point N through the PointDeconv _a Feature dimension 128 is maintained, resulting in example feature prediction F _ins . The semantic segmentation decoding part adopts a deconvolution mode to gradually up-sample until the point number is sampled to an input point N _a (feature dimension 128), resulting in semantic feature prediction F _sem 。

The bidirectional self-attention module improves the semantic feature prediction F obtained by the PointConv feature extraction module _sem And example feature prediction F _ins And carrying out feature fusion to obtain instance information containing semantic features and semantic information containing instance features. The method comprises the following steps:

example segmentation: semantic feature prediction F _sem And example feature prediction F _ins Into the STOI module, see FIG. 4, at which point F will be _sem Meanwhile, through two 1x1 convolutions, the two results are multiplied after being transposed, and then a weight matrix is obtained through sigmoid compression, and the weight matrix and the example characteristic prediction F are obtained _ins Multiplied by and then with example feature prediction F _ins Splicing to obtain example feature F with semantic information _stoi Through two layers of full connectionAfter buffer of the connection layers (Fc 1, fc 2), N is finally obtained _a ×N _e Is embedded in F _s ′ _toi And after the back propagation optimization is performed for a plurality of times, only one mean-shift clustering operation is needed to be performed on the part, and then the instance segmentation is completed.

Semantic segmentation: buffering the example split by Fc1 to obtain N _a Example feature information F of x 128 _i ′ _ns Prediction of initial semantic features F _sem As input into the ITOS module, see fig. 5, at which point F will be _i ′ _ns Meanwhile, through two times of 1x1 convolutions, the transposed multiplication is carried out on the two convolutions, then the weighting matrix is obtained through sigmoid compression, and the weighting matrix and semantic feature prediction F are obtained _sem Multiplying and then predicting F by semantic features _sem Splicing to obtain semantic features F with instance information _itos Then go through a full-link layer (Fc) to obtain N _a ×N _c After repeated back propagation optimization, only one argmax is needed to be done finally to complete semantic segmentation.

When training an algorithm of a point cloud semantic and instance joint segmentation system based on improved PointConv, the adopted loss function consists of two parts, wherein one part is the loss of a semantic segmentation part, and the other part is the loss of an instance segmentation part; and the two parts are optimized simultaneously to finish the training task.

The loss function expression is as follows:

L＝L _sem +L _ins

L _sem loss function for semantic segmentation, L _ins A loss function for the instance partition;

L _sem the classical cross entropy loss function is used, and the expression is as follows:

where p (x) is the true probability distribution (determined from the input labels of the training dataset), n is the number of categories, q (x) is the predicted probability distribution, the smaller the difference between the two probability distributions, the better the predicted result, and the better the partial optimization effect.

L _ins The discrimination loss function discriminative loss is employed and expressed as follows:

L _ins ＝L _var +L _dist +ɑ·L _reg

wherein: i is the number of instances of the true value; n (N) _i Is the number of points in example i; mu (mu) _i For the average embedding of the example i,for example i _A Average embedding of->For example i _B Is embedded on average; e, e _j Embedding for a certain point; delta _d 、δ _v A loss function threshold; alpha is a balance coefficient and is set to 0.001.

L _var The method is mainly used for embedding and gathering the examples of each point to the center of each example, so that the points belonging to the same example can be mutually close in the feature space; and L is _dist The method is mainly used for mutually exclusive points between different examples, and the distance between the points is pulled away; l (L) _reg To ensure the feature embedding is bounded, the center of the instance is made to be close to the origin of the local coordinate system.

Finally, when testing based on the point cloud semantic of the improved PointConv and an algorithm of an instance joint segmentation system, clustering an instance embedding generated by an instance segmentation part by using a mean-shift method to obtain a final instance result; performing argmax operation on semantic features generated by the semantic segmentation part to obtain final semantic classification. So far, the whole point cloud semantic and instance joint segmentation system based on the improved PointConv is completed in algorithm operation.

The point cloud semantic and instance joint segmentation system based on the improved PointConv can be implemented in the form of a computer program, which can be run on a computer device, which can be a server or a terminal. The servers can be independent servers or server clusters; the terminal can be a notebook computer, a desktop computer, or other electronic devices.

The computer device includes a processor, a memory, and a network interface connected by a system bus, wherein the memory may include a non-volatile storage medium and an internal memory; the non-volatile storage medium may store an operating system and a computer program. The computer program comprises program instructions that, when executed, cause the processor to perform any of a number of point cloud semantic and instance joint segmentation methods based on improved PointConv. The processor is used to provide computing and control capabilities to support the operation of the entire computer device. The memory provides an environment for the execution of a computer program in a non-volatile storage medium that, when executed by the processor, causes the processor to perform any of a variety of point cloud semantic and instance joint segmentation methods based on the modified PointConv. The network interface is used for network communication such as transmitting assigned tasks and the like.

It should be appreciated that the processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Embodiments of the present application further provide a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program includes program instructions, and the processor executes the program instructions to implement the point cloud semantic and instance joint segmentation method based on improved PointConv.

The computer readable storage medium may be an internal storage unit of the computer device according to the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, which are provided on the computer device.

The examples are preferred embodiments of the present invention, but the present invention is not limited to the above-described embodiments, and any obvious modifications, substitutions or variations that can be made by one skilled in the art without departing from the spirit of the present invention are within the scope of the present invention.

Claims

1. A point cloud semantic and instance joint segmentation method based on improved PointConv is characterized in that:

the bidirectional self-attention module improves the semantic feature prediction F obtained by the PointConv feature extraction module _sem And example feature prediction F _ins Performing feature fusion, and performing instance segmentation and semantic segmentation respectively to obtain instance information containing semantic features and instance information containing semantic featuresSemantic information with instance features;

the semantic segmentation decoding specifically comprises the following steps: gradually up-sampling in deconvolution mode until the point number is sampled to an input point N with a characteristic dimension of 128 _a Obtaining semantic feature prediction F _sem ；

The example partition decoding is specifically:

deconvolution up-sampling of the final result of the shared encoding module to N using PointDeconv _d A plurality of points, N _d The characteristics of each point are input into a context aggregation module, the characteristics Q, the characteristics K and the characteristics V are respectively obtained through three times of 1x1 convolution, the characteristics Q and the transposed characteristics K are subjected to matrix multiplication, and then the weighting matrix W is obtained through sigmoid compression ₁ The weight matrix W ₁ Performing matrix multiplication on the characteristic V, and adding the obtained result with the characteristic V element by element to obtain a final aggregation characteristic;

final polymerization characteristics F _dec Feature F output by shared encoding module PointConv_3 _enc As input to the gating propagation module, will F _dec And F _enc Performing channel splicing to obtain F _con Then 1x1 convolution and sigmoid compression are carried out to obtain N _d Weight matrix W of x1 ₂ The method comprises the steps of carrying out a first treatment on the surface of the Matrix the weights W ₂ Tiling 256 times in the feature dimension with F _enc Multiplying element by element to obtain F _enc ' weight matrix 1-W ₂ Tiling 256 times in the feature dimension with F _dec Multiplying element by element to obtain F _dec ' then F _enc ' and F _dec ' channel splicing is carried out, and a final result is output;

will F _dec And F _enc The two parts of characteristics are fused to finish the decoding operation of the first step to obtain N with 256 dimensions _c A plurality of points; in a similar operation, the points are up-sampled to N with feature dimension 128 _b The method comprises the steps of carrying out a first treatment on the surface of the Finally, through PointDeconv, the point is up-sampled to the input point N _a Feature dimension 128 is maintained, resulting in example feature prediction F _ins ；

The example segmentation is specifically: semantic feature prediction F _sem And examplesFeature prediction F _ins Into STOI module, F _sem Meanwhile, through two times of 1x1 convolutions, the two results are multiplied after being transposed, and then a weight matrix is obtained through sigmoid compression, and the weight matrix is combined with F _ins Multiplying by F _ins Splicing to obtain example feature F with semantic information _stoi After buffer of two full-connection layers, N is finally obtained _a ×N _e Is embedded in F _s ′ _toi Performing clustering operation once after repeated back propagation optimization, and completing instance segmentation;

the semantic segmentation specifically comprises the following steps: the example divided part is buffered by the full-connection layer Fc1 to obtain N _a Example feature information F of x 128 _i ′ _ns Will be example feature information F _i ′ _ns And F is equal to _sem Input ITOS module, input F _i ′ _ns Meanwhile, through two times of 1x1 convolutions, the transposed multiplication is carried out on the two convolutions, and then the weighting matrix is obtained through sigmoid compression, and the weighting matrix and F are obtained _sem Multiplying by F _sem Splicing to obtain semantic features F with instance information _itos Then obtaining N through the full connection layer _a ×N _c After repeated back propagation optimization, the semantic segmentation is completed through one argmax.

2. The point cloud semantic and instance joint segmentation method based on the improved PointConv as claimed in claim 1, wherein the point cloud input to the improved PointConv feature extraction module comprises xyz normalized absolute coordinates of points, rgb color information, and relative coordinates x ' y ' z ' of points with respect to a local coordinate system.

3. A segmentation system implementing the improved PointConv-based point cloud semantic and instance joint segmentation method according to any one of claims 1-2, comprising:

Two-waySelf-attention module, predict F for instance characteristics _ins And semantic feature prediction F _sem And carrying out feature fusion, and respectively carrying out instance segmentation and semantic segmentation.

4. A segmentation system according to claim 3, wherein the input channel of the modified PointConv feature extraction module is 9, representing xyz normalized absolute coordinates of the points, rgb color information, and relative coordinates x ' y ' z ' of the points with respect to the local coordinate system, respectively.