CN113421267A

CN113421267A - Point cloud semantic and instance joint segmentation method and system based on improved PointConv

Info

Publication number: CN113421267A
Application number: CN202110495434.3A
Authority: CN
Inventors: 顾寄南; 张文浩
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2021-05-07
Filing date: 2021-05-07
Publication date: 2021-09-21
Anticipated expiration: 2041-05-07
Also published as: CN113421267B

Abstract

The invention provides a point cloud semantic and instance combined segmentation method and system based on improved PointConv, wherein point cloud obtained by a laser radar or a depth camera is used as input of an improved PointConv feature extraction module, point passing through a shared coding module is subjected to semantic segmentation decoding and instance segmentation decoding simultaneously to obtain instance feature prediction and semantic feature prediction, a double-line self-attention module performs feature fusion on the semantic feature prediction and the instance feature prediction obtained by the improved PointConv feature extraction module, and instance segmentation and semantic segmentation are performed respectively to obtain instance information containing semantic features and semantic information containing the instance features. The method and the device improve the speed of example segmentation and reduce the dependency on semantic segmentation precision.

Description

Point cloud semantic and instance joint segmentation method and system based on improved PointConv

Technical Field

The invention belongs to the technical field of point cloud segmentation, and particularly relates to a point cloud semantic and instance joint segmentation method and system based on improved PointConv.

Background

The neural network has strong feature learning capability in the image feature extraction process, so that the image semantics and the example segmentation task in the computer vision field are made a significant breakthrough. After the PointNet algorithm, an end-to-end point cloud segmentation algorithm is developed rapidly, but the following defects still exist: firstly, when KNN or Radius NN search is carried out, due to the fact that the sequence of search points is different, the point cloud is disordered, most methods use mlp and maxporoling to extract features, and the extracted point features cannot acquire the interaction between the local geometric shape of the point cloud and the points; secondly, the point cloud algorithm adopts farthest point sampling in the first step, which belongs to non-uniform sampling, so that a large amount of local area points are gathered, partial area points disappear, and the feature learning ability is not strong; most networks combine semantic segmentation and instance segmentation tasks in a serial mode, and the problems of suboptimal performance, low efficiency and over-strong dependency of the semantic segmentation and the instance segmentation tasks exist.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a point cloud semantic and instance combined segmentation method and system based on improved PointConv, which can improve the speed of instance segmentation and reduce the dependency on semantic segmentation precision.

The present invention achieves the above-described object by the following technical means.

A point cloud semantic and instance joint segmentation method based on improved PointConv comprises the following steps:

inputting the obtained point cloud into an improved PointConv feature extraction module, obtaining points with a feature dimension of 512 through a shared coding module, and obtaining an example feature prediction F through semantic segmentation decoding and example segmentation decoding of the points with the feature dimension of 512 at the same time_insAnd semantic feature prediction F_sem(ii) a The example segmentation decoding part introduces a context aggregation module and a gating transmission module to enhance the learning of features;

the double-line self-attention module predicts the semantic features obtained by the improved PointConv feature extraction module_semAnd example feature prediction F_insAnd performing feature fusion, and performing instance segmentation and semantic segmentation respectively to obtain instance information containing semantic features and semantic information containing the instance features.

According to a further technical scheme, the point cloud input into the improved PointConv feature extraction module comprises xyz normalized absolute coordinates of points, rgb color information and relative coordinates x ' y ' z ' of the points relative to a local coordinate system.

In a further technical solution, the example partition decoding specifically includes:

the final result of the shared coding module is deconvoluted and up-sampled to N by using PointDeconv_dPoint, N is_dInputting the characteristics of the points into a context aggregation module, performing convolution for three times by 1x1, dividing to obtain characteristics Q, K and V, performing matrix multiplication on the characteristics Q and the converted characteristics K, and performing sigmoid compression to obtain a weight matrix W₁The weight matrix W₁Performing matrix multiplication with the characteristic V, and performing element-by-element addition on the obtained result and the characteristic V to obtain a final polymerization characteristic;

final polymerization characteristics F_decFeature F output by shared coding module PointConv _3_encAs input to the gated propagation module, F_decAnd F_encPerforming channel splicing to obtain F_conThen obtaining N through convolution and sigmoid compression by 1x1_dX1 weight matrix W₂(ii) a Weighting matrix W₂Tiling 256 times in the feature dimension, and F_encElement by element multiplication to obtain F_enc', the weight matrix 1-W₂Tiling 256 times in the feature dimension, and F_decElement by element multiplication to obtain F_dec', then F_enc' and F_dec' channel splicing is carried out, and the final result is output;

f is to be_decAnd F_encThe two parts of features are fused to complete the first step of decoding operation to obtain 128-dimensional N_cPoint; doing this twice in a similar operation, upsampling the points to N with a feature dimension of 128_b(ii) a Finally, the point number is up-sampled to an input point N through one PointDeconv_aKeeping the feature dimension 128 to obtain an instance feature prediction F_ins。

In a further technical scheme, the semantic segmentation decoding specifically comprises: gradually up-sampling in a deconvolution mode until the number of points is sampled to an input point N with a characteristic dimension of 128_aObtaining a semantic feature prediction F_sem。

In a further technical scheme, the example segmentation specifically comprises:

firstly, semantic feature prediction F_semAnd example feature prediction F_insInto STOI module, F_semMeanwhile, after two times of 1x1 convolution, the results of the two times of convolution are multiplied after transposition, and then a weight matrix is obtained through sigmoid compression, and the weight matrix is combined with F_insMultiply by F_insSplicing to obtain example characteristics F with semantic information_stoiAfter the buffering of two full-connection layers, N is finally obtained_a×N_eExample of (2) is inserted into F'_stoiAnd after multiple back propagation optimization, performing clustering operation once to finish instance segmentation.

In a further technical scheme, the semantic segmentation specifically comprises the following steps:

buffering of the example fractions through a full junction layer Fc1 gave N_aExample feature information F 'of x 128'_insAnd F_semInputting ITOS module, and converting F'_insMeanwhile, after two times of 1x1 convolution, the results of the two times of convolution are multiplied after being transposed, and then a weight matrix is obtained through sigmoid compression, and the weight matrix and the F_semMultiply by F_semSplicing to obtain semantic features F with instance information_itosThen obtaining N through a full connection layer_a×N_cAnd after multiple times of back propagation optimization, completing semantic segmentation through one argmax.

A point cloud semantic and instance joint segmentation system based on improved PointConv comprises the following steps:

improved PointConv feature extraction module for obtaining example feature prediction F_insAnd semantic feature prediction F_sem；

Two-line self-attention module, predicting F for instance features_insAnd semantic feature prediction F_semAnd (4) carrying out feature fusion, and respectively carrying out instance segmentation and semantic segmentation.

In the above technical solution, the input channel of the improved PointConv feature extraction module is 9, which respectively represents the xyz normalized absolute coordinate of the point, the rgb color information, and the relative coordinate x ' y ' z ' of the point with respect to the local coordinate system.

The invention has the beneficial effects that:

(1) according to the improved PointConv feature extraction module, a context aggregation module and a gating propagation module are added to an instance segmentation part, instance information is enhanced through weight learning, and the precision of joint segmentation is improved.

(2) The invention improves a PointConv feature extraction module to adopt joint segmentation parallel branches and obtain semantic feature prediction F_semAnd example feature prediction F_insWorking with this baseline improves the speed of instance segmentation, reducing the dependency on semantic segmentation accuracy.

(3) The bifilar self-attention module adopts an STOI module and an ITOS module to fuse semantic features and example features to obtain example information endowed with the semantic features and semantic information rich in the example features, and mutual promotion of the two tasks is completed in a soft constraint mode.

Drawings

FIG. 1 is a flow chart of a point cloud semantic and instance joint segmentation method based on improved PointConv according to the present invention;

FIG. 2 is a block diagram of a context aggregation module according to the present invention;

FIG. 3 is a block diagram of a gated propagation module according to the present invention;

FIG. 4 is a block diagram of the STOI module of the present invention;

FIG. 5 is a block diagram of the ITOS module of the present invention.

Detailed Description

The invention will be further described with reference to the following figures and specific examples, but the scope of the invention is not limited thereto.

As shown in FIG. 1, the point cloud semantic and instance joint segmentation system based on the improved PointConv comprises an improved PointConv feature extraction module and a double-line self-attention module. The processing objects of the point cloud semantic and instance combined segmentation system based on the improved PointConv are as follows: point clouds obtained by a laser radar or a depth camera are subjected to an improved PointConv feature extraction module to obtain an example feature prediction F_insAnd semantic feature prediction F_semExample feature prediction F_insAnd semantic feature prediction F_semThen, the feature fusion is performed by the double-line self-attention module to obtain a product containingInstance information of the semantic features and semantic information containing the instance features.

Table 1 is a specific network structure table of the point cloud semantic and instance joint segmentation system based on the improved PointConv.

Table 1 network structure table

With continued reference to fig. 1, the point cloud semantic and instance joint segmentation method based on the improved PointConv specifically includes the following steps:

the input channel of the improved PointConv feature extraction module is 9, and represents the xyz normalized absolute coordinate of the point, the rgb color information and the relative coordinate x ' y ' z ' of the point relative to the local coordinate system respectively; and introducing relative coordinates of a local coordinate system, which is mainly used for ensuring the translation invariance of the input point. N to be inputted_aThe point is convolved by a shared coding module PointConv _1 (each PointConv has BN operation) to obtain N_bAnd (3) inputting points, namely inputting the points into a high-dimensional space to enable the input points to have 64-dimensional characteristics, performing convolution on the input points sequentially through PointConv _2, PointConv _3 and PointConv _4, and increasing the characteristic dimensions of the points to 512 dimensions, wherein enough characteristic information is used for a later decoding part.

The decoding part is divided into two branches: one for semantic segmentation decoding and the other for instance segmentation decoding. The example segmentation decoding part introduces a context aggregation module and a gating propagation module to enhance the learning of features, and the specific operations are as follows: the final result of the shared coding module is deconvoluted up-sampled to N using PointDeconv (with BN operation every time PointDeconv)_dPoint, N is_dInputting the characteristics of points (corresponding to Fin in figure 2) into a context aggregation module, performing 1x1 convolution three times, dividing to obtain characteristics Q, K and V, performing matrix multiplication on the characteristics Q and the converted characteristics K, and performing sigmoid compression to obtain a weight matrix W₁The weight matrix W₁Performing matrix multiplication with the characteristic V, and performing element-by-element addition on the obtained result and the characteristic V to obtain a final polymerization characteristic; the context aggregation module performs weighting on the features in a weight learning mode, so that the effective features are enhanced, and the ineffective features are weakened; the final aggregate characteristics (corresponding to F in FIG. 3) are then evaluated_dec) Features of the output of the shared coding module PointConv _3 (corresponding to F in FIG. 3)_enc) As input to the gated propagation module, F_decAnd F_encPerforming channel splicing to obtain F_con(512 dimensional feature), and then obtaining N through 1x1 convolution and sigmoid compression_dX1 weight matrix W₂The weight matrix W₂Tiling 256 times in the feature dimension, and F_encElement by element multiplication to obtain F_enc', the weight matrix 1-W₂Tiling 256 times in the feature dimension, and F_decElement by element multiplication to obtain F_dec', then F_enc' and F_dec' channel splicing is carried out, and the final result is output; the gating transmission module screens effective characteristics in the two parts of characteristics in a learning weight mode to reduce circulation of irrelevant information; finally F_decAnd F_encThe two parts of characteristics are fused to complete the first step of decoding operation to obtain N with 256 dimensions_cAnd (4) points. Go through twice in a similar operation, up-sampling the number of points to N_bFeature dimension of 128; finally, the point number is up-sampled to the input point N through one time of PointDeconv_aKeeping the feature dimension 128 to obtain an instance feature prediction F_ins. The semantic segmentation decoding part adopts a deconvolution form to carry out gradual up-sampling until the point number is sampled to an input point N_a(feature dimension 128) to obtain a semantic feature prediction F_sem。

The double-line self-attention module predicts the semantic features obtained by the improved PointConv feature extraction module_semAnd example feature prediction F_insAnd performing feature fusion to obtain instance information containing semantic features and semantic information containing the instance features. The method specifically comprises the following steps:

example segmentation section: firstly, semantic feature prediction F_semAnd example feature prediction F_insInto the STOI module, see FIG. 4, when F is turned on_semMeanwhile, after two times of 1x1 convolution, the results of the two times of convolution are multiplied after transposition, and then a weight matrix is obtained through sigmoid compression, and the weight matrix and the example characteristic prediction F_insMultiplying and then predicting with example feature F_insSplicing to obtain example characteristics F with semantic information_stoiAfter buffering by two full-link layers (Fc1, Fc2), N is finally obtained_a×N_eExample of (2) is inserted into F'_stoiAnd after repeated back propagation optimization, performing mean-shift clustering operation on the part for one time to finish example segmentation.

And a semantic segmentation part: n obtained after buffering of example fractions with Fc1_aExample feature information F 'of x 128'_insAnd initial semantic feature prediction F_semPassed as input into the ITOS module, see FIG. 5, at which time F'_insMeanwhile, after two times of 1x1 convolution, the results of the two times of convolution are multiplied after transposition, and then a weight matrix is obtained through sigmoid compression, and the weight matrix and the semantic feature prediction F_semMultiplication and semantic feature prediction F_semSplicing to obtain semantic features F with instance information_itosThen obtaining N through a full connection layer (Fc)_a×N_cAnd after repeated back propagation optimization, only one argmax is needed to be carried out finally to complete semantic segmentation.

When an algorithm of a point cloud semantic and example combined segmentation system based on improved PointConv is trained, an adopted loss function consists of two parts, wherein one part is the loss of a semantic segmentation part, and the other part is the loss of an example segmentation part; the two parts are optimized simultaneously to complete the training task.

The loss function is expressed as follows:

L＝L_sem+L_ins

L_semas a loss function of semantically segmented parts, L_insA loss function that is an example partition;

L_semwith a classical cross entropy loss function, the expression is as follows:

wherein p (x) is the true probability distribution (which has been determined according to the input label of the training data set), n is the number of categories, and q (x) is the predicted probability distribution, and the smaller the difference between the two probability distributions is, the better the predicted result is, and the better the partial optimization effect is.

L_insAdopting a discrimination loss function discrete loss, wherein the expression is as follows:

L_ins＝L_var+L_dist+α·L_reg

wherein: i is the number of instances of the true value; n is a radical of_iThe number of points in example i; mu.s_iFor the average embedding of the example i,

is an example i_AThe average of the embedding of (a) into (b),

is an example i_BAverage embedding of (2); e.g. of the type_jEmbedding for a certain point; delta_d、δ_vIs a loss function threshold; a is the equilibrium coefficient, set to 0.001.

L_varThe embedding of each point instance is mainly used for clustering the embedding of each point instance to the center of each instance, so that the points belonging to the same instance can be close to each other in a feature space; and L is_distMainly used for mutually repelling points among different instances, and the distance between the points is widened; l is_regTo ensure the feature embedding is bounded, the instance center is brought close to the local coordinate system origin.

Finally, when testing is carried out on the basis of the point cloud semantics of the improved PointConv and an algorithm of an example combined segmentation system, the examples generated by the example segmentation part are embedded and clustered by using a mean-shift method to obtain a final example result; and performing argmax operation on the semantic features generated by the semantic segmentation part to obtain the final semantic classification. And completing the operation of the whole algorithm of the point cloud semantic and instance combined segmentation system based on the improved PointConv.

The point cloud semantic and instance joint segmentation system based on the improved PointConv can be implemented in the form of a computer program, and the computer program can be run on a computer device, and the computer device can be a server or a terminal. The server can be an independent server or a server cluster; the terminal can be a notebook computer, a desktop computer, and other electronic equipment.

The computer device comprises a processor, a memory and a network interface which are connected through a system bus, wherein the memory can comprise a nonvolatile storage medium and an internal memory; the non-volatile storage medium may store an operating system and a computer program. The computer program includes program instructions that, when executed, cause a processor to perform any one of the methods of improving PointConv-based joint segmentation of point cloud semantics and instances. The processor is used for providing calculation and control capability and supporting the operation of the whole computer equipment. The memory provides an environment for execution of a computer program in a non-volatile storage medium, which when executed by the processor, causes the processor to perform any one of the methods for improving joint segmentation of point cloud semantics and instances based on PointConv. The network interface is used for network communication, such as sending assigned tasks and the like.

It should be understood that the Processor may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The embodiment of the application further provides a computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, the computer program comprises program instructions, and the processor executes the program instructions to realize the point cloud semantic and instance joint segmentation method based on the improved PointConv.

The computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device.

The present invention is not limited to the above-described embodiments, and any obvious improvements, substitutions or modifications can be made by those skilled in the art without departing from the spirit of the present invention.

Claims

1. A point cloud semantic and instance joint segmentation method based on improved PointConv is characterized by comprising the following steps:

inputting the obtained point cloud into an improved PointConv characteristic extraction module, and sharing and editingThe code module obtains 512 points with characteristic dimensionality, and the 512 points with characteristic dimensionality are subjected to semantic segmentation decoding and example segmentation decoding simultaneously to obtain example characteristic prediction F_insAnd semantic feature prediction F_sem(ii) a The example segmentation decoding part introduces a context aggregation module and a gating transmission module to enhance the learning of features;

2. The improved PointConv-based point cloud semantic and instance joint segmentation method according to claim 1, wherein the point cloud input into the improved PointConv feature extraction module comprises xyz normalized absolute coordinates of points, rgb color information, and relative coordinates x ' y ' z ' of points with respect to a local coordinate system.

3. The method of claim 1, wherein the example segmentation decoding specifically comprises:

final polymerization characteristics F_decFeature F output by shared coding module PointConv _3_encAs input to the gated propagation module, F_decAnd F_encPerforming channel splicing to obtain F_conThen, the mixture is convoluted by 1x1 and is compressed by sigmoidTo obtain N_dX1 weight matrix W₂(ii) a Weighting matrix W₂Tiling 256 times in the feature dimension, and F_encElement by element multiplication to obtain F_enc', the weight matrix 1-W₂Tiling 256 times in the feature dimension, and F_decElement by element multiplication to obtain F_dec', then F_enc' and F_dec' channel splicing is carried out, and the final result is output;

f is to be_decAnd F_encThe two parts of characteristics are fused to complete the first step of decoding operation to obtain N with 256 dimensions_cPoint; doing this twice in a similar operation, upsampling the points to N with a feature dimension of 128_b(ii) a Finally, the point number is up-sampled to an input point N through one PointDeconv_aKeeping the feature dimension 128 to obtain an instance feature prediction F_ins。

4. The point cloud semantic and instance joint segmentation method based on the improved PointConv as claimed in claim 1, wherein the semantic segmentation decoding specifically comprises: gradually up-sampling in a deconvolution mode until the number of points is sampled to an input point N with a characteristic dimension of 128_aObtaining a semantic feature prediction F_sem。

5. The improved PointConv-based point cloud semantic and instance joint segmentation method according to claim 1, wherein the instance segmentation specifically comprises:

6. The point cloud semantic and instance joint segmentation method based on the improved PointConv as claimed in claim 1, wherein the semantic segmentation is specifically as follows:

7. A segmentation system for implementing the point cloud semantic and instance joint segmentation method based on improved PointConv according to any one of claims 1 to 6, comprising:

8. The segmentation system according to claim 7, wherein the input channel of the modified PointConv feature extraction module is 9, representing the xyz normalized absolute coordinates of a point, rgb color information, and the relative coordinates x ' y ' z ' of a point with respect to the local coordinate system, respectively.