CN118351320B

CN118351320B - Instance segmentation method based on three-dimensional point cloud

Info

Publication number: CN118351320B
Application number: CN202410780784.8A
Authority: CN
Inventors: 潘磊; 董华征; 栾五洋; 王艾; 李俊辉; 黄洋
Original assignee: Civil Aviation Flight University of China; Inspur Intelligent IoT Technology Co Ltd
Current assignee: Civil Aviation Flight University of China; Inspur Intelligent IoT Technology Co Ltd
Priority date: 2024-06-18
Filing date: 2024-06-18
Publication date: 2024-08-16
Anticipated expiration: 2044-06-18
Also published as: CN118351320A

Abstract

The invention discloses an example segmentation method based on three-dimensional point cloud, which belongs to the technical field of image processing and computer vision and comprises the following steps: acquiring and preprocessing point cloud data to obtain a point cloud training data set; extracting characteristics of point cloud data to obtain point cloud level data characteristics and comprehensive characteristics of each super point; classifying and aggregating by a coding and second class aggregation method to obtain an initialization position coding, an initialization position query vector and a high-density point query vector; vector fusion is carried out based on a cross attention mechanism, so that example fusion characteristics are obtained; predicting the center position and the boundary position of the instance, convolving to generate a segmentation mask, and converting the segmentation mask into a prediction mask; and according to the bipartite matching method and the graph model, matching and matching degree evaluation are carried out on the prediction mask and the real mask, and instance segmentation of the three-dimensional point cloud is carried out. The method solves the problem of limitation of the existing three-dimensional point cloud instance segmentation technology in the aspects of processing closely adjacent objects and understanding real-time scenes.

Description

Instance segmentation method based on three-dimensional point cloud

Technical Field

The invention belongs to the technical field of image processing and computer vision, and particularly relates to an integrated query and density clustering instance segmentation method suitable for 3D instance segmentation.

Background

Three-dimensional instance segmentation is a critical task in the field of computer vision, involving the accurate segmentation and identification of individual objects from three-dimensional point cloud data. The accurate three-dimensional instance segmentation not only can improve the environment sensing capability of systems in multiple fields such as automatic driving, robot navigation, virtual reality and the like, but also is beneficial to realizing safety and more functions.

Three-dimensional point cloud data is typically collected by a lidar, a structured light scanner, or a stereo vision system, which has unstructured and highly complex features. Processing such data requires an understanding of its unique spatial structure and dense point cloud distribution. Traditional three-dimensional scene understanding methods tend to be limited to predefined categories and supervise learning techniques, which rely on large amounts of annotation data. The point cloud data is distinguished from the traditional two-dimensional image, the distribution in space is uneven, and is often influenced by shielding and noise, and the three-dimensional space structure and topological relation of the data need to be effectively understood when the data are processed.

In recent years, the introduction of deep learning brings revolutionary progress to three-dimensional scene understanding, and more complex and abstract feature representations can be learned, so that the accuracy and the robustness of segmentation are remarkably improved. However, the existing method often depends on a large amount of labeling data and computing resources, so that flexibility and expandability of instance segmentation are limited, and meanwhile, tight overlapping among instances, multiple categories and real-time segmentation in a dynamic environment are difficult to effectively process.

Disclosure of Invention

Aiming at the defects in the prior art, the instance segmentation method based on the three-dimensional point cloud solves the problem of limitation of the existing three-dimensional point cloud instance segmentation technology in the aspects of processing closely adjacent objects and understanding real-time scenes by combining binary clustering and query vectors.

In order to achieve the aim of the invention, the invention adopts the following technical scheme:

the invention provides an example segmentation method based on three-dimensional point cloud, which comprises the following steps:

S1, acquiring and preprocessing point cloud data to obtain a point cloud training data set;

s2, extracting characteristics of point cloud data in the point cloud training data set to obtain point cloud level data characteristics and comprehensive characteristics of all super points;

S3, classifying and aggregating according to the basic data characteristics of the point cloud by using a coding and second class aggregation method to obtain an initialization position coding, an initialization position query vector and a high-density point query vector;

S4, vector fusion is carried out on the basis of a cross attention mechanism according to the comprehensive characteristics of each super point, the initialized position code, the initialized position query vector and the high-density point query vector to obtain example fusion characteristics;

S5, based on the example fusion characteristics, predicting the center position and the boundary position of the example, convoluting to generate a segmentation mask, and converting the segmentation mask of each prediction example into a prediction mask;

And S6, according to the bipartite matching method and the graph model, matching and matching degree evaluation are carried out on the prediction mask and the real mask, and instance segmentation of the three-dimensional point cloud is carried out.

The beneficial effects of the invention are as follows: according to the example segmentation method based on the three-dimensional point cloud, characteristic extraction is carried out on point cloud data, so that point cloud scale is obtained, characteristic data and comprehensive characteristics of all super points are obtained, extraction of high-density point query vectors is realized through second-class fusion values, the space recognition capability of the query vectors is improved, key characteristic points of a dense area are accurately represented, and the understanding and processing capability of a complex three-dimensional scene are enhanced; based on the comprehensive characteristics of each super point, the initialization position code, the initialization position query vector and the high-density point query vector, the vector fusion is carried out through a cross attention mechanism, so that the example fusion characteristics are obtained, and the accuracy of generating the prediction mask is improved; the invention obviously improves the processing efficiency of unstructured three-dimensional point cloud data through the matching between the prediction mask and the real mask and the matching evaluation optimization, and has obvious advantages in the aspect of example segmentation of the three-dimensional point cloud data and the aspect of processing a plurality of examples closely connected in space.

Further, the step S1 includes the following steps:

s11, acquiring point cloud data in a plurality of scenes, and matching corresponding labels with the point cloud data;

S12, carrying out standardization processing and data enhancement processing on the point cloud data after the label is matched;

S13, generating a point cloud data set based on the point cloud data subjected to the standardization processing and the data enhancement processing;

S14, carrying out voxelization on the point cloud data with the size of H multiplied by W multiplied by 3 in the point cloud data set, and carrying out voxelization on the point cloud scene to obtain a point cloud training data set, wherein H represents the height of the point cloud data, and W represents the width of the point cloud data.

The beneficial effects of adopting the further scheme are as follows: according to the invention, through the standardization and data enhancement processing of the point cloud data, the richness of the point cloud data is effectively improved through different visual angles and sizes.

Further, the step S2 includes the following steps:

S21, performing feature conversion on point cloud data in a point cloud training data set by using an input convolution layer to obtain initial point cloud data features;

S22, performing multi-scale feature extraction on the initial point cloud data features by using a pre-training sparse 3D U-Net model to obtain multi-scale point cloud data features;

s23, utilizing a linear layer to adjust feature dimensions of the multi-scale point cloud data features to obtain normalized point cloud data features;

s24, reconstructing a mapping relation between the normalized point cloud data characteristics and point cloud data in the point cloud training data set by using a mapping table to obtain point cloud level data characteristics;

s25, pooling the normalized point cloud data characteristics by using the identification of the super points to obtain pooling characteristics of each super point;

s26, according to the pooling type adopted by the pooling characteristics of each super point, the characteristics in the same super point are aggregated, and the comprehensive characteristics of each super point are obtained.

The beneficial effects of adopting the further scheme are as follows: according to the invention, through the encoder-decoder architecture and jump connection of the 3D U-Net model, feature extraction is effectively realized on multiple scales, the feature from the whole world to the local is ensured to be effectively captured and utilized, the mapping relation is reconstructed by utilizing the mapping table, the spatial consistency of the feature is ensured, the comprehensive feature of the super point is obtained through pooling and aggregation, the necessary information is reserved while the data quantity is reduced, and the processing of large-scale point cloud data is facilitated.

Further, the step S3 includes the following steps:

s31, coding the space position information of the point cloud scale other data characteristics to obtain an initialization position code;

s32, generating an initial position query vector by using the initial position code;

S33, counting the number of neighborhood points of each point in the point cloud within a preset radius range of the point according to the data characteristics of the points cloud scale, and taking the number as the local density of the point;

s34, setting a density classification threshold;

s35, taking points, among the point clouds, of which the local density is greater than a density classification threshold value as high-density points;

s36, gathering all the high-density points to obtain a high-density point set;

And S37, extracting the characteristics in the high-density point set to obtain a high-density point query vector.

The beneficial effects of adopting the further scheme are as follows: the invention obtains the initialization position code by the coding processing method and correspondingly generates the initial position inquiry vector; according to the method, binary clustering is realized by setting the density classification threshold value, high-density points in the point cloud are obtained, the high-density point sets are formed by aggregation, and the high-density point query vector is obtained through feature extraction, so that the instance identification efficiency and the instance segmentation precision are effectively improved, and the method is particularly suitable for processing intensive or complex three-dimensional scenes.

Further, the step S4 includes the following steps:

S41, calculating the similarity between each initialized position query vector and the key vector and the value vector by adopting a self-attention mechanism, and normalizing the similarity into a first weight through a softmax function;

s42, matching the first weight with the corresponding point cloud scale other characteristic data, and correspondingly weighting and summing to obtain a new initialization position query vector;

s43, calculating the similarity between each high-density query vector and the key vector and the value vector by adopting a self-attention mechanism, and normalizing the similarity into a second weight through a softmax function;

S44, matching the second weight with the corresponding point cloud scale other characteristic data, and correspondingly weighting and summing to obtain a new high-density query vector;

S45, taking the comprehensive characteristics of the initialization position codes and the super points as key vectors to be fused and value vectors to be fused, and taking a new initialization position query vector and a new high-density query vector as query vectors to be fused;

S46, fusing the query vector to be fused with the key vector to be fused and the value vector to be fused based on a cross attention mechanism to obtain an instance fusion characteristic.

The beneficial effects of adopting the further scheme are as follows: the invention adopts a self-attention mechanism, constructs a new initialization position query vector and a new high-density query vector according to the similarity between the initialization position query vector and the key vector and the value vector and the similarity between the high-density query vector and the key vector and the value vector, adopts a cross-attention mechanism, combines the comprehensive characteristics of initialization position codes and all the superpoints, realizes vector fusion, constructs a new vector query mode and provides a basis for improving the accuracy of generating the prediction mask.

Further, the calculation expression of the cross-attention mechanism in S46 is as follows:

，

Wherein, attention () represents an Attention mechanism function, Q represents a query vector to be fused, K represents a key vector to be fused, V represents a value vector to be fused, softmax () represents a softmax function, K ^T represents a transpose of the key vector to be fused, and d _k represents a dimension of the key vector to be fused.

The beneficial effects of adopting the further scheme are as follows: the invention provides a calculation method of a cross attention mechanism, which can fuse the comprehensive characteristics of each super point with a new initialized position query vector and a new high-density query vector which are processed by the self attention mechanism, thereby realizing effective integration of information of different feature spaces and improving the analysis capability.

Further, the step S5 includes the following steps:

S51, based on example fusion characteristics, constructing characteristic mapping associated with an example center to obtain a center position and a boundary position of a predicted example;

s52, based on the central position and the boundary position of the predicted instance, obtaining a segmentation mask of each predicted instance by carrying out convolution operation on the instance fusion characteristics;

s53, converting the segmentation mask of each prediction instance into a prediction mask;

the computational expression of the prediction mask is as follows:

Where M _i (x) represents the prediction mask of the ith instance, sigmoid () represents the sigmoid function, and f (x) represents the partition mask of each prediction instance.

The beneficial effects of adopting the further scheme are as follows: the invention provides a method for carrying out mask segmentation and mask prediction based on an instance fusion feature, which obtains a prediction mask and provides a basis for executing an instance segmentation task.

Further, the calculation expressions of the center position and the boundary position of the prediction example in S51 are as follows:

，

Wherein Center _i represents the Center position of the i-th predicted instance, σ represents the activation function, MLP () represents the multi-layer perceptron, Q _position represents the query vector corresponding to the Center position of the instance, i represents the i-th instance, bound _i represents the Boundary position of the i-th predicted instance, MLP _center () represents the multi-layer perceptron for instance Center prediction, Q _ct represents the query vector corresponding to the Boundary position of the current instance, and Q _pt-1 represents the query vector corresponding to the Center position of the previous instance.

The beneficial effects of adopting the further scheme are as follows: the invention provides a calculation method for the center position and the boundary position of a predicted instance, which can provide a basis for accurately performing mask segmentation and obtaining a predicted mask by accurately calculating the center position and the boundary position of the predicted instance.

Further, the step S6 includes the steps of:

s61, constructing a graph model according to a binary matching method, wherein nodes in the graph model are a prediction mask and a real mask respectively, and the weight of edges in the graph model is the similarity between the prediction mask and the real mask;

S62, calculating a matching degree between a prediction mask and a real mask through a mask cross-correlation model;

The calculation expression of the mask merging ratio model is as follows:

，

Wherein IOU (M _pred,M_gt) represents the matching degree between the prediction mask and the real mask, M _pred represents the prediction mask, M _gt represents the real mask, |represents the absolute value, |represents the intersection operation, |represents the union operation;

s63, setting a matching degree threshold, and taking the prediction mask and the real mask as a matching mask pair when the matching degree between the prediction mask and the real mask is greater than the matching degree threshold;

S64, constructing a global optimal matching model according to the Hungary algorithm, and optimizing the matching degree between the prediction mask and the real mask;

the computational expression of the global optimal matching model is as follows:

，

wherein maximize denotes the maximization, Represent the firstPrediction mask and the firstThe degree of matching between the individual true masks, x represents the multiplication,Represent the firstThe number of prediction masks is chosen such that,Represent the firstThe number of true masks is one,Representing a mask pairing indication function, wherein the mask pairing index function is at the firstPrediction mask and the firstThe pair of true masks forming the matching mask takes a value of 1, at the firstPrediction mask and the firstThe value of the matching pair is 0 when the matching pair is not selected among the true masks;

S65, evaluating the matching degree between the optimized prediction mask and the real mask by using the matching loss function to obtain a matching degree evaluation result;

And S66, optimizing prediction masks and prediction of instance centers based on the matching degree evaluation result, and executing instance segmentation of the three-dimensional point cloud.

The beneficial effects of adopting the further scheme are as follows: the invention provides a method for matching a prediction mask and a real mask and optimizing and evaluating a matching result, which is used for executing instance segmentation of a three-dimensional point cloud based on an optimizing process and a matching degree evaluation result, so that the instance segmentation efficiency, and the accuracy and the integrity of the segmentation result can be greatly improved.

Further, the calculation expression of the matching loss function in S65 is as follows;

，

wherein L represents a matching loss function, λ _cls represents a classification loss weight coefficient, L _cls represents a classification loss function, λ _mask represents a mask binary loss weight coefficient, L _mask represents a mask binary loss function, λ _dice represents an intersection ratio loss weight coefficient, L _dice represents an intersection ratio loss function, λ _center represents a center regression loss weight coefficient, L _center represents a center regression loss function, CE () represents an intersection entropy loss function, pred _class represents a class of a predicted instance, true _class represents a class of a real instance, pred _mask represents a prediction mask, true _mask represents a real mask, and i is a norm operation.

The beneficial effects of adopting the further scheme are as follows: the invention provides a calculation method of a matching loss function, which evaluates the matching degree of the prediction mask and the real mask in an omnibearing way from the prediction type, the matching precision of pixel levels between the prediction mask and the real mask, the overlapping area and the error between center points, and can optimize the matching accuracy between the prediction mask and the real mask and the prediction of the instance center while ensuring the classification recognition precision, thereby improving the overall performance on a three-dimensional point cloud instance segmentation task.

Other advantages that are also present with respect to the present invention will be more detailed in the following examples.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart illustrating steps of an example segmentation method based on a three-dimensional point cloud according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.

As shown in fig. 1, in one embodiment of the present invention, the present invention provides an example segmentation method based on a three-dimensional point cloud, including the following steps:

The step S1 comprises the following steps:

in the embodiment, the standardization processing is performed on the point cloud data by adopting the scale normalization or the centralization processing at random, the data enhancement is performed on the point cloud data by randomly rotating or zooming, and the point cloud data with different visual angles and sizes are expanded;

In this embodiment, the information in the point cloud data set includes a scene ID, a voxel coordinate, a mapping of point cloud data to a voxel, a mapping of a voxel to point cloud data, a shape of a discrete voxel space, a feature of the point cloud data, a super point identifier, a batch offset, an instance tag, and a point cloud floating point coordinate;

The scene ID is used for uniquely identifying a scene; the voxel coordinates are used for representing coordinates of the point cloud data in a discrete voxel space; the mapping from the point cloud data to the voxels is used for mapping the points in the point cloud data to the corresponding voxels; the mapping from the voxels to the point cloud data is used for mapping the points in the voxels to the corresponding point cloud data; the shape of the discrete voxel space is used to represent the size of a voxel grid; the characteristics of the point cloud data comprise the characteristics of the position, the color, the normal vector and the like of the point; the super point mark is used for representing advanced features for improving the point cloud processing performance; the batch offset is used for identifying data boundaries of different scenes in a batch processing process; the instance tag is used for representing an instance to which each point in the point cloud data belongs; the point cloud floating point coordinates are used for representing floating point number coordinate information of the point cloud data.

S14, carrying out voxelization on point cloud data with the size of H multiplied by W multiplied by 3 in the point cloud data set, and carrying out voxelization on a point cloud scene by using Open3D to obtain a point cloud training data set, wherein H represents the height of the point cloud data, and W represents the width of the point cloud data.

the step S2 comprises the following steps:

S21, performing feature conversion on point cloud data in a point cloud training data set by using an input convolution layer to obtain initial point cloud data features; the input convolution layer performs preliminary feature conversion on the original point cloud data, and prepares for multi-scale analysis and feature extraction of the deep network;

s22, performing multi-scale feature extraction on the initial point cloud data features by using a pre-training sparse 3D U-Net model to obtain multi-scale point cloud data features; the pre-training sparse 3D U-Net model is connected with the jump feature through the encoder-decoder architecture, so that feature extraction of initial point cloud data features on multiple scales is effectively realized, the feature from the whole world to the local can be effectively captured and utilized, and the method is applicable to complex 3D structures and objects of various scales.

S23, utilizing a linear layer to adjust feature dimensions of the multi-scale point cloud data features to obtain normalized point cloud data features; in this embodiment, the linear layer adopts a Normalization function and Relu activation functions for activating the Normalization result, so that feature dimensions of the normalized point cloud data features are matched with feature mapping and pooling operations.

S24, reconstructing a mapping relation between the normalized point cloud data characteristics and point cloud data in the point cloud training data set by using a mapping table to obtain point cloud level data characteristics; the remapping adopts a mapping table v2p_map to correspond the point cloud data characteristics subjected to multi-scale characteristic extraction and normalization processing with the point cloud data in the point cloud training data set, is very important for processing irregular point cloud data, and can ensure consistency of characteristic space. The remapped point cloud scale data features retain the resolution of the point cloud level, retain the detail information of the point cloud data, and can be directly used for tasks such as query vector generation and instance segmentation with high precision requirements.

S26, according to the pooling type adopted by the pooling characteristics of each super point, the characteristics in the same super point are aggregated, and the comprehensive characteristics of each super point are obtained. In this embodiment, features in the same super point are aggregated according to mean pooling or maximum pooling, so as to obtain a comprehensive feature of each super point, thereby reducing data volume while retaining necessary information, and facilitating processing of large-scale point cloud data.

the step S3 comprises the following steps:

s34, setting a density classification threshold;

s36, gathering all the high-density points to obtain a high-density point set;

And S37, extracting the characteristics in the high-density point set to obtain a high-density point query vector. In this embodiment, the high-density point set includes key structural information, which is an important point for instance identification and scene segmentation; the high-density point query vector can be used for guiding instance recognition and instance segmentation in a backbone architecture, so that the instance recognition efficiency can be improved, the instance segmentation accuracy can be improved, and the method is particularly suitable for three-dimensional scenes with intensive and complex processing and insignificant boundaries between instances.

The step S4 comprises the following steps:

S41, calculating the similarity between each initialized position query vector and the key vector and the value vector by adopting a self-attention mechanism, and normalizing the similarity into a first weight through a softmax function; in the embodiment, similarity between each initialization position query vector and key vector and value vector is calculated by adopting dot product operation;

S46, fusing the query vector to be fused, the key vector to be fused and the value vector to be fused based on a cross attention mechanism to obtain an instance fusion characteristic;

The computational expression of the cross-attention mechanism is as follows:

，

Wherein, attention () represents an Attention mechanism function, Q represents a query vector to be fused, K represents a key vector to be fused, V represents a value vector to be fused, softmax () represents a softmax function, K ^T represents a transpose of the key vector to be fused, and d _k represents a dimension of the key vector to be fused; dot products can be scaled through the dimension of the key, and gradient problems caused by overlarge internal dot products are prevented. The softmax function is used for calculating a normalized dot product result, so that the output weight distribution is reasonable. Instance fusion features can be used for instance identification and segmentation, accurate localization of the boundaries of each instance, and prediction of the center of the instance and corresponding class labels.

The step S5 comprises the following steps:

the calculation expressions of the center position and the boundary position of the predicted instance in S51 are as follows:

，

wherein Center _i represents the Center position of the i-th predicted instance, σ represents the activation function, MLP () represents the multi-layer perceptron, Q _position represents the query vector corresponding to the Center position of the instance, i represents the i-th instance, bound _i represents the Boundary position of the i-th predicted instance, MLP _center () represents the multi-layer perceptron for instance Center prediction, Q _ct represents the query vector corresponding to the Boundary position of the current instance, Q _pt-1 represents the query vector corresponding to the Center position of the previous instance; in this embodiment, the activation function adopts a sigmoid activation function;

S52, based on the central position and the boundary position of the predicted instance, obtaining a segmentation mask of each predicted instance by carrying out convolution operation on the instance fusion characteristics; the segmentation mask is a binary image, if the pixel value in the binary image is 1, a prediction example is represented, and if the pixel value in the binary image is 0, a background or other examples are represented;

S53, converting the segmentation mask of each prediction instance into a prediction mask; in this embodiment, the prediction mask represents a possibility that each pixel belongs to a certain instance, and the segmentation mask may be converted into the prediction mask through a sigmoid function or a softmax function;

the computational expression of the prediction mask is as follows:

，

And S6, according to the bipartite matching method and the graph model, matching and matching degree evaluation are carried out on the prediction mask and the real mask, and instance segmentation of the three-dimensional point cloud is carried out. In the instance segmentation task, particularly when there are multiple overlapping instances in a scene, it is necessary to match the generated prediction masks accurately.

The step S6 comprises the following steps:

S61, constructing a graph model according to a binary matching method, wherein nodes in the graph model are a prediction mask and a real mask respectively, and the weight of edges in the graph model is the similarity between the prediction mask and the real mask; the similarity of the prediction mask and the real mask can reflect the cross overlap area ratio between different instances in the scene;

The calculation expression of the mask merging ratio model is as follows:

，

the calculation expression of the matching loss function in S65 is as follows;

，

Wherein L represents a matching loss function, λ _cls represents a classification loss weight coefficient, L _cls represents a classification loss function, λ _mask represents a mask binary loss weight coefficient, L _mask represents a mask binary loss function, λ _dice represents an intersection ratio loss weight coefficient, L _dice represents an intersection ratio loss function, λ _center represents a center regression loss weight coefficient, L _center represents a center regression loss function, CE () represents an intersection entropy loss function, pred _class represents a class of a predicted instance, true _class represents a class of a real instance, pred _mask represents a prediction mask, true _mask represents a real mask, and i represents a norm operation; the classification loss function is used for evaluating the matching degree between the prediction mask and the real mask, and the cross entropy loss function is adopted in the embodiment for calculating the classification loss function; the mask binary loss function is used for evaluating the matching precision of pixel levels between the prediction mask and the real mask, and a binary cross entropy loss function is adopted in the embodiment for calculating the binary loss function; the cross-over ratio loss function is used for optimizing the cross-over ratio of the prediction mask and the real mask and improving the overlapping area between the prediction mask and the real mask; the center regression loss function is used for optimizing the error between the predicted instance center point and the real instance center point, and in this embodiment, a minimum absolute value deviation function is used to measure the distance between the predicted instance center point and the real instance center point.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention.

Claims

1. An instance segmentation method based on three-dimensional point cloud is characterized by comprising the following steps:

the step S2 comprises the following steps:

s26, according to the pooling type adopted by the pooling characteristics of each super point, the characteristics in the same super point are aggregated to obtain the comprehensive characteristics of each super point;

the step S3 comprises the following steps:

s32, generating an initialization position query vector by using the initialization position code;

s34, setting a density classification threshold;

s36, gathering all the high-density points to obtain a high-density point set;

S37, extracting features in the high-density point set to obtain a high-density point query vector;

The step S4 comprises the following steps:

The step S5 comprises the following steps:

，

Wherein Center _i represents the Center position of the i-th predicted instance, σ represents the activation function, MLP () represents the multi-layer perceptron, Q _position represents the query vector corresponding to the Center position of the instance, i represents the i-th instance, bound _i represents the Boundary position of the i-th predicted instance, MLP _center () represents the multi-layer perceptron for instance Center prediction, Q _ct represents the query vector corresponding to the Boundary position of the current instance, Q _pt-1 represents the query vector corresponding to the Center position of the previous instance;

the computational expression of the prediction mask is as follows:

Where M _i (x) represents the prediction mask of the ith instance, sigmoid () represents the sigmoid function, and f (x) represents the partition mask of each prediction instance;

2. The three-dimensional point cloud-based instance segmentation method according to claim 1, wherein the S1 comprises the steps of:

3. The three-dimensional point cloud-based instance segmentation method according to claim 1, wherein the computation expression of the cross-attention mechanism in S46 is as follows:

，

4. The three-dimensional point cloud-based instance segmentation method according to claim 1, wherein the step S6 comprises the steps of:

The calculation expression of the mask merging ratio model is as follows:

，

5. The three-dimensional point cloud based instance segmentation method according to claim 4, wherein the calculation expression of the matching loss function in S65 is as follows;

，