CN111489358B - Three-dimensional point cloud semantic segmentation method based on deep learning - Google Patents
Three-dimensional point cloud semantic segmentation method based on deep learning Download PDFInfo
- Publication number
- CN111489358B CN111489358B CN202010190589.1A CN202010190589A CN111489358B CN 111489358 B CN111489358 B CN 111489358B CN 202010190589 A CN202010190589 A CN 202010190589A CN 111489358 B CN111489358 B CN 111489358B
- Authority
- CN
- China
- Prior art keywords
- point cloud
- features
- semantic segmentation
- feature
- dimensional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 79
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000013135 deep learning Methods 0.000 title claims abstract description 12
- 238000000605 extraction Methods 0.000 claims abstract description 45
- 238000012549 training Methods 0.000 claims abstract description 38
- 230000006870 function Effects 0.000 claims abstract description 27
- 238000003062 neural network model Methods 0.000 claims abstract description 16
- 230000001737 promoting effect Effects 0.000 claims abstract description 4
- 238000011176 pooling Methods 0.000 claims description 16
- 238000010586 diagram Methods 0.000 claims description 8
- 239000000284 extract Substances 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 2
- 230000002401 inhibitory effect Effects 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 7
- 238000003909 pattern recognition Methods 0.000 abstract description 2
- 238000010606 normalization Methods 0.000 description 8
- 238000013527 convolutional neural network Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000004913 activation Effects 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 230000006872 improvement Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 101150064138 MAP1 gene Proteins 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a three-dimensional point cloud semantic segmentation method based on deep learning, and belongs to the field of three-dimensional point cloud and pattern recognition. The method comprises the following steps: training a semantic segmentation neural network model by using a three-dimensional point cloud training set, wherein the labels are real semantic categories, and the semantic segmentation neural network model comprises the following steps: a feature extraction network and a semantic segmentation network; the feature extraction network is used for extracting global features and local features of the three-dimensional point cloud; the semantic segmentation network is used for fusing global features and local features of the point cloud, and the probability that each point corresponding to the output feature map belongs to each semantic category; and inputting the point cloud to be detected into the trained semantic segmentation neural network model to obtain a point cloud segmentation result. According to the method, the local feature extraction module is used for extracting the local features of the point clouds in multiple scales, the channel attention promotion module is used for promoting the attention of important feature channels, unimportant feature channels are restrained, the weighted multi-class loss function is used for optimizing the training effect, and the precision of the semantic segmentation method is improved.
Description
Technical Field
The invention belongs to the field of three-dimensional point cloud and pattern recognition, and particularly relates to a three-dimensional point cloud semantic segmentation method based on deep learning.
Background
The semantic segmentation of the three-dimensional point cloud is the basis of semantic understanding and analysis of a three-dimensional scene, and is a research hotspot in the fields of navigation positioning, mode recognition, unmanned driving and the like. The semantic segmentation algorithm of the three-dimensional point cloud is mainly divided into a traditional feature extraction algorithm and a deep learning algorithm.
The three-dimensional point cloud segmentation algorithm based on traditional feature extraction carries out clustering and classification by extracting features such as boundary gradient, normal vector, surface ratio, texture and the like of the three-dimensional point cloud so as to carry out semantic segmentation.
The three-dimensional point cloud segmentation algorithm based on deep learning is divided into a voxel CNN, a multi-view CNN and a point cloud CNN algorithm. The voxel CNN algorithm needs to convert point cloud into 3D grid and then carries out three-dimensional convolution similar to two-dimension, because one-dimension is added, the complexity of time and space is too high; the multi-view CNN algorithm is that three-dimensional point cloud is mapped into images of multiple visual angles, then the images are segmented by using an image semantic segmentation algorithm and then are fused into the three-dimensional point cloud, and the algorithm ignores the space structure of the point cloud and is difficult to expand into tasks such as scene understanding; the point cloud CNN algorithm directly takes the point cloud as input without conversion, and end-to-end training learning is realized. The point cloud CNN algorithm mainly obtains a semantic segmentation model by training a marked three-dimensional point cloud scene, and completes semantic segmentation by using the semantic segmentation model, and the algorithm has relatively high accuracy and universality and is a hotspot of current research, and in order to perform more accurate semantic understanding, the accuracy of semantic segmentation still has a space for improvement.
Therefore, the existing three-dimensional point cloud semantic segmentation algorithm still has a space for improving the accuracy.
Disclosure of Invention
The invention provides a three-dimensional point cloud semantic segmentation method based on deep learning, aiming at solving the problem of low accuracy of the three-dimensional point cloud semantic segmentation algorithm in the prior art and aiming at improving the accuracy of the current three-dimensional point cloud semantic segmentation algorithm.
In order to achieve the above object, according to a first aspect of the present invention, there is provided a deep learning-based three-dimensional point cloud semantic segmentation method, including the following steps:
s1, training a semantic segmentation neural network model by using a three-dimensional point cloud training set, wherein a training sample is a three-dimensional point cloud, a label is a real semantic category, and the semantic segmentation neural network model comprises the following steps: a feature extraction network and a semantic segmentation network; the feature extraction network is used for extracting global features and local features of the three-dimensional point cloud; the semantic segmentation network is used for fusing global features and local features of the point cloud, and the output feature map corresponds to the probability that each point in the point cloud belongs to each semantic category;
and S2, inputting the point cloud to be detected into the trained semantic segmentation neural network model to obtain a point cloud segmentation result.
Preferably, the feature extraction network takes the point cloud as input and extracts features sequentially through an MLP (64), a first local feature extraction module, a first channel attention promotion module, a second local feature extraction module, a second channel attention promotion module, a connection structure, an MLP (1024) and a maximum pooling layer;
the local feature extraction module is used for extracting multi-scale local features of the point cloud;
the channel attention promoting module is used for extracting global features and point features of the point cloud, focusing attention on a feature channel with large information quantity and inhibiting unimportant channel features;
the connecting structure connects the local features output by the two local feature extraction modules with the point features output by the second channel attention promoting module, and the final features are obtained through MLP (1024) and maximum pooling;
here, the number in parentheses of MLP () represents the number of convolution kernels.
Preferably, the local feature extraction module includes four parallel branches, each parallel branch extracting a local neighborhood feature of one scale, and each branch includes: the system comprises a KNN local neighborhood search module, two multilayer perceptrons 1 × 1MLP and a Softmax function, wherein the local feature extraction module is used for connecting four parallel branch features to obtain the multi-scale local features.
Preferably, the specific structure of the channel attention boosting module is as follows:
a1 x 1mlp (C) for extracting C-dimensional features;
the first branch, the C-dimensional features are processed through a global average pooling layer, a full-connection layer and a Sigmoid function to obtain the weight of each feature channel, and the weight is multiplied by the C-dimensional features to obtain the point features of channel attention improvement;
the second branch is used for obtaining a C-dimensional global feature by performing maximal pooling on the C-dimensional feature, copying the global feature to restore the size of the original C-dimensional feature, and multiplying the global feature by the weight obtained by the first branch to obtain the global feature subjected to channel attention boosting;
here, the number in parentheses of MLP () represents the number of convolution kernels.
Preferably, the semantic segmentation sub-network of the semantic segmentation network obtains the probability of each semantic category of each point in the point cloud through a connection structure, MLP (512), MLP (256), MLP (128), MLP (c), and uses the category with the highest probability as the label of the point, thereby realizing semantic segmentation and obtaining a segmentation result, wherein the number in the parentheses of MLP () represents the number of convolution kernels.
Preferably, in the training stage, the current network weight parameter is calculated and updated, and the weighted multi-class loss function is:
where Loss is the Loss function, fl(x)(x) Is a Softmax function, al(x)(x) For segmenting the value of the network output characteristic diagram corresponding to the position of the point x belonging to the semantic category l (x), K represents the number of the semantic categories, wl(x)(x) Representing the weight of x points belonging to semantic class l (x), N being the total number of points in the set of points Ω input into the network model, Nl(x)The number of dots of the type l (x) is shown.
To achieve the above object, according to a second aspect of the present invention, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method according to the first aspect.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
(1) the invention uses a local feature extraction module to extract local features of a point cloud. Local neighborhoods at different scales have different useful information, and a parallel multi-scale network structure is used for extracting multi-scale features and fully fusing the multi-scale features together. The attention of the important characteristic channel is promoted by using a channel attention promotion module, the unimportant characteristic channel is restrained, and the weighted multi-class loss function is used for optimizing the training effect, so that items in different classes have different learning weights; network overfitting to unbalanced class distribution in the training process is reduced; the class with smaller point number is endowed with larger learning weight, the class with larger point number is endowed with smaller learning weight, the attention of the class with smaller point number ratio is improved, and the segmentation effect is improved by continuously reducing the Loss value during training.
(2) The invention directly takes the point cloud as input for training, and uses the neural network to extract the global characteristics of the input point cloud so as to solve the problem of point cloud disorder. And finally, connecting the global features and the point features and obtaining the classification probability of each point through an MLP layer, thereby realizing the semantic segmentation of the three-dimensional point cloud.
(3) The invention extracts and fuses the characteristics of the point cloud through multilayer convolution operation, and uses batch normalization and activation functions to carry out network optimization, thereby improving the training effect and monitoring the accuracy of the training process, loss functions and other data in real time.
(4) The invention preprocesses the S3DIS data set, including segmenting and sampling the room in the data set, so that each point has 9 dimensionalities of information, and the expanded dimensionality information is beneficial to improving the segmentation precision of network training.
Drawings
FIG. 1 is a flowchart of an embodiment of the present invention for implementing a deep learning-based semantic segmentation method for three-dimensional point clouds;
FIG. 2 is a schematic structural diagram of a three-dimensional semantic segmented deep neural network provided by an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a local feature extraction module in a deep neural network according to an embodiment of the present invention;
fig. 4 is a schematic diagram of KNN neighborhood search in the local feature extraction module according to the embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a channel attention boosting module in a deep neural network according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a connection structure of a convolutional layer, a batch normalization layer, and an activation layer in a deep neural network according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in fig. 1, the invention provides a deep learning-based three-dimensional point cloud semantic segmentation method, which includes:
s1, training a semantic segmentation neural network model by using a three-dimensional point cloud training set, wherein a training sample is a point cloud containing actual three-dimensional coordinates, RGB colors and normalized coordinate information, and a label is a real semantic category.
And (3) carrying out segmentation, sampling and position normalization pretreatment on the indoor three-dimensional point cloud data set, and establishing a training set and a verification set.
The Indoor three-dimensional point cloud data set Stanford large-scale 3D inoor Spaces database (S3DIS) comprises three-dimensional point clouds of 271 rooms, the 271 rooms are divided into six areas of Area1-Area6, Area1-Area 5 are used as training areas, and Area6 is used as a verification Area. Sampling and position normalization are carried out on each room, each room is divided into a plurality of cuboids with 1m x 1m height, when the actual point number N of each cuboid is larger than 4096, sampling is carried out randomly, N is smaller than or equal to 4096, samples are copied randomly, the point number of each cuboid is guaranteed to be 4096, and then position normalization is carried out on each point in each cuboid relative to the corresponding room:
Xnorm=X/Xroom_max
Ynorm=Y/Yroom_max
Znorm=Z/Zroom_max
wherein (X, Y, Z) is the coordinates of a point, Xroom_max、Yroom_max、Zroom_maxRespectively corresponding to the maximum values of the middle points of the room on the X axis, the Y axis and the Z axis (X)norm,norm,Znorm) Is the coordinates of the normalized point.
Further, for each point in the dataset, dimension 9, includes: actual three-dimensional coordinates, RGB color values, normalized coordinates, i.e., [ X, Y, Z, R, G, B, Xnorm,Ynorm,Znorm]。
Further, the semantic categories of the point cloud in the S3DIS dataset are classified into 13 categories, which include: ceiling, floor, wall, beam, column, window, door, table, chair, sofa, bookcase, panel, and others.
As shown in fig. 2, the semantic segmentation neural network model is composed of a feature extraction sub-network and a semantic segmentation sub-network. The network model of the invention is added with a multi-scale local feature extraction module to extract multi-scale local features of the point cloud, a channel attention promotion module to promote the attention of important feature channels, and a weighted multi-class loss function aided training, thereby improving the effect of three-dimensional semantic segmentation.
Feature extraction subnetwork
The feature extraction sub-network mainly comprises: two local feature extraction modules, two channel attention boosting modules, a connection structure, two multi-layer perceptrons 1 × 9MLP (64) and 1 × 1MLP (1024), and a Max Pooling layer Max Pooling. The feature extraction sub-network directly takes the point cloud as input and sequentially passes through an MLP (64), a local feature extraction module 1, a channel attention promotion module 1, a local feature extraction module 2, a channel attention promotion module 2, a connection structure, an MLP (1024) and a Max Pooling layer to extract features. N in the network represents the number of input points, taking 4096, the input to the network is 4096 × 9, i.e., 4096 points, each point having a dimension of 9.
The local feature extraction module is used for extracting multi-scale local features of the point cloud. As shown in fig. 3, the local feature extraction module includes four parallel branches, each parallel branch extracts a local domain feature of one scale, each branch includes a KNN local neighborhood search, two multi-layer perceptrons 1 × 1MLP and a Softmax function, and the module connects the four parallel branch features to obtain the multi-scale local feature. KNN (k) represents KNN neighborhood search, the numbers in parentheses represent the number of points included in each point neighborhood, the diagram of local neighborhood search is shown in fig. 4, and one point feature and the edge of each point feature in the local neighborhood constitute a local neighborhood feature. The first branch extracts point characteristics through a1 x 1MLP (O), and the output characteristic dimension is O; the second branch searches K neighborhood of the point cloud through KNN (K), firstly extracts the feature through 1X 1MLP (O), then extracts the weight of each side in the local feature through 1X 1MLP (1) and a Softmax function, matrix multiplication is carried out on the output of the MLP (O) and the weight to obtain the local feature under the scale of the local neighborhood, and the output feature dimension is O; the third branch searches a 2K neighborhood of the point cloud through KNN (2K), and then outputs a characteristic dimension O as same as the second branch; and in the fourth branch, searching a 3K neighborhood of the point cloud through KNN (3K), and outputting a characteristic dimension O subsequently as in the second branch. And finally, connecting the output features of the four branches to obtain a multi-scale local feature, wherein the dimension of the output feature is 4 x O. Two local feature extraction modules are used in the feature extraction sub-network, wherein O in the first local feature extraction module is 64, O in the second local feature extraction module is 128, and K in the two local feature extraction modules is 16, so that the local features extracted by the local feature extraction modules have four scales of information, namely 1, 16, 32 and 48.
As shown in fig. 5, the channel attention boosting module is used to extract high-dimensional features of the point cloud and focus attention on the feature channels with large information amount, suppressing those unimportant channel features. The channel attention boosting module contains two branches and outputs two features, one global feature and one point feature. Firstly, extracting C-dimensional features through 1 x 1MLP (C); then, from bottom to top, in the first branch, the C-dimensional features are subjected to Average Pooling through a global Average Pooling layer, a full connection layer FC and a Sigmoid function to obtain the weight of each feature channel, and the weight is multiplied by the C-dimensional features to obtain the point features of channel attention boost; and in the second branch, the C-dimensional feature is processed by a Max Pooling with the largest pool to obtain a C-dimensional global feature, the global feature is copied and restored to the size of the original C-dimensional feature, and then the C-dimensional global feature is multiplied by the weight obtained by the first branch to obtain the global feature with the improved channel attention. Two channel attention boosting modules are used in the feature extraction sub-network, C in the first channel attention boosting module takes 64, and global features and point features with dimension of 64 are output; the second channel attention boosting module takes C for 128 and outputs global and point features with dimensions of 128.
The connection structure of the feature extraction sub-network connects the local features output by the two local feature extraction modules with the point features output by the second channel attention boosting module, and then the final global features are obtained through a1 x 1MLP (1024) and a Max Pooling.
The point features and the global features are connected, the local features of the point cloud are not fully utilized, and the local structure information between the points is helpful for improving the precision of semantic segmentation.
Semantic segmentation sub-network
The semantic segmentation sub-network mainly comprises: a connecting structure and four 1 × 1 multi-layer perceptrons MLP (512), MLP (256), MLP (128) and MLP (c), wherein c is the semantic category number 13. MLP is a multilayer perceptron, four 1 × 1 MLPs (512), 256, 128, and (c) represent convolution layers with convolution kernels of 1 × 1, and the number in the parentheses of MLP () represents the number of convolution kernels, which is also the dimension of the output feature. N in the network represents the number of points input, 4096. The semantic segmentation sub-network obtains a segmentation result through the connection structure, MLP (512), MLP (256), MLP (128), MLP (c).
The connection structure of the semantic segmentation sub-network connects the final global feature and the 64-dimensional and 128-dimensional global features from the two channel attention boosting modules with a plurality of levels of point features (896-dimensional features output by the connection structure in the feature extraction sub-network, 1024-dimensional features output by MLP (1024), 1024-dimensional global features output by Max Pooling), and then sequentially passes through 1 × 1MLP (512), 1 × 1MLP (256), 1 × 1MLP (128) and 1 × 1MLP (c), wherein c is a semantic category number to obtain the score of each point in the point cloud in each semantic category, and the category with the largest score is taken as a label of the point, so that the semantic segmentation is realized.
Further, as shown in fig. 6, the convolution layer and the deconvolution layer in the semantic segmentation model are subjected to batch normalization after convolution or deconvolution, and then the ReLU activation function is used. The batch normalization layer is used for solving gradient explosion or disappearance in backward propagation and relieving overfitting, and the activation layer is used for increasing nonlinearity of the neural network model so that the neural network model can approximate any function.
Calculating loss values by using weighted multi-class loss functions based on real classes to obtain prediction errors, performing back propagation by using the prediction errors, calculating and updating current network weight parameters, and calculating and updating network weights for multiple times by using a training set to obtain final network weight parameters, thereby obtaining the trained three-dimensional point cloud semantic segmentation deep neural network model.
The weighted multi-class loss function is:
in the training of the three-dimensional point cloud semantic segmentation deep neural network model, loss values are calculated by using a loss function based on real categories to obtain a prediction error, back propagation is performed by using the prediction error, and current network weight parameters are calculated and updated, wherein the weighted multi-category loss function is as follows:
wherein, Loss is a weighted Softmax cross entropy Loss function which can measure the difference between the predicted value and the true value. The smaller the value of the cross entropy, the better the model prediction effect. f. ofl(x)(x) Is a Softmax function, al(x)(x) To segment the value of the network output feature map for which the corresponding x point position belongs to class l (x), K representing the number of semantic classes, the Softmax function makes the sum of the multi-class prediction probabilities for each position in the feature map 1. w is al(x)(x) Representing the weight of x points belonging to semantic class l (x), N being the total number of data concentration points, Nl(x)The number of dots of the type l (x) is shown. w is alThe weight means that a category having a relatively small number of points is given a large learning weight, and a category having a relatively large number of points is given a small learning weight. And improving the segmentation effect by continuously reducing the value of Loss during training.
The training set consists of the Area1-Area 5 in the preprocessed S3DIS data set. The parameters of the training model are configured as: the point number N of the input point cloud is 4096, and the nine channels comprise XYZ coordinate information, RGB color information and position normalization information. An adam optimizer is adopted during model training, the initial learning rate is 0.001, the momentum is 0.9, the batch is 24, the delay rate is 0.5, the delay step size is 300000, and the maximum iteration number is 50. And monitoring parameters such as loss, IoU, call and the like in the training process, comparing IoU with the historical maximum value IoU after each iteration is finished, if the IoU is larger than the historical maximum value, saving the current training model, and updating the historical maximum value IoU, so that the saved training model is the training model IoU the highest after the training is finished.
And S2, inputting the point cloud to be detected into the trained three-dimensional point cloud semantic segmentation neural network model to obtain a point cloud segmentation result.
When the trained three-dimensional point cloud semantic segmentation model is used for semantic segmentation, the semantic category of each point in the point cloud can be effectively acquired, the accuracy of the semantic segmentation method is improved, training is performed on an S3DIS data set Area1-Area 5, verification is performed on the Area6, the precision reaches 90.14%, and the mIOU reaches 72.83%.
In practical application, the method can more accurately perform semantic segmentation of the three-dimensional point cloud, can realize higher precision compared with the conventional method, and is suitable for complex scenes.
It will be understood by those skilled in the art that the foregoing is only an exemplary embodiment of the present invention, and is not intended to limit the invention to the particular forms disclosed, since various modifications, substitutions and improvements within the spirit and scope of the invention are possible and within the scope of the appended claims.
Claims (6)
1. A three-dimensional point cloud semantic segmentation method based on deep learning is characterized by comprising the following steps:
s1, training a semantic segmentation neural network model by using a three-dimensional point cloud training set, wherein a training sample is a three-dimensional point cloud, a label is a real semantic category, and the semantic segmentation neural network model comprises the following steps: a feature extraction network and a semantic segmentation network;
the feature extraction network is used for extracting global features and local features of the three-dimensional point cloud;
the semantic segmentation network is used for fusing the global features and the local features of the point cloud, and the output feature map corresponds to the probability that each point in the point cloud belongs to each semantic category;
s2, inputting the point cloud to be detected into the trained semantic segmentation neural network model to obtain a point cloud segmentation result;
the feature extraction network takes the point cloud as input and extracts features sequentially through a1 × 9MLP (64), a first local feature extraction module, a first channel attention promotion module, a second local feature extraction module, a second channel attention promotion module, a connection structure, a1 × 1MLP (1024) and a maximum pooling layer; wherein,
the local feature extraction module is used for extracting multi-scale local features of the point cloud;
the channel attention promoting module is used for extracting global features and point features of the point cloud, focusing attention on a feature channel with large information quantity and inhibiting unimportant channel features;
the connecting structure is used for connecting the local features output by the two local feature extraction modules with the point features output by the second channel attention boosting module, and obtaining the final features through 1 x 1MLP (1024) and the maximum pooling layer;
the number in parentheses of MLP () represents the number of convolution kernels.
2. The method of semantic segmentation of three-dimensional point clouds according to claim 1, wherein the local feature extraction module includes four parallel branches, each branch extracting a scale of local neighborhood features,
a first branch for extracting point features through a1 x 1mlp (O), the output feature dimension being O;
the second branch is used for searching K neighborhoods of the point cloud through KNN (K), extracting features through 1X 1MLP (O), extracting the weight of each edge in the local features through 1X 1MLP (1) and a Softmax function, and performing matrix multiplication on the output of the MLP (O) and the weight to obtain the local features under the local neighborhood scale, wherein the output feature dimension is O;
the third branch is used for searching a 2K neighborhood of the point cloud through KNN (2K), and the subsequent characteristic dimension is O as same as that of the second branch;
the fourth branch is used for searching a 3K neighborhood of the point cloud through KNN (3K), and the subsequent output characteristic dimension is O as same as that of the second branch;
and finally, connecting the output features of the four branches to obtain the multi-scale local feature, wherein the dimension of the output feature is 4-O multi-scale local feature.
3. The three-dimensional point cloud semantic segmentation method according to claim 1, wherein the channel attention boosting module has a specific structure as follows:
a1 x 1mlp (C) for extracting C-dimensional features;
the first branch is used for enabling the C-dimensional features to pass through a global average pooling layer, a full connection layer and a Sigmoid function to obtain the weight of each feature channel, and multiplying the weight by the C-dimensional features to obtain point features with improved channel attention;
the second branch is used for obtaining a C-dimensional global feature by performing maximal pooling on the C-dimensional feature, copying the global feature to restore the size of the original C-dimensional feature, and multiplying the global feature by the weight obtained by the first branch to obtain the global feature with the channel attention promoted;
here, the number in parentheses of MLP () represents the number of convolution kernels.
4. The three-dimensional point cloud semantic segmentation method according to any one of claims 1 to 3, wherein the semantic segmentation network semantic segmentation sub-network obtains the probability of each point in the point cloud in each semantic category through a connection structure, 1 x 1MLP (512), 1 x 1MLP (256), 1 x 1MLP (128), and 1 x 1MLP (c), and the category with the highest probability is used as the label of the point, so as to realize semantic segmentation, and obtain the segmentation result, wherein c represents the number of semantic categories.
5. The method for semantic segmentation of three-dimensional point cloud according to any one of claims 1 to 3, wherein in the training stage, the current network weight parameters are calculated and updated, and the weighted multi-class loss function is:
where Loss is the Loss function, fl(x)(x) Is a Softmax function, al(x)(x) For segmenting the value of the network output characteristic diagram corresponding to the position of the point x belonging to the semantic category l (x), K represents the number of the semantic categories, wl(x)(x) Representing the weight of x points belonging to semantic categories l (x), N being the total number of points in the point set omega input into the network model, Nl(x)The number of dots of the type l (x) is shown.
6. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the deep learning based three-dimensional point cloud semantic segmentation method according to any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010190589.1A CN111489358B (en) | 2020-03-18 | 2020-03-18 | Three-dimensional point cloud semantic segmentation method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010190589.1A CN111489358B (en) | 2020-03-18 | 2020-03-18 | Three-dimensional point cloud semantic segmentation method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111489358A CN111489358A (en) | 2020-08-04 |
CN111489358B true CN111489358B (en) | 2022-06-14 |
Family
ID=71810781
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010190589.1A Active CN111489358B (en) | 2020-03-18 | 2020-03-18 | Three-dimensional point cloud semantic segmentation method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111489358B (en) |
Families Citing this family (60)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111931790A (en) * | 2020-08-10 | 2020-11-13 | 武汉慧通智云信息技术有限公司 | Laser point cloud extraction method and device |
CN111882558A (en) * | 2020-08-11 | 2020-11-03 | 上海商汤智能科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN112017196B (en) * | 2020-08-27 | 2022-02-22 | 重庆邮电大学 | Three-dimensional tooth model mesh segmentation method based on local attention mechanism |
CN111950658B (en) * | 2020-08-28 | 2024-02-09 | 南京大学 | Deep learning-based LiDAR point cloud and optical image priori coupling classification method |
CN112070760B (en) * | 2020-09-17 | 2022-11-08 | 安徽大学 | Bone mass detection method based on convolutional neural network |
CN112200248B (en) * | 2020-10-13 | 2023-05-12 | 北京理工大学 | Point cloud semantic segmentation method, system and storage medium based on DBSCAN clustering under urban road environment |
CN112233124B (en) * | 2020-10-14 | 2022-05-17 | 华东交通大学 | Point cloud semantic segmentation method and system based on countermeasure learning and multi-modal learning |
CN112257597B (en) * | 2020-10-22 | 2024-03-15 | 中国人民解放军战略支援部队信息工程大学 | Semantic segmentation method for point cloud data |
CN112330699B (en) * | 2020-11-14 | 2022-09-16 | 重庆邮电大学 | Three-dimensional point cloud segmentation method based on overlapping region alignment |
CN112418235B (en) * | 2020-11-20 | 2023-05-30 | 中南大学 | Point cloud semantic segmentation method based on expansion nearest neighbor feature enhancement |
EP4009236A1 (en) * | 2020-12-02 | 2022-06-08 | Aptiv Technologies Limited | Method for determining a semantic segmentation of an environment of a vehicle |
CN112541535B (en) * | 2020-12-09 | 2024-01-05 | 中国科学院深圳先进技术研究院 | Three-dimensional point cloud classification method based on complementary multi-branch deep learning |
CN112541908B (en) * | 2020-12-18 | 2023-08-29 | 广东工业大学 | Casting flash recognition method based on machine vision and storage medium |
CN112560965B (en) * | 2020-12-18 | 2024-04-05 | 中国科学院深圳先进技术研究院 | Image semantic segmentation method, storage medium and computer device |
CN112560865B (en) * | 2020-12-23 | 2022-08-12 | 清华大学 | Semantic segmentation method for point cloud under outdoor large scene |
CN113160414B (en) * | 2021-01-25 | 2024-06-07 | 北京豆牛网络科技有限公司 | Automatic goods allowance recognition method, device, electronic equipment and computer readable medium |
CN112836734A (en) * | 2021-01-27 | 2021-05-25 | 深圳市华汉伟业科技有限公司 | Heterogeneous data fusion method and device and storage medium |
CN112907602B (en) * | 2021-01-28 | 2022-07-19 | 中北大学 | Three-dimensional scene point cloud segmentation method based on improved K-nearest neighbor algorithm |
CN112819080B (en) * | 2021-02-05 | 2022-09-02 | 四川大学 | High-precision universal three-dimensional point cloud identification method |
CN112837420B (en) * | 2021-03-09 | 2024-01-09 | 西北大学 | Shape complement method and system for terracotta soldiers and horses point cloud based on multi-scale and folding structure |
CN113011430B (en) * | 2021-03-23 | 2023-01-20 | 中国科学院自动化研究所 | Large-scale point cloud semantic segmentation method and system |
CN113850811B (en) * | 2021-03-25 | 2024-05-28 | 北京大学 | Three-dimensional point cloud instance segmentation method based on multi-scale clustering and mask scoring |
CN113012177A (en) * | 2021-04-02 | 2021-06-22 | 上海交通大学 | Three-dimensional point cloud segmentation method based on geometric feature extraction and edge perception coding |
CN113421267B (en) * | 2021-05-07 | 2024-04-12 | 江苏大学 | Point cloud semantic and instance joint segmentation method and system based on improved PointConv |
CN113345101B (en) * | 2021-05-20 | 2023-07-25 | 北京百度网讯科技有限公司 | Three-dimensional point cloud labeling method, device, equipment and storage medium |
CN113159232A (en) * | 2021-05-21 | 2021-07-23 | 西南大学 | Three-dimensional target classification and segmentation method |
CN113177555B (en) * | 2021-05-21 | 2022-11-04 | 西南大学 | Target processing method and device based on cross-level, cross-scale and cross-attention mechanism |
CN113554654B (en) * | 2021-06-07 | 2024-03-22 | 之江实验室 | Point cloud feature extraction system and classification segmentation method based on graph neural network |
CN113554653B (en) * | 2021-06-07 | 2024-10-29 | 之江实验室 | Semantic segmentation method based on mutual information calibration point cloud data long tail distribution |
CN113435461B (en) * | 2021-06-11 | 2023-07-14 | 深圳市规划和自然资源数据管理中心(深圳市空间地理信息中心) | Point cloud local feature extraction method, device, equipment and storage medium |
CN113361538B (en) * | 2021-06-22 | 2022-09-02 | 中国科学技术大学 | Point cloud classification and segmentation method and system based on self-adaptive selection neighborhood |
CN113516663B (en) * | 2021-06-30 | 2022-09-27 | 同济大学 | Point cloud semantic segmentation method and device, electronic equipment and storage medium |
CN113657387B (en) * | 2021-07-07 | 2023-10-13 | 复旦大学 | Semi-supervised three-dimensional point cloud semantic segmentation method based on neural network |
CN113538474B (en) * | 2021-07-12 | 2023-08-22 | 大连民族大学 | 3D point cloud segmentation target detection system based on edge feature fusion |
CN113449744A (en) * | 2021-07-15 | 2021-09-28 | 东南大学 | Three-dimensional point cloud semantic segmentation method based on depth feature expression |
CN113744186B (en) * | 2021-07-26 | 2024-09-24 | 南开大学 | Method for detecting surface defects of workpiece by fusing projection point set segmentation network |
CN113486988B (en) * | 2021-08-04 | 2022-02-15 | 广东工业大学 | Point cloud completion device and method based on adaptive self-attention transformation network |
CN113627440A (en) * | 2021-08-14 | 2021-11-09 | 张冉 | Large-scale point cloud semantic segmentation method based on lightweight neural network |
CN113781432B (en) * | 2021-09-10 | 2023-11-21 | 浙江大学 | Laser scanning automatic laying on-line detection method and device based on deep learning |
CN113807233B (en) * | 2021-09-14 | 2023-04-07 | 电子科技大学 | Point cloud feature extraction method, classification method and segmentation method based on high-order term reference surface learning |
CN113870272B (en) * | 2021-10-12 | 2024-10-08 | 中国联合网络通信集团有限公司 | Point cloud segmentation method and device and computer readable storage medium |
CN114241226A (en) * | 2021-12-07 | 2022-03-25 | 电子科技大学 | Three-dimensional point cloud semantic segmentation method based on multi-neighborhood characteristics of hybrid model |
CN114419372B (en) * | 2022-01-13 | 2024-11-01 | 南京邮电大学 | Multi-scale point cloud classification method and system |
CN114529757B (en) * | 2022-01-21 | 2023-04-18 | 四川大学 | Cross-modal single-sample three-dimensional point cloud segmentation method |
CN114359562B (en) * | 2022-03-20 | 2022-06-17 | 宁波博登智能科技有限公司 | Automatic semantic segmentation and labeling system and method for four-dimensional point cloud |
CN114419570B (en) * | 2022-03-28 | 2023-04-07 | 苏州浪潮智能科技有限公司 | Point cloud data identification method and device, electronic equipment and storage medium |
CN114419381B (en) * | 2022-04-01 | 2022-06-24 | 城云科技(中国)有限公司 | Semantic segmentation method and road ponding detection method and device applying same |
CN114792372B (en) * | 2022-06-22 | 2022-11-04 | 广东工业大学 | Three-dimensional point cloud semantic segmentation method and system based on multi-head two-stage attention |
CN115170585B (en) * | 2022-07-12 | 2024-06-14 | 上海人工智能创新中心 | Three-dimensional point cloud semantic segmentation method |
CN115294562B (en) * | 2022-07-19 | 2023-05-09 | 广西大学 | Intelligent sensing method for operation environment of plant protection robot |
CN115311274B (en) * | 2022-10-11 | 2022-12-23 | 四川路桥华东建设有限责任公司 | Weld joint detection method and system based on spatial transformation self-attention module |
CN115471513B (en) * | 2022-11-01 | 2023-03-31 | 小米汽车科技有限公司 | Point cloud segmentation method and device |
CN115578393B (en) * | 2022-12-09 | 2023-03-10 | 腾讯科技(深圳)有限公司 | Key point detection method, key point training method, key point detection device, key point training device, key point detection equipment, key point detection medium and key point detection medium |
CN116129207B (en) * | 2023-04-18 | 2023-08-04 | 江西师范大学 | Image data processing method for attention of multi-scale channel |
CN116246039B (en) * | 2023-05-12 | 2023-07-14 | 中国空气动力研究与发展中心计算空气动力研究所 | Three-dimensional flow field grid classification segmentation method based on deep learning |
CN116824188B (en) * | 2023-06-05 | 2024-04-09 | 腾晖科技建筑智能(深圳)有限公司 | Hanging object type identification method and system based on multi-neural network integrated learning |
CN116524197B (en) * | 2023-06-30 | 2023-09-29 | 厦门微亚智能科技股份有限公司 | Point cloud segmentation method, device and equipment combining edge points and depth network |
CN117708560A (en) * | 2023-11-27 | 2024-03-15 | 云南电网有限责任公司昭通供电局 | Multi-information PointNet++ fusion method for constructing DEM (digital elevation model) based on airborne laser radar data |
CN118154929B (en) * | 2024-01-03 | 2024-09-03 | 华中科技大学 | Rock-soil particle three-dimensional point cloud semantic classification method |
CN118351332B (en) * | 2024-06-18 | 2024-08-20 | 山东财经大学 | Automatic driving vehicle three-dimensional feature extraction method and system based on point cloud data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110197223A (en) * | 2019-05-29 | 2019-09-03 | 北方民族大学 | Point cloud data classification method based on deep learning |
CN110660062A (en) * | 2019-08-31 | 2020-01-07 | 南京理工大学 | Point cloud instance segmentation method and system based on PointNet |
WO2020036742A1 (en) * | 2018-08-17 | 2020-02-20 | Nec Laboratories America, Inc. | Dense three-dimensional correspondence estimation with multi-level metric learning and hierarchical matching |
CN110827398A (en) * | 2019-11-04 | 2020-02-21 | 北京建筑大学 | Indoor three-dimensional point cloud automatic semantic segmentation algorithm based on deep neural network |
-
2020
- 2020-03-18 CN CN202010190589.1A patent/CN111489358B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020036742A1 (en) * | 2018-08-17 | 2020-02-20 | Nec Laboratories America, Inc. | Dense three-dimensional correspondence estimation with multi-level metric learning and hierarchical matching |
CN110197223A (en) * | 2019-05-29 | 2019-09-03 | 北方民族大学 | Point cloud data classification method based on deep learning |
CN110660062A (en) * | 2019-08-31 | 2020-01-07 | 南京理工大学 | Point cloud instance segmentation method and system based on PointNet |
CN110827398A (en) * | 2019-11-04 | 2020-02-21 | 北京建筑大学 | Indoor three-dimensional point cloud automatic semantic segmentation algorithm based on deep neural network |
Non-Patent Citations (1)
Title |
---|
A multi-scale fully convolutional network for semantic labeling of 3D point clouds;Mohammed Yousefhussien等;《ISPRS》;20180516;第191-204页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111489358A (en) | 2020-08-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111489358B (en) | Three-dimensional point cloud semantic segmentation method based on deep learning | |
CN110930454B (en) | Six-degree-of-freedom pose estimation algorithm based on boundary box outer key point positioning | |
CN109919108B (en) | Remote sensing image rapid target detection method based on deep hash auxiliary network | |
WO2022252274A1 (en) | Point cloud segmentation and virtual environment generation method and apparatus based on pointnet network | |
CN110532920B (en) | Face recognition method for small-quantity data set based on FaceNet method | |
CN111242208A (en) | Point cloud classification method, point cloud segmentation method and related equipment | |
CN111898432B (en) | Pedestrian detection system and method based on improved YOLOv3 algorithm | |
CN112907602B (en) | Three-dimensional scene point cloud segmentation method based on improved K-nearest neighbor algorithm | |
CN111583263A (en) | Point cloud segmentation method based on joint dynamic graph convolution | |
CN111625667A (en) | Three-dimensional model cross-domain retrieval method and system based on complex background image | |
CN111612008A (en) | Image segmentation method based on convolution network | |
CN113435461B (en) | Point cloud local feature extraction method, device, equipment and storage medium | |
CN110263855B (en) | Method for classifying images by utilizing common-basis capsule projection | |
CN110532409B (en) | Image retrieval method based on heterogeneous bilinear attention network | |
CN115222998B (en) | Image classification method | |
CN112364747B (en) | Target detection method under limited sample | |
CN113989340A (en) | Point cloud registration method based on distribution | |
CN111899203A (en) | Real image generation method based on label graph under unsupervised training and storage medium | |
CN114187506B (en) | Remote sensing image scene classification method of viewpoint-aware dynamic routing capsule network | |
CN115049833A (en) | Point cloud component segmentation method based on local feature enhancement and similarity measurement | |
CN114066844A (en) | Pneumonia X-ray image analysis model and method based on attention superposition and feature fusion | |
CN117671666A (en) | Target identification method based on self-adaptive graph convolution neural network | |
CN117173595A (en) | Unmanned aerial vehicle aerial image target detection method based on improved YOLOv7 | |
CN116597267A (en) | Image recognition method, device, computer equipment and storage medium | |
Li et al. | Few-shot meta-learning on point cloud for semantic segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |