CN111027559A - Point cloud semantic segmentation method based on expansion point convolution space pyramid pooling - Google Patents
Point cloud semantic segmentation method based on expansion point convolution space pyramid pooling Download PDFInfo
- Publication number
- CN111027559A CN111027559A CN201911048539.3A CN201911048539A CN111027559A CN 111027559 A CN111027559 A CN 111027559A CN 201911048539 A CN201911048539 A CN 201911048539A CN 111027559 A CN111027559 A CN 111027559A
- Authority
- CN
- China
- Prior art keywords
- point
- point cloud
- convolution
- expansion
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a point cloud semantic segmentation method based on expansion point convolution space pyramid pooling, which comprises the steps of firstly obtaining point cloud subset central points through a farthest point sampling algorithm, and determining a subset range by utilizing a KNN algorithm; then, cloud subset features of each point are extracted in a pyramid mode through the expansion point convolution space, the receptive field of point convolution is increased, and feature extraction of scene multi-scale targets is enriched; secondly, a simple and effective decoding module is adopted to realize feature decoding, so that the segmentation precision of sparse point cloud is improved; and finally, realizing the label classification of each point cloud through a full connection layer. The point cloud semantic segmentation method has the outstanding advantages of high segmentation precision, various adaptive scenes and the like.
Description
Technical Field
The invention belongs to the field of computer vision, and relates to a 3D semantic segmentation method based on expansion point convolution space pyramid pooling.
Background
Point cloud semantic segmentation is one of the main research difficulties and hot spots of a 3D scene analysis technology, and how to efficiently and quickly acquire local features, global features and scene context information of a point cloud becomes an urgent problem to be solved. The point cloud semantic segmentation realizes point-by-point classification by acquiring scene point cloud characteristics, and achieves the purpose of scene analysis. However, the scene point cloud has disorder, sparsity and density unevenness, and the scene target has multi-scale characteristics, which all seriously affect the acquisition of the point cloud characteristics. At present, the schemes for semantic segmentation of point cloud mainly include a multi-view scheme, a voxelization scheme, and a scheme for directly processing point cloud. The multi-view scheme acquires images of a point cloud scene from different views by using a projection mode and inputs the images into a traditional 2D convolutional neural network; the voxelization scheme divides the point cloud into 3D grids, and extracts features by using a 3D convolutional neural network; the two schemes convert irregular point clouds into regular data, so that the limitation of the point clouds is avoided to a certain extent, but partial geometric information of scene point clouds is lost, quantization errors are introduced, and the segmentation precision depends on the performance of a traditional convolutional neural network. The scheme of directly processing the point cloud has received more and more attention because the point cloud information is retained to the maximum. In order to obtain context information of different scales, a point cloud semantic segmentation network usually adopts a point cloud multi-scale grouping mode, which is easy to cause more calculation cost. In addition, the local feature and context information acquisition capability of the existing point cloud semantic segmentation network still needs to be improved.
How to improve the local feature and context information acquisition capability and reduce the network computing cost is a main urgent technical problem to be solved in the field.
Disclosure of Invention
Aiming at the problems, the invention provides a point cloud semantic segmentation method based on expanded point convolution space pyramid pooling.
A point cloud semantic segmentation method based on an expansion point convolution space pyramid comprises the following steps:
step 1: based on ScanNet data set point cloud, adopting a farthest point sampling algorithm to obtain point cloud subset center points;
the input point clouds of the network are respectively P { P1,p2,p3…,pnF, selecting a subset P from the cloud of input points using an iterative farthest point sampling algorithmsub-i{Pi1,Pi2,Pi3,…,PimIs such that P isijFurthest from other points in the subset;
step 2: determining the range of the point cloud subset by using a nearest neighbor algorithm based on the point cloud subset center point obtained in the step 1;
the input forms are P x (D + C) and P1Matrix information of x D, output form P1×K1X (D + C) matrix information. Wherein P is the number of the input point clouds, P1The number of the central points is sampled, D is D-dimensional coordinate information of each point, C is C-dimensional point characteristic information, K1The number of the neighborhood points of the central point is;
searching out K by KNN algorithm1And each neighborhood point closest to the central point is sorted and numbered according to the distance from each neighborhood point to the central point. K1The individual domain points and the center point constitute a center point neighborhood, also referred to as a local neighborhood.
And step 3: pyramid pooling extraction of local neighborhood features F using improved expansion point convolution space1And obtaining P1An image extraction point;
input form is P1×K1Matrix information of x (D + C), output form P1Matrix information of x (D + C "). Wherein, C' is a point characteristic dimension obtained by pyramid pooling of the improved expansion point convolution space in a local neighborhood abstraction;
and 4, step 4: based on P obtained in step 31And sampling and grouping the abstract point clouds.
The step is input in the form of P1Matrix information of x (D + C'), output form P2×K2X (D + C'); wherein, P2The number of the central points of the second down-sampling, K2The number of neighborhood points of the central point is sampled for the second time; based on P1Repeating the steps 1 and 2 to obtain P2A central point of down sampling, K2A central point domain point, get P2A local neighborhood;
and 5: PointNet extracting local field feature F2;
Input form is P2×K2Matrix of x (D + C'), output form P2X (D + C "); wherein, C' is a point characteristic dimension abstracted by PointNet in a local field;
step 6: decoding P2Each contains F2Abstract point cloud of features, to obtain P1Each contains F3An abstract point cloud of features;
input form is P2Matrix information of x (D + C ″), output form P1X (D + C'); wherein C' is a characteristic dimension of the decoded point cloud;
and 7: decoding P1Each contains F3Obtaining P abstract point clouds containing F4An abstract point cloud of features;
input form is P1X (D + C') matrix information, output form P1X (D + C ""). Wherein, C "" is the characteristic dimension of the decoded point cloud;
and 8: and obtaining the label of each point cloud by adopting the full connection layer.
This step inputs the form P × (D + C ""), and outputs P × k. Where k is the number of scene point cloud categories.
Further, the pyramid pooling extraction of the point cloud local neighborhood characteristics by the improved expansion point convolution space comprises the following three steps: 1) improving the conventional dilation point convolution; 2) respectively extracting local field features from improved expansion point convolution channels with different expansion rates; 3) fusing and adding the characteristics of all channels; 4) reducing dimension of the features;
the specific extraction process is as follows:
1) replacing a point convolution kernel function (MLP) and improving the expansion point convolution;
the continuous convolution of a conventional dilated point convolution is defined as:
where H is a continuous characteristic function, given by pjAssigning a feature value; g is a continuous kernel function, pjTo piDistance is mapped as kernel weight;using monte carlo integration, the continuous convolution definition translates into:
wherein d is the expansion rate of the expansion point convolution; the infinite kernel G (·) is replaced with a multi-layered perceptron:
wherein p is the relative position of the neighborhood point and the central point, and the Euclidean distance is used; θ is a series of parameters of MLP.
In order to obtain more local neighborhood characteristics, the local neighborhood point characteristics are abstracted to a higher dimension to obtain redundant information, so that the improved expansion point convolution replaces an infinite kernel function g (-) by PointNet:
wherein, PN is a PointNet network, and theta' is a series of parameters of PN;
2) extracting local neighborhood characteristics by convolution of each improved expansion point;
input form is P1×K1Matrix information of x (D + C), output form P1×(D+Ci) (ii) a Wherein, CiObtaining a point characteristic dimension abstracted in a local neighborhood for the ith improved expansion point convolution;
the space pyramid pooling has i channels, each channel carries an improved expansion point convolution with an expansion rate d1,d2,…,diThen, the content information P of different scales of the i groups of local neighborhoods is obtained1×(D+Ci);
3) Fusing characteristic information extracted by each improved expansion point convolution network;
input form is P1×(D+Ci) The output form is sigma P1×(D+Ci) (ii) a Wherein i is space goldThe number of character tower pooling channels is also the improved expansion point convolution number; fusing content information of different scales by a splicing method;
4) reducing the dimension characteristic information;
the input form is output form is sigma P1×(D+Ci) The output form is P1×(D+C′)。
With the increase of the number of channels, the local neighborhood characteristic number is increased by i times, and the calculation cost of the rear-end coding of the coding layer is increased. Therefore, 1 × 1 convolution operation is performed on the fused feature information, and feature dimensionality is reduced.
Compared with the traditional point convolution, the improved expansion point convolution enlarges the receptive field and obtains more scene context information without increasing the convolution calculation cost. In addition, spatial pyramid pooling can effectively encode scene multi-scale content information. Therefore, the method combines the advantages of the improved expansion point convolution and the spatial pyramid pooling, ensures the convolution calculation cost, and efficiently extracts the local neighborhood characteristics and the context information.
Further, PointNet is selected as a feature extractor of local neighborhood point cloud, and the working principle of the PointNet feature extractor is as follows:
given a set of unordered local neighborhood points { pl1,pl1,…,plnH' may be defined to map the set of point clouds into a vector:
where γ and h are typically MLP.
PointNet [ document 1 ] is often used as a point cloud feature extractor, which ensures the translational infeasibility of disordered point clouds, can abstract low-dimensional point features into rich high-dimensional semantic features, and improves segmentation accuracy.
Further, the decoding process in step 6 is as follows:
1) interpolation and upsampling: based on P2Sampling the abstract point cloud by interpolation algorithm to obtain P1An abstract point cloud with point cloud characteristics of F2. The step is inputP2X (D + C') matrix, output P1×(D+C″);
2) Jump link fusion feature: p in step 3 is converted into a link-hopping mode1An abstract point cloud characteristic F1With up-sampled P1An abstract point cloud characteristic F2Fusion addition; the step inputs P1X (D + C') and P1X (D + C'), and output P1×(D+C″+C′);
3) And (3) decoding: decoding the point cloud based on the fusion characteristics by using a Unit PointNet;
the step inputs P1X (D + C "+ C'), output P1×(D+C″′)。
Further, the decoding process in step 7 is as follows:
1) interpolation and upsampling: input P1X (D + C ') matrix, output P x (D + C');
2) jump link fusion feature: inputting P x (D + C ') and P x (D + C), and outputting P x (D + C' + C);
3) and (3) decoding: p × (D + C '+ C) is input, and P × (D + C') is output.
Advantageous effects
The point cloud semantic segmentation method based on the expansion point convolution space pyramid pooling firstly obtains a point cloud subset central point through a farthest point sampling algorithm, and determines a subset range by utilizing a KNN algorithm; then, cloud subset features of each point are extracted in a pyramid mode through the expansion point convolution space, the receptive field of point convolution is increased, and feature extraction of scene multi-scale targets is enriched; secondly, a simple and effective decoding module is used for realizing feature decoding, and the segmentation precision of the sparse point cloud is improved; and finally, realizing the label classification of each point cloud through a full connection layer. The point cloud semantic segmentation method has the outstanding advantages of high segmentation precision, various adaptive scenes and the like.
The invention can realize the semantic segmentation of the irregular, sparse and uneven-density point cloud, has the advantages of high segmentation precision, low calculation cost, multiple adaptive scenes and the like, and effectively solves the problems of low acquisition efficiency and high calculation cost of local characteristics and context information of the point cloud in the indoor and outdoor scene semantic segmentation technology.
Compared with the existing point cloud semantic segmentation network, the invention has the advantages that:
1) the expansion point convolution and the PointNet are combined, an improved expansion point convolution is provided, and the acquisition capability of the local characteristics and the context information of the point cloud is improved;
2) inspiring the space pyramid pooling, the invention provides the improved expansion point convolution space pyramid pooling, which effectively encodes multi-scale context information, enriches point cloud characteristics and improves scene semantic segmentation precision.
3) The invention provides an improved coding layer fused with expansion point convolution space pyramid pooling, which is placed at the front end of the coding layer in a pyramid pooling mode, so that the loss of point cloud high-dimensional features is avoided, and the segmentation of scene small targets is facilitated.
4) The invention provides a simple and effective decoding layer, which is used for sampling point cloud high-dimensional features of the coding layer and adding the point cloud high-dimensional features to point cloud low-dimensional features, so that scene detail feature information is enriched, and the segmentation precision is improved.
Drawings
FIG. 1 is a block diagram of the overall network of the present invention;
FIG. 2 is a diagram of a conventional point convolution, an extended point convolution and an improved extended point convolution;
FIG. 3 is an expanded point convolution spatial pyramid pooling;
FIG. 4PointNet feature extractor
FIG. 5 is a point cloud semantic segmentation network framework.
Detailed Description
The present invention will be described in further detail below with reference to the accompanying drawings.
The point cloud data related by the invention can adopt an indoor scene common data set ScanNet and an outdoor scene common data set Semantic3D and the like. The ScanNet data set provides various indoor scenes such as offices, apartments, bedrooms and the like, and the point cloud data is acquired by the RGB-D camera and comprises coordinate information, color information and an alpha channel P (x, y, z, r, g, b and alpha) of points. The sematic 3D dataset provides many types of outdoor scenes such as farms, broadacres, castle, etc., and the point cloud data is collected by a static ground laser scanner and contains coordinate information, laser reflection intensity and color information P (x, y, z, intensity, r, g, b) of points. As an application example, the test effect based on the ScanNet public data set is given.
Fig. 1 shows a flowchart of the present invention, and a point cloud semantic segmentation method based on an extended point convolution space pyramid includes the following steps:
step 1: based on ScanNet data set point cloud, adopting a farthest point sampling algorithm to obtain point cloud subset center points;
the input point clouds of the network are respectively P { P1,p2,p3…,pnF, selecting a subset P from the cloud of input points using an iterative farthest point sampling algorithmsub-i{Pi1,Pi2,Pi3,…,PimIs such that P isijFurthest from the other points in the subset. On the premise of giving the same number of central points, compared with a random point sampling algorithm, the farthest point sampling algorithm has stronger capability of covering all input point clouds.
Step 2: determining the range of the point cloud subset by using a nearest neighbor algorithm (KNN) based on the point cloud subset center point obtained in the step 1;
the input form of this step is P × (D + C) and P1Matrix information of x D, output form P1×K1X (D + C) matrix information. Wherein P is the number of the input point clouds, P1The number of the central points is sampled, D is D-dimensional coordinate information of each point, C is C-dimensional point characteristic information, K1The number of the neighborhood points of the central point.
Searching out K by KNN algorithm1And each neighborhood point closest to the central point is sorted and numbered according to the distance from each neighborhood point to the central point. K1The individual domain points and the center point constitute a center point neighborhood, also referred to as a local neighborhood.
And step 3: pyramid pooling extraction of local neighborhood features F using improved expansion point convolution space1And obtaining P1An image extraction point;
the input form of the step is P1×K1Matrix information of x (D + C), output form P1Matrix information of x (D + C "). Wherein, C' is a point characteristic dimension abstracted in a local neighborhood by the pyramid pooling of the improved expansion point convolution space.
Compared with the traditional point convolution, the improved expansion point convolution enlarges the receptive field and obtains more scene context information without increasing the convolution calculation cost. In addition, spatial pyramid pooling can effectively encode scene multi-scale content information. Therefore, the method combines the advantages of the improved expansion point convolution and the spatial pyramid pooling, ensures the convolution calculation cost, and efficiently extracts the local neighborhood characteristics and the context information. The pyramid pooling extraction of the local neighborhood characteristics of the point cloud by the improved expansion point convolution space mainly comprises the following three steps: 1) improving the conventional dilation point convolution; 2) respectively extracting local field features from improved expansion point convolution channels with different expansion rates; 3) merging and adding the characteristics of all channels; 4) and (5) reducing the dimension of the feature. The specific extraction process is as follows:
1) replacing a point convolution kernel function (MLP) and improving the expansion point convolution;
as shown in fig. 2, the continuous convolution of the conventional dilated point convolution is defined as:
where H is a continuous characteristic function, given by pjAssigning a feature value; g is a continuous kernel function, pjTo piDistance is mapped as kernel weight; in most practical applications, the characteristic function F is not completely known, and with monte carlo integration, the continuous convolution definition can be approximately converted into:
wherein d is the expansion rate of the expansion point convolution; here the infinite kernel function G (·) is replaced with a multi-layered perceptron (MLP):
wherein p is the relative position of the neighborhood point and the central point, and the Euclidean distance is used; θ is a series of parameters of MLP.
In order to obtain more local neighborhood characteristics, abstracting the local neighborhood point characteristics to a higher dimension to obtain redundant information, so that the modified expansion point convolution replaces an infinite kernel function g (-) with PointNet, and a PointNet characteristic extractor will be described in detail in step 5:
wherein, PN is PointNet network, and theta' is a series of parameters of PN.
2) Extracting local neighborhood characteristics by convolution of each improved expansion point;
the input form of this step is P1×K1Matrix information of x (D + C), output form P1×(D+ Ci). Wherein, CiAnd (4) convolving the feature dimension of the point abstracted in the local neighborhood for the ith improved expansion point.
As shown in FIG. 3, the spatial pyramid pooling has i channels, each of which carries an improved dilation point convolution with a dilation rate d1,d2,…,diThen, the content information P of i groups of local neighborhoods with different scales is obtained1×(D+Ci)。
3) Fusing characteristic information extracted by each improved expansion point convolution network;
the step is input in the form of P1×(D+Ci) The output form is sigma P1×(D+Ci). Wherein, i is the number of the spatial pyramid pooling channels and is the number of the improved expansion point convolution. And fusing content information of different scales by a splicing method.
4) And (5) reducing the dimension characteristic information.
The step inputs the form of output as sigma P1×(D+Ci) The output form is P1×(D+C′)。With the increase of the number of channels, the local neighborhood characteristic number is increased by i times, and the calculation cost of the coding at the rear end of the coding layer is increased. Therefore, 1 × 1 convolution operation is performed on the fused feature information, and feature dimensionality is reduced.
And 4, step 4: based on P obtained in step 31And sampling and grouping the abstract point clouds.
The step is input in the form of P1Matrix information of x (D + C'), output form P2×K2X (D + C'). Wherein, P2The number of the central points of the second down-sampling, K2And the number of the neighborhood points of the central point is sampled for the second time. Based on P1Repeating the steps 1 and 2 to obtain P2A central point of down sampling, K2A central point domain point, get P2A local neighborhood.
And 5: PointNet extracting local field feature F2;
The step is input in the form of P2×K2Matrix of x (D + C'), output form P2X (D + C "). Wherein, C' is a point characteristic dimension abstracted by PointNet in a local field.
PointNet [ document 1 ] is often used as a point cloud feature extractor, which ensures the translational infeasibility of disordered point clouds, can abstract low-dimensional point features into rich high-dimensional semantic features, and improves segmentation accuracy. According to the invention, PointNet is selected as a feature extractor of local neighborhood point cloud, as shown in FIG. 4, the working principle of the PointNet feature extractor is as follows.
Given a set of unordered local neighborhood points { pl1,pl1,…,plnH' may be defined to map the set of point clouds into a vector:
where γ and h are typically MLP.
Step 6: decoding P2Each contains F2Abstract point cloud of features, to obtain P1Each contains F3An abstract point cloud of features;
the step is input in the form of P2Matrix information of x (D + C ″), output form P1X (D + C'). Wherein, C' is the characteristic dimension of the decoded point cloud. The decoding layer mainly comprises three steps: 1) interpolation and upsampling: based on P2Sampling the abstract point cloud by interpolation algorithm to obtain P1An abstract point cloud with point cloud characteristics of F2. The step is input with P2X (D + C') matrix, output P1X (D + C "). 2) Jump link fusion feature: p in step 3 is connected in a hopping link mode1An abstract point cloud characteristic F1With up-sampled P1An abstract point cloud characteristic F2Fused addition. The step inputs P1X (D + C') and P1X (D + C'), and output P1X (D + C "+ C'). 3) And (3) decoding: and (4) decoding the point cloud based on the fusion characteristics by using Unit PointNet (document 1). The step inputs P1X (D + C "+ C'), output P1×(D+C″′)。
And 7: decoding P1Each contains F3Obtaining P abstract point clouds containing F4An abstract point cloud of features;
the step is input in the form of P1X (D + C') matrix information, output form P1X (D + C ""). Wherein, C "" is the characteristic dimension of the decoded point cloud. The decoding process is shown in step 6, and the data stream form is as follows: 1) Interpolation and upsampling: input P1X (D + C ') matrix, output P X (D + C'). 2) Jump link fusion feature: p (D + C') and P (D + C) are input, and P (D + C) is output. 3) And (3) decoding: p × (D + C '+ C) is input, and P × (D + C') is output.
And 8: and obtaining the label of each point cloud by adopting the full connection layer.
This step inputs the form P × (D + C ""), and outputs P × k. Where k is the number of scene point cloud categories.
Charles R Q, Hao S, Mo K, et al.PointNet: Deep Learning on pointSets for 3D Classification and Segmentation [ C ]// IEEE Conference on computer Vision & Pattern recognition.2017.
Claims (5)
1. A point cloud semantic segmentation method based on an expansion point convolution space pyramid is characterized by comprising the following steps:
step 1: based on ScanNet data set point cloud, adopting a farthest point sampling algorithm to obtain point cloud subset center points;
the input point clouds of the network are respectively P { P1,p2,p3…,pnF, selecting a subset from the cloud of input points using an iterative furthest point sampling algorithmSo thatFurthest from other points in the subset;
step 2: determining the range of the point cloud subset by using a nearest neighbor algorithm based on the point cloud subset center point obtained in the step 1;
the input forms are P x (D + C) and P1Matrix information of x D, output form P1×K1X (D + C) matrix information. Wherein P is the number of the input point clouds, P1The number of the central points is sampled, D is D-dimensional coordinate information of each point, C is C-dimensional point characteristic information, K1The number of the neighborhood points of the central point is;
and step 3: pyramid pooling extraction of local neighborhood features F using improved expansion point convolution space1And obtaining P1An image extraction point;
input form is P1×K1Matrix information of x (D + C), output form P1Matrix information of x (D + C "). Wherein, C' is a point characteristic dimension obtained by pyramid pooling of the improved expansion point convolution space in a local neighborhood abstraction;
and 4, step 4: based on P obtained in step 31And sampling and grouping the abstract point clouds.
The step is input in the form of P1Matrix information of x (D + C'), output form P2×K2X (D + C'); wherein,P2The number of the central points of the second down-sampling, K2The number of neighborhood points of the central point is sampled for the second time; based on P1Repeating the steps 1 and 2 to obtain P2A central point of down sampling, K2A central point domain point, to obtain P2A local neighborhood;
and 5: PointNet extracting local field feature F2;
Input form is P2×K2Matrix of x (D + C'), output form P2X (D + C "); wherein, C' is a point characteristic dimension abstracted by PointNet in a local field;
step 6: decoding P2Each contains F2Abstract point cloud of features, to obtain P1Each contains F3An abstract point cloud of features;
input form is P2Matrix information of x (D + C ″), output form P1X (D + C'); wherein C' is a characteristic dimension of the decoded point cloud;
and 7: decoding P1Each contains F3Obtaining P abstract point clouds containing F4An abstract point cloud of features;
input form is P1X (D + C') matrix information, output form P1X (D + C ""). Wherein, C "" is the characteristic dimension of the decoded point cloud;
and 8: obtaining a label of each point cloud by adopting a full connection layer;
input form P × (D + C ""), output P × k; where k is the number of scene point cloud categories.
2. The method of claim 1, wherein the improved expanded point convolution space pyramid extraction of the point cloud local neighborhood features comprises the following three steps: 1) improving the conventional dilation point convolution; 2) respectively extracting local field features from improved expansion point convolution channels with different expansion rates; 3) fusing and adding the characteristics of all channels; 4) reducing dimension of the features;
the specific extraction process is as follows:
1) replacing a point convolution kernel function (MLP) and improving the expansion point convolution;
the continuous convolution of a conventional dilated point convolution is defined as:
where H is a continuous characteristic function, given by pjAssigning a feature value; g is a continuous kernel function, pjTo piDistance mapping is used as kernel weight; using monte carlo integration, the continuous convolution definition translates into:
wherein d is the expansion rate of the expansion point convolution; the infinite kernel G (·) is replaced with a multi-layered perceptron:
wherein p is the relative position of the neighborhood point and the central point, and the Euclidean distance is used; θ is a series of parameters of MLP.
In order to obtain more local neighborhood characteristics, the local neighborhood point characteristics are abstracted to a higher dimension to obtain redundant information, so that the improved expansion point convolution replaces an infinite kernel function g (-) by PointNet:
wherein, PN is a PointNet network, and theta' is a series of parameters of PN;
2) extracting local neighborhood characteristics by convolution of each improved expansion point;
input form is P1×K1Matrix information of x (D + C), output form P1×(D+Ci) (ii) a Wherein, CiObtaining a point characteristic dimension abstracted in a local neighborhood for the ith improved expansion point convolution;
spatial pyramid pooling has i channelsEach channel carries an improved expansion point convolution with an expansion rate d1,d2,…,diThen, the content information P of different scales of the i groups of local neighborhoods is obtained1×(D+Ci);
3) Fusing characteristic information extracted by each improved expansion point convolution network;
input form is P1×(D+Ci) The output form is sigma P1×(D+Ci) (ii) a Wherein, i is the number of the spatial pyramid pooling channels and is the number of the improved expansion point convolution; fusing content information of different scales by a splicing method;
4) reducing the dimension characteristic information;
the input form is output form is sigma P1×(D+Ci) The output form is P1×(D+C′)。
3. The method of claim 2, wherein PointNet is selected as the feature extractor of the local neighborhood point cloud, and the working principle of the PointNet feature extractor is as follows:
given a set of unordered local neighborhood points { pl1,pl1,…,plnH' may be defined to map the set of point clouds into a vector:
where γ and h are typically MLP.
4. The method of claim 1, wherein the decoding process in step 6 is as follows:
1) interpolation and upsampling: based on P2Sampling the abstract point cloud by interpolation algorithm to obtain P1An abstract point cloud with point cloud characteristics of F2. The step inputs P2X (D + C') matrix, output P1×(D+C″);
2) Jump link fusion feature: p in step 3 is converted into a link-hopping mode1An abstract point cloudCharacteristic F1With up-sampled P1An abstract point cloud characteristic F2Fusion addition; the step inputs P1X (D + C') and P1X (D + C'), and output P1×(D+C″+C′);
3) And (3) decoding: decoding the point cloud based on the fusion characteristics by using a Unit PointNet;
the step inputs P1X (D + C "+ C'), output P1×(D+C″′)。
5. The method of claim 4, wherein the decoding process in step 7 is as follows:
1) interpolation and upsampling: input P1X (D + C ') matrix, output P x (D + C');
2) jump link fusion feature: inputting P x (D + C ') and P x (D + C), and outputting P x (D + C' + C);
3) and (3) decoding: p × (D + C '+ C) is input, and P × (D + C') is output.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911048539.3A CN111027559A (en) | 2019-10-31 | 2019-10-31 | Point cloud semantic segmentation method based on expansion point convolution space pyramid pooling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911048539.3A CN111027559A (en) | 2019-10-31 | 2019-10-31 | Point cloud semantic segmentation method based on expansion point convolution space pyramid pooling |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111027559A true CN111027559A (en) | 2020-04-17 |
Family
ID=70200756
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911048539.3A Pending CN111027559A (en) | 2019-10-31 | 2019-10-31 | Point cloud semantic segmentation method based on expansion point convolution space pyramid pooling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111027559A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111860138A (en) * | 2020-06-09 | 2020-10-30 | 中南民族大学 | Three-dimensional point cloud semantic segmentation method and system based on full-fusion network |
CN112149725A (en) * | 2020-09-18 | 2020-12-29 | 南京信息工程大学 | Spectral domain graph convolution 3D point cloud classification method based on Fourier transform |
CN112418235A (en) * | 2020-11-20 | 2021-02-26 | 中南大学 | Point cloud semantic segmentation method based on expansion nearest neighbor feature enhancement |
CN112560965A (en) * | 2020-12-18 | 2021-03-26 | 中国科学院深圳先进技术研究院 | Image semantic segmentation method, storage medium and computer device |
CN112819833A (en) * | 2021-02-05 | 2021-05-18 | 四川大学 | Large scene point cloud semantic segmentation method |
CN112967296A (en) * | 2021-03-10 | 2021-06-15 | 重庆理工大学 | Point cloud dynamic region graph convolution method, classification method and segmentation method |
CN113378112A (en) * | 2021-06-18 | 2021-09-10 | 浙江工业大学 | Point cloud completion method and device based on anisotropic convolution |
CN113392841A (en) * | 2021-06-03 | 2021-09-14 | 电子科技大学 | Three-dimensional point cloud semantic segmentation method based on multi-feature information enhanced coding |
CN113486963A (en) * | 2021-07-12 | 2021-10-08 | 厦门大学 | Density self-adaptive point cloud end-to-end sampling method |
CN114693932A (en) * | 2022-04-06 | 2022-07-01 | 南京航空航天大学 | Large aircraft large component point cloud semantic segmentation method |
CN115496910A (en) * | 2022-11-07 | 2022-12-20 | 中国测绘科学研究院 | Point cloud semantic segmentation method based on full-connected graph coding and double-expansion residual error |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108319957A (en) * | 2018-02-09 | 2018-07-24 | 深圳市唯特视科技有限公司 | A kind of large-scale point cloud semantic segmentation method based on overtrick figure |
CN108345831A (en) * | 2017-12-28 | 2018-07-31 | 新智数字科技有限公司 | The method, apparatus and electronic equipment of Road image segmentation based on point cloud data |
CN109410307A (en) * | 2018-10-16 | 2019-03-01 | 大连理工大学 | A kind of scene point cloud semantic segmentation method |
US20190147302A1 (en) * | 2017-11-10 | 2019-05-16 | Nvidia Corp. | Bilateral convolution layer network for processing point clouds |
US10408939B1 (en) * | 2019-01-31 | 2019-09-10 | StradVision, Inc. | Learning method and learning device for integrating image acquired by camera and point-cloud map acquired by radar or LiDAR corresponding to image at each of convolution stages in neural network and testing method and testing device using the same |
CN110223348A (en) * | 2019-02-25 | 2019-09-10 | 湖南大学 | Robot scene adaptive bit orientation estimation method based on RGB-D camera |
-
2019
- 2019-10-31 CN CN201911048539.3A patent/CN111027559A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190147302A1 (en) * | 2017-11-10 | 2019-05-16 | Nvidia Corp. | Bilateral convolution layer network for processing point clouds |
CN108345831A (en) * | 2017-12-28 | 2018-07-31 | 新智数字科技有限公司 | The method, apparatus and electronic equipment of Road image segmentation based on point cloud data |
CN108319957A (en) * | 2018-02-09 | 2018-07-24 | 深圳市唯特视科技有限公司 | A kind of large-scale point cloud semantic segmentation method based on overtrick figure |
CN109410307A (en) * | 2018-10-16 | 2019-03-01 | 大连理工大学 | A kind of scene point cloud semantic segmentation method |
US10408939B1 (en) * | 2019-01-31 | 2019-09-10 | StradVision, Inc. | Learning method and learning device for integrating image acquired by camera and point-cloud map acquired by radar or LiDAR corresponding to image at each of convolution stages in neural network and testing method and testing device using the same |
CN110223348A (en) * | 2019-02-25 | 2019-09-10 | 湖南大学 | Robot scene adaptive bit orientation estimation method based on RGB-D camera |
Non-Patent Citations (3)
Title |
---|
HONGSHAN YU 等: "Methods and datasets on semantic segmentation: A review", 《NEUROCOMPUTING》 * |
YUAN WANG 等: "PointSeg: Real-Time Semantic Segmentation Based on 3D LiDAR Point Cloud", 《ARXIV》 * |
张祥甫 等: "基于深度学习的语义分割问题研究综述", 《激光与光电子学进展》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111860138A (en) * | 2020-06-09 | 2020-10-30 | 中南民族大学 | Three-dimensional point cloud semantic segmentation method and system based on full-fusion network |
CN111860138B (en) * | 2020-06-09 | 2024-03-01 | 中南民族大学 | Three-dimensional point cloud semantic segmentation method and system based on full fusion network |
CN112149725A (en) * | 2020-09-18 | 2020-12-29 | 南京信息工程大学 | Spectral domain graph convolution 3D point cloud classification method based on Fourier transform |
CN112149725B (en) * | 2020-09-18 | 2023-08-22 | 南京信息工程大学 | Fourier transform-based spectrum domain map convolution 3D point cloud classification method |
CN112418235A (en) * | 2020-11-20 | 2021-02-26 | 中南大学 | Point cloud semantic segmentation method based on expansion nearest neighbor feature enhancement |
CN112560965B (en) * | 2020-12-18 | 2024-04-05 | 中国科学院深圳先进技术研究院 | Image semantic segmentation method, storage medium and computer device |
CN112560965A (en) * | 2020-12-18 | 2021-03-26 | 中国科学院深圳先进技术研究院 | Image semantic segmentation method, storage medium and computer device |
CN112819833A (en) * | 2021-02-05 | 2021-05-18 | 四川大学 | Large scene point cloud semantic segmentation method |
CN112819833B (en) * | 2021-02-05 | 2022-07-12 | 四川大学 | Large scene point cloud semantic segmentation method |
CN112967296A (en) * | 2021-03-10 | 2021-06-15 | 重庆理工大学 | Point cloud dynamic region graph convolution method, classification method and segmentation method |
CN112967296B (en) * | 2021-03-10 | 2022-11-15 | 重庆理工大学 | Point cloud dynamic region graph convolution method, classification method and segmentation method |
CN113392841A (en) * | 2021-06-03 | 2021-09-14 | 电子科技大学 | Three-dimensional point cloud semantic segmentation method based on multi-feature information enhanced coding |
CN113378112A (en) * | 2021-06-18 | 2021-09-10 | 浙江工业大学 | Point cloud completion method and device based on anisotropic convolution |
CN113486963A (en) * | 2021-07-12 | 2021-10-08 | 厦门大学 | Density self-adaptive point cloud end-to-end sampling method |
CN113486963B (en) * | 2021-07-12 | 2023-07-07 | 厦门大学 | Point cloud end-to-end sampling method with self-adaptive density |
CN114693932A (en) * | 2022-04-06 | 2022-07-01 | 南京航空航天大学 | Large aircraft large component point cloud semantic segmentation method |
CN115496910B (en) * | 2022-11-07 | 2023-04-07 | 中国测绘科学研究院 | Point cloud semantic segmentation method based on full-connected graph coding and double-expansion residual error |
CN115496910A (en) * | 2022-11-07 | 2022-12-20 | 中国测绘科学研究院 | Point cloud semantic segmentation method based on full-connected graph coding and double-expansion residual error |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111027559A (en) | Point cloud semantic segmentation method based on expansion point convolution space pyramid pooling | |
Qiu et al. | Semantic segmentation for real point cloud scenes via bilateral augmentation and adaptive fusion | |
Li et al. | Deep learning for remote sensing image classification: A survey | |
CN104036012B (en) | Dictionary learning, vision bag of words feature extracting method and searching system | |
Lin et al. | Local and global encoder network for semantic segmentation of Airborne laser scanning point clouds | |
CN113240683B (en) | Attention mechanism-based lightweight semantic segmentation model construction method | |
CN114792372A (en) | Three-dimensional point cloud semantic segmentation method and system based on multi-head two-stage attention | |
CN113870286B (en) | Foreground segmentation method based on multi-level feature and mask fusion | |
CN111652273A (en) | Deep learning-based RGB-D image classification method | |
CN114299285A (en) | Three-dimensional point cloud semi-automatic labeling method and system, electronic equipment and storage medium | |
CN115272696A (en) | Point cloud semantic segmentation method based on self-adaptive convolution and local geometric information | |
CN116129118A (en) | Urban scene laser LiDAR point cloud semantic segmentation method based on graph convolution | |
CN116028663A (en) | Three-dimensional data engine platform | |
Zheng et al. | Person re-identification in the 3D space | |
Hazer et al. | Deep learning based point cloud processing techniques | |
Han et al. | A Large-Scale Network Construction and Lightweighting Method for Point Cloud Semantic Segmentation | |
Bashmal et al. | Language Integration in Remote Sensing: Tasks, datasets, and future directions | |
CN117765258A (en) | Large-scale point cloud semantic segmentation method based on density self-adaption and attention mechanism | |
CN111597367B (en) | Three-dimensional model retrieval method based on view and hash algorithm | |
CN116503746A (en) | Infrared small target detection method based on multilayer nested non-full-mapping U-shaped network | |
Wang et al. | Hierarchical Kernel Interaction Network for Remote Sensing Object Counting | |
Tan et al. | 3D detection transformer: Set prediction of objects using point clouds | |
CN115497085A (en) | Point cloud completion method and system based on multi-resolution dual-feature folding | |
Zhu et al. | Gradient-based graph attention for scene text image super-resolution | |
Huang et al. | Remote sensing data detection based on multiscale fusion and attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200417 |
|
WD01 | Invention patent application deemed withdrawn after publication |