CN112085066B - Voxelized three-dimensional point cloud scene classification method based on graph convolution neural network - Google Patents
Voxelized three-dimensional point cloud scene classification method based on graph convolution neural network Download PDFInfo
- Publication number
- CN112085066B CN112085066B CN202010812456.3A CN202010812456A CN112085066B CN 112085066 B CN112085066 B CN 112085066B CN 202010812456 A CN202010812456 A CN 202010812456A CN 112085066 B CN112085066 B CN 112085066B
- Authority
- CN
- China
- Prior art keywords
- point
- voxel
- point cloud
- dimensional
- numbered
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a voxelized three-dimensional point cloud scene classification method based on a graph convolution neural network, which specifically comprises the following steps of: firstly, carrying out voxelization processing adaptive to rotation translation transformation on scene point cloud data obtained by a visual sensor; then, weighting the information of the points near each point to the point by a method based on graph neural network spectrum convolution for the point cloud in the voxel to obtain a characteristic vector of each point; numbering each point in the voxel one by one according to the spatial distance, performing maximum pooling on the feature vector of each point according to the number, and splicing the pooled results end to obtain the feature vector of each voxel; and finally, inputting the feature vector of the voxel into a full-connection network to obtain a scene class label. The method relieves the problem of high calculation complexity of a spectrum convolution method to a certain extent, and has certain robustness for point cloud rotation and translation transformation.
Description
Technical Field
The invention belongs to a method for identifying a dimensional indoor scene, and particularly relates to a voxelized three-dimensional point cloud scene classification method based on a graph convolution neural network.
Background
With the rapid development of computer hardware and theory, the acquisition and processing of three-dimensional data become easier and easier. The identification of three-dimensional point cloud scenes is a research hotspot in the field of robots and computer vision at present.
In the identification of three-dimensional point cloud scenes, methods based on manual descriptors such as a histogram oriented feature descriptor (SHOT) exist, but the manual descriptors have the problem of small application range; there is a method of extracting features of a voxelized point cloud by using 3D CNN, but there is a problem of high computational complexity; the method for converting the three-dimensional point cloud into the two-dimensional image recognition by using multi-angle projection is beneficial, but the problem of losing too much three-dimensional geometric information in the projection process exists; with the development of the neural network and the excellent effect of the neural network on visual information identification, the three-dimensional scene point cloud identification by using the neural network becomes a research hotspot on processing three-dimensional point cloud data.
At present, the research methods of the neural network are mainly divided into two categories. The method is a space domain method, namely, point cloud space position information obtained by a sensor is directly used as input, and original point cloud data are not transformed. The method has more research ideas, such as PointNet, the T-net is used for relieving the interference caused by point cloud rotation translation transformation, and a small multilayer perceptron is used as a convolution kernel to extract characteristic information; as another example, PointAtrousNet, the neighborhood information is weighted 4 times to the center point with a multi-tier perceptron. In the method in the airspace, the multilayer perceptron is adopted to extract the characteristic information, but the multilayer perceptron has poor interpretability, and the structural design of the multilayer perceptron depends on a large amount of debugging. The other is a spectral domain method, which transforms point cloud space information obtained by a sensor to a fourier space and then designs a convolution kernel, such as a local spectrum convolution kernel proposed by Michael et al. The convolution mode has definite meaning, but the calculation complexity of calculating the Laplace matrix on the whole point cloud is high.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to provide a voxelized three-dimensional point cloud scene classification method based on a graph convolution neural network, which can reduce the calculation complexity of a spectrum convolution method and has certain robustness on rotation translation transformation.
The technical scheme is as follows: the invention discloses a voxelized three-dimensional point cloud scene classification method based on a graph convolution neural network, which comprises the following steps of:
(1) transforming the three-dimensional space coordinate of the point cloud obtained by the visual sensor by using a T-net network in PointNet, and voxelizing the point cloud transformed by the T-net network;
(2) weighting the information of the adjacent point of each point in each voxel to the point to obtain the characteristic vector of each point;
(3) numbering the points in the voxels one by one according to the spatial distance of each point in the voxels; then performing maximum pooling on the feature vectors of the numbered adjacent points, and splicing the pooled results head and tail to obtain the feature vector of each voxel;
(4) and (4) inputting the characteristic vector of each voxel in the step (3) into a full-connection network, and outputting the characteristic vector as a category label of the scene point cloud.
In the step (2), the spatial position information of a plurality of points adjacent to each point in each voxel is weighted to the point for a plurality of times in a mode of fusing PointAtrousNet and a local spectrum convolution kernel.
In the step (3), the voxel interior points are numbered one by one according to the natural number sequence, and the method specifically comprises the following steps:
(3.1) defining the radius rho of the node neighborhood;
(3.2) randomly selecting one unnumbered node to carry out sequential numbering;
(3.3) selecting the node closest to the current maximum number node in the neighborhood range and coding the node as a next number;
(3.4) determining whether all the nodes in the neighborhood of the current maximum number node are numbered;
(3.5) if all numbers are not available, repeating the step (3.3); if all the nodes are numbered, further confirming that all the nodes in the voxels are numbered;
(3.6) if the nodes which are not numbered exist in the step (3.5), repeating the step (3.2); if all nodes in the voxels in step (3.5) are numbered, the numbering is completed.
In the step (3), the step of performing maximum pooling on the feature vectors of the points with adjacent numbers specifically includes that the points with the adjacent numbers are sequentially taken for maximum pooling, the norm of the feature vectors of each point in the pooling size is calculated, and the point with the maximum norm is taken as a pooling result.
In the step (4), the voxel characteristic vector is used as the input of the full-connection network.
Has the advantages that: compared with the prior art, the invention has the beneficial effects that: (1) the method is suitable for being applied to three-dimensional point cloud data, and has better robustness on the interference of the image to scale, visual angle and light compared with a two-dimensional scene image; (2) the method also has certain robustness on the rotation and translation transformation of the point cloud; (3) the spectral convolution processing is used for the point cloud in the voxel, and compared with the spectral convolution processing of the whole point cloud, the calculation complexity is reduced.
Drawings
FIG. 1 is a flow chart of the modeling of the present invention;
FIG. 2 is a flow chart of the numbering of points within voxels in accordance with the present invention.
Detailed Description
The invention is described in further detail below with reference to specific embodiments and the attached drawings.
As shown in fig. 1, in the method for classifying a voxelized three-dimensional point cloud scene based on a graph convolution neural network, when the method is implemented specifically, a recognition model needs to be established first, then a large number of three-dimensional scene point cloud training models are used, and training is completed after iteration is performed to a preset number of times. When a new point cloud scene to be classified is classified, three-dimensional point cloud data of the scene is input into a trained model, the output is a class label vector of the scene, and the sequence number of the maximum value in the vector is the corresponding scene class. This embodiment is implemented with the help of a pcl library, and a pytorch library. The method comprises the following specific steps:
(1) and transforming the three-dimensional space coordinates of the point cloud obtained by the visual sensor by using a T-net network in PointNet, and voxelizing the point cloud transformed by the T-net network. The T-net network is a fully-connected network sharing parameters, the input of the T-net network is three-dimensional space coordinates of each point of the whole point cloud, the output of the T-net network is three-dimensional vectors of corresponding points, and the output three-dimensional vectors show the spatial relationship of each point before rotation and translation transformation to a certain extent. Through T-net network processing, the robustness of the whole algorithm for point cloud rotation and translation transformation is improved. In the embodiment, the size of the T-net is changed slightly, a full-connection network T-net is built by using a pytorch, and the number of neurons in hidden layers of the network is in a symmetrical structure of 1024-256-64-256-1024. The voxelization of the point cloud after T-net network transformation specifically comprises the following steps: according to the three-dimensional space coordinates of each point of the point cloud, performing voxelization on the initial point cloud by using a pcl point cloud library to obtain K voxels, and replacing information of each point in each voxel by a three-dimensional vector of each point after T-net transformation from the initial three-dimensional space coordinates. Finally, the voxel number and the three-dimensional vector of each point within the corresponding voxel are obtained, as in a data structure "dictionary":
{ "voxel 1": a point p three-dimensional vector, -; … "voxel k": a point q three-dimensional vector.
(2) And (3) weighting the spatial position information of a plurality of points adjacent to each point in each voxel for multiple times in a mode of credit fusion of PointAtrousNet and a local spectrum convolution kernel to the point to obtain the feature vector of each point. The method specifically comprises the following steps: and (3) according to the three-dimensional vectors of the points in the point cloud after T-net transformation, calculating the distance between the points in the voxel, traversing each point to find out 20 points closest to the point, connecting each point with the 20 nearest neighbors to build a graph, and calculating the Laplace matrix. For each point in the voxel, the information of 20 adjacent points is sampled by 4 times and sent into 4 independent spectrum convolution kernels, and the sampling rate is 1,2,3 and 4. That is, each convolution kernel takes information of a target point and 5 neighboring points as input, the input vector dimension is 3, the weighted information of the target point is taken as output, and the output vector dimension is also 3. Each convolution kernel is built by a pytorech, 4 independent spectrum convolution kernels are adopted in the embodiment, each convolution kernel comprises N depths, each depth comprises 6 parameters to be trained, and the total number of the parameters is 24 xN. Parameters in each depth are trained independently and share parameters. The above is a layer of convolution operation, and the same convolution operation is performed 4 times. And connecting the convolution layers with the same structure end to end, and taking the output of the previous layer of convolution kernel with the same sampling rate as the input of the next layer of convolution kernel. After the last layer of convolution layer, the convolution results of all depths of all convolution kernels are spliced end to obtain 12 xN-dimension characteristic vectors of all points. The feature vectors of each point are recorded in the pytorch in the following data structure:
{ "voxel 1": point 1 feature vector,. point p feature vector; … "voxel k": point 1 eigenvector,. point q eigenvector ] }.
(3) Numbering the points in the voxels one by one according to a natural number sequence according to the spatial distance of each point in the voxels, so that the two points with the closest spatial distance are numbered adjacently; then maximum pooling is carried out on the feature vectors of the points with adjacent numbers, namely, the points in each voxel are sequentially taken several adjacent numbered points according to the number sequence for maximum pooling, the norm of the feature vectors of each point in the pooling size is calculated, and the expression isAnd taking the point with the maximum norm as the pooling result. The end-to-end stitching pooling results are then used as the feature vectors of the voxels, whichThe dimension is 12xNxm, and m is the number of pooled retention points per voxel. Because the number of points in each voxel is different, the pooling size in each voxel is flexibly adjusted, and the consistency of the feature vector size of each voxel is ensured.
(4) A fully-connected network is built by using a pytorech, a ReLU is selected as an activation function, Adam is used as an optimizer, and a MultiLabelSoftMarginLoss of the pytorech is used as a loss function. Inputting the K12 xNxm dimensional voxel characteristic vectors obtained in the step (3) into a full connection layer, and outputting the vectors as the labels of the scene categories. The class labels of the training data and the testing data are all in a one-hot encoding mode, and the last layer of the fully-connected network comprises a softmax layer.
In the step (2), the calculation formula of the local spectrum convolution kernel is as follows:
wherein f is the convolution result; f. of j The information of 1 target point and 5 adjacent points after T-net transformation in the step (1); u is a characteristic vector matrix after decomposition of a characteristic value of a Laplace matrix of a point cloud in a voxel; u shape T Is the transposition of U; lambda is an eigenvalue diagonal matrix after characteristic decomposition of the Laplace matrix, alpha j Are the convolution kernel parameters to be trained.
As shown in fig. 2, in step (3), numbering the points in the voxel one by one, specifically includes the following steps:
(3.1) defining the radius rho of the node neighborhood;
(3.2) randomly selecting one unnumbered node to carry out sequential numbering;
(3.3) selecting the node closest to the current maximum number node in the neighborhood range and coding the node as a next number;
(3.4) determining whether all the nodes in the neighborhood of the current maximum number node are numbered;
(3.5) if not all numbers, repeating the step (3.3); if all the nodes are numbered, further confirming that all the nodes in the voxels are numbered;
(3.6) if the nodes which are not numbered exist in the step (3.5), repeating the step (3.2); if all nodes in the voxels in step (3.5) are numbered, the numbering is completed.
Claims (5)
1. The method for classifying the voxelized three-dimensional point cloud scene based on the graph convolution neural network is characterized by comprising the following steps of:
(1) transforming the point cloud three-dimensional space coordinate obtained by the visual sensor by using a T-net network in PointNet, and finely adjusting the size of the T-net network, wherein the number of hidden layer neurons of the network is a symmetrical structure; performing voxelization on the point cloud after T-net network transformation, and performing voxelization on the initial point cloud according to the three-dimensional space coordinates of each point of the point cloud to obtain a plurality of voxels; then replacing the information of each point in the voxel by the three-dimensional vector of each point after T-net transformation from the initial three-dimensional space coordinate; finally obtaining the voxel number and the three-dimensional vector of each point in the corresponding voxel;
(2) weighting the information of each point adjacent point in each voxel to the point by fusing PointAtrousNet and a local spectrum convolution kernel to obtain the characteristic vector of each point;
(3) numbering the points in the voxels one by one according to the space distance of each point in the voxels; then performing maximum pooling on the feature vectors of the numbered adjacent points, and splicing the pooled results head and tail to obtain the feature vector of each voxel;
(4) and (4) inputting the characteristic vector of each voxel in the step (3) into a full-connection network, and outputting the characteristic vector as a category label of the scene point cloud.
2. The method of claim 1, wherein the method comprises the steps of: in the step (2), the spatial position information of a plurality of points adjacent to each point in each voxel is weighted to the point for a plurality of times in a mode of fusing PointAtrousNet and a local spectrum convolution kernel.
3. The method for classifying the voxelized three-dimensional point cloud scene based on the graph convolution neural network according to claim 1, wherein in the step (3), the points in the voxels are numbered one by one according to a natural number sequence, and the method specifically comprises the following steps:
(3.1) defining the radius rho of the node neighborhood;
(3.2) randomly selecting one unnumbered node to carry out sequential numbering;
(3.3) selecting the node closest to the current maximum number node in the neighborhood range and coding the node as a next number;
(3.4) determining whether all nodes in the neighborhood of the current maximum number node are numbered;
(3.5) if not all numbers, repeating the step (3.3); if all the nodes are numbered, further confirming that all the nodes in the voxels are numbered;
(3.6) if the nodes which are not numbered exist in the step (3.5), repeating the step (3.2); if all nodes in the voxels in step (3.5) are numbered, the numbering is completed.
4. The method of claim 1, wherein the method comprises the steps of: in the step (3), the step of performing maximum pooling on the feature vectors of the points with adjacent numbers specifically includes that the points with the adjacent numbers are sequentially taken for maximum pooling, the norm of the feature vectors of each point in the pooling size is calculated, and the point with the maximum norm is taken as a pooling result.
5. The method of any one of claims 1 to 4 for classifying a scene from a voxelized three-dimensional point cloud based on a graph convolution neural network, wherein: in the step (4), the voxel characteristic vector is used as the input of the full-connection network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010812456.3A CN112085066B (en) | 2020-08-13 | 2020-08-13 | Voxelized three-dimensional point cloud scene classification method based on graph convolution neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010812456.3A CN112085066B (en) | 2020-08-13 | 2020-08-13 | Voxelized three-dimensional point cloud scene classification method based on graph convolution neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112085066A CN112085066A (en) | 2020-12-15 |
CN112085066B true CN112085066B (en) | 2022-08-26 |
Family
ID=73728203
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010812456.3A Active CN112085066B (en) | 2020-08-13 | 2020-08-13 | Voxelized three-dimensional point cloud scene classification method based on graph convolution neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112085066B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB202207459D0 (en) * | 2022-05-20 | 2022-07-06 | Cobra Simulation Ltd | Content generation from sparse point datasets |
CN117409209B (en) * | 2023-12-15 | 2024-04-16 | 深圳大学 | Multi-task perception three-dimensional scene graph element segmentation and relationship reasoning method |
CN117773918B (en) * | 2023-12-19 | 2024-08-20 | 中信重工开诚智能装备有限公司 | Intelligent guniting robot guniting path planning method based on point cloud processing |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109118564A (en) * | 2018-08-01 | 2019-01-01 | 湖南拓视觉信息技术有限公司 | A kind of three-dimensional point cloud labeling method and device based on fusion voxel |
CN109410307A (en) * | 2018-10-16 | 2019-03-01 | 大连理工大学 | A kind of scene point cloud semantic segmentation method |
CN110135227A (en) * | 2018-02-09 | 2019-08-16 | 电子科技大学 | A kind of laser point cloud outdoor scene automatic division method based on machine learning |
CN110633640A (en) * | 2019-08-13 | 2019-12-31 | 杭州电子科技大学 | Method for identifying complex scene by optimizing PointNet |
-
2020
- 2020-08-13 CN CN202010812456.3A patent/CN112085066B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110135227A (en) * | 2018-02-09 | 2019-08-16 | 电子科技大学 | A kind of laser point cloud outdoor scene automatic division method based on machine learning |
CN109118564A (en) * | 2018-08-01 | 2019-01-01 | 湖南拓视觉信息技术有限公司 | A kind of three-dimensional point cloud labeling method and device based on fusion voxel |
CN109410307A (en) * | 2018-10-16 | 2019-03-01 | 大连理工大学 | A kind of scene point cloud semantic segmentation method |
CN110633640A (en) * | 2019-08-13 | 2019-12-31 | 杭州电子科技大学 | Method for identifying complex scene by optimizing PointNet |
Non-Patent Citations (1)
Title |
---|
基于多尺度特征和PointNet的LiDAR点云地物分类方法;赵中阳等;《激光与光电子学进展》;20181007(第05期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112085066A (en) | 2020-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112085066B (en) | Voxelized three-dimensional point cloud scene classification method based on graph convolution neural network | |
Atapour-Abarghouei et al. | Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer | |
Li et al. | So-net: Self-organizing network for point cloud analysis | |
Zanfir et al. | Deep learning of graph matching | |
CN110348399B (en) | Hyperspectral intelligent classification method based on prototype learning mechanism and multidimensional residual error network | |
CN109063724B (en) | Enhanced generation type countermeasure network and target sample identification method | |
CN108021947B (en) | A kind of layering extreme learning machine target identification method of view-based access control model | |
CN112488205B (en) | Neural network image classification and identification method based on optimized KPCA algorithm | |
CN107578007A (en) | A kind of deep learning face identification method based on multi-feature fusion | |
CN111625667A (en) | Three-dimensional model cross-domain retrieval method and system based on complex background image | |
CN104408760B (en) | A kind of high-precision virtual assembly system algorithm based on binocular vision | |
CN107944459A (en) | A kind of RGB D object identification methods | |
CN108595558B (en) | Image annotation method based on data equalization strategy and multi-feature fusion | |
CN111028238B (en) | Robot vision-based three-dimensional segmentation method and system for complex special-shaped curved surface | |
CN106844620B (en) | View-based feature matching three-dimensional model retrieval method | |
CN109840518B (en) | Visual tracking method combining classification and domain adaptation | |
CN112784782A (en) | Three-dimensional object identification method based on multi-view double-attention network | |
CN111652273A (en) | Deep learning-based RGB-D image classification method | |
Ahmad et al. | 3D capsule networks for object classification from 3D model data | |
CN111368733B (en) | Three-dimensional hand posture estimation method based on label distribution learning, storage medium and terminal | |
CN114973418A (en) | Behavior identification method of cross-modal three-dimensional point cloud sequence space-time characteristic network | |
CN115578574A (en) | Three-dimensional point cloud completion method based on deep learning and topology perception | |
Wang et al. | Ovpt: Optimal viewset pooling transformer for 3d object recognition | |
CN117671666A (en) | Target identification method based on self-adaptive graph convolution neural network | |
CN114170154A (en) | Remote sensing VHR image change detection method based on Transformer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Zhu Bo Inventor after: Fan Ximing Inventor after: Gao Xiang Inventor before: Gao Xiang Inventor before: Fan Ximing Inventor before: Zhu Bo |
|
CB03 | Change of inventor or designer information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |