CN112541535B

CN112541535B - Three-dimensional point cloud classification method based on complementary multi-branch deep learning

Info

Publication number: CN112541535B
Application number: CN202011426512.6A
Authority: CN
Inventors: 张锲石; 程俊; 唐梓峰; 吴福祥
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2020-12-09
Filing date: 2020-12-09
Publication date: 2024-01-05
Anticipated expiration: 2040-12-09
Also published as: CN112541535A

Abstract

The application discloses a three-dimensional point cloud classification method based on complementary multi-branch deep learning, which comprises the following steps: acquiring a three-dimensional point cloud data set to be marked; performing feature classification on the three-dimensional point cloud data set, wherein the feature classification comprises local features, global features and point-by-point features; the local feature, the global feature, and the point-by-point feature are concatenated together. According to the scheme, the robustness of the local features, the global features and the point-by-point features is utilized, so that the model can obtain good performance on a real-world data set.

Description

Three-dimensional point cloud classification method based on complementary multi-branch deep learning

Technical Field

The invention relates to the technical field of point cloud classification, in particular to a method, a device, equipment and a storage medium for three-dimensional point cloud classification based on complementary multi-branch deep learning.

Background

Autopilot requires accurate and efficient 3D point cloud processing techniques, where deep learning has shown great potential. However, most existing works, while achieving high precision in the composite data, are unsatisfactory in real world data sets, and while existing multi-view based, volume based and point based approaches have achieved favorable results, they still suffer from high computational complexity or low memory utilization efficiency. For example, computation time and memory consumption of multi-view-based and volume-based methods increases at most cubic with increasing input resolution, while point-based methods waste run-time due to their irregular memory accesses.

Disclosure of Invention

In view of the foregoing drawbacks or shortcomings in the prior art, it is desirable to provide a method, apparatus, device, and storage medium thereof for three-dimensional point cloud classification based on complementary multi-branch deep learning.

In a first aspect, an embodiment of the present application provides a method for classifying a three-dimensional point cloud based on complementary multi-branch deep learning, where the method includes:

acquiring a three-dimensional point cloud data set to be marked;

performing feature classification on the three-dimensional point cloud data set, wherein the feature classification comprises local features, global features and point-by-point features;

the local feature, the global feature, and the point-by-point feature are concatenated together.

In one embodiment, the local features include:

selecting a first preset number of points from the three-dimensional Point cloud data set based on a Kernel-Point convolution method;

for each point, selecting adjacent points thereof within a specific radius as input by utilizing point cloud feature extraction;

each point is convolved with a predefined kernel point having a different weight.

In one embodiment, the global features include:

in the combined spherical convolution, the raw data is augmented with the angle of each intersection between the grid normal and the ray;

constructing angle characteristics;

by interpolation, each grid is made to contain a value indicating the distance between the sphere and the point;

and calculating the shape of the point cloud according to the distance information.

In one embodiment, the point-wise feature comprises:

extracting punctiform features by using an MLP layer;

the layer is maximally pooled to aggregate the punctiform features.

In a second aspect, an embodiment of the present application further provides a three-dimensional point cloud classification device based on complementary multi-branch deep learning, where the device includes:

the acquisition unit is used for acquiring a three-dimensional point cloud data set to be marked;

the classification unit is used for classifying the characteristics of the three-dimensional point cloud data set, wherein the characteristics comprise local characteristics, global characteristics and point-by-point characteristics;

and a series unit for connecting the local feature, the global feature and the point-by-point feature together in series.

In one embodiment, the local features include:

the selecting unit is used for selecting a first preset number of points from the three-dimensional Point cloud data set based on a kernel-Point convolution method;

an extraction unit for selecting, for each point, its neighboring points within a specific radius as input using point cloud feature extraction;

a convolution unit for convolving each point with a predefined kernel point having a different weight.

In one embodiment, the global features include:

constructing angle characteristics;

In one embodiment, the point-wise feature comprises:

extracting punctiform features by using an MLP layer;

the layer is maximally pooled to aggregate the punctiform features.

In a third aspect, embodiments of the present application further provide a computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing a method as described in any of the embodiments of the present application when the program is executed.

In a fourth aspect, embodiments of the present application further provide a computer apparatus, a computer readable storage medium having stored thereon a computer program for: the computer program, when executed by a processor, implements a method as described in any of the embodiments of the present application.

The invention has the beneficial effects that:

according to the three-dimensional point cloud classification method based on complementary multi-branch deep learning, the robustness of local features, global features and point-by-point features is utilized, so that the model can obtain good performance on a data set in the real world.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings, in which:

fig. 1 shows a flow diagram of a three-dimensional point cloud classification method based on complementary multi-branch deep learning according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a three-dimensional point cloud classification method based on complementary multi-branch deep learning according to another embodiment of the present application;

FIG. 3 illustrates an exemplary block diagram of a three-dimensional point cloud classification device 300 based on complementary multi-branch deep learning according to one embodiment of the present application;

FIG. 4 illustrates an exemplary block diagram of a three-dimensional point cloud classification device 400 based on complementary multi-branch deep learning according to yet another embodiment of the present application;

fig. 5 shows a schematic diagram of a computer system suitable for implementing the terminal device of the embodiments of the present application.

Detailed Description

In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the invention, whereby the invention is not limited to the specific embodiments disclosed below.

In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", "axial", "radial", "circumferential", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element being referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present invention.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.

In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly, through intermediaries, or both, may be in communication with each other or in interaction with each other, unless expressly defined otherwise. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.

In the present invention, unless expressly stated or limited otherwise, a first feature "up" or "down" a second feature may be the first and second features in direct contact, or the first and second features in indirect contact via an intervening medium. Moreover, a first feature being "above," "over" and "on" a second feature may be a first feature being directly above or obliquely above the second feature, or simply indicating that the first feature is level higher than the second feature. The first feature being "under", "below" and "beneath" the second feature may be the first feature being directly under or obliquely below the second feature, or simply indicating that the first feature is less level than the second feature.

It will be understood that when an element is referred to as being "fixed" or "disposed" on another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "upper," "lower," "left," "right," and the like are used herein for illustrative purposes only and are not meant to be the only embodiment.

Referring to fig. 1, fig. 1 is a schematic flow chart of a three-dimensional point cloud classification method based on complementary multi-branch deep learning according to an embodiment of the present application.

As shown in fig. 1, the method includes:

step 110, acquiring a three-dimensional point cloud data set to be marked;

step 120, classifying the three-dimensional point cloud data set into feature classification, wherein the feature classification comprises local features, global features and point-by-point features;

step 130, concatenating the local feature, the global feature, and the point-by-point feature.

By adopting the technical scheme, the robustness of the local features, the global features and the point-by-point features is utilized, so that the model can obtain good performance on a real-world data set.

In some embodiments, please refer to fig. 2, fig. 2 is a flow chart illustrating a three-dimensional point cloud classification method based on complementary multi-branch deep learning according to another embodiment of the present application.

As shown in fig. 2, the method includes:

step 210, selecting a first preset number of points from the three-dimensional Point cloud data set based on a kernel-Point convolution method;

step 220, for each point, selecting its neighboring points within a specific radius as input using point cloud feature extraction;

step 230 convolves each point with a predefined kernel point having a different weight.

In some embodiments, the global features include: in the combined spherical convolution, the raw data is augmented with the angle of each intersection between the grid normal and the ray; constructing angle characteristics; by interpolation, each grid is made to contain a value indicating the distance between the sphere and the point; and calculating the shape of the point cloud according to the distance information.

In some embodiments, the point-wise feature comprises: extracting punctiform features by using an MLP layer; the layer is maximally pooled to aggregate the punctiform features.

In particular, we sought to build a fusion architecture named multi-branch CNN (MBCNN) to combine various functions at different angles. From the point of view we extract the point features of each point using MLP. In order to ensure that the model can detect local relationships between points, point-based convolution was introduced in our MBCNN as a local feature branch. Finally, global features are aggregated as auxiliary signals by adopting spherical convolution to guide classification in 3D deep learning

(a) Local features

Since the kernel-Point convolution is excellent in learning local signals. In feature extraction, we use kernel-Point convolution as a local feature branch. However, KPConv still faces high computational costs due to unavoidable irregular memory accesses. To solve the problem of irregular memory access, a method similar to KD-tree is proposed, wherein the computational complexity is O (Nlog (N)), and the method is faster than the conventional brute force search. Nevertheless, the efficiency of the GPU may be reduced due to irregular access to the search neighbors. Therefore, we propose a novel idea to index neighbors in a point cloud to speed up the process of finding neighbors. Inspired by voxel-based convolution, we first construct a map between each point and its relative voxel grid. We can then avoid searching all points again when searching for a neighborhood point by simply searching for a neighborhood voxel grid. In general, to find neighbors for each point, the computational complexity is significantly reduced from O (Nlog (N)) to $O (N). In an implementation, a three-dimensional array is constructed, each element containing a linked list or array. When each element contains a linked list, too much memory is not wasted due to the brightness of the pointer. Nevertheless, we simply reduce the computational complexity of the algorithm, but the problem of irregularities still exists. On the other hand, when each element is an array, we can avoid frequent irregular memory accesses, but face larger memory footprints. Based on the equation. We can easily convert a point into an associated grid based on its coordinates, where r represents the resolution of the voxel, and determine if the point falls within the voxel grid. Whether it is an insertion for each point or a search for neighbors, it is computationally complex.

We utilize Kernal point convolution in the local functional branches. For each point, KPConv selects its neighbors within a certain radius as input and then convolves them with some predefined kernel points with different weights. Furthermore, a deformable kernel is introduced to enable the model to automatically adapt to the geometry of the input data.

(b) Global feature branching

The main challenge of combining spherical convolutions is that the original spherical CNN framework is not based on a point cloud. Thus, similar to the data sampling method in spherical CNN, we have designed a new spherical convolution framework to directly extract the point cloud. Finally, the features extracted by the spherical CNN are combined into global features.

Projection onto sphere: similar to the sampling method in spherical CNN, we have designed a backward rendering model (graph\ref { fig. project }) that sends light of the input point back to the sphere. At the beginning we start a closed sphere with a mesh. Then for each point, a ray is sent from the centroid to that point and eventually intersects the sphere at that point.

Interpolation:

based on the intersection points, we can use interpolation with four nearest grid-shared values on the sphere (distances between sphere and points) according to their relative weights (graph\ref { graph. Interpolation }). The weight of each grid is determined by the Euclidean distance between the intersection point and the grid equation. (consider that when the grids are dense, the geodesic distance is approximately equal to the Euclidean distance, weight represents the Weight of each grid, grid represents the location of each grid, and p represents the location of the intersection).

Linear regression: as previously described, in spherical CNN, the raw data is augmented with the angle of each intersection between the grid normal and the ray. However, there is no "grid" and its normal concept in point clouds. In this case we need to construct the angle feature manually. In the last step, each grid contains a value indicating the distance between the sphere and the point by interpolation. From the distance information, the shape of the point cloud can be estimated: based on one grid and the other 8 grids around, the relative position within the sphere can be calculated and then a linear regression can be performed using these 9 points. Finally, the regression outputs a plane almost equal to the grid and outputs an angle signal from the normal to the plane. It is worth mentioning that during the execution of the linear regression we did not use the sklearn library directly, since the speed was too slow. Instead, we obtain coefficients (when full order matrix) by simply computing the inverse Eq to implement the C-expansion independently, which is 20,000 times faster than using the sklearn library directly.

After conversion, three-way features have been assigned to the mesh on the sphere. Thus, due to the isomorphic rotation space on the sphere, spherical convolution is applied to the features, outputting features located on SO (3). It has been demonstrated that spherical CNNs exhibit excellent performance in perturbing the classification of 3D objects due to their equal variances.

(c) Point-by-point feature branching

Point function branching:

several shared MLP layers are used to extract punctiform features independently, followed by a max pooling layer to aggregate the most important features, which are insensitive to real-world noise, as reliable supplementary information for global and local features.

(d) Feature fusion:

based on three different branches of the elements, we put them together in series, as they provide complementary signals to each other. We do not use a simple vote classifier, but rather use a multi-layer MLP to fuse this function, since the weights of each learner are actually predetermined. Thus, by introducing an MLP, the weights of the individual learners can be adaptively defined by gradient descent. Through the age of training, the connection weights of neurons will be updated and the most useful information selected.

Further, referring to fig. 3, fig. 3 illustrates an exemplary block diagram of a three-dimensional point cloud classification apparatus 300 based on complementary multi-branch deep learning according to one embodiment of the present application.

As shown in fig. 3, the apparatus includes:

an acquiring unit 310, configured to acquire a three-dimensional point cloud data set to be marked;

a classification unit 320, configured to perform feature classification on the three-dimensional point cloud data set, where the feature classification includes a local feature, a global feature, and a point-by-point feature;

a series unit 330 for connecting the local feature, the global feature and the point-by-point feature in series.

Referring to fig. 4, fig. 4 is a block diagram illustrating an exemplary structure of a three-dimensional point cloud classifying apparatus 400 according to another embodiment of the present application based on complementary multi-branch deep learning.

As shown in fig. 4, the apparatus includes:

a selecting unit 410, configured to select a first preset number of points from the three-dimensional Point cloud data set based on a kernel-Point convolution method;

an extraction unit 420 for selecting, for each point, its neighboring points within a specific radius as input using point cloud feature extraction;

a convolution unit 430 for convolving each point with a predefined kernel point having a different weight.

It should be understood that the units or modules described in the apparatuses 300-400 correspond to the various steps in the methods described with reference to fig. 1-2. Thus, the operations and features described above with respect to the methods are equally applicable to the apparatuses 300-400 and the units contained therein, and are not described in detail herein. The apparatus 300-400 may be implemented in advance in a browser or other security application of the electronic device, or may be loaded into a browser or security application of the electronic device by means of downloading, etc. The respective units in the apparatus 300-400 may cooperate with units in an electronic device to implement aspects of embodiments of the present application.

Referring now to FIG. 5, there is illustrated a schematic diagram of a computer system 500 suitable for use in implementing a terminal device or server of an embodiment of the present application.

As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU) 501, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

The following components are connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, and the like; an output portion 507 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The drive 510 is also connected to the I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as needed so that a computer program read therefrom is mounted into the storage section 508 as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to fig. 1-2 may be implemented as computer software programs. For example, embodiments of the present disclosure include a method of three-dimensional point cloud classification based on complementary multi-branch deep learning, including a computer program tangibly embodied on a machine-readable medium, the computer program including program code for performing the method of fig. 1-2. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 509, and/or installed from the removable media 511.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present application may be implemented by software, or may be implemented by hardware. The described units or modules may also be provided in a processor, for example, as: a processor includes a first sub-region generation unit, a second sub-region generation unit, and a display region generation unit. The names of these units or modules do not constitute a limitation of the unit or module itself in some cases, and for example, the display area generating unit may also be described as "a unit for generating a display area of text from the first sub-area and the second sub-area".

As another aspect, the present application also provides a computer-readable storage medium, which may be a computer-readable storage medium contained in the foregoing apparatus in the foregoing embodiment; or may be a computer-readable storage medium, alone, that is not assembled into a device. The computer-readable storage medium stores one or more programs for use by one or more processors in performing the text generation method described herein as applied to transparent window envelopes.

The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the invention referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or equivalents thereof is possible without departing from the spirit of the invention. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.

Claims

1. A three-dimensional point cloud classification method based on complementary multi-branch deep learning is characterized by comprising the following steps:

acquiring a three-dimensional point cloud data set to be marked;

performing feature classification on the three-dimensional point cloud data set, wherein the feature classification comprises local features, global features and point-by-point features, and the local features comprise:

for each point, selecting adjacent points within a specific radius by utilizing point cloud feature extraction as input, wherein mapping is constructed between each point and a voxel grid corresponding to each point, searching the neighborhood voxel grid to avoid searching all points again when searching the neighborhood point, converting the points into relevant grids according to coordinates of the points, and judging whether the points fall into the voxel grid;

convolving each point with a predefined kernel point having a different weight;

the local features are expressed as:

where r represents the resolution of the voxel;

2. The method of three-dimensional point cloud classification based on complementary multi-branch deep learning of claim 1, wherein the global features comprise:

constructing angle characteristics;

3. The method of three-dimensional point cloud classification based on complementary multi-branch deep learning of claim 1, wherein the point-wise feature comprises:

extracting punctiform features by using an MLP layer;

the layer is maximally pooled to aggregate the punctiform features.

4. A three-dimensional point cloud classification device based on complementary multi-branch deep learning, the device comprising:

the classification unit is used for classifying the three-dimensional point cloud data set, wherein the characteristic classification comprises local characteristics, global characteristics and point-by-point characteristics, and the local characteristics comprise: selecting a first preset number of points from the three-dimensional Point cloud data set based on a Kernel-Point convolution method; for each point, selecting adjacent points thereof within a specific radius as input by utilizing point cloud feature extraction; convolving each point with a predefined kernel point having a different weight; the local features are expressed as:

wherein r represents the resolution of the voxel and is used for judging whether the point falls in the voxel grid or not;

5. The three-dimensional point cloud classification device based on complementary multi-branch deep learning of claim 4, wherein the global features comprise:

constructing angle characteristics;

6. The three-dimensional point cloud classification device based on complementary multi-branch deep learning of claim 4, wherein the point-wise feature comprises:

extracting punctiform features by using an MLP layer;

the layer is maximally pooled to aggregate the punctiform features.

7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-3 when executing the program.

8. A computer readable storage medium having stored thereon a computer program for:

the computer program, when executed by a processor, implements the method of any of claims 1-3.