1. Introduction
More and more fields, including 3D reconstruction [
1,
2], simultaneous localization and mapping (SLAM) [
3], autonomous driving [
4], virtual reality [
5], and others, have used point clouds obtained by laser scanning or stereo matching of images. However, only a part of the 3D object’s point cloud may be obtained in a single scan due to the scanning device’s range of view limitation. If insufficient acquired point clouds are provided, several point clouds of the 3D object must be combined to measure the entire 3D object, which requires all partial point clouds to be aligned into a global coordinate frame [
6]. The registration between point clouds is a necessary operation before further use of point clouds, such as segmentation, recognition, classification, retrieval of point clouds, etc.
Minimizing the distance between a source point cloud and a target point cloud is the aim of point cloud registration, which then converts the pair of point clouds into a global coordinate frame. Nevertheless, without an accurate initial orientation or position, the fine transformation matrix estimation may only converge to a local minimum rather than a global optimum during the distance-minimization process. Therefore, a coarse transformation estimation should be carried out initially to obtain the correct initial transformation parameters, which include a rotation matrix and a translation vector with six degrees of freedom. Finding geometric features and the correspondence between these features are two crucial jobs that must be completed for point clouds in any initial orientation or position during coarse registration. However, the aforementioned tasks may be hindered by some significant obstacles, such as noise and outliers, various types of occlusions, varying densities, low overlap, and so on.
In this paper, we propose a local descriptor based on a multi-scale covariance matrix for point cloud registration which is robust to noise and invariant to rigid transformation, and we also put forward an accurate transformation estimation method by multi-level estimating point cloud transformations. The following is a summary of the contributions.
(1) A multi-scale covariance matrix feature descriptor is proposed, which can describe the geometrical relationship, eigenvalue variation gradient, and surface variation gradient between a keypoint and its neighboring points. The feature descriptor can filter out most of the noise samples by the mean operation in the process of producing covariance and make it very robust to noise.
(2) A method for detecting boundary points is put forward, which can efficiently extract points on the border. By utilizing this approach, nearby keypoints may be successfully eliminated, improving the descriptiveness of the keypoint by our proposed feature descriptor.
(3) An accurate transformation estimation between a pair of point clouds is brought upon. In this method, we obtain corresponding keypoint pairs by feature matching, estimate an initial transformation using a coarse transformation estimation method based on clustering via these correspondences, and finally obtain an accurate transformation by a fine transformation estimation, such as ICP’s variants.
2. Related Work
2.1. Traditional Methods
To expand the viability and effectiveness of traditional point cloud registration, many studies have proposed some methods to tackle these challenges, including coarse registration methods and fine registration methods.
Coarse registration is a rapid estimation of an initial matrix without knowing any initial positions between the source and target point clouds. The course registration approach is always composed of four stages: keypoint detection, feature descriptor generation, and feature matching and transformation estimation. Among the preceding processes, generating feature descriptors is unquestionably the most significant, and numerous studies on feature descriptors, particularly local feature descriptors, have been presented for point cloud registration. Johnson and Hebert et al. [
7] presented a spin image descriptor. Each keypoint described by a spin image is insensitive to rigid transformation, and keypoint correspondences between a source point and a target point cloud can be constructed using the spin image’s correspondences. Frome et al. [
8] suggested a 3D shape context (3DSC) descriptor. As the local reference axis, they use the normal at a keypoint. After that, the spherical neighborhood is partitioned evenly along the azimuth and elevation dimensions but logarithmically along the radial dimension. The weighted number of points that fall within each bin is used to construct the 3DSC descriptor. Rusu et al. [
9] offered the Point Feature Histogram (PFH) descriptor. A Darboux frame is defined initially for each pair of points in the neighborhood of a keypoint. They then compute four measures based on the angles between the normals of the points and the distance vector between them and add these measures for all pairs of points to form a 16-bin histogram. Point clouds are finally registered and aligned into a consistent global point cloud using corresponding feature point pairs. Salti et al. [
10] put forward the SHOT descriptor which encodes the histograms of surface normals at various spatial positions in the reference point neighborhood. Yulan Guo et al. [
11] presented the RoPS descriptor which employs the rotation and projection mechanism to encode the local surfaces. Radu Bogdan Rusu et al. suggested Fast Point Feature Histograms (FPFHs) [
12] which modify PFH descriptors’ mathematical expressions and perform a rigorous analysis of their robustness and complexity for the problem of 3D registration for overlapping point cloud views. Yu Zou et al. [
13] put forward BRoPH, which is generated directly from the point cloud by turning the description of the 3D point cloud into a series binarization of 2D image patches. Yuhe Zhang et al. [
14] proposed KDD, which encodes the information of the whole 3D space around the feature point via kernel density estimation and provides the strategy for selecting different matching metrics for datasets with diverse levels of resolution qualities.
Fine registration is a more precise and refined registration based on coarse registration, and the essential idea is to immediately acquire the correspondence between a source point cloud and a target point cloud. The ICP (Iterative Closest Point) technique [
15,
16] is the most extensively used fine registration method. It is based on the quaternions-based approach and reduce point-to-point distances in overlapping areas between point clouds. However, ICP often needs adequate starting transformation settings, and its iteration process is lengthy. As a result, various ICP variations [
17,
18,
19,
20,
21,
22,
23,
24,
25] have been developed to solve these issues.
2.2. Learning-Based Methods
A number of deep learning approaches, including PCRNet [
26], REGTR [
27], PREDATOR [
28], and others, have recently been suggested for registration. Two categories can be used to categorize learning-based registration: feature learning-based methods and end-to-end learning-based approaches.
In contrast to the conventional point cloud registration techniques, feature learning-based approaches learn a robust feature correspondence search using a deep neural network. The transformation matrix is then finally computed via a one-step estimate (e.g., RANSAC) without the need for iterations. In order to identify soft correspondences between the overlap of two point clouds and extract contextual information for the purpose of learning more unique feature descriptors, PREDATOR uses an attention mechanism. Self-attention and cross-attention are used by REGTR to directly forecast a set of final point correspondences that are consistent. The goal of all these techniques is to estimate robust correspondences using the learned distinctive feature, and they all use deep learning as a tool for feature extraction. Gojcic et al. [
29] propose to describe 3DSmoothNet for point cloud registration, a comprehensive approach to match 3D point clouds with a Siamese deep learning architecture with fully convolutional layers via a voxelized smoothed density value (SDV) representation. Xuyang Bai et al. [
30] put forward PointDSC, which is a revolutionary deep neural network that explicitly uses spatial consistency to prune correspondences between outliers during point cloud registration.
End-to-end neural networks are used in end-to-end learning-based approaches to overcome the registration challenge. Two point clouds are the network’s input, and the transformation matrix that aligns the two point clouds is its output. The network not only can extract features of point clouds but can also estimate transformation. The network of the feature learning-based approach is distinct from the transformation estimation network of the end-to-end learning-based approach and concentrates on feature learning. After extracting global features with PointNet, PCRNet joins these features and feeds them into the MLP network for regression modification. To learn pose-invariant point-to-distribution parameter correspondences, DeepGMR [
31] makes use of a neural network. The GMM optimization module then uses these correspondences to estimate the transformation matrix. For internal likelihood prediction, DGR [
32] proposes a six-dimensional convolutional network design and uses a weighted Procrustes module to estimate the transformation.
However, 3D point clouds with labeled tags are commonly used in deep learning methods, but the creation of the labeled tags is time consuming, making them difficult to use for some practical applications. Exceptions include point cloud registration technology based on deep learning, which still has problems, such as inaccurate error solving, certain randomness and instability of registration results, high algorithm complexity and long computation time, and limited generalization ability. Specifically, the issues with 3DSmoonNet includes high efficiency in processing large-scale point clouds, strong hardware dependency, and low generalization ability. The shortcomings of PointDSC include severe performance dependence on initial matching, high computational complexity, etc.
3. Methodology
Our registration method includes three steps: keypoint detection with boundary removal, feature descriptor generator, and accurate transformation estimation. In the process of keypoint detection with boundary removal, we can detect keypoints from the point cloud that are not located around the boundary, allowing us to describe complete neighborhood information of keypoints and generate a good feature descriptor for each keypoint. In the process of feature descriptor generation, we use our multi-scale covariance matrix descriptor to extract local features of keypoints, making it more robust to noise and invariant to rigid transformation. During the accurate transformation estimation, we use feature matching, coarse transformation estimation based on clustering, and fine transformation estimation based on an ICP’s variant to estimate an accurate transformation between a pair of point clouds. The framework of our point cloud registration is shown in
Figure 1.
3.1. Keypoint Detection with Bounday Removal
We put forward a method for keypoint detection with boundary removal that detects firstly keypoints and boundary points, respectively, and then removes keypoints close to boundary points. The keypoint detection with boundary removal makes sure detected keypoints are of high quality, which can be better matched with the keypoints on another point cloud.
3.1.1. Keypoint Detector
The goal of the keypoint detector is to choose a subset of conspicuous points with good repeatability from the surface of a point cloud. The discovered keypoints should be very robust to noise and highly invariant to rigid transformation.
Given a point cloud
and
, the mean
and covariance matrix C of P can be computed as follows:
then, applying eigenvalue decomposition on the covariance matrix C yields the following results:
where V is the matrix of eigenvectors, E is the diagonal eigenvalues matrix of C, and each column in V corresponds to a diagonal eigenvalue in the diagonal matrix E.
Following that, using Formula (4), point cloud P is aligned with the principal axes established by V, which is known as Principal Component Analysis (PCA).
where
is a normalized point cloud of point cloud P.
For each point
in point cloud
, a local neighborhood Nbhd (
) is used to prune from
using a sphere with a radius
r (so-called support radius) centered at
. Let X and Y stand for the x and y components of Nbhd (
) as follows:
where
l is the length of Nbhd (
). Then, a surface variation index
is defined as follows:
where the local neighborhood’s geometric variation around the point
is reflected by the surface variation index
, which is the ratio between the local neighborhood’s first two principal axes centered at
.
is 1 for a symmetric local point set, such as a plane or sphere, and more than 1 for an asymmetric local point set. The point in
that has a surface variation index higher than
is regarded as a keypoint and is defined as follows:
3.1.2. Boundary Point Detector
We project a point’s neighboring points onto the point’ tangent plane, as seen in
Figure 2, and it can be observed that the majority of a boundary point’s surrounding points projected on the tangent plane are located on one side of the boundary point. There will, therefore, be a notch in the line connecting the projection points and the boundary point when these surrounding points are projected onto the boundary point’s tangent plane. One can ascertain if a point is a boundary point or not by measuring the angle of the notch.
The following illustrates how to decide whether a point in a point cloud is a boundary point.
Find
surrounding points centered at
on a sphere with radius
and designate them as
.
Project each point
in
to a tangent plane centered at
with the following formula:
where
is the normal of
, and designate these projection points as
Nbhd.
Find the nearest point to , designate it as , and then construct a local reference frame () in which axis is ’s normal , axis is , and axis is .
Find the included angle by calculating the angle between the vector and axis, respectively, along a clockwise manner. Then, designate the set of the included angle as , where represents the number of all the included angles.
Assign
after sorting
in ascending order and then compute the difference between adjacent included angles as follows:
Find maximum difference and determine is a boundary point if . is a threshold in this case, whose value is modifiable to determine whether a point is a boundary point or not.
3.1.3. Boundary Keypoint Removal
The process of boundary keypoints removal is depicted as follows:
Traverse all points in the point cloud and detect its boundary points into a set named , following the above boundary point detector.
Traverse all points in the point cloud and detect its entire keypoints into a set named following the above keypoint detector.
For every keypoint in , find its nearest distance to a boundary point in and remove from if .
3.2. Feature Descriptor Generation
We put forward a multi-scale covariance matrix descriptor that gathers geometric relation, surface variation gradient, and eigenvalue variation gradient from a keypoint and its neighborhood within several support radii and encodes such information into a covariance matrix.
3.2.1. Covariance Matrix Descriptor
In probability theory and statistics, covariance is a measure of the joint variability of two random variables, and it is used as a descriptor in image processing. A set of random variables relates to a set of observable properties that may be detected from a keypoint and its neighborhood in the context given by the covariance matrix descriptor, such as normal values, curvature values, 3D coordinates, and so on. As a result, the first step of generating a covariance matrix descriptor is to construct a variable selection function
for a given keypoint p as shown below.
where
is a neighboring point of p,
is a random variable feature vector obtained between p and each of its neighboring points, and r is the support radius of the neighborhood of p.
To make selected random variables in the feature vector robust to noise and invariant to rigid transformation, these random variables should be computed relative to the keypoint p. As a result, we define
as follows:
where
represents the cosine between the segment from p to
and the normal vector n of p;
represents the cosine between the segment from p to
and the normal vector
of
;
represents the cosine between the normal vector
of
and the normal vector n of p;
represents the surface variation gradient between p and
; and
represent the gradient of eigenvalue variation of two covariance matrices constructed by two neighborhoods centered at p and
, respectively. It should be noted that we utilize the cosine rather than the angle of
to save computation time. From (14) to (20), the calculation formulae are provided, and the three geometric relations
,
, and
are shown in
Figure 3.
where (
) represents the dot product between two points and
represents the Euclidean distance between two points.
Following the definition of a variable selection function
for the keypoint
p, the covariance matrix descriptor of a keypoint
p is defined as follows:
where
N represents the number of neighboring points of
p within its support radius
r and
is the mean of these vectors. Our proposed feature descriptor offers a more flexible representation since it considers 3D points as samples of a distribution and, by construction, subtracts the mean of this sample distribution; therefore, in case of noise, it will be naturally attenuated.
3.2.2. Multi-Scale Covariance Matrix Descriptor
We adjust the value of the support radius r to generate covariance descriptors at various scales for a keypoint. A keypoint’s multi-scale covariance descriptor is defined as follows:
Compared to a covariance matrix descriptor with a fixed radius, a multi-scale covariance descriptor for a keypoint can improve its description.
3.3. Accurate Transformation Estimation
An accurate transformation estimation includes feature matching, coarse transformation estimation, and fine transformation estimation. By feature matching, we can find coarse correspondences between a pair of point clouds, and we can obtain a coarse transformation matrix between the pair of point clouds by coarse transformation estimation. After applying the coarse transformation matrix to the source point cloud which is a transformable point cloud in the pair of point clouds, we use a fine transformation estimation to estimate a fine matrix between the transformed source point cloud and the target point cloud, which is fixed point cloud in the original pair of point clouds. Through these three processes, the transformation between a pair of point clouds will become increasingly accurate.
3.3.1. Feature Matching
The goal of feature matching is to find feature correspondences between two sets of multi-scale covariance matrix descriptors and then find keypoint correspondences between the source point cloud and the target point cloud. To establish feature correspondences, we must determine the similarity of two multi-scale covariance matrix descriptors followed by the feature correspondences between two sets of multi-scale covariance matrix descriptors.
Similarity between two multi-scale covariance matrix descriptors.
The distance between two covariance matrix descriptors can be used to represent their similarity. The distance is closer, and the similarity is greater. Because the covariance matrix is a symmetric positive definite matrix, the geodesic distance between two covariance matrices can be used to measure their similarity.
In recent years, various geodesic distance formulas have been proposed. In this paper, Jensen–Bregman LogDet Divergence [
33] is employed, and it is defined as follows:
where
is the matrix determinant and
is the natural logarithm. This metric has been proven to be more efficient in terms of speed without sacrificing application performance [
28], which is why we use it to compare the similarity of two covariance matrix descriptors in this paper.
Additionally, the similarity between two multi-scaled covariance matrix descriptors can be defined using formula (24) as follows:
where
and
correspond to two covariance matrix descriptors relating to different radius scales ranging from
to
and
represents the similarity of two covariance matrix descriptors with a fixed scale
. Formula (24) considers similarity under multi-scales and utilizes the mean of similarity under multi-scales as the similarity of multi-scaled covariance matrix descriptors. A covariance matrix descriptor stated in the rest of this paper refers to a multi-scale covariance matrix descriptor unless otherwise specified.
Correspondence between two sets of covariance matrix descriptors.
We present a bidirectional nearest neighbor distance ratio (BNNDR) to detect correspondences from two sets of covariance matrix descriptors, which differs from existing techniques, such as the nearest neighbor (NN) and the nearest neighbor distance ratio (NNDR). The detailed matching procedure via the BNNDR is as follows.
First, assume that and are the sets of covariance matrix descriptors from a source point and a target point cloud , respectively.
Next, each covariance matrix descriptor
in
is matched against all the covariance matrix descriptors in
to yield the first and second nearest covariance matrix descriptors to
:
and
, respectively.
where
is the set
, excluding the covariance matrix descriptor
, and then the NNDR
is defined as follows:
if the ratio
is less than a threshold
, (
,
) can be thought of as a candidate-matched feature pair, and (
) can be also thought of as a candidate-matched keypoint pair, with keypoint
corresponding to
and keypoint
corresponding to
.
Then, is also matched against all the covariance matrix descriptors in to generate its corresponding feature descriptor following the preceding procedures, and the matched keypoint pair (, ) can be acquired by the corresponding covariance matrix descriptor pair.
Finally, (, ) is a matched feature pair if is the same keypoint as ; otherwise, (, ) is an unmatched feature pair.
3.3.2. Coarse Transformation Estimation
So far, a correspondence set
is constructed between
and
. The goal of coarse transformation estimation is to find an optimal rigid transformation
from
. The RANSAC algorithm and its variants [
29,
30] are popular approaches to solving the problem. However, they face challenges due to a lack of robustness and a high time consumption. To estimate
, we put forward a coarse transformation estimation method based on clustering. The steps are outlined below in detail.
3.3.3. Fine Transformation Estimation
Following the completion of the preceding two phases, we acquire an initial transformation
between the source and target point clouds and then transform the source point using the transformation
before fine registration. We apply a recent ICP variant [
21] to complete fine transformation estimation, which can register accurately the source point cloud to the target point cloud.
4. Experiments
The experiments involve the comparison of performance and registration effectiveness between our proposed feature descriptor based on multi-scale covariance matrix and state-of-the-art feature descriptors, with the goal of testing the performance of our proposed feature descriptor and registration approach.
4.1. Experiments of Keypoint Detection with Boundary Removal
The task of this section is to validate whether our keypoint detection with boundary removed can detect boundary points in a point cloud and can obtain keypoints far away from the boundary.
4.1.1. Dataset
To conduct these experiments, we initially employed partial point clouds of Armadillo, Bunny, and Buddha et al. from the Stanford 3D Scanning Repository [
35] for our research. For Armadillo, we included 11 partial point clouds in our collection, numbered from ArmadilloBack_0 to ArmadilloBack_330 with 60-degree intervals between them. For Bunny, we provided six partial point clouds in our dataset, labeled Bun000, Bun045, Bun090, Bun180, Bun270, and Bun315. For Happy Buddha, we chose 15 partial point clouds from our dataset and called them HappyStandRight_0 to HappyStandRight_336 with 24-degree intervals between them. Samples of point clouds from our dataset are shown in
Figure 4.
4.1.2. Results of Keypoint Detection with Boundary Removal
To determine if a location is a boundary point, we choose neighboring points within a radius of 4 mr (the support radius) and assign different
values.
Figure 5 depicts the boundary points of the point clouds indicated in
Figure 4 under various
, with the red points representing boundary points and the green points representing the original point clouds in
Figure 4.
Figure 5 clearly shows that the number of boundary points is the minimum, and the boundary points are discrete when
is set to π. There are more boundary locations when
is between π/2 and π/4. When
is set to π/3 or π/4, the boundary points overlap and create many clusters. In subsequent studies, we set
to π/3, recognizing that too many boundary points reduce the time efficiency of finding keypoints and that too few boundary points reduce the effectiveness of keypoint detection.
After acquiring boundary points for a point cloud, we may use our offered keypoint detector to extract original keypoints and delete boundary keypoints. In
Figure 6, columns (a) and (c) illustrate keypoints detected by our keypoint detector for the various point clouds indicated in
Figure 4, with red points representing keypoints and green points representing original points. After retrieving the original keypoints, we shall eliminate any keypoints that are less than 5 mr from the nearest boundary point. Columns (b) and (d) in
Figure 6 show the current keypoints after deleting the boundary keypoints from the original keypoints. When compared to keypoints in columns (a), (b), (c), and (d) in
Figure 6, it is obvious that our approach for keypoint detection with boundary removal can yield keypoints on non-boundaries, which is critical for generating feature descriptors and matching point pairs between a pair of point clouds.
4.2. Experiments on the Performance of Descriptors
4.2.1. Dataset
We chose four source point clouds from the Stanford 3D Scanning Repository for these experiments to evaluate the performance of our proposed feature descriptor in these experiments. Our target point clouds are created by superimposing Gaussian noise on our source point clouds (at three distinct levels of Gaussian noise, with standard deviations of 0.1 mr, 0.3 mr, and 0.5 mr).
4.2.2. Evaluation Metrics of Performance of Descriptors
Our proposed multi-scale covariance matrix descriptor includes only one parameter: random variables in the vector . As a result, the descriptor’s performance against various variables in the vector must be evaluated. In the meantime, a performance comparison of our proposed feature descriptor with state-of-the-art feature descriptors is required.
Our experimental findings are shown using the recall versus 1-precision curve (RP curve). The recall versus 1-precision curve is a frequent statistic in the literature for evaluating a descriptor [
36]. It is created in the following manner. Given a target point cloud, a source point cloud, and a ground truth matrix, keypoints from the target and source point clouds are detected and described as target feature descriptors and source feature descriptors, respectively, using our proposed multi-scale covariance matrix descriptor algorithm. The total number of positives is used to calculate the number of target feature descriptors. To determine the closest feature descriptor, a target feature descriptor is compared to all source feature descriptors. If the distance between the target feature descriptor and the nearest source feature descriptor is less than a certain threshold, the pair is considered matched. Furthermore, a matched feature pair is regarded as a valid positive if the distance between their physical places is short enough; otherwise, it is considered a false positive. The RP can be calculated using these numbers.
4.2.3. Comparison between Different Feature Vectors for Covariance Descriptor
Random variables in the feature vector play a significant part in the generation of our proposed covariance matrix descriptor, which impacts not only the encapsulation ability of information in the covariance matrix but also the length of the feature vector. The following feature vectors are concatenated by random variables in this paper:
is the feature vector that is employed in the covariance matrix descriptor described in this paper. Other feature vectors are generated by increasing or reducing random variables based on . The relative values of the keypoints and their neighboring ones are represented by the values of each variable in the feature vectors , , and , and these random variables exhibit rotation translation invariance and noise robustness. The x, y, and z coordinates of neighboring points that are not invariant to rotation and translation are added to the feature vector . On the basis of ., the eigenvector . adds the normal vector information of neighboring points. These normal vectors lack rotation translation invariance as well.
We find keypoints from a source point cloud using our proposed keypoint detection method with boundary removal in these experiments. Simultaneously, to ensure that the entire number of positives is known, we must construct keypoints of the target point cloud that match to keypoints of the source point cloud. The following description depicts the method of producing corresponding keypoints of target point clouds:
Transform the source point cloud’s keypoints using the ground truth matrix;
For every keypoint in transformed keypoints, find the closest point to it in the target point cloud and designate the point as a keypoint of the target point cloud.
In the process of generating the covariance matrix descriptor, the support radius is set to 6 mr, 7 mr, 8 mr, and 9 mr, correspondingly. The average recall and 1-precision are determined under three Gaussian noise situations.
Figure 6 depicts the experimental results, where
Figure 7a–c represent RP curves at 0.1 mr, 0.3 mr, and 0.5 mr, respectively.
Figure 6 shows that under the three noises, the performance of the covariance matrix descriptor formed by the feature vectors
,
, and
is better than the performance of the covariance matrix formed by the feature vectors
and
, and the performance of the covariance matrix formed by the feature vectors
and
will decline rapidly as the noise deviation increases. This is due to two factors. First, the random variables in
,
, and
have rotation and translation invariance, but the random variables in
and
do not have rotation and translation invariance, such as absolute spatial coordinates x, y, z, and so on. Second, the random variables
,
, and
include the geometric structure, surface variation gradient, and eigenvalue variation gradient, which are all robust to noise, whereas the random variables
and
include normal vector and spatial coordinates that are not robust to noise.
In the meantime, when the performance of the covariance matrix descriptor constructed from the feature vectors and is compared, it can be seen that the performance of the covariance matrix descriptor formed by is better than the performance of the covariance matrix descriptor formed by , which is due to the fact that surface variation gradient and eigenvalue variation gradient are more stable and more robust to noise than geometric structure.
Additionally, by comparing the performance of the covariance matrix descriptor constructed by the feature vector
,
, and
, it can also be shown that the performance of the covariance matrix descriptor will be enhanced by adding more random variables with strong robustness to noise and invariance to rotation and translation to the feature vector. Kaiser et al. [
37] obtain a similar result and recommend that the number of random variables in the feature vector be limited to roughly 15. In this study, the length of the random variables is 7. Despite the limited number of random variables, the reduced dimensionality of the covariance matrix descriptor can increase time efficiency and minimize space overhead. Finally,
Figure 6 shows that the performance of the covariance matrix descriptor formed by
drops slightly as noise increases from 0.1 mr to 0.5 mr, confirming that the feature vector used in this paper has good noise robustness and rotation and translation invariance.
4.2.4. Comparison between Different Feature Descriptors
We compare the performance of our proposed covariance matrix descriptor to the performance of state-of-the-art feature descriptors like spin image [
7], 3DSC [
8], PFH [
9], RoPS [
11], FHPH [
12], BRoPH [
13], and KDD [
14], which are regarded as classical feature descriptors in the literature for successful surface matching and are implemented on Point Cloud Library (PCL) [
38,
39]. Any required parameter (support radius, bin size) for the compared feature descriptors is set according to their original recommendations or similar values for our proposed feature descriptor, with the goal of providing the most fair comparison possible.
Figure 5 clearly shows that the spin image descriptor performs worse than the other three descriptors under three different noise circumstances. The fundamental explanation for this is that the spin image descriptor is sensitive to non-uniform sampling [
40], whereas the keypoints in this set of trials are not produced via uniform sampling. Simultaneously, when noise levels rise, FPH’s performance falls significantly. There are two reasons for this. First, the PFH establishes a Darboux coordinate system and computes four features between two surrounding points that are rotation and translation invariant and noise resistant. Second, PFH combines three of its four feature variables to build a histogram with 16 intervals, which further increases PFH’s robustness to noise.
Meanwhile,
Figure 8 shows that the performance of KDD is better than that of our proposed feature descriptor in the case of 0.1 mr noise, but as the noise increases, particularly when the noise is 0.3 mr and 0.5 mr, the performance of our proposed feature descriptor clearly outperforms all of the other feature descriptors. Our proposed feature descriptor is based on a covariance matrix, which is noise resistant for two reasons. First, the feature vectors that comprise the covariance matrix are noise robust and rotation and translation insensitive. Second, the mean operation in the process of producing covariance filters out most of the noise-damaged samples, directly improving the robustness of the feature descriptor to noise.
4.3. Experiments on Registration Effectiveness
4.3.1. Dataset
To verify the registration of point clouds using our proposed covariance matrix descriptor, we conduct experiments on partial point clouds from the Stanford 3D Scanning Repository, the SpaceTime [
10,
37,
38] dataset, and the Kinect dataset [
10,
37,
38]. In the Stanford 3D Scanning Repository, we test registration effectiveness for eleven of Armadillo’s incomplete point clouds, six of Bunny’s incomplete point clouds, fifteen of Happy Buddha’s incomplete point clouds, and fifteen of Dragon’s partial point clouds. In the SpaceTime dataset, we test registration effectiveness between two models and fifteen scenes, where models and scenes are regarded as source point clouds and target point clouds, respectively. In the Kinect dataset, we test registration effectiveness between partial views of six models and fifteen scenes, where partial views of six models and scenes are regarded as source point clouds and target point clouds, respectively. Some of the point clouds taken from these datasets are presented in
Figure 9.
4.3.2. Evaluation Metrics of Registration Effectiveness
In [
41], two metrics—the rotation error
and the translation error
—are utilized for a quantitative evaluation of the registration’s performance. The error between the real matrix
and the ground truth rotation matrix
is represented by
. The inaccuracy between the translation vector
and the ground truth vector
is represented by
. Here is how they are defined.
where
and
are known in advance time and
is 1 mr. It should be mentioned how
and
can be obtained. A coarse transformation matrix
between a source point cloud and a target point cloud can be estimated following coarse registration, and it is presented as formula (36). A transformation matrix called
will be produced throughout each iteration of the fine registration procedure. Therefore, each iterative transformation matrix can be multiplied to generate a fine transformation matrix, as indicated by formula (37). Ultimately, the final transformation matrix
is obtained by multiplying
by
, as expressed in formula (38).
where
includes a rotate matrix
and a translator vector
,
and
stand for the rotation matrix and translation vector of
, respectively, and
and
stand for the rotation matrix and translation vector of
, respectively.
4.3.3. Settings of Experimental Parameters
Table 1 lists all parameters used in our experiments.
is a threshold for a surface variation index, and a point is defined as a keypoint when its surface variation index is greater than
;
is a threshold for the maximum difference between adjacent included angles, and a point is defined as a boundary point when its maximum difference is greater than
.
is a threshold for the nearest distance between a keypoint and a boundary point, and a keypoint is regarded as a boundary point when its distance to its nearest boundary point is lower than
is the supported radius used to generate a multi-scale feature descriptor proposed in our paper.
is a threshold for the distance ratio between a pair of the first nearest descriptor and a pair of the second nearest descriptor; a pair of descriptors can be regarded as a matched pair of descriptors when the distance ratio between a pair of the first nearest descriptor and a pair of the second nearest descriptor is less than
. κ is the number of clusters in coarse transformer estimation, and we set it to seven in this paper;
is a threshold for determining whether a point is an inlier, and a transformed point by a transformer matrix in a source point cloud is regarded as an inlier when its distance to its nearest point in the target point cloud is less than
.
4.3.4. Comparison between Registration Effectiveness of Transformation Estimation
Based on our registration framework and our proposed feature descriptor, we compare the registration effectiveness between our accurate transformation estimation and state-of-the-art transformation estimation such as feature matching, clustering, ICP, RANSAC, etc., on the above datasets.
Table 2 shows the registration effectiveness of kinds of transformation estimation based on formulas (36) and (37). It is clearly shown that our proposed transformation estimation outperforms other transformation estimations followed by feature matching + clustering and RANSAC, and transformation estimation based on ICP obtains the lowest registration effectiveness in all transformation estimation methods. The core reason that transformation estimation based on ICP obtains the lowest registration effectiveness is that the initial transformation is unknown. So, when the initial transformation is obtained by feature matching, we then obtain a transformation between a pair of point clouds, and it will improve registration effectiveness. At the same time, according to a comparison of the registration effectiveness of transformation estimation between our proposed transformation estimation and feature matching + clustering and feature matching + ICP, three steps in our proposed transformation estimation were found to gradually improve registration effectiveness. The transformation estimation based on feature matching can not only obtain the transformation of the point cloud but also obtain a series of correspondences between source keypoints and target keypoints between source feature descriptors and target feature descriptors. These correspondences are applied to coarse transformation estimation based on clustering, which will help us obtain the initial transformation of point cloud registration. Finally, a fine transformation estimation (transformation estimation based on ICP) via the initial transformation can further obtain a more optimal transformation of point cloud registration.
4.3.5. Comparison between Registration Effectiveness of Point Clouds
As a result, we compare our feature descriptor to state-of-the-art feature descriptors using our registration framework. The average registration errors of point clouds in our dataset are shown in
Table 3. It clearly shows that the average registration error for point clouds in the Standford 3D Scanning Repository via our proposed feature descriptor is moderate, lagged by BRoPH and KDD, and the average registration error for point clouds in the SpaceTime dataset via our proposed feature descriptor is also moderate, lagged by 3DSC, BRoPH, and KDD. However, in the Kinect dataset, pairwise registration based on our proposed feature descriptor obtains the least registration error, which means it achieves the best registration effectiveness. The deep-seated reason is that the sampling quality or noise level of the three datasets is different, and our proposed feature descriptor has a greater robustness to noise. The point clouds in the 3D Scanning Repository are scanned with a Cyberware 3030 MS scanner, which is a swept-stripe laser triangulation range scanner with a higher sampling quality; the point clouds in the SpaceTime dataset are captured by a SpaceTime Stereo scanner with a high sampling quality, and point clouds are scanned by a low-cost Microsoft Kinect scanner with a low sampling quality. Different sampling qualities of the scanner decide the noise level of captured point clouds, with higher sampling quality resulting in less noise, and vice versa. Point clouds in the 3D Scanning Repository and the SpaceTime dataset have lower noise, so feature descriptors such as 3DSC, BRoPH, and KDD are more advantageous, and the registration effectiveness based on these feature descriptors are naturally higher than the registration effectiveness based on our proposed feature descriptor. However, with the increase in noise in point clouds, the advantage of our proposed feature descriptor in point cloud registration also emerges, which is due that the mean operation in the process of producing covariance filters out of most of the noise-damaged samples, making our feature descriptor very robust to noise and achieving the highest registration performance in the Kinect dataset.
4.3.6. Comparison between Registration Efficiency of Point Cloud
Based on our registration framework and our proposed feature descriptor, we compare the average registration time between our accurate transformation estimation and state-of-the-art transformation estimation, such as feature matching, clustering, ICP, RANSAC, etc., on the above three datasets.
Table 4 shows the average registration time based on the kinds of transformation estimations. Meanwhile, we compare the average registration time between our feature descriptor and state-of-the-art feature descriptors via our registration framework and our accurate transformation estimation. The average registration time of point clouds based on the kinds of feature descriptors on the above three datasets is shown in
Table 5.
The results in
Table 4 show that the registration time of our proposed transformation estimation approach is only slightly lower than that of feature matching and ICP-based estimation methods. Although our transformation estimate technique takes somewhat longer to register than feature matching-based estimation methods, it is due to the fact that our method relies on feature matching to create initial matched point pairs. Although this phase takes some time, it lays the groundwork for more precise transformation estimates in the next stage. On the other hand, although the ICP-based transformation estimation method outperforms our method in registration time, as shown in
Table 2, this method has significant errors in estimating the transformation matrix. This means that although the ICP method has advantages in time efficiency, there are shortcomings in accuracy. In addition, it can be observed in
Table 4 that when feature matching-based transformation methods are combined with ICP methods or clustering methods, their computation time increases, resulting in lower time efficiency than our methods. This may be because these combination methods require more computing resources and steps when processing complex data. Finally, the transformation approach based on clustering/RANSAC requires a significant number of iterative operations, which takes longer than our method. This further demonstrates that our accurate transformation estimation approach is time efficient while retaining excellent accuracy. Based on the findings in
Table 2, we can infer that our transformation estimation approach maintains high accuracy while being time efficient.
According to
Table 5, the registration approach based on our proposed feature descriptors takes just slightly longer than the registration approach based on PFH/FPFH. The registration approach based on PFH has the shortest registration time, which might be owing to the relatively easy and quick calculation of PFH descriptors. In contrast, registration methods based on 3DSC have the longest registration times, typically due to higher descriptor dimensions, resulting in more complex and time-consuming generation and matching processes. As a result, it is clear that one of the most important elements influencing registration time is the descriptor dimension. Generally speaking, the larger the dimension of a descriptor, the more processing and storage resources it demands, resulting in a longer creation and matching time. Combined with the results shown in
Table 3, it is clear that our feature descriptor achieves a relatively rapid registration time while being low dimensional, resulting in a good balance of efficiency and accuracy.
5. Conclusions
In this paper, we propose a feature descriptor based on a multi-scale covariance matrix that describes the geometrical relationship, the eigenvalue variation gradient, and the surface variation gradient, and we use it to extract local features of keypoints detected by our keypoint detection method. We, thereafter, present an accurate transformation estimation method by multi-level estimating point cloud transformations to obtain the best optimal transformation between a pair of point clouds. Experiment findings reveal that our proposed feature descriptor is more resistant to noise than PFH, 3DSC, spin image, etc., our accurate transformation estimation outperforms state-of-the-art transformation estimation such as feature matching, clustering, ICP, RANSAC, etc., and our registration effectiveness is extremely successful in the Kinect dataset with noisy data. Of course, the approach we propose has certain limitations, which are as follows.
(1) Too many parameters and thresholds must be supplied in our proposed approach. For example, r (the support radius is used to detect keypoints), (a threshold for a surface variation index), (a threshold for the maximum difference between adjacent included angles), (a threshold for the nearest distance between a keypoint and a boundary point), et al., must be provided. Setting these parameter values/thresholds is a hard and time-consuming operation, and the quality of the settings will influence the registration outcomes. As a result, the next phase of work should concentrate on research in this area, such as establishing multiple parameter values/threshold acquisition standards and adaptively choosing parameter values/thresholds for 3D point cloud registration.
(2) The rigid point cloud registration approach assumes that objects do not undergo deformation during the registration process, while the non-rigid point cloud registration involves objects that undergo deformation, that is, objects may undergo shape or size changes during the registration process due to various factors such as pressure, temperature, etc. Therefore, the method used in this paper cannot be directly applied to non-rigid body registration due to its inherent assumptions and limitations. But if non-rigid bodies are decomposed into multiple small rigid bodies, non-rigid body registration can be defined as a non-uniform Euclidean transformation between multiple small rigid bodies and the approach proposed in this paper also have practical applications. Therefore, future work needs to research how to decompose non-rigid point cloud registrations into multiple local rigid point cloud registrations and apply the method proposed in this paper to the non-rigid point cloud registrations.
(3) Apply our proposed point cloud registration approach in this paper to the registration of indoor and outdoor scenes. At present, our point clouds used in this paper only involve single, small objects and do not involve large point clouds, such as indoor and outdoor scenes. Compared with a single small object, the point cloud involved in indoor and outdoor scenes grows exponentially, and the computational complexity also increases exponentially. These will be challenges for our point cloud registration approach proposed in this paper.