CN114519772A

CN114519772A - Three-dimensional reconstruction method and system based on sparse point cloud and cost aggregation

Info

Publication number: CN114519772A
Application number: CN202210090256.0A
Authority: CN
Inventors: 陶文兵; 齐雨航; 刘李漫
Original assignee: Wuhan Tuke Intelligent Technology Co ltd
Current assignee: Wuhan Tuke Intelligent Technology Co ltd
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2022-05-20

Abstract

The invention relates to a three-dimensional reconstruction method and a system based on sparse point cloud and cost aggregation, wherein the method comprises the following steps: acquiring a multi-view image and a plurality of corresponding sparse point clouds, and preprocessing the sparse point clouds to obtain depth maps under a plurality of views; extracting features of the multi-view image, constructing one or more cost bodies, and modulating and regularizing each cost body by using the plurality of sparse point clouds to obtain a plurality of probability bodies; and restoring a depth map from each probability body, and fusing the depth map with the filtered depth maps under multiple viewing angles to obtain a reconstructed point cloud model. According to the method, the constructed cost body is modulated by using sparse prior through a strategy based on sparse point guide, and the accuracy of the cost body in estimating the depth of a weak texture region and a detailed structure is improved by using means such as regularization, so that the reconstruction quality of the three-dimensional point cloud model is improved.

Description

A 3D reconstruction method and system based on sparse point cloud and cost aggregation

技术领域technical field

本发明属于计算机视觉技术领域，具体涉及一种基于稀疏点云和代价聚合的三维重建方法及系统。The invention belongs to the technical field of computer vision, and in particular relates to a three-dimensional reconstruction method and system based on sparse point cloud and cost aggregation.

背景技术Background technique

基于图像的三维重建旨在从多张输入图像中恢复出三维几何结构，是计算机视觉领域中一个重要且具有挑战性的问题。相比于基于激光雷达的主动式三维重建，基于图像的三维重建具有成本低、通用性强等优点。Image-based 3D reconstruction, which aims to recover 3D geometry from multiple input images, is an important and challenging problem in the field of computer vision. Compared with active 3D reconstruction based on lidar, image-based 3D reconstruction has the advantages of low cost and strong versatility.

传统的多视图三维重建方法基于手工设计的特征进行跨视图的相似性搜索，在理想的朗伯体场景下能够取得较好的重建效果，但是在弱纹理区域和存在镜面反射的区域，由于图像特征难以提取，导致重建效果不尽人意。近年来，深度神经网络在计算机视觉领域得到的广泛的应用。深度学习的方法基于大量的标注数据，通过深度神经网络自动地学习输入图像的特征。与传统方法相比，深度神经网络提取的特征包含更多的语义信息。The traditional multi-view 3D reconstruction method performs cross-view similarity search based on hand-designed features, and can achieve good reconstruction results in an ideal Lambertian scene, but in weak texture areas and areas with specular reflections, due to the image Features are difficult to extract, resulting in unsatisfactory reconstruction results. In recent years, deep neural networks have been widely used in the field of computer vision. The deep learning method is based on a large amount of labeled data, and automatically learns the features of the input image through the deep neural network. Compared with traditional methods, the features extracted by deep neural networks contain more semantic information.

2020年华中科技大学的学者Xu和Tao将基于方差的代价指标替换为平均组相关(Average Group－wise Correlation)，在不降低模型重建质量的前提下，降低了GPU显存开销。同时将多视图深度估计问题建模为逆向深度回归问题，使得模型在深度范围较大的场景表现更佳。In 2020, scholars Xu and Tao from Huazhong University of Science and Technology replaced the variance-based cost index with Average Group-wise Correlation, which reduced GPU memory overhead without reducing the quality of model reconstruction. At the same time, the multi-view depth estimation problem is modeled as an inverse depth regression problem, which makes the model perform better in scenes with a large depth range.

尽管基于深度学习的方法取得了较大的进展，但尚未充分利用稀疏重建的结果，只利用了相机位姿信息，忽略或没有充分利用稀疏点云信息。Although the methods based on deep learning have made great progress, the results of sparse reconstruction have not been fully utilized, only the camera pose information is utilized, and the sparse point cloud information is ignored or not fully utilized.

发明内容SUMMARY OF THE INVENTION

为充分利用稀疏点云信息和提高深度估计的准确度，进而提高三维点云模型的重建质量，尤其是解决弱纹理区域和存在镜面反射的区域，图像特征难以提取的问题，在本发明的第一方面提供了一种基于稀疏点云和代价聚合的三维重建方法，包括：获取多视角图像及其对应的多个稀疏点云，并对所述多个稀疏点云进行预处理，得到多个视角下的深度图；对所述多视角图像进行特征提取并构建一个或多个代价体，利用所述多个稀疏点云对每个代价体进行调制和正则化，得到多个概率体；从每个概率体中恢复出深度图，并将其与过滤后的多个视角下的深度图融合，得到重建后的点云模型。In order to make full use of sparse point cloud information and improve the accuracy of depth estimation, thereby improving the reconstruction quality of the 3D point cloud model, especially to solve the problem that image features are difficult to extract in weak texture areas and areas with specular reflection, in the first step of the present invention. On the one hand, a 3D reconstruction method based on sparse point cloud and cost aggregation is provided, including: acquiring a multi-view image and its corresponding multiple sparse point clouds, and preprocessing the multiple sparse point clouds to obtain multiple sparse point clouds. The depth map under the perspective; perform feature extraction on the multi-view image and construct one or more cost volumes, and use the multiple sparse point clouds to modulate and regularize each cost volume to obtain multiple probability volumes; The depth map is recovered from each probability volume and fused with the filtered depth maps from multiple perspectives to obtain the reconstructed point cloud model.

在本发明的一些实施例中，所述对所述多视角图像进行特征提取并构建一个或多个代价体包括：利用卷积神经网络分别对所述多视角图像的每张图像进行特征提取，得到多个特征图；选取所述多个特征图中的一个特征图作为参考特征图，余下的多个特征图作为源特征图，并计算每个源特征图在参考特征图上的特征体，得到多个视图的特征体；将所述多个视图的特征体进行聚合为代价体。In some embodiments of the present invention, the performing feature extraction on the multi-view image and constructing one or more cost volumes includes: using a convolutional neural network to perform feature extraction on each image of the multi-view image, respectively, Obtain multiple feature maps; select one feature map in the multiple feature maps as a reference feature map, and the remaining multiple feature maps as source feature maps, and calculate the feature body of each source feature map on the reference feature map, Obtain feature volumes of multiple views; aggregate the feature volumes of the multiple views into a cost volume.

进一步的，所述将所述多个视图的特征体进行聚合为代价体通过如下方法实现：Further, the aggregation of the feature volumes of the multiple views into a cost volume is implemented by the following method:

式中C表示代价体，M表示逐元素的方差计算，v_i表示第i个特征体， N表示特征体的总数，

表示所有特征体的均值。where C represents the cost volume, M represents the element-by-element variance calculation, vi represents the _ith feature volume, N represents the total number of feature volumes,

Represents the mean of all feature volumes.

在本发明的一些实施例中，所述利用所述多个稀疏点云对每个代价体进行调制和正则化，得到多个概率体包括：基于多个视角下的深度图，构建高斯调制函数；根据所述高斯调制函数对每个代价体进行调制；利用3D分割网络对每个代价体进行正则化，得到过滤后的概率体。In some embodiments of the present invention, using the multiple sparse point clouds to modulate and regularize each cost volume to obtain multiple probability volumes includes: constructing a Gaussian modulation function based on depth maps from multiple viewing angles ; modulate each cost volume according to the Gaussian modulation function; use a 3D segmentation network to regularize each cost volume to obtain a filtered probability volume.

进一步的，，所述正则化通过如下方法实现：Further, the regularization is implemented by the following methods:

式中C(v₀)为每个代价体中体素v₀的代价，

为正则化操作后体素v₀的代价；ω_k为第k个采样位置处的权重，v_k为卷积感受野中固定的偏置，Δv_k为自适应代价聚合过程中学习的偏置。where C(v ₀ ) is the cost of voxel v ₀ in each cost volume,

is the cost of the voxel v ₀ after the regularization operation; ω _k is the weight at the k-th sampling position, v _k is the fixed bias in the convolution receptive field, and Δv _k is the bias learned during the adaptive cost aggregation process.

在上述的实施例中，所述对所述多个稀疏点云进行预处理，得到多个视角下的深度图包括：获取每一视图下所有关键点对应的三维点，过滤不可见的三维点；所述过滤后的多个三维点，根据当前视图的相机外参，通过投影变化和坐标变换，得到每个三维点在其图像坐标系中的深度值。In the above-mentioned embodiment, the preprocessing of the multiple sparse point clouds to obtain the depth maps from multiple perspectives includes: acquiring 3D points corresponding to all key points in each view, filtering invisible 3D points ; For the filtered multiple three-dimensional points, the depth value of each three-dimensional point in its image coordinate system is obtained through projection change and coordinate transformation according to the camera external parameters of the current view.

本发明的第二方面，提供了一种基于稀疏点云和代价聚合的三维重建系统，包括：获取模块，用于获取多视角图像及其对应的多个稀疏点云，并对所述多个稀疏点云进行预处理，得到多个视角下的深度图；构建模块，用于对所述多视角图像进行特征提取并构建一个或多个代价体，利用所述多个稀疏点云对每个代价体进行调制和正则化，得到多个概率体；重建模块，用于从每个概率体中恢复出深度图，并将其与过滤后的多个视角下的深度图融合，得到重建后的点云模型。A second aspect of the present invention provides a 3D reconstruction system based on sparse point cloud and cost aggregation, including: an acquisition module for acquiring a multi-view image and a plurality of corresponding sparse point clouds, and for the plurality of sparse point clouds. The sparse point cloud is preprocessed to obtain depth maps from multiple perspectives; a building module is used to perform feature extraction on the multi-view image and construct one or more cost volumes, and use the multiple sparse point clouds for each The cost volume is modulated and regularized to obtain multiple probability volumes; the reconstruction module is used to recover the depth map from each probability volume and fuse it with the filtered depth maps from multiple perspectives to obtain the reconstructed point cloud model.

进一步的，所述构建模块包括：提取单元，用于利用卷积神经网络分别对所述多视角图像的每张图像进行特征提取，得到多个特征图；计算单元，用于选取所述多个特征图中的一个特征图作为参考特征图，余下的多个特征图作为源特征图，并计算每个源特征图在参考特征图上的特征体，得到多个视图的特征体；聚合单元，用于将所述多个视图的特征体进行聚合为代价体。Further, the building module includes: an extraction unit for extracting features from each image of the multi-view image by using a convolutional neural network to obtain a plurality of feature maps; a calculation unit for selecting the plurality of One feature map in the feature map is used as the reference feature map, and the remaining multiple feature maps are used as the source feature map, and the feature volume of each source feature map on the reference feature map is calculated to obtain the feature volume of multiple views; the aggregation unit, for aggregating the feature volumes of the multiple views into a cost volume.

本发明的第三方面，提供了一种电子设备，包括：一个或多个处理器；存储装置，用于存储一个或多个程序，当所述一个或多个程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现本发明在第一方面提供的基于稀疏点云和代价聚合的三维重建方法。A third aspect of the present invention provides an electronic device, comprising: one or more processors; a storage device for storing one or more programs, when the one or more programs are stored by the one or more programs The processor executes, so that the one or more processors implement the three-dimensional reconstruction method based on sparse point cloud and cost aggregation provided in the first aspect of the present invention.

本发明的第四方面，提供了一种计算机可读介质，其上存储有计算机程序，其中，所述计算机程序被处理器执行时实现本发明在第一方面提供的基于稀疏点云和代价聚合的三维重建方法。A fourth aspect of the present invention provides a computer-readable medium on which a computer program is stored, wherein, when the computer program is executed by a processor, the sparse point cloud-based and cost aggregation provided in the first aspect of the present invention is implemented 3D reconstruction method.

本发明的有益效果是：The beneficial effects of the present invention are:

1.本发明充分利用了稀疏重建的结果，将稀疏重建得到的稀疏点云作为先验信息融入到代价体中。利用稀疏点云投影得到的稀疏深度图作为场景的几何先验，通过增强稀疏先验附近的深度假设、抑制远离先验深度值的深度假设的方式，提高了深度估计的准确性，尤其是在细小结构和深度不连续处；1. The present invention makes full use of the result of sparse reconstruction, and integrates the sparse point cloud obtained by sparse reconstruction into the cost body as prior information. Using the sparse depth map obtained by sparse point cloud projection as the geometric prior of the scene, by enhancing the depth hypothesis near the sparse prior and suppressing the depth hypothesis far from the prior depth value, the accuracy of depth estimation is improved, especially in the Small structures and depth discontinuities;

2.在代价体正则化阶段引入了自适应代价聚合模块，使得网络模型具有场景感知的能力，以数据驱动的方式自适应地学习偏置，在弱纹理区域和物体边界处获得更准确的深度估计。2. An adaptive cost aggregation module is introduced in the cost body regularization stage, so that the network model has the ability of scene perception, adaptively learns the bias in a data-driven manner, and obtains more accurate depth in weak texture areas and object boundaries estimate.

附图说明Description of drawings

图1为本发明的一些实施例中的基于稀疏点云和代价聚合的三维重建方法的基本流程示意图；FIG. 1 is a schematic flowchart of a basic flow of a 3D reconstruction method based on sparse point cloud and cost aggregation in some embodiments of the present invention;

图2为本发明的一些实施例中的保持电子地平线的路网数据范围最小化的方法的具体流程示意图；FIG. 2 is a specific flowchart of a method for keeping the range of road network data of the electronic horizon minimized in some embodiments of the present invention;

图3为本发明的一些实施例中的用于图像特征提取的卷积神经网络的结构示意图；3 is a schematic structural diagram of a convolutional neural network for image feature extraction in some embodiments of the present invention;

图4为本发明的一些实施例中的稀疏点云数据预处理示意图；4 is a schematic diagram of sparse point cloud data preprocessing in some embodiments of the present invention;

图5为本发明的一些实施例中的高斯调制函数的工作原理示意图；5 is a schematic diagram of the working principle of a Gaussian modulation function in some embodiments of the present invention;

图6为本发明的一些实施例中的用于自适应代价聚合的3DUNet网络结构示意图；6 is a schematic diagram of a 3DUNet network structure for adaptive cost aggregation in some embodiments of the present invention;

图7为本发明的一些实施例中的基于稀疏点云和代价聚合的三维重建系统的结构示意图；7 is a schematic structural diagram of a 3D reconstruction system based on sparse point cloud and cost aggregation in some embodiments of the present invention;

图8为本发明的一些实施例中的电子设备的结构示意图。FIG. 8 is a schematic structural diagram of an electronic device in some embodiments of the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的原理和特征进行描述，所举实例只用于解释本发明，并非用于限定本发明的范围。The principles and features of the present invention will be described below with reference to the accompanying drawings. The examples are only used to explain the present invention, but not to limit the scope of the present invention.

参考图1与图2，在本发明的第一方面，提供了一种基于稀疏点云和代价聚合的三维重建方法，包括：S100.获取多视角图像及其对应的多个稀疏点云，并对所述多个稀疏点云进行预处理，得到多个视角下的深度图；S200. 对所述多视角图像进行特征提取并构建一个或多个代价体，利用所述多个稀疏点云对每个代价体进行调制和正则化，得到多个概率体；S300.从每个概率体中恢复出深度图，并将其与过滤后的多个视角下的深度图融合，得到重建后的点云模型。Referring to FIG. 1 and FIG. 2, in a first aspect of the present invention, a 3D reconstruction method based on sparse point cloud and cost aggregation is provided, including: S100. Acquire a multi-view image and a plurality of corresponding sparse point clouds, and Preprocess the multiple sparse point clouds to obtain depth maps from multiple perspectives; S200. Perform feature extraction on the multi-view image and construct one or more cost volumes, and use the multiple sparse point clouds to pair Each cost volume is modulated and regularized to obtain multiple probability volumes; S300. Recover the depth map from each probability volume, and fuse it with the filtered depth maps from multiple perspectives to obtain reconstructed points Cloud model.

可以理解，深度学习的方法取得了较大的进展，但尚未充分利用稀疏重建的结果，只利用了相机位姿信息，忽略或没有充分利用稀疏点云信息。上述方法则克服了这一问题，充分利用了稀疏点云信息，从而提高了三维重建模型的准确性。多视角图像是指从同一场景的不同拍摄角度(视角)获取的多张图像，即所述多张图像源于同一场景或图像。It is understandable that deep learning methods have made great progress, but have not yet fully utilized the results of sparse reconstruction, only using camera pose information, ignoring or not fully utilizing sparse point cloud information. The above method overcomes this problem and makes full use of sparse point cloud information, thereby improving the accuracy of the 3D reconstruction model. Multi-perspective images refer to multiple images obtained from different shooting angles (perspectives) of the same scene, that is, the multiple images originate from the same scene or image.

在本发明的一些实施例的步骤S200中，所述对所述多视角图像进行特征提取并构建一个或多个代价体包括：S201.利用卷积神经网络分别对所述多视角图像的每张图像进行特征提取，得到多个特征图；S202.选取所述多个特征图中的一个特征图作为参考特征图，余下的多个特征图作为源特征图，并计算每个源特征图在参考特征图上的特征体，得到多个视图的特征体； S203.将所述多个视图的特征体进行聚合为代价体。In step S200 of some embodiments of the present invention, the performing feature extraction on the multi-view image and constructing one or more cost volumes includes: S201. Using a convolutional neural network to perform feature extraction on each of the multi-view images respectively Perform feature extraction on the image to obtain multiple feature maps; S202. Select one feature map from the multiple feature maps as a reference feature map, and the remaining multiple feature maps as source feature maps, and calculate each source feature map in the reference From the feature volumes on the feature map, the feature volumes of multiple views are obtained; S203. Aggregate the feature volumes of the multiple views into a cost volume.

具体地，在步骤S201中，利用图3所示的卷积神经网络为输入图像

进行特征提取，得到N幅图像对应的特征图

卷积神经网络中包含8 个卷积层，除了最后一层卷积外，每个卷积后面都有BN层和ReLU激活函数。图像特征提取模块实现3×H×W到

的映射，其中H和W分别为输入图像的高和宽，C为特征图的通道维度。提取的特征图用于后续的代价体构建。Specifically, in step S201, the convolutional neural network shown in FIG. 3 is used as the input image

Perform feature extraction to obtain feature maps corresponding to N images

The convolutional neural network contains 8 convolutional layers. Except for the last layer of convolution, each convolution is followed by a BN layer and a ReLU activation function. The image feature extraction module realizes 3×H×W to

, where H and W are the height and width of the input image, respectively, and C is the channel dimension of the feature map. The extracted feature maps are used for subsequent cost volume construction.

接着，在步骤S202中，F₁为需要进行深度估计的参考图像提取的特征，

为与参考图像进行匹配的源图像的特征图，

为稀疏重建得到的各个视图对应的相机内参矩阵、旋转矩阵和平移向量。以参考图像特征 F₁为基准，参考图像特征上的像素p，基于深度假设d_j，在源图像特征F_i上的对应像素点p_i,j的计算公式为：Next, in step S202, _F1 is the feature extracted from the reference image that needs depth estimation,

is the feature map of the source image matched with the reference image,

The camera intrinsic parameter matrix, rotation matrix and translation vector corresponding to each view obtained for sparse reconstruction. Taking the reference image feature F ₁ as the benchmark, the pixel p on the reference image feature, based on the depth hypothesis d _j , the calculation formula of the corresponding pixel p _i,j on the source image feature F _i is:

式中d_j为第j个深度假设，其中j∈{1,2,…,N_d}，N_d为深度假设数，下标ref表示参考图像特征。每幅图像的特征图通过投影变换得到对应的特征体

为了灵活地处理任意数量地输入视图，本发明利用基于方差的度量指标，将多视图的特征体

聚合为代价体C。where d _j is the jth depth hypothesis, where j∈{1,2,…,N _d }, N _d is the number of depth hypotheses, and the subscript ref represents the reference image feature. The feature map of each image is obtained by projective transformation to obtain the corresponding feature volume

In order to flexibly handle any number of input views, the present invention utilizes variance-based metrics to combine the feature volumes of multiple views

Aggregate as cost body C.

式中

为所有特征体的均值，M为逐元素的方差计算，v_i表示第i个特征体，N表示特征体的总数。in the formula

is the mean of all feature volumes, M is the element-wise variance calculation, vi represents the _i -th feature volume, and N represents the total number of feature volumes.

在本发明的一些实施例的步骤S200中，所述利用所述多个稀疏点云对每个代价体进行调制和正则化，得到多个概率体包括：S204.基于多个视角下的深度图，构建高斯调制函数；S205.根据所述高斯调制函数对每个代价体进行调制；S206.利用3D分割网络对每个代价体进行正则化，得到过滤后的概率体。In step S200 of some embodiments of the present invention, the modulating and regularizing each cost volume by using the multiple sparse point clouds to obtain multiple probability volumes includes: S204. Depth maps based on multiple viewing angles , construct a Gaussian modulation function; S205. Modulate each cost volume according to the Gaussian modulation function; S206. Use a 3D segmentation network to regularize each cost volume to obtain a filtered probability volume.

参考图5，示意性地，步骤S204包括：以稀疏深度值d′为中心，深度假设d为自变量，构建高斯调制函数。高斯调制函数用于代价体的特征增强。具体地，增强d＝d′附近深度假设值的响应，抑制远离d′处的深度假设值。高斯调制函数：Referring to FIG. 5 , schematically, step S204 includes: constructing a Gaussian modulation function with the sparse depth value d′ as the center and the depth assumption d as an independent variable. The Gaussian modulation function is used for feature enhancement of the cost volume. Specifically, the response of the depth hypotheses near d=d' is enhanced, and the depth hypotheses far from d' are suppressed. Gaussian modulation function:

式中，d为深度假设值；d′为稀疏深度值；k为高斯函数的幅值，c为带宽。考虑到采用方差的指标，代价越小意味着该深度假设的置信度越高，因此将上式改写为：In the formula, d is the depth hypothesis value; d' is the sparse depth value; k is the magnitude of the Gaussian function, and c is the bandwidth. Considering the index of variance, the smaller the cost, the higher the confidence of the depth hypothesis, so the above formula is rewritten as:

式中Ω_sparse为存在先验深度的像素集合。where Ω _sparse is the set of pixels with a priori depth.

参考图6，示意性地，步骤S205包括：利用3DU－Net的网络来进行代价体的正则化操作，得到过滤后的概率体。本发明在代价体正则化步骤中引入自适应代价聚合模块，自适应代价聚合模块可以自适应地学习偏置，使得在深度不连续处取得更准确的重建精度。可选的，利用U－Net系列的变体或Segnet等3D图像分割网络实现代价体的正则化。Referring to FIG. 6 , schematically, step S205 includes: using a 3DU-Net network to perform a regularization operation on the cost volume to obtain a filtered probability volume. The present invention introduces an adaptive cost aggregation module in the cost body regularization step, and the adaptive cost aggregation module can adaptively learn the offset, so that more accurate reconstruction accuracy can be obtained at the depth discontinuity. Optionally, use variants of the U-Net series or 3D image segmentation networks such as Segnet to regularize the cost volume.

具体地，步骤S206中，概率体归一化，利用3D卷积将C通道的代价体转化为单通道的概率体，实现

到

的映射，沿深度方向对概率体进行soft－max归一化操作。Specifically, in step S206, the probability volume is normalized, and 3D convolution is used to convert the cost volume of the C channel into the probability volume of the single channel, so as to realize

arrive

The mapping of , performs soft-max normalization on the probability volume along the depth direction.

在步骤S206中，所述正则化通过如下方法实现：In step S206, the regularization is implemented by the following method:

式中C(v₀)为每个代价体中体素v₀的代价，

可以理解，体素化(Voxelization)是将物体的几何形式表示转换成最接近该物体的体素表示形式，产生体数据集，其不仅包含模型的表面信息，而且能描述模型的内部属性。体素就相当于图像中的像素，可以理解为三维物体中的像素，这就使得三维空间中的物体可以基于规则的空间体素进行表示。It can be understood that voxelization is to convert the geometric representation of an object into a voxel representation closest to the object to generate a volume data set, which not only contains the surface information of the model, but also can describe the internal properties of the model. Voxels are equivalent to pixels in an image, which can be understood as pixels in three-dimensional objects, which enables objects in three-dimensional space to be represented based on regular space voxels.

体素的表示方法通常包括SDF和TSDF：SDF(signed distance field) 即有效距离场。也就是通过给每个体素赋予SDF来模拟物体表面。如果SDF 值大于0，表示该体素在当前表面前，如果SDF值小于0，则表示体素在表面后，SDF越接近0，表明它越接近场景真实表面。不过这种表示方法会占用大量的资源，因此有人提出了TSDF；The representation methods of voxels usually include SDF and TSDF: SDF (signed distance field) is the effective distance field. That is, the surface of the object is simulated by assigning an SDF to each voxel. If the SDF value is greater than 0, it means that the voxel is in front of the current surface. If the SDF value is less than 0, it means that the voxel is behind the surface. The closer the SDF is to 0, the closer it is to the real surface of the scene. However, this representation method will take up a lot of resources, so someone proposed TSDF;

TSDF是为了降低体素表示方法的资源消耗而提出的。TSDF采用栅格立方体代表三维空间，每个栅格中存放的是其到物体表面的距离，TSDF值的正负分别表示被表面遮挡与可见，而表面上的点则经过零点。TSDF is proposed to reduce the resource consumption of voxel representation methods. TSDF uses grid cubes to represent the three-dimensional space. Each grid stores the distance to the surface of the object. The positive and negative values of the TSDF value indicate that they are occluded and visible by the surface, while the points on the surface pass through the zero point.

在在本发明的一些实施例的步骤S300中，所述从每个概率体中恢复出深度图，并将其与过滤后的多个视角下的深度图融合，得到重建后的点云模型包括如下步骤：In step S300 of some embodiments of the present invention, the depth map is recovered from each probability volume, and the depth map is fused with the filtered depth maps from multiple viewing angles to obtain a reconstructed point cloud model including: Follow the steps below:

S301.深度图回归：为了实现亚像素级别的深度估计精度，本发明使用所有深度假设的加权平均作为最后的深度输出(soft－argmin操作)，每一项的权重为该假设对应的概率值。逐像素的深度估计由下式计算得到：S301. Depth map regression: In order to achieve sub-pixel-level depth estimation accuracy, the present invention uses the weighted average of all depth hypotheses as the final depth output (soft-argmin operation), and the weight of each item is the probability value corresponding to the hypothesis. The pixel-wise depth estimate is calculated by:

其中P(d)为在深度假设d对应的概率值；where P(d) is the probability value corresponding to the depth hypothesis d;

S302.光度置信度图的计算：光度置信度图用于衡量多视图光度一致性匹配质量，本发明将离深度假设值最近的4个假设值概率求和得到光度置信度。光度置信度图用于上述实施例中的深度图过滤；S302. Calculation of photometric confidence map: The photometric confidence map is used to measure the quality of multi-view photometric consistency matching. The present invention sums the probabilities of the four hypothetical values closest to the depth hypothesized value to obtain the photometric confidence degree. The photometric confidence map is used for the depth map filtering in the above embodiment;

S303.深度图过滤：深度图过滤使用光度一致性和几何一致性进行鲁棒的深度图过滤。本发明使用概率图来衡量深度估计的质量，将概率值低于τ₀的点视为外点进行滤除。几何一致性用于衡量多视图之间的深度连续性。将参考图像像素点p₁处的深度值d₁投影至邻域视图的p_i点处，然后将p_i处的深度值d_i重投影至参考图像p_reproj，其深度值d_reproj，如果满足|p_reproj-p₁|< τ₁并且|d_reproj-d₁|/d₁<τ₂，则称像素p₁处的深度估计值d₁是两视图连续的。本发明中为保证深度估计的跨视图的连续性，至少满足n_τ幅视图连续的深度估计才会被保留下来。将过滤后的像素点反向投影至三维空间中，获得稠密三维点云模型。S303. Depth map filtering: Depth map filtering uses photometric consistency and geometric consistency for robust depth map filtering. The present invention uses a probability map to measure the quality of depth estimation, and filters out points with probability values lower than τ ₀ as outliers. Geometric consistency is used to measure depth continuity between multiple views. Project the depth value d ₁ at the pixel point p ₁ of the reference image to the point pi of the neighborhood view, and then _{reproject the depth value d i at the reference image p reproj to the reference image preproj , and its depth value d reproj} _, _if _it _satisfies |p _reproj -p ₁ |<τ ₁ and |d _reproj -d ₁ |/d ₁ <τ ₂ , then the depth estimation value d ₁ at the pixel p ₁ is said to be continuous between the two views. In the present invention, in order to ensure the continuity of depth estimation across views, only depth estimations that satisfy the continuity of at least n _τ views will be retained. The filtered pixels are back projected into the 3D space to obtain a dense 3D point cloud model.

在上述的实施例的步骤S100中，所述对所述多个稀疏点云进行预处理，得到多个视角下的深度图包括：S101.获取每一视图下所有关键点对应的三维点，过滤不可见的三维点；S102.所述过滤后的多个三维点，根据当前视图的相机外参，通过投影变化和坐标变换，得到每个三维点在其图像坐标系中的深度值。In step S100 of the above-mentioned embodiment, the preprocessing of the multiple sparse point clouds to obtain depth maps from multiple viewing angles includes: S101. Obtaining 3D points corresponding to all key points in each view, filtering Invisible three-dimensional points; S102. For the filtered multiple three-dimensional points, the depth value of each three-dimensional point in its image coordinate system is obtained through projection change and coordinate transformation according to the camera external parameters of the current view.

示意性地，步骤S101可参考图4：获取每一视图下所有关键点对应的三维点，过滤不可见的三维点，将该视图可见的三维点记为P_world。将世界坐标系中的三维点根据当前视图的相机外参，变换到相机坐标系下，得到相机坐标系下的点P_cam：Schematically, step S101 can refer to FIG. 4 : acquire 3D points corresponding to all key points in each view, filter the invisible 3D points, and record the visible 3D points in the view as P _world . Transform the three-dimensional point in the world coordinate system into the camera coordinate system according to the camera external parameters of the current view, and obtain the point P _cam in the camera coordinate system:

P_cam＝R·P_world+t，P _cam =R·P _world +t,

式中R为旋转矩阵，t为平移向量。根据投影关系，得到三维点在图像坐标系下的投影点及其深度值where R is the rotation matrix and t is the translation vector. According to the projection relationship, the projection point and its depth value of the 3D point in the image coordinate system are obtained

d(u,v,1)^T＝K·P_cam，d(u,v,1) ^T =K·P _cam ,

式中(u,v,1)^T为图像像素点坐标的齐次坐标，d为像素点的深度值。K为相机内参。where (u, v, 1) ^T is the homogeneous coordinate of the pixel coordinates of the image, and d is the depth value of the pixel. K is the camera internal parameter.

应理解，在实施本发明方法之前，需要代价体构建、代价体调制、深度图过滤和点云融合过程中的参数进行设定。在实际应用中，不同深度假设数目的代价体回归出的深度图存在差异；不同幅值和带宽对应的高斯调制函数的约束作用不同，不同的深度图融合参数会得到不同视觉效果的三维点云模型。本发明的各个参数：深度假设平面数N_d＝192，k＝10，c＝2Δd，Δd＝d_j+1-d_j，τ₀＝0.8，τ₁＝1，τ₂＝0.01，n_τ＝3。It should be understood that, before implementing the method of the present invention, parameters in the process of cost volume construction, cost volume modulation, depth map filtering and point cloud fusion need to be set. In practical applications, there are differences in the depth maps regressed by cost volumes with different numbers of depth assumptions; the Gaussian modulation functions corresponding to different amplitudes and bandwidths have different constraints, and different depth map fusion parameters will obtain 3D point clouds with different visual effects. Model. Various parameters of the present invention: Depth assumption plane number N _d = 192, k = 10, c = 2Δd, Δd = d _j+1 -d _j , τ ₀ =0.8, τ ₁ =1, τ ₂ =0.01, n _τ =3.

实施例2Example 2

参考图7，本发明的第二方面，提供了一种基于稀疏点云和代价聚合的三维重建系统1，包括：获取模块11，用于获取多视角图像及其对应的多个稀疏点云，并对所述多个稀疏点云进行预处理，得到多个视角下的深度图；构建模块12，用于对所述多视角图像进行特征提取并构建一个或多个代价体，利用所述多个稀疏点云对每个代价体进行调制和正则化，得到多个概率体；重建模块13，用于从每个概率体中恢复出深度图，并将其与过滤后的多个视角下的深度图融合，得到重建后的点云模型。Referring to FIG. 7 , a second aspect of the present invention provides a 3D reconstruction system 1 based on sparse point cloud and cost aggregation, comprising: an acquisition module 11 for acquiring a multi-view image and a plurality of corresponding sparse point clouds, and preprocess the multiple sparse point clouds to obtain depth maps from multiple perspectives; the building module 12 is used to perform feature extraction on the multi-view images and construct one or more cost volumes, and use the multiple perspective images to perform feature extraction. A sparse point cloud modulates and regularizes each cost volume to obtain multiple probability volumes; the reconstruction module 13 is used to recover the depth map from each probability volume and compare it with the filtered images from multiple viewpoints. The depth map is fused to obtain the reconstructed point cloud model.

进一步的，所述构建模块12包括：提取单元，用于利用卷积神经网络分别对所述多视角图像的每张图像进行特征提取，得到多个特征图；计算单元，用于选取所述多个特征图中的一个特征图作为参考特征图，余下的多个特征图作为源特征图，并计算每个源特征图在参考特征图上的特征体，得到多个视图的特征体；聚合单元，用于将所述多个视图的特征体进行聚合为代价体。Further, the building module 12 includes: an extraction unit for extracting features from each image of the multi-view image by using a convolutional neural network to obtain a plurality of feature maps; a calculation unit for selecting the multi-view images. One feature map in each feature map is used as a reference feature map, and the remaining multiple feature maps are used as source feature maps, and the feature volume of each source feature map on the reference feature map is calculated to obtain feature volumes of multiple views; aggregation unit , which is used to aggregate the feature volumes of the multiple views into a cost volume.

实施例3Example 3

参考图8，本发明的第三方面，提供了一种电子设备，包括：一个或多个处理器；存储装置，用于存储一个或多个程序，当所述一个或多个程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现本发明在第一方面的方法。8, a third aspect of the present invention provides an electronic device, comprising: one or more processors; a storage device for storing one or more programs, when the one or more programs are described One or more processors execute such that the one or more processors implement the method of the first aspect of the invention.

电子设备500可以包括处理装置(例如中央处理器、图形处理器等)501，其可以根据存储在只读存储器(ROM)502中的程序或者从存储装置508加载到随机访问存储器(RAM)503中的程序而执行各种适当的动作和处理。在RAM 503中，还存储有电子设备500操作所需的各种程序和数据。处理装置501、 ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。Electronic device 500 may include processing means (eg, central processing unit, graphics processor, etc.) 501 that may be loaded into random access memory (RAM) 503 according to a program stored in read only memory (ROM) 502 or from storage means 508 program to perform various appropriate actions and processes. In the RAM 503, various programs and data necessary for the operation of the electronic device 500 are also stored. The processing device 501 , the ROM 502 , and the RAM 503 are connected to each other through a bus 504 . An input/output (I/O) interface 505 is also connected to bus 504 .

通常以下装置可以连接至I/O接口505：包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置506；包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置507；包括例如硬盘等的存储装置508；以及通信装置509。通信装置509可以允许电子设备500与其他设备进行无线或有线通信以交换数据。虽然图8示出了具有各种装置的电子设备500，但是应理解的是，并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。图8中示出的每个方框可以代表一个装置，也可以根据需要代表多个装置。Typically the following devices can be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibrators output device 507 , etc.; storage device 508 including, for example, a hard disk; and communication device 509 . Communication means 509 may allow electronic device 500 to communicate wirelessly or by wire with other devices to exchange data. Although FIG. 8 shows electronic device 500 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in FIG. 8 can represent one device, and can also represent multiple devices as required.

特别地，根据本公开的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本公开的实施例包括一种计算机程序产品，其包括承载在计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信装置509从网络上被下载和安装，或者从存储装置508被安装，或者从ROM 502被安装。在该计算机程序被处理装置501执行时，执行本公开的实施例的方法中限定的上述功能。需要说明的是，本公开的实施例所描述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD－ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开的实施例中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开的实施例中，计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：电线、光缆、RF(射频)等等，或者上述的任意合适的组合。In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication device 509 , or from the storage device 508 , or from the ROM 502 . When the computer program is executed by the processing device 501, the above-described functions defined in the methods of the embodiments of the present disclosure are performed. It should be noted that the computer-readable medium described in the embodiments of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the above. In embodiments of the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. Rather, in embodiments of the present disclosure, a computer-readable signal medium may include a data signal in baseband or propagated as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.

上述计算机可读介质可以是上述电子设备中所包含的；也可以是单独存在，而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个计算机程序，当上述一个或者多个程序被该电子设备执行时，使得该电子设备：The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device. The above-mentioned computer-readable medium carries one or more computer programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device:

可以以一种或多种程序设计语言或其组合来编写用于执行本公开的实施例的操作的计算机程序代码，程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++、Python，还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out operations of embodiments of the present disclosure may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, Python, or a combination thereof, Also included are conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. Where a remote computer is involved, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider to via Internet connection).

附图中的流程图和框图，图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

以上所述仅为本发明的较佳实施例，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection of the present invention. within the range.

Claims

1. A three-dimensional reconstruction method based on sparse point cloud and cost aggregation is characterized by comprising the following steps:

acquiring a multi-view image and a plurality of corresponding sparse point clouds, and preprocessing the sparse point clouds to obtain depth maps under a plurality of views;

extracting features of the multi-view image, constructing one or more cost bodies, and modulating and regularizing each cost body by using the plurality of sparse point clouds to obtain a plurality of probability bodies;

and restoring a depth map from each probability body, and fusing the depth map with the filtered depth maps under multiple viewing angles to obtain a reconstructed point cloud model.

2. The sparse point cloud and cost aggregation-based three-dimensional reconstruction method of claim 1, wherein the feature extraction of the multi-view image and the construction of one or more cost bodies comprises:

respectively extracting the features of each image of the multi-view images by using a convolutional neural network to obtain a plurality of feature maps;

selecting one of the feature maps as a reference feature map, using the rest feature maps as source feature maps, and calculating feature bodies of each source feature map on the reference feature map to obtain feature bodies of multiple views;

and aggregating the characteristic bodies of the plurality of views into a cost body.

3. The sparse point cloud and cost aggregation-based three-dimensional reconstruction method according to claim 2, wherein the aggregation of the feature volumes of the plurality of views into the cost volume is achieved by:

where C represents a cost body, M represents a variance calculation element by element, v_iRepresenting the ith feature, N representing the total number of features,

mean values for all the features are indicated.

4. The sparse point cloud and cost aggregation-based three-dimensional reconstruction method of claim 1, wherein the modulating and regularizing each cost body by using the plurality of sparse point clouds to obtain a plurality of probability bodies comprises:

constructing a Gaussian modulation function based on the depth maps under a plurality of visual angles;

modulating each cost body according to the Gaussian modulation function;

and regularizing each cost body by using a 3D segmentation network to obtain a filtered probability body.

5. The sparse point cloud and cost aggregation-based three-dimensional reconstruction method of claim 4, wherein the regularization is achieved by:

wherein C (v)₀) For each voxel v in the cost volume₀The cost of (a) of (b),

operating on the post-regularization voxel v₀The cost of (d); omega_kIs the weight at the kth sample position, v_kFor a fixed offset in the convolution field, Δ v_kThe bias learned in the process of adaptive cost aggregation.

6. The sparse point cloud and cost aggregation-based three-dimensional reconstruction method according to any one of claims 1 to 5, wherein the preprocessing the plurality of sparse point clouds to obtain depth maps at a plurality of viewing angles comprises:

acquiring three-dimensional points corresponding to all key points under each view, and filtering invisible three-dimensional points;

and the depth value of each three-dimensional point in the image coordinate system is obtained by projection change and coordinate transformation of the filtered three-dimensional points according to the camera external parameters of the current view.

7. A three-dimensional reconstruction system based on sparse point cloud and cost aggregation comprises:

the system comprises an acquisition module, a depth map generation module and a depth estimation module, wherein the acquisition module is used for acquiring a multi-view image and a plurality of corresponding sparse point clouds and preprocessing the sparse point clouds to obtain depth maps under a plurality of views;

the construction module is used for extracting features of the multi-view image, constructing one or more cost bodies, and modulating and regularizing each cost body by using the plurality of sparse point clouds to obtain a plurality of probability bodies;

and the reconstruction module is used for recovering the depth map from each probability body and fusing the depth map with the filtered depth maps under the multiple viewing angles to obtain a reconstructed point cloud model.

8. The sparse point cloud and cost aggregation based three-dimensional reconstruction system of claim 7, wherein the construction module comprises:

the extraction unit is used for respectively extracting the features of each image of the multi-view images by using a convolutional neural network to obtain a plurality of feature maps;

the calculation unit is used for selecting one of the feature maps as a reference feature map, using the rest feature maps as source feature maps, and calculating feature bodies of each source feature map on the reference feature map to obtain feature bodies of a plurality of views;

and the aggregation unit is used for aggregating the characteristic bodies of the multiple views into a cost body.

9. An electronic device, comprising: one or more processors; a storage device to store one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the sparse point cloud and cost aggregation based three-dimensional reconstruction method of any one of claims 1 to 6.

10. A computer readable medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the sparse point cloud and cost aggregation based three-dimensional reconstruction method of any one of claims 1 to 6.