CN106910242A

CN106910242A - The method and system of indoor full scene three-dimensional reconstruction are carried out based on depth camera

Info

Publication number: CN106910242A
Application number: CN201710051366.5A
Authority: CN
Inventors: 李建伟; 高伟; 吴毅红
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2017-01-23
Filing date: 2017-01-23
Publication date: 2017-06-30
Anticipated expiration: 2037-01-23
Also published as: CN106910242B

Abstract

The present invention discloses a kind of method and system that indoor full scene three-dimensional reconstruction is carried out based on consumer level depth camera.Wherein, the method includes that obtaining depth image carries out self adaptation bilateral filtering；Visual odometry estimation is carried out using filtered depth image, view-based access control model content does automatic segmentation to image sequence, closed loop detection is done between section and section, and carry out global optimization；Volume data fusion is weighted according to the camera trace information after optimization, so as to rebuild indoor full scene threedimensional model.The embodiment of the present invention realizes guarantor side, the denoising to depth map by self adaptation bilateral filtering algorithm, view-based access control model content automatic segmentation algorithm can effectively reduce the accumulated error during visual odometry is estimated and improve registration accuracy, weighted body data anastomosing algorithm is additionally used, the geometric detail of body surface can be effectively kept.Thus, the technical problem for how improving reconstruction accuracy under indoor scene is solved such that it is able to obtain indoor scene model that is complete, accurate, becoming more meticulous.

Description

Method and system for 3D reconstruction of indoor complete scene based on depth camera

技术领域technical field

本发明涉及计算机视觉技术领域，具体地，涉及一种基于消费级深度相机进行室内完整场景三维重建的方法及系统。The present invention relates to the technical field of computer vision, in particular to a method and system for three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera.

背景技术Background technique

室内场景高精度三维重建是计算机视觉中具有挑战性的研究课题之一，涉及计算机视觉、计算机图形学、模式识别、最优化等多个领域的理论与技术。实现三维重建有多种途径，传统方法是采用激光、雷达等测距传感器或结构光技术来获取场景或物体表面的结构信息进行三维重建，但这些仪器大多价格昂贵并且不易携带，所以应用场合有限。随着计算机视觉技术的发展，研究者们开始研究使用纯视觉的方法进行三维重建，其中涌现出来了大量有益的研究工作。High-precision 3D reconstruction of indoor scenes is one of the challenging research topics in computer vision, involving theories and technologies in many fields such as computer vision, computer graphics, pattern recognition, and optimization. There are many ways to achieve 3D reconstruction. The traditional method is to use laser, radar and other ranging sensors or structured light technology to obtain the structural information of the scene or object surface for 3D reconstruction. However, most of these instruments are expensive and not easy to carry, so the application occasions are limited. . With the development of computer vision technology, researchers began to study the use of pure vision methods for 3D reconstruction, and a lot of useful research work emerged.

消费级深度摄像机Microsoft Kinect推出后，人们可以直接利用深度数据比较便捷地进行室内场景三维重建。Newcombe等人提出的KinectFusion算法利用Kinect来获取图像中各点的深度信息，通过迭代近似最近邻点(Iterative Closest Point,ICP)算法将三维点在当前帧相机坐标系下的坐标与在全局模型中的坐标进行对齐来估计当前帧相机的姿态，再通过曲面隐函数(Truncated Signed Distance Function,TSDF)迭代进行体数据融合，得到稠密的三维模型。虽然Kinect获取深度不受光照条件和纹理丰富程度的影响，但其深度数据范围只有0.5-4m，而且网格模型的位置和大小是固定的，所以只适用于局部、静态的室内场景。After the launch of the consumer-grade depth camera Microsoft Kinect, people can directly use the depth data to perform 3D reconstruction of indoor scenes more conveniently. The KinectFusion algorithm proposed by Newcombe et al. uses Kinect to obtain the depth information of each point in the image, and uses the Iterative Closest Point (ICP) algorithm to compare the coordinates of the 3D points in the current frame camera coordinate system with those in the global model. Align the coordinates of the current frame to estimate the pose of the camera in the current frame, and then iteratively perform volume data fusion through the surface implicit function (Truncated Signed Distance Function, TSDF) to obtain a dense 3D model. Although the depth acquired by Kinect is not affected by lighting conditions and texture richness, its depth data range is only 0.5-4m, and the position and size of the grid model are fixed, so it is only suitable for local and static indoor scenes.

基于消费级深度相机进行室内场景三维重建，一般存在以下几个问题：(1)消费级深度相机获取的深度图像分辨率小、噪声大使得物体表面细节难以保持，而且深度值范围有限不能直接用于完整场景三维重建；(2)相机姿态估计产生的累积误差会造成错误、扭曲的三维模型；(3)消费级深度相机一般都是手持式拍摄，相机的运动状态比较随意，获取的数据质量有好有坏，影响重建效果。The 3D reconstruction of indoor scenes based on consumer-grade depth cameras generally has the following problems: (1) The resolution of depth images obtained by consumer-grade depth cameras is small and the noise is large, making it difficult to maintain the surface details of objects, and the range of depth values is limited and cannot be used directly. 3D reconstruction of complete scenes; (2) The cumulative error generated by camera pose estimation will cause errors and distorted 3D models; (3) Consumer-grade depth cameras are generally hand-held for shooting, and the movement state of the camera is relatively random. There are good and bad, which affect the reconstruction effect.

为了进行完整的室内场景三维重建，Whelan等人提出了Kintinuous算法，其是对KinectFusion的进一步扩展。该算法使用ShiftingTSDFVolume循环利用显存的方式解决大场景重建时网格模型显存消耗的问题，并通过DBoW寻找匹配的关键帧进行闭环检测，最后对位姿图和模型做优化，从而得到大场景三维模型。Choi等人提出了Elastic Fragment思想，先将RGBD数据流每隔50帧做分段，对每段单独做视觉里程计估计，从两两段间的点云数据中提取几何描述子FPFH寻找匹配进行闭环检测，再引入line processes约束对检测结果进行优化、去除错误的闭环，最后利用优化后的里程计信息进行体数据融合。通过分段处理和闭环检测实现了室内完整场景重建，但是没有考虑保留物体的局部几何细节，而且这种固定分段的方法在进行真实室内场景重建时并不鲁棒。Zeng等人提出了3D Match描述子概念，该算法先将RGBD数据流进行固定分段处理并重建得到局部模型，从每个分段的3D模型上提取关键点作为3D卷积网络(ConvNet)的输入，用该网络学习得到的特征向量作为另一矩阵网络(Metric network)的输入，通过相似度比较输出匹配结果。由于深度网络具有非常明显的特征学习优势，相对其他描述子用3D Match来做几何配准可以提高重建精度。但这种方法需要先进行局部三维重建，利用深度学习网络来做几何配准，再输出全局三维模型，而且网络训练需要大量的数据，整个重建流程效率较低。In order to perform a complete 3D reconstruction of indoor scenes, Whelan et al. proposed the Kintinuous algorithm, which is a further extension of KinectFusion. The algorithm uses ShiftingTSDFVolume to recycle video memory to solve the problem of grid model memory consumption during large scene reconstruction, and uses DBoW to find matching key frames for closed-loop detection, and finally optimizes the pose graph and model to obtain a 3D model of the large scene . Choi et al. proposed the idea of Elastic Fragment. First, the RGBD data stream is segmented every 50 frames, and the visual odometry is estimated for each segment separately, and the geometric descriptor FPFH is extracted from the point cloud data between two segments to find a match. Closed-loop detection, and then introduce line processes constraints to optimize the detection results, remove wrong closed-loops, and finally use the optimized odometer information for volume data fusion. Indoor complete scene reconstruction is achieved through segmentation processing and loop closure detection, but the local geometric details of objects are not considered, and this method of fixed segmentation is not robust for real indoor scene reconstruction. Zeng et al. proposed the concept of 3D Match descriptors. This algorithm first processes the RGBD data stream into fixed segments and reconstructs the local model, and extracts key points from the 3D model of each segment as the 3D convolutional network (ConvNet). Input, use the feature vector learned by the network as the input of another matrix network (Metric network), and output the matching result through similarity comparison. Since the deep network has a very obvious feature learning advantage, using 3D Match for geometric registration can improve the reconstruction accuracy compared to other descriptors. However, this method needs to perform local 3D reconstruction first, use the deep learning network to do geometric registration, and then output the global 3D model, and network training requires a large amount of data, so the efficiency of the entire reconstruction process is low.

在提高三维重建精度方面，Angela等人提出了VSBR算法，其主要思想是利用明暗恢复形状(Shape from Shading，SFS)技术对TSDF数据进行分层优化后再进行融合，以解决TSDF数据融合时过度平滑导致物体表面细节丢失的问题，从而得到较为精细的三维结构模型。但这种方法只对理想光源下的单体重建比较有效，室内场景由于光源变化较大精度提升不明显。In terms of improving the accuracy of 3D reconstruction, Angela et al. proposed the VSBR algorithm. The main idea is to use the Shape from Shading (SFS) technology to optimize the TSDF data layered and then fuse them to solve the problem of excessive TSDF data fusion. Smoothing leads to the loss of details on the surface of objects, so that a more refined three-dimensional structure model can be obtained. However, this method is only effective for single-body reconstruction under ideal light sources, and the accuracy of indoor scenes is not significantly improved due to large changes in light sources.

有鉴于此，特提出本发明。In view of this, the present invention is proposed.

发明内容Contents of the invention

为了解决现有技术中的上述问题，即为了解决如何提高室内场景下三维重建精度的技术问题，提供一种基于消费级深度相机进行室内完整场景三维重建的方法及系统。In order to solve the above problems in the prior art, that is, to solve the technical problem of how to improve the accuracy of 3D reconstruction in indoor scenes, a method and system for 3D reconstruction of complete indoor scenes based on consumer-grade depth cameras are provided.

为了实现上述目的，一方面，提供以下技术方案：In order to achieve the above purpose, on the one hand, the following technical solutions are provided:

一种基于消费级深度相机进行室内完整场景三维重建的方法，该方法可以包括：A method for 3D reconstruction of a complete indoor scene based on a consumer-grade depth camera, the method may include:

获取深度图像；Get the depth image;

对所述深度图像进行自适应双边滤波；performing adaptive bilateral filtering on the depth image;

对滤波后的深度图像进行基于视觉内容的分块融合和配准处理；Perform block fusion and registration processing based on visual content on the filtered depth image;

根据处理结果，进行加权体数据融合，从而重建室内完整场景三维模型。According to the processing results, the weighted volume data fusion is carried out to reconstruct the 3D model of the complete indoor scene.

优选地，所述对所述深度图像进行自适应双边滤波具体包括：Preferably, the performing adaptive bilateral filtering on the depth image specifically includes:

根据下式进行自适应双边滤波：Adaptive bilateral filtering is performed according to the following formula:

其中，所述u和所述u_k分别表示所述深度图像上的任一像素及其领域像素；所述Z(u)和所述Z(u_k)分别表示对应所述u和所述u_k的深度值；所述表示滤波后对应的深度值；所述W表示在领域上的归一化因子；所述w_s和所述w_c分别表示在空间域和值域滤波的高斯核函数。Wherein, the u and the u _k respectively represent any pixel on the depth image and its domain pixels; the Z(u) and the Z(u _k ) respectively represent the pixels corresponding to the u and the u The depth value of _k ; the Indicates the corresponding depth value after filtering; the W indicates that in the field The normalization factor on ; the w _s and the w _c respectively represent the Gaussian kernel function filtered in the spatial domain and the value domain.

优选地，所述在空间域和值域滤波的高斯核函数根据下式来确定：Preferably, the Gaussian kernel function filtered in the spatial domain and the value domain is determined according to the following formula:

其中，所述δ_s和所述δ_c分别是空间域和值域高斯核函数的方差；Wherein, described δ _s and described δ _c are respectively the variance of space domain and range Gaussian kernel function;

其中，所述δ_s和所述δ_c根据下式来确定：Wherein, described δ _s and described δ _c are determined according to the following formula:

其中，所述f表示所述深度相机的焦距，所述K_s和所述K_c表示常数。Wherein, the f represents the focal length of the depth camera, and the K _s and the K _c represent constants.

优选地，所述对滤波后的深度图像进行基于视觉内容的分块融合和配准处理具体包括：基于视觉内容对深度图像序列进行分段，并对每一分段进行分块融合，且所述分段间进行闭环检测，对闭环检测的结果做全局优化。Preferably, performing block fusion and registration processing on the filtered depth image based on visual content specifically includes: segmenting the depth image sequence based on visual content, and performing block fusion on each segment, and the Perform closed-loop detection between the above segments, and perform global optimization on the results of closed-loop detection.

优选地，所述基于视觉内容对深度图像序列进行分段，并对每一分段进行分块融合，且所述分段间进行闭环检测，对闭环检测的结果做全局优化具体包括：Preferably, the step of segmenting the depth image sequence based on the visual content, performing block fusion for each segment, and performing closed-loop detection between the segments, and performing global optimization on the result of the closed-loop detection specifically includes:

基于视觉内容检测自动分段方法对深度图像序列进行分段，将相似的深度图像内容分在一个分段中，并对每一分段进行分块融合，确定所述深度图像之间的变换关系，并根据所述变换关系在段与段之间做闭环检测，以实现全局优化。Segment the depth image sequence based on the automatic segmentation method of visual content detection, divide similar depth image content into one segment, and perform block fusion for each segment, and determine the transformation relationship between the depth images , and perform closed-loop detection between segments according to the transformation relationship, so as to achieve global optimization.

优选地，所述基于视觉内容检测自动分段方法对深度图像序列进行分段，将相似的深度图像内容分在一个分段中，并对每一分段进行分块融合，确定所述深度图像之间的变换关系，并根据所述变换关系在段与段之间做闭环检测，以实现全局优化，具体包括：Preferably, the automatic segmentation method based on visual content detection segments the depth image sequence, divides similar depth image content into one segment, and performs block fusion for each segment to determine the depth image The transformation relationship between them, and perform closed-loop detection between segments according to the transformation relationship to achieve global optimization, specifically including:

采用Kintinuous框架，进行视觉里程计估计，得到每帧深度图像下的相机位姿信息；Use the Kintinuous framework to estimate the visual odometry and obtain the camera pose information under each frame of the depth image;

根据所述相机位姿信息，将由所述每帧深度图像对应的点云数据反投影到初始坐标系下，用投影后得到的深度图像与初始帧的深度图像进行相似度比较，并当相似度低于相似度阈值时，初始化相机位姿，进行分段；According to the camera pose information, back-project the point cloud data corresponding to the depth image of each frame into the initial coordinate system, compare the similarity between the depth image obtained after projection and the depth image of the initial frame, and calculate the similarity When it is lower than the similarity threshold, initialize the camera pose and perform segmentation;

提取每一分段点云数据中的PFFH几何描述子，并在每两段之间进行粗配准，以及采用GICP算法进行精配准，得到段与段之间的匹配关系；Extract the PFFH geometric descriptor in each segment point cloud data, and perform rough registration between each two segments, and use the GICP algorithm for fine registration to obtain the matching relationship between segments;

利用每一分段的位姿信息以及所述段与段之间的匹配关系，构建图并采用G2O框架进行图优化，得到优化后的相机轨迹信息，从而实现所述全局优化。Using the pose information of each segment and the matching relationship between the segments, construct a graph and use the G2O framework to optimize the graph to obtain optimized camera trajectory information, thereby realizing the global optimization.

优选地，所述根据所述相机位姿信息，将由所述每帧深度图像对应的点云数据反投影到初始坐标系下，用投影后得到的深度图像与初始帧的深度图像进行相似度比较，并当相似度低于相似度阈值时，初始化相机位姿，进行分段，具体包括：Preferably, according to the camera pose information, the point cloud data corresponding to each frame of depth image is back-projected into the initial coordinate system, and the similarity is compared between the depth image obtained after projection and the depth image of the initial frame , and when the similarity is lower than the similarity threshold, initialize the camera pose and perform segmentation, including:

步骤1：计算所述每帧深度图像与第一帧深度图像的相似度；Step 1: Calculate the similarity between the depth image of each frame and the depth image of the first frame;

步骤2：判断所述相似度是否低于相似度阈值；Step 2: judging whether the similarity is lower than a similarity threshold;

步骤3：若是，则对所述深度图像序列进行分段；Step 3: If yes, segment the depth image sequence;

步骤4：将下一帧深度图像作为下一分段的起始帧深度图像，并重复执行步骤1和步骤2，直至处理完所有帧深度图像。Step 4: Use the next frame depth image as the starting frame depth image of the next segment, and repeat step 1 and step 2 until all frame depth images are processed.

优选地，所述步骤1具体包括：Preferably, said step 1 specifically includes:

根据投影关系和任一帧深度图像的深度值，并利用下式计算所述深度图像上每个像素所对应的第一空间三维点：According to the projection relationship and the depth value of any frame depth image, and use the following formula to calculate the first space three-dimensional point corresponding to each pixel on the depth image:

p＝π^-1(u_p,Z(u_p))p=π ^-1 (u _p ,Z(u _p ))

其中，所述u_p是所述深度图像上的任一像素；所述Z(u_p)和所述p分别表示所述u_p对应的深度值和所述第一空间三维点；所述π表示所述投影关系；Wherein, the up is any pixel on the depth image; the Z(up) and the _p represent the depth value corresponding to the up and the three- _dimensional point in the first space, respectively; the _π represent the projection relationship;

根据下式将所述第一空间三维点旋转平移变换到世界坐标系下，得到第二空间三维点：According to the following formula, the rotation and translation transformation of the three-dimensional point in the first space is transformed into the world coordinate system to obtain the three-dimensional point in the second space:

q＝T_ipq=T _i p

其中，所述T_i表示第i帧深度图对应空间三维点到世界坐标系下的旋转平移矩阵；所述p表示所述第一空间三维点，所述q表示所述第二空间三维点；所述i取正整数；Wherein, the T _i represents the rotation and translation matrix of the i-th frame depth map corresponding to the three-dimensional point in the space to the world coordinate system; the p represents the three-dimensional point in the first space, and the q represents the three-dimensional point in the second space; The i takes a positive integer;

根据下式将所述第二空间三维点反投影到二维图像平面，得到投影后的深度图像：The three-dimensional point in the second space is back-projected to the two-dimensional image plane according to the following formula to obtain the projected depth image:

其中，所述u_q是所述q对应的投影后深度图像上的像素；所述f_x、所述f_y、所述c_x和所述c_y表示深度相机的内参；所述x_q、y_q、z_q表示所述q的坐标；所述T表示矩阵的转置；Wherein, the u _q is the pixel on the projected depth image corresponding to the q; the f _x , the f _y , the c _x and the _cy represent the internal reference of the depth camera; the x _q , y _q and z _q represent the coordinates of the q; the T represents the transposition of the matrix;

分别计算所述起始帧深度图像和任一帧投影后的深度图像上的有效像素个数，并将两者比值作为相似度。The number of effective pixels on the depth image of the initial frame and the projected depth image of any frame are respectively calculated, and the ratio of the two is used as the similarity.

优选地，所述根据处理结果，进行加权体数据融合，从而重建室内完整场景三维模型具体包括：根据所述处理结果，利用截断符号距离函数网格模型融合各帧的深度图像，并使用体素网格来表示三维空间，从而得到室内完整场景三维模型。Preferably, performing weighted volume data fusion according to the processing results, so as to reconstruct the 3D model of the complete indoor scene specifically includes: according to the processing results, using the truncated signed distance function grid model to fuse the depth images of each frame, and using voxel The grid is used to represent the 3D space, so as to obtain the 3D model of the complete indoor scene.

优选地，根据所述处理结果，利用截断符号距离函数网格模型融合各帧的深度图像，并使用体素网格来表示三维空间，从而得到室内完整场景三维模型，具体包括：Preferably, according to the processing result, the depth images of each frame are fused using a truncated signed distance function grid model, and a voxel grid is used to represent a three-dimensional space, thereby obtaining a three-dimensional model of a complete indoor scene, specifically including:

基于噪声特点与兴趣区域模型，利用Volumetric method框架进行所述截断符号距离函数数据加权融合；Based on the noise characteristics and the region of interest model, the weighted fusion of the truncated signed distance function data is performed using the Volumetric method framework;

采用Marching cubes算法进行Mesh模型提取，从而得到所述室内完整场景三维模型。The Marching cubes algorithm is used to extract the Mesh model, so as to obtain the 3D model of the complete indoor scene.

优选地，所述截断符号距离函数根据下式来确定：Preferably, the truncated signed distance function is determined according to the following formula:

f_i(v)＝[K^-1z_i(u)[u^T,1]^T]_z-[v_i]_z f _i (v)＝[K ^-1 z _i (u)[u ^T ,1] ^T ] _z -[v _i ] _z

其中，f_i(v)表示截断符号距离函数，也即网格到物体模型表面的距离，正负表示该网格是在表面被遮挡一侧还是在可见一侧，而过零点就是表面上的点；所述K表示所述相机的内参数矩阵；所述u表示像素；所述z_i(u)表示所述像素u对应的深度值；所述v_i表示体素。Among them, f _i (v) represents the truncated sign distance function, that is, the distance from the grid to the surface of the object model, the positive and negative indicate whether the grid is on the side where the surface is occluded or on the visible side, and the zero-crossing point is the distance on the surface point; the K represents the internal parameter matrix of the camera; the u represents a pixel; the z _i (u) represents the depth value corresponding to the pixel u; the v _i represents a voxel.

优选地，所述数据加权融合根据下式进行：Preferably, the data weighted fusion is performed according to the following formula:

其中，所述v表示体素；所述f_i(v)和所述w_i(v)分别表示所述体素v对应的截断符号距离函数及其权值函数；所述n取正整数；所述F(v)表示融合后所述体素v所对应的截断符号距离函数值；所述W(v)表示融合后体素v所对应的截断符号距离函数值的权重；Wherein, the v represents a voxel; the f _i (v) and the w _i (v) respectively represent a truncated signed distance function and a weight function corresponding to the voxel v; the n is a positive integer; The F(v) represents the truncated signed distance function value corresponding to the voxel v after fusion; the W(v) represents the weight of the truncated signed distance function value corresponding to the voxel v after fusion;

其中，所述权值函数可以根据下式来确定：Wherein, the weight function can be determined according to the following formula:

其中，所述d_i表示兴趣区域的半径；所述δ_s是深度数据中的噪声方差；所述w为常数。Wherein, the d _i represents the radius of the region of interest; the δ _s is the noise variance in the depth data; the w is a constant.

为了实现上述目的，另一方面，还提供了一种基于消费级深度相机进行室内完整场景三维重建的系统，该系统包括：In order to achieve the above purpose, on the other hand, a system for 3D reconstruction of a complete indoor scene based on a consumer-grade depth camera is also provided, the system includes:

获取模块，用于获取深度图像；Obtaining module, used for obtaining depth image;

滤波模块，用于对所述深度图像进行自适应双边滤波；A filtering module, configured to perform adaptive bilateral filtering on the depth image;

分块融合与配准模块，用于对滤波后的深度图像进行基于视觉内容的分块融合和配准处理；The block fusion and registration module is used to perform block fusion and registration processing based on visual content on the filtered depth image;

体数据融合模块，用于根据处理结果，进行加权体数据融合，从而重建室内完整场景三维模型。The volume data fusion module is used to perform weighted volume data fusion according to the processing results, so as to reconstruct the 3D model of the complete indoor scene.

优选地，所述滤波模块具体用于：Preferably, the filtering module is specifically used for:

优选地，所述分块融合与配准模块具体可以用于：基于视觉内容对深度图像序列进行分段，并对每一分段进行分块融合，且所述分段间进行闭环检测，对闭环检测的结果做全局优化。Preferably, the block fusion and registration module can be specifically configured to: segment the depth image sequence based on visual content, perform block fusion for each segment, and perform loop closure detection between the segments, and The results of loop closure detection are globally optimized.

优选地，所述分块融合与配准模块还具体可以用于：Preferably, the block fusion and registration module can also be specifically used for:

基于视觉内容检测自动分段方法对深度图像序列进行分段，将相似的深度图像内容分在一个分段中，对每一分段进行分块融合，确定所述深度图像之间的变换关系，并根据所述变换关系在段与段之间做闭环检测，以实现全局优化。Segmenting the depth image sequence based on the automatic segmentation method of visual content detection, dividing similar depth image content into one segment, performing block fusion for each segment, and determining the transformation relationship between the depth images, And perform closed-loop detection between segments according to the transformation relationship, so as to realize global optimization.

优选地，所述分块融合与配准模块具体包括：Preferably, the block fusion and registration module specifically includes:

相机位姿信息获取单元，用于采用Kintinuous框架，进行视觉里程计估计，得到每帧深度图像下的相机位姿信息；The camera pose information acquisition unit is used to use the Kintinuous framework to perform visual odometry estimation and obtain the camera pose information under each frame of depth image;

分段单元，用于根据所述相机位姿信息，将由所述每帧深度图像对应的点云数据反投影到初始坐标系下，用投影后得到的深度图像与初始帧的深度图像进行相似度比较，并当相似度低于相似度阈值时，初始化相机位姿，进行分段；The segmentation unit is used to back-project the point cloud data corresponding to the depth image of each frame into the initial coordinate system according to the camera pose information, and use the depth image obtained after projection to perform similarity with the depth image of the initial frame Compare, and when the similarity is lower than the similarity threshold, initialize the camera pose and perform segmentation;

配准单元，用于提取每一分段点云数据中的PFFH几何描述子，并在每两段之间进行粗配准，以及采用GICP算法进行精配准，得到段与段之间的匹配关系；The registration unit is used to extract the PFFH geometric descriptor in each segment point cloud data, and perform rough registration between each two segments, and use the GICP algorithm for fine registration to obtain the matching between segments relation;

优化单元，用于利用每一分段的位姿信息以及所述段与段之间的匹配关系，构建图并采用G2O框架进行图优化，得到优化后的相机轨迹信息，从而实现所述全局优化。An optimization unit is configured to use the pose information of each segment and the matching relationship between the segments to construct a graph and optimize the graph using the G2O framework to obtain optimized camera trajectory information, thereby realizing the global optimization .

优选地，所述分段单元具体包括：Preferably, the segmentation unit specifically includes:

计算单元，用于计算所述每帧深度图像与第一帧深度图像的相似度；A calculation unit, configured to calculate the similarity between the depth image of each frame and the depth image of the first frame;

判断单元，用于判断所述相似度是否低于相似度阈值；a judging unit, configured to judge whether the similarity is lower than a similarity threshold;

分段子单元，用于当所述相似度低于相似度阈值时，对所述深度图像序列进行分段；a segmentation subunit, configured to segment the sequence of depth images when the similarity is lower than a similarity threshold;

处理单元，用于将下一帧深度图像作为下一分段的起始帧深度图像，并重复执行计算单元和判断单元，直至处理完所有帧深度图像。The processing unit is configured to use the next frame of depth image as the starting frame of depth image of the next segment, and repeatedly execute the calculation unit and the judging unit until all the frame depth images are processed.

优选地，所述体数据融合模块具体用于：根据所述处理结果，利用截断符号距离函数网格模型融合各帧的深度图像，并使用体素网格来表示三维空间，从而得到室内完整场景三维模型。Preferably, the volume data fusion module is specifically configured to: according to the processing result, use the truncated signed distance function grid model to fuse the depth images of each frame, and use the voxel grid to represent the three-dimensional space, so as to obtain the complete indoor scene 3D model.

优选地，所述体数据融合模块具体包括：Preferably, the volume data fusion module specifically includes:

加权融合单元，用于基于噪声特点与兴趣区域，利用Volumetric method框架进行所述截断符号距离函数数据加权融合；A weighted fusion unit is used to perform the weighted fusion of the truncated signed distance function data using the Volumetric method framework based on the noise characteristics and the region of interest;

提取单元，用于采用Marching cubes算法进行Mesh模型提取，从而得到所述室内完整场景三维模型。The extraction unit is configured to extract the Mesh model by using the Marching cubes algorithm, so as to obtain the 3D model of the complete indoor scene.

本发明实施例提供一种基于消费级深度相机进行室内完整场景三维重建的方法及系统。其中，该方法包括获取深度图像；对深度图像进行自适应双边滤波；对滤波后的深度图像进行基于视觉内容的分块融合和配准处理；根据处理结果，进行加权体数据融合，从而重建室内完整场景三维模型。本发明实施例通过对深度图像进行基于视觉内容的分块融合和配准，能有效地降低视觉里程计估计中的累积误差并提高配准精度，还采用加权体数据融合算法，这可以有效地保持物体表面的几何细节，由此，解决了如何提高室内场景下三维重建精度的技术问题，从而能够得到完整、准确、精细化的室内场景模型。Embodiments of the present invention provide a method and system for three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera. Among them, the method includes acquiring a depth image; performing adaptive bilateral filtering on the depth image; performing block fusion and registration processing based on visual content on the filtered depth image; performing weighted volume data fusion according to the processing results, thereby reconstructing indoor 3D model of the complete scene. The embodiment of the present invention can effectively reduce the cumulative error in visual odometry estimation and improve the registration accuracy by performing block fusion and registration based on visual content on the depth image, and also adopts a weighted volume data fusion algorithm, which can effectively The geometric details of the surface of the object are maintained, thereby solving the technical problem of how to improve the accuracy of 3D reconstruction in indoor scenes, so that a complete, accurate and refined indoor scene model can be obtained.

附图说明Description of drawings

图1为根据本发明实施例的基于消费级深度相机进行室内完整场景三维重建的方法的流程示意图；FIG. 1 is a schematic flowchart of a method for three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera according to an embodiment of the present invention;

图2a为根据本发明实施例的深度图像对应的彩色图像；Fig. 2a is a color image corresponding to a depth image according to an embodiment of the present invention;

图2b为根据本发明实施例的从深度图像得到的点云示意图；Fig. 2b is a schematic diagram of a point cloud obtained from a depth image according to an embodiment of the present invention;

图2c为根据本发明实施例的对深度图像进行双边滤波得到的点云示意图；2c is a schematic diagram of a point cloud obtained by performing bilateral filtering on a depth image according to an embodiment of the present invention;

图2d为根据本发明实施例的对深度图像进行自适应双边滤波得到的点云示意图Fig. 2d is a schematic diagram of a point cloud obtained by performing adaptive bilateral filtering on a depth image according to an embodiment of the present invention

图3为根据本发明实施例的基于视觉内容分段融合、配准的流程示意图；FIG. 3 is a schematic flow diagram of segmentation fusion and registration based on visual content according to an embodiment of the present invention;

图4为根据本发明实施例的加权体数据融合过程示意图；4 is a schematic diagram of a weighted volume data fusion process according to an embodiment of the present invention;

图5a为运用非加权体数据融合算法的三维重建结果示意图；Figure 5a is a schematic diagram of the 3D reconstruction result using the unweighted volume data fusion algorithm;

图5b为图5a中三维模型的局部细节示意图；Fig. 5b is a schematic diagram of local details of the three-dimensional model in Fig. 5a;

图5c为根据本发明实施例提出的加权体数据融合算法得到的三维重建结果示意图；FIG. 5c is a schematic diagram of a three-dimensional reconstruction result obtained by a weighted volume data fusion algorithm proposed according to an embodiment of the present invention;

图5d为图5c中三维模型的局部细节示意图；Fig. 5d is a schematic diagram of local details of the three-dimensional model in Fig. 5c;

图6为根据本发明实施例的在3D Scene Data数据集上使用本发明实施例提出的方法进行三维重建的效果示意图；6 is a schematic diagram of the effect of three-dimensional reconstruction using the method proposed by the embodiment of the present invention on the 3D Scene Data dataset according to the embodiment of the present invention;

图7为根据本发明实施例的在Augmented ICL-NUIM Dataset数据集上使用本发明实施例提出的方法进行三维重建的效果示意图；7 is a schematic diagram of the effect of three-dimensional reconstruction using the method proposed in the embodiment of the present invention on the Augmented ICL-NUIM Dataset dataset according to an embodiment of the present invention;

图8为根据本发明实施例的利用Microsoft Kinect for Windows采集的室内场景数据进行三维重建的效果示意图；8 is a schematic diagram showing the effect of three-dimensional reconstruction using indoor scene data collected by Microsoft Kinect for Windows according to an embodiment of the present invention;

图9为根据本发明实施例的基于消费级深度相机进行室内完整场景三维重建的系统的结构示意图。9 is a schematic structural diagram of a system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera according to an embodiment of the present invention.

具体实施方式detailed description

下面参照附图来描述本发明的优选实施方式。本领域技术人员应当理解的是，这些实施方式仅仅用于解释本发明的技术原理，并非旨在限制本发明的保护范围。Preferred embodiments of the present invention are described below with reference to the accompanying drawings. Those skilled in the art should understand that these embodiments are only used to explain the technical principles of the present invention, and are not intended to limit the protection scope of the present invention.

本发明实施例提供一种基于消费级深度相机进行室内完整场景三维重建的方法。如图1所示，该方法包括：An embodiment of the present invention provides a method for three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera. As shown in Figure 1, the method includes:

S100：获取深度图像。S100: Acquire a depth image.

具体地，本步骤可以包括：利用基于结构光原理的消费级深度相机来获取深度图像。Specifically, this step may include: using a consumer-grade depth camera based on the principle of structured light to acquire a depth image.

其中，基于结构光原理的消费级深度相机(Microsoft Kinect for Windows和Xtion，简称深度相机)，是通过发射结构光，接收反射信息来获取深度图像的深度数据的。Among them, consumer-grade depth cameras (Microsoft Kinect for Windows and Xtion, referred to as depth cameras) based on the principle of structured light acquire depth data of depth images by emitting structured light and receiving reflection information.

在实际应用中，可以利用手持式消费级深度相机Microsoft Kinect for Windows采集真实室内场景数据。In practical applications, real indoor scene data can be collected using a handheld consumer-grade depth camera Microsoft Kinect for Windows.

深度数据可以根据下式来计算：Depth data can be calculated according to the following formula:

其中，f表示消费级深度相机的焦距；B表示基线；D表示视差。Among them, f represents the focal length of the consumer-grade depth camera; B represents the baseline; D represents the disparity.

S110：对深度图像进行自适应双边滤波。S110: Perform adaptive bilateral filtering on the depth image.

本步骤利用基于结构光原理的消费级深度相机的噪声特点对获取的深度图像进行自适应双边滤波。In this step, adaptive bilateral filtering is performed on the obtained depth image by using the noise characteristics of the consumer-grade depth camera based on the principle of structured light.

其中，自适应双边滤波算法是指在深度图像的空间域和值域上都进行滤波。Among them, the adaptive bilateral filtering algorithm refers to performing filtering on both the spatial domain and the value domain of the depth image.

在实际应用中，可以根据深度相机的噪声特点及其内部参数来设置自适应双边滤波算法的参数，这样能有效地去除噪声并保留边缘信息。In practical applications, the parameters of the adaptive bilateral filtering algorithm can be set according to the noise characteristics of the depth camera and its internal parameters, which can effectively remove noise and preserve edge information.

对深度Z关于视差D求偏导，存在以下关系：For the partial derivative of the depth Z with respect to the parallax D, there is the following relationship:

深度数据的噪声主要产生于量化过程，从上式可以看出深度噪声的方差与深度值二次方成正比，也就是说深度值越大，噪声也越大。为了有效去除深度图像中的噪声，本发明实施例基于这个噪声特点来定义滤波算法。The noise of the depth data is mainly generated in the quantization process. From the above formula, it can be seen that the variance of the depth noise is proportional to the square of the depth value, that is to say, the greater the depth value, the greater the noise. In order to effectively remove the noise in the depth image, the embodiment of the present invention defines a filtering algorithm based on this noise characteristic.

具体地，上述自适应双边滤波可以根据下式进行：Specifically, the above adaptive bilateral filtering can be performed according to the following formula:

其中，u和u_k分别表示深度图像上的任一像素及其领域像素；Z(u)和Z(u_k)分别表示对应u和u_k的深度值；表示滤波后对应的深度值；W表示在领域上的归一化因子；w_s和w_c分别表示在空间域和值域滤波的高斯核函数。Among them, u and u _k respectively represent any pixel on the depth image and its domain pixels; Z(u) and Z(u _k ) represent the depth values corresponding to u and u _k respectively; Indicates the corresponding depth value after filtering; W indicates that in the field The normalization factor on ; w _s and w _c represent the Gaussian kernel function filtered in the spatial domain and the value domain, respectively.

在上述实施例中，w_s和w_c可以根据下式来确定：In the above embodiment, w _s and w _c can be determined according to the following formula:

其中，δ_s和δ_c分别是空间域和值域高斯核函数的方差。Among them, δ _s and δ _c are the variance of Gaussian kernel function in space domain and value domain respectively.

δ_s和δ_c与深度值大小有关，其取值不是固定的。δ _s and δ _c are related to the depth value, and their values are not fixed.

具体地，在上述实施例中，δ_s和δ_c可以根据下式来确定：Specifically, in the above-mentioned embodiment, δ _s and δ _c can be determined according to the following formula:

其中，f表示深度相机的焦距，K_s和K_c表示常数，其具体取值与深度相机的参数有关。Among them, f represents the focal length of the depth camera, K _s and K _c represent constants, and their specific values are related to the parameters of the depth camera.

图2a-d示例性地示出了不同滤波算法的效果比较示意图。其中，图2a示出了深度图像对应的彩色图像。图2b示出了从深度图像得到的点云。图2c示出了对深度图像进行双边滤波得到的点云。图2d示出了对深度图像进行自适应双边滤波得到的点云。Figures 2a-d exemplarily show the effect comparison diagrams of different filtering algorithms. Wherein, Fig. 2a shows a color image corresponding to a depth image. Figure 2b shows the point cloud obtained from the depth image. Figure 2c shows the point cloud resulting from bilateral filtering of the depth image. Figure 2d shows the point cloud obtained by adaptive bilateral filtering on the depth image.

本发明实施例通过采用自适应双边滤波方法，可以实现深度图的保边、去噪。In the embodiment of the present invention, by adopting an adaptive bilateral filtering method, edge preservation and denoising of the depth map can be realized.

S120：对深度图像进行基于视觉内容的分块融合和配准处理。S120: Perform block fusion and registration processing based on visual content on the depth image.

本步骤基于视觉内容对深度图像序列进行分段，并对每个分段进行分块融合，且分段间进行闭环检测，对闭环检测的结果做全局优化。其中，深度图像序列为深度图像数据流。In this step, the depth image sequence is segmented based on the visual content, and each segment is fused in blocks, and the closed-loop detection is performed between the segments, and the result of the closed-loop detection is globally optimized. Wherein, the depth image sequence is a depth image data stream.

优选地，本步骤可以包括：确定深度图像之间的变换关系，基于视觉内容自动分段的方法对深度图像序列进行分段，将相似的深度图像内容分在一个分段中，对每个分段进行分块融合，确定深度图像之间的变换关系，并根据变换关系在段与段之间做闭环检测，并实现全局优化。Preferably, this step may include: determining the transformation relationship between the depth images, segmenting the depth image sequence based on the method of automatic segmentation of visual content, dividing similar depth image contents into one segment, and dividing each segment Segments are merged into blocks, the transformation relationship between depth images is determined, and loop closure detection is performed between segments according to the transformation relationship, and global optimization is realized.

进一步地，本步骤可以包括：Further, this step may include:

S121：采用Kintinuous框架，进行视觉里程计估计，得到每帧深度图像下的相机位姿信息。S121: Use the Kintinuous framework to perform visual odometry estimation to obtain camera pose information under each frame of depth image.

S122：根据相机位姿信息，将由每帧深度图像对应的点云数据反投影到初始坐标系下，用投影后得到的深度图像与初始帧的深度图像进行相似度比较，并当相似度低于相似度阈值时，初始化相机位姿，进行分段。S122: According to the camera pose information, back-project the point cloud data corresponding to each frame of depth image into the initial coordinate system, compare the similarity between the depth image obtained after projection and the depth image of the initial frame, and when the similarity is lower than When the similarity threshold is reached, the camera pose is initialized for segmentation.

S123：提取每一分段点云数据中的PFFH几何描述子，并在每两段之间进行粗配准，以及采用GICP算法进行精配准，得到段与段之间的匹配关系。S123: Extract the PFFH geometric descriptor in each segment point cloud data, perform rough registration between every two segments, and use the GICP algorithm to perform fine registration to obtain the matching relationship between segments.

本步骤对段与段之间做闭环检测。In this step, a closed-loop detection is performed between segments.

S124：利用每一分段的位姿信息以及段与段之间的匹配关系，构建图并采用G2O框架进行图优化，得到优化后的相机轨迹信息，从而实现全局优化。S124: Use the pose information of each segment and the matching relationship between segments to construct a graph and use the G2O framework to optimize the graph to obtain optimized camera trajectory information, thereby achieving global optimization.

本步骤在优化时应用(Simultaneous Localization and Calibration,SLAC)模式改善非刚性畸变，引入line processes约束删除错误的闭环匹配。In this step, the (Simultaneous Localization and Calibration, SLAC) mode is applied to improve non-rigid distortion during optimization, and line processes constraints are introduced to delete wrong closed-loop matching.

上述步骤S122还可以具体包括：The above step S122 may also specifically include:

S1221：计算每帧深度图像与第一帧深度图像的相似度。S1221: Calculate the similarity between each frame of depth image and the first frame of depth image.

S1222：判断该相似度是否低于相似度阈值，若是，则执行步骤S1223；否则，执行步骤S1224。S1222: Determine whether the similarity is lower than the similarity threshold, if yes, execute step S1223; otherwise, execute step S1224.

S1223：对深度图像序列进行分段。S1223: Segment the depth image sequence.

本步骤基于视觉内容对深度图像序列进行分段处理。这样既可以有效地解决视觉里程计估计产生的累积误差问题，又可以将相似的内容融合在一起，从而提高配准精度。In this step, the depth image sequence is segmented based on the visual content. This can not only effectively solve the problem of cumulative error generated by visual odometry estimation, but also fuse similar content together to improve registration accuracy.

S1224：对深度图像序列不进行分段。S1224: Do not segment the depth image sequence.

S1225：将下一帧深度图像作为下一分段的起始帧深度图像，并重复执行步骤S1221和步骤S1222，直至处理完所有帧深度图像。S1225: Use the next frame depth image as the starting frame depth image of the next segment, and repeat step S1221 and step S1222 until all frame depth images are processed.

在上述实施例中，计算每帧深度图像与第一帧深度图像的相似度的步骤具体可以包括：In the above embodiment, the step of calculating the similarity between each frame of depth image and the first frame of depth image may specifically include:

S12211：根据投影关系和任一帧深度图像的深度值，并利用下式计算深度图像上每个像素所对应的第一空间三维点：S12211: According to the projection relationship and the depth value of any frame of depth image, and use the following formula to calculate the first space three-dimensional point corresponding to each pixel on the depth image:

p＝π^-1(u_p,Z(u_p))p=π ^-1 (u _p ,Z(u _p ))

其中，u_p是深度图像上的任一像素；Z(u_p)和p分别表示u_p对应的深度值和第一空间三维点；π表示投影关系，即每帧深度图像对应的点云数据反投影到初始坐标系下的2D-3D投影变换关系。Among them, u _p is any pixel on the depth image; Z(u _p ) and p represent the depth value corresponding to up _p and the 3D point in the first space, respectively; π represents the projection relationship, that is, the point cloud data corresponding to each frame of depth image Backprojection to the 2D-3D projection transformation relationship in the initial coordinate system.

S12212：根据下式将第一空间三维点旋转平移变换到世界坐标系下，得到第二空间三维点：S12212: Transform the rotation and translation of the 3D point in the first space into the world coordinate system according to the following formula to obtain the 3D point in the second space:

q＝T_ipq=T _i p

其中，T_i表示第i帧深度图对应空间三维点到世界坐标系下的旋转平移矩阵，其可以通过视觉里程计估计得到；i取正整数；p表示第一空间三维点，q表示第二空间三维点，p和q的坐标分别为：Among them, T _i represents the rotation and translation matrix from the three-dimensional point in the i-th frame depth map to the world coordinate system, which can be estimated by visual odometry; i takes a positive integer; p represents the first three-dimensional point in space, and q represents the second A three-dimensional point in space, the coordinates of p and q are:

p＝(x_p,y_p,z_p)，q＝(x_q,y_q,z_q)。p = (x _p , y _p , z _p ), q = (x _q , y _q , z _q ).

S12213：根据下式将第二空间三维点反投影到二维图像平面，得到投影后的深度图像：S12213: Back-project the 3D point in the second space onto the 2D image plane according to the following formula to obtain the projected depth image:

其中，u_q是q对应的投影后深度图像上的像素；f_x、f_y、c_x和c_y表示深度相机的内参；x_q、y_q、z_q表示q的坐标；T表示矩阵的转置。Among them, u _q is the pixel on the projected depth image corresponding to q; f _x , f _y , c _x and _cy represent the internal parameters of the depth camera; x _q , y _q , z _q represent the coordinates of q; T represents the matrix Transpose.

S12214：分别计算起始帧深度图像和任一帧投影后的深度图像上的有效像素个数，并将两者比值作为相似度。S12214: Calculate the number of effective pixels on the initial frame depth image and the projected depth image of any frame respectively, and use the ratio of the two as the similarity.

举例来说，根据下式来计算相似度：For example, the similarity is calculated according to the following formula:

其中，n⁰和nⁱ分别表示起始帧深度图像和任一帧投影后的深度图像上的有效像素个数；ρ表示相似度。Among them, n ⁰ and ⁿⁱ represent the number of effective pixels on the depth image of the initial frame and the projected depth image of any frame, respectively; ρ represents the similarity.

图3示例性地示出了基于视觉内容分段融合、配准的流程示意图。Fig. 3 exemplarily shows a flow chart of segmentation fusion and registration based on visual content.

本发明实施例采用基于视觉内容自动分段算法，能有效降低视觉里程计估计中的累积误差，提高了配准精度。The embodiment of the present invention adopts an automatic segmentation algorithm based on visual content, which can effectively reduce the cumulative error in visual odometry estimation and improve registration accuracy.

S130：根据处理结果，进行加权体数据融合，从而重建室内完整场景三维模型。S130: Perform weighted volume data fusion according to the processing result, so as to reconstruct the 3D model of the complete indoor scene.

具体地，本步骤可以包括：根据基于视觉内容的分块融合和配准处理结果，利用截断符号距离函数(TSDF)网格模型融合各帧的深度图像，并使用体素网格来表示三维空间，从而得到室内完整场景三维模型。Specifically, this step may include: according to the block fusion and registration processing results based on visual content, using the truncated signed distance function (TSDF) grid model to fuse the depth images of each frame, and using the voxel grid to represent the three-dimensional space , so as to obtain the 3D model of the complete indoor scene.

本步骤还可以进一步包括：This step can further include:

S131：基于噪声特点与兴趣区域，利用Volumetric method框架进行截断符号距离函数数据加权融合。S131: Based on the noise characteristics and the region of interest, use the Volumetric method framework to perform weighted fusion of truncated signed distance function data.

S132：采用Marching cubes算法进行Mesh模型提取。S132: Using the Marching cubes algorithm to extract the Mesh model.

在实际应用中，可以根据视觉里程计的估计结果，利用TSDF网格模型融合各帧的深度图像使用分辨率为m的体素网格来表示三维空间，即每个三维空间被分为m块，每个网格v存储两个值：截断符号距离函数f_i(v)及其权重w_i(v)。In practical applications, the TSDF grid model can be used to fuse the depth images of each frame according to the estimation results of the visual odometry. A voxel grid with a resolution of m can be used to represent the three-dimensional space, that is, each three-dimensional space is divided into m blocks , each grid v stores two values: the truncated signed distance function f _i (v) and its weight w _i (v).

其中，可以根据下式来确定截断符号距离函数：Among them, the truncated signed distance function can be determined according to the following formula:

其中，f_i(v)表示截断符号距离函数，也即网格到物体模型表面的距离，正负表示该网格是在表面被遮挡一侧还是在可见一侧，而过零点就是表面上的点；K表示相机的内参数矩阵；u表示像素；z_i(u)表示像素u对应的深度值；v_i表示体素。其中，该相机可以为深度相机或深度摄像机。Among them, f _i (v) represents the truncated sign distance function, that is, the distance from the grid to the surface of the object model, the positive and negative indicate whether the grid is on the side where the surface is occluded or on the visible side, and the zero-crossing point is the distance on the surface point; K represents the internal parameter matrix of the camera; u represents a pixel; z _i (u) represents the depth value corresponding to pixel u; v _i represents a voxel. Wherein, the camera may be a depth camera or a depth camera.

其中，可以根据下式进行数据加权融合：Among them, data weighted fusion can be performed according to the following formula:

其中，f_i(v)和w_i(v)分别表示体素v对应的截断符号距离函数(TSDF)及其权值函数；n取正整数；F(v)表示融合后体素v所对应的截断符号距离函数值；W(v)表示融合后体素v所对应的截断符号距离函数值的权重。Among them, f _i (v) and w _i (v) represent the truncated signed distance function (TSDF) and its weight function corresponding to voxel v respectively; n takes a positive integer; F(v) represents the fused voxel corresponding to The truncated signed distance function value of ; W(v) represents the weight of the truncated signed distance function value corresponding to voxel v after fusion.

在上述实施例中，权值函数可以根据深度数据的噪声特点以及兴趣区域来确定，其值是不固定的。为了保持物体表面的几何细节，将噪声小的区域以及感兴趣区域的权值设置得大，将噪声大的区域或不感兴趣区域的权值设置得小。In the above embodiments, the weight function may be determined according to the noise characteristics of the depth data and the region of interest, and its value is not fixed. In order to maintain the geometric details of the object surface, the weight of the area with low noise and the area of interest is set to be large, and the weight of the area with large noise or the area of no interest is set to be small.

具体地，权值函数可以根据下式来确定：Specifically, the weight function can be determined according to the following formula:

其中，d_i表示兴趣区域的半径，半径越小表示越感兴趣，权值越大；δ_s是深度数据中的噪声方差，其取值与自适应双边滤波算法空间域核函数的方差一致；w为常数，优选地，其可以取值为1或0。Among them, d _i represents the radius of the region of interest, the smaller the radius, the more interested it is, and the greater the weight; δ _s is the noise variance in the depth data, and its value is consistent with the variance of the spatial domain kernel function of the adaptive bilateral filtering algorithm; w is a constant, preferably, it can take a value of 1 or 0.

图4示例性地示出了加权体数据融合过程示意图。Fig. 4 exemplarily shows a schematic diagram of a weighted volume data fusion process.

本发明实施例采用加权体数据融合算法可以有效保持物体表面的几何细节，能够得到完整、准确、精细化的室内场景模型，具有良好的鲁棒性和扩展性。In the embodiment of the present invention, the weighted volume data fusion algorithm can effectively maintain the geometric details of the object surface, and can obtain a complete, accurate, and refined indoor scene model, which has good robustness and scalability.

图5a示例性地示出了运用非加权体数据融合算法的三维重建结果；图5b示例性地示出了图5a中三维模型的局部细节；图5c示例性地示出了利用本发明实施例提出的加权体数据融合算法得到的三维重建结果；图5d示例性地示出了图5c中三维模型的局部细节。Fig. 5a exemplarily shows the 3D reconstruction result using the unweighted volume data fusion algorithm; Fig. 5b exemplarily shows the local details of the 3D model in Fig. 5a; Fig. 5c exemplarily shows The 3D reconstruction results obtained by the proposed weighted volume data fusion algorithm; Figure 5d exemplarily shows the local details of the 3D model in Figure 5c.

图6示例性地示出了在3D Scene Data数据集上使用本发明实施例提出的方法进行三维重建的效果示意图；图7示例性地示出了在Augmented ICL-NUIM Dataset数据集上使用本发明实施例提出的方法进行三维重建的效果示意图；图8示例性地示出了利用Microsoft Kinect for Windows采集的室内场景数据进行三维重建的效果示意图。Fig. 6 exemplarily shows the schematic diagram of the effect of three-dimensional reconstruction using the method proposed by the embodiment of the present invention on the 3D Scene Data dataset; Fig. 7 exemplarily shows the use of the present invention on the Augmented ICL-NUIM Dataset dataset A schematic diagram of the effect of three-dimensional reconstruction by the method proposed in the embodiment; FIG. 8 exemplarily shows a schematic diagram of the effect of three-dimensional reconstruction using indoor scene data collected by Microsoft Kinect for Windows.

应指出的是，本文虽然以上述顺序来描述本发明实施例，但是，本领域技术人员能够理解，还可以采取不同于此处的描述顺序来实施本发明，这些简单的变化也应包含在本发明的保护范围之内。It should be pointed out that although the embodiments of the present invention are described in the above order, those skilled in the art can understand that the present invention can also be implemented in a sequence different from that described here, and these simple changes should also be included in this document. within the scope of protection of the invention.

基于与方法实施例相同的技术构思，本发明实施例还提供一种基于消费级深度相机进行室内完整场景三维重建的系统，如图9所示，该系统90包括：获取模块92、滤波模块94、分块融合与配准模块96和体数据融合模块98。其中，获取模块92用于获取深度图像。滤波模块94用于对深度图像进行自适应双边滤波。分块融合与配准模块96用于对滤波后的深度图像进行基于视觉内容的分块融合和配准处理。体数据融合模块98用于根据处理结果，进行加权体数据融合，从而重建室内完整场景三维模型。Based on the same technical concept as the method embodiment, the embodiment of the present invention also provides a system for 3D reconstruction of a complete indoor scene based on a consumer-grade depth camera. As shown in FIG. 9 , the system 90 includes: an acquisition module 92 and a filtering module 94 , block fusion and registration module 96 and volume data fusion module 98. Wherein, the obtaining module 92 is used for obtaining a depth image. The filtering module 94 is used for performing adaptive bilateral filtering on the depth image. The block fusion and registration module 96 is used to perform block fusion and registration processing based on visual content on the filtered depth image. The volume data fusion module 98 is used to perform weighted volume data fusion according to the processing results, so as to reconstruct the 3D model of the complete indoor scene.

本发明实施例通过采用上述技术方案，能有效地降低视觉里程计估计中的累积误差，并提高配准精度，可以有效保持物体表面的几何细节，能够得到完整、准确、精细化的室内场景模型。The embodiments of the present invention can effectively reduce the cumulative error in visual odometry estimation by adopting the above technical solution, and improve the registration accuracy, can effectively maintain the geometric details of the object surface, and can obtain a complete, accurate and refined indoor scene model .

在一些实施例中，滤波模块具体用于：根据下式进行自适应双边滤波：In some embodiments, the filtering module is specifically configured to: perform adaptive bilateral filtering according to the following formula:

在一些实施例中，分块融合与配准模块具体可以用于：基于视觉内容对深度图像序列进行分段，并对每一分段进行分块融合，且分段间进行闭环检测，对闭环检测的结果做全局优化。In some embodiments, the block fusion and registration module can be specifically used to: segment the depth image sequence based on visual content, perform block fusion for each segment, and perform closed-loop detection between segments, and perform closed-loop detection The detection results are globally optimized.

在另一些实施例中，分块融合与配准模块还具体可以用于：确定深度图像之间的变换关系，基于视觉内容检测自动分段方法对深度图像序列进行分段，将相似的深度图像内容分在一个分段中，并对每一分段进行分块融合，确定深度图像之间的变换关系，根据变换关系在段与段之间做闭环检测，并实现全局优化。In some other embodiments, the block fusion and registration module can also be specifically used to: determine the transformation relationship between depth images, segment the depth image sequence based on the automatic segmentation method of visual content detection, and divide similar depth images The content is divided into a segment, and each segment is fused in blocks to determine the transformation relationship between depth images, and perform closed-loop detection between segments according to the transformation relationship, and achieve global optimization.

在一些优选的实施例中，分块融合与配准模块具体可以包括：相机位姿信息获取单元、分段单元、配准单元和优化单元。其中，相机位姿信息获取单元用于采用Kintinuous框架，进行视觉里程计估计，得到每帧深度图像下的相机位姿信息。分段单元用于根据相机位姿信息，将由每帧深度图像对应的点云数据反投影到初始坐标系下，用投影后得到的深度图像与初始帧的深度图像进行相似度比较，并当相似度低于相似度阈值时，初始化相机位姿，进行分段。配准单元用于提取每一分段点云数据中的PFFH几何描述子，并在每两段之间进行粗配准，以及采用GICP算法进行精配准，得到段与段之间的匹配关系。优化单元用于利用每一分段的位姿信息以及段与段之间的匹配关系，构建图并采用G2O框架进行图优化，得到优化后的相机轨迹信息，从而实现全局优化。In some preferred embodiments, the block fusion and registration module may specifically include: a camera pose information acquisition unit, a segmentation unit, a registration unit and an optimization unit. Among them, the camera pose information acquisition unit is used to use the Kintinuous framework to perform visual odometry estimation and obtain the camera pose information under each frame of depth image. The segmentation unit is used to back-project the point cloud data corresponding to the depth image of each frame into the initial coordinate system according to the camera pose information, and compare the similarity between the depth image obtained after projection and the depth image of the initial frame, and when similar When the degree is lower than the similarity threshold, the camera pose is initialized for segmentation. The registration unit is used to extract the PFFH geometric descriptor in each segment point cloud data, and perform rough registration between each two segments, and use the GICP algorithm for fine registration to obtain the matching relationship between segments . The optimization unit is used to use the pose information of each segment and the matching relationship between segments to construct a graph and use the G2O framework for graph optimization to obtain optimized camera trajectory information, thereby achieving global optimization.

其中，上述分段单元具体可以包括：计算单元、判断单元、分段子单元和处理单元。其中，计算单元用于计算每帧深度图像与第一帧深度图像的相似度。判断单元用于判断相似度是否低于相似度阈值。分段子单元用于当相似度低于相似度阈值时，对深度图像序列进行分段。处理单元用于将下一帧深度图像作为下一分段的起始帧深度图像，并重复执行计算单元和判断单元，直至处理完所有帧深度图像。Wherein, the segmentation unit may specifically include: a computing unit, a judging unit, a segmentation subunit, and a processing unit. Wherein, the calculation unit is used to calculate the similarity between each frame of depth image and the first frame of depth image. The judging unit is used for judging whether the similarity is lower than a similarity threshold. The segmentation subunit is used to segment the depth image sequence when the similarity is lower than the similarity threshold. The processing unit is used to use the next frame of depth image as the starting frame of depth image of the next segment, and repeatedly execute the calculation unit and the judging unit until all frames of depth images are processed.

在一些实施例中，体数据融合模块具体可以用于根据处理结果，利用截断符号距离函数网格模型融合各帧的深度图像，并使用体素网格来表示三维空间，从而得到室内完整场景三维模型。In some embodiments, the volume data fusion module can be specifically used to fuse the depth images of each frame by using the truncated signed distance function grid model according to the processing results, and use the voxel grid to represent the three-dimensional space, so as to obtain the three-dimensional space of the complete indoor scene. Model.

在一些实施例中，体数据融合模块具体可以包括加权融合单元和提取单元。其中，加权融合单元用于基于噪声特点与兴趣区域，利用Volumetric method框架进行截断符号距离函数数据加权融合。提取单元用于采用Marching cubes算法进行Mesh模型提取，从而得到室内完整场景三维模型。In some embodiments, the volume data fusion module may specifically include a weighted fusion unit and an extraction unit. Among them, the weighted fusion unit is used to perform weighted fusion of the truncated sign distance function data based on the noise characteristics and the region of interest, using the Volumetric method framework. The extraction unit is used to extract the Mesh model by using the Marching cubes algorithm, so as to obtain the 3D model of the complete indoor scene.

下面以一优选实施例来详细说明本发明。The present invention will be described in detail below with a preferred embodiment.

基于消费级深度相机进行室内完整场景三维重建的系统包括采集模块、滤波模块、分块融合与配准模块和体数据融合模块。其中：The system for 3D reconstruction of indoor complete scenes based on consumer-grade depth cameras includes an acquisition module, a filtering module, a block fusion and registration module, and a volume data fusion module. in:

采集模块用于利用深度相机对室内场景进行深度图像采集。The collection module is used for collecting depth images of indoor scenes by using a depth camera.

滤波模块用于对获取的深度图像做自适应双边滤波处理。The filtering module is used to perform adaptive bilateral filtering on the obtained depth image.

该采集模块为上述获取模块的等同替换。在实际应用中，可以利用手持式消费级深度相机Microsoft Kinect for Windows采集真实室内场景数据。然后，对采集到的深度图像进行自适应双边滤波，根据深度相机的噪声特点及其内部参数来自动设置自适应双边滤波方法中的参数，故，本发明实施例能有效去除噪声并保留边缘信息。The collection module is an equivalent replacement of the above acquisition module. In practical applications, real indoor scene data can be collected using a handheld consumer-grade depth camera Microsoft Kinect for Windows. Then, adaptive bilateral filtering is performed on the collected depth image, and the parameters in the adaptive bilateral filtering method are automatically set according to the noise characteristics of the depth camera and its internal parameters, so the embodiment of the present invention can effectively remove noise and retain edge information .

分块融合与配准模块用于基于视觉内容对数据流做自动分段，每个分段进行分块融合，分段间进行闭环检测，对闭环检测的结果做全局优化。The block fusion and registration module is used to automatically segment the data stream based on the visual content, perform block fusion for each segment, perform closed-loop detection between segments, and perform global optimization on the results of closed-loop detection.

该分块融合与配准模块进行基于视觉内容的自动分块融合、配准。The block fusion and registration module performs automatic block fusion and registration based on visual content.

在一个更优选的实施例中，分块融合与配准模块具体包括：位姿信息获取模块、分段模块、粗配准模块、精配准模块和优化模块。其中，位姿信息获取模块用于采用Kintinuous框架，进行视觉里程计估计，得到每帧深度图像下的相机位姿信息。分段模块用于根据相机位姿信息将由每帧深度图像对应的点云数据反投影到初始坐标系下，用投影后的深度图像与初始帧的深度图像进行相似度比较，若相似度低于相似度阈值则初始化相机位姿，并进行新的分段。粗配准模块用于提取每一分段点云数据中的PFFH几何描述子，并在每两段间之间进行粗配准；精配准模块用于采用GICP算法进行精配准，以获取段与段之间的匹配关系。优化模块用于利用每一段的位姿信息以及段与段之间的匹配关系，构建图并采用G2O框架进行图优化。In a more preferred embodiment, the block fusion and registration module specifically includes: a pose information acquisition module, a segmentation module, a rough registration module, a fine registration module and an optimization module. Among them, the pose information acquisition module is used to use the Kintinuous framework to perform visual odometry estimation and obtain the camera pose information under each frame of depth image. The segmentation module is used to back-project the point cloud data corresponding to each frame of depth image to the initial coordinate system according to the camera pose information, and compare the similarity between the projected depth image and the depth image of the initial frame. If the similarity is lower than The similarity threshold initializes the camera pose and performs new segmentation. The coarse registration module is used to extract the PFFH geometric descriptor in each segment point cloud data, and perform coarse registration between each two segments; the fine registration module is used to use the GICP algorithm for fine registration to obtain The matching relationship between segments. The optimization module is used to use the pose information of each segment and the matching relationship between segments to construct a graph and use the G2O framework for graph optimization.

优选地，上述优化模块还进一步用于应用SLAC(Simultaneous Localization andCalibration)模式以优化非刚性畸变，并利用line processes约束删除错误的闭环匹配。Preferably, the above optimization module is further used to apply SLAC (Simultaneous Localization and Calibration) mode to optimize non-rigid distortion, and use line processes constraints to delete wrong closed-loop matching.

上述分块融合与配准模块基于视觉内容对RGBD数据流进行分段处理，既可以有效地解决视觉里程计估计产生的累积误差问题，又可以将相似的内容融合在一起，从而可以提高配准精度。The above block fusion and registration module processes RGBD data streams in segments based on visual content, which can not only effectively solve the problem of cumulative error generated by visual odometry estimation, but also fuse similar content together, thereby improving registration precision.

体数据融合模块用于根据优化后的相机轨迹信息进行加权体数据融合，得到场景的三维模型。The volume data fusion module is used to perform weighted volume data fusion according to the optimized camera trajectory information to obtain a 3D model of the scene.

该体数据融合模块根据深度相机的噪声特点和感兴趣区域来定义截断符号距离函数的权值函数，来实现物体表面的几何细节的保持。The volume data fusion module defines the weight function of the truncated sign distance function according to the noise characteristics of the depth camera and the region of interest, so as to maintain the geometric details of the object surface.

基于消费级深度相机进行室内完整场景三维重建的系统上的实验表明：基于消费级深度相机的高精度三维重建方法，能够得到完整、准确、精细化的室内场景模型，系统具有良好的鲁棒性和扩展性。Experiments on the system of 3D reconstruction of indoor complete scenes based on consumer-grade depth cameras show that: the high-precision 3D reconstruction method based on consumer-grade depth cameras can obtain a complete, accurate and refined indoor scene model, and the system has good robustness and scalability.

上述基于消费级深度相机进行室内完整场景三维重建的系统实施例可以用于执行基于消费级深度相机进行室内完整场景三维重建的方法实施例，其技术原理、所解决的技术问题及产生的技术效果相似，可以互相参考；为描述的方便和简洁，各个实施例之间省略了描述相同的部分。The above-mentioned system embodiment for 3D reconstruction of a complete indoor scene based on a consumer-grade depth camera can be used to implement a method embodiment for 3D reconstruction of a complete indoor scene based on a consumer-grade depth camera. Its technical principles, technical problems to be solved, and technical effects produced For similarity, reference may be made to each other; for convenience and brevity of description, descriptions of the same parts between the various embodiments are omitted.

需要说明的是，上述实施例提供的基于消费级深度相机进行室内完整场景三维重建的系统和方法在进行室内完整场景三维重建时，仅以上述各功能模块、单元或步骤的划分进行举例说明，例如，前述中的获取模块也可以作为采集模块，在实际应用中，可以根据需要而将上述功能分配由不同的功能模块、单元或步骤来完成，即将本发明实施例中的模块、单元或者步骤再分解或者组合，例如，可以将获取模块或采集和滤波模块合并为数据预处理模块。It should be noted that when the system and method for 3D reconstruction of a complete indoor scene based on a consumer-grade depth camera provided by the above-mentioned embodiments perform 3D reconstruction of a complete indoor scene, only the division of the above-mentioned functional modules, units or steps is used for illustration. For example, the acquisition module in the foregoing can also be used as an acquisition module. In practical applications, the above-mentioned functions can be allocated by different functional modules, units or steps according to needs, that is, the modules, units or steps in the embodiments of the present invention Then decompose or combine, for example, the acquisition module or the collection and filtering module can be combined into a data preprocessing module.

至此，已经结合附图所示的优选实施方式描述了本发明的技术方案，但是，本领域技术人员容易理解的是，本发明的保护范围显然不局限于这些具体实施方式。在不偏离本发明的原理的前提下，本领域技术人员可以对相关技术特征作出等同的更改或替换，这些更改或替换之后的技术方案都将落入本发明的保护范围之内。So far, the technical solutions of the present invention have been described in conjunction with the preferred embodiments shown in the accompanying drawings, but those skilled in the art will easily understand that the protection scope of the present invention is obviously not limited to these specific embodiments. Without departing from the principles of the present invention, those skilled in the art can make equivalent changes or substitutions to relevant technical features, and the technical solutions after these changes or substitutions will all fall within the protection scope of the present invention.

Claims

1. A method for performing indoor complete scene three-dimensional reconstruction based on a consumer-grade depth camera, the method comprising:

acquiring a depth image;

performing adaptive bilateral filtering on the depth image;

carrying out block fusion and registration processing based on visual contents on the filtered depth image;

and according to the processing result, performing weighted volume data fusion so as to reconstruct an indoor complete scene three-dimensional model.

2. The method according to claim 1, wherein the adaptive bilateral filtering of the depth image specifically comprises:

adaptive bilateral filtering is performed according to:

wherein said u and said u_kRespectively representing any pixel and a domain pixel thereof on the depth image; the Z (u) and the Z (u)_k) Respectively represent corresponding to the u and the u_kDepth value of (d); the above-mentionedRepresenting the corresponding depth value after filtering; said W represents in the fieldA normalization factor of (a); said w_sAnd said w_cRepresenting the gaussian kernel filtered in the spatial and value domains, respectively.

3. The method of claim 2, wherein the gaussian kernel function for spatial and value domain filtering is determined according to the following equation:

\{\begin{matrix} w_{s} = \exp (- \frac{{(u - u_{k})}^{2}}{2 δ_{s}^{2}}) \\ w_{c} = \exp (- \frac{{(Z (u) - Z (u_{k}))}^{2}}{2 δ_{c}^{2}}) \end{matrix}

wherein, the_sAnd said_cThe variance of the spatial domain and the value domain gaussian kernel function respectively;

wherein the sum is determined according to:

\{\begin{matrix} δ_{s} = \frac{K_{s} Z (u)}{f} \\ δ_{c} = K_{c} Z {(u)}^{2} \end{matrix}

wherein f represents a focal length of the depth camera, K_sAnd said K_cRepresenting a constant.

4. The method according to claim 1, wherein the process of visual content-based block fusion and registration of the filtered depth image comprises in particular: and segmenting the depth image sequence based on visual content, performing block fusion on each segment, performing closed-loop detection between the segments, and performing global optimization on the result of the closed-loop detection.

5. The method according to claim 4, wherein the segmenting the depth image sequence based on the visual content, and performing block fusion on each segment, and performing closed-loop detection between the segments, and performing global optimization on the result of the closed-loop detection specifically comprises:

segmenting a depth image sequence based on an automatic segmentation method for visual content detection, dividing similar depth image contents into segments, performing block fusion on each segment, determining a transformation relation between the depth images, and performing closed-loop detection between the segments according to the transformation relation so as to realize global optimization.

6. The method according to claim 5, wherein the automatic segmentation method based on visual content detection is configured to segment a depth image sequence, segment similar depth image contents into one segment, perform block fusion on each segment, determine a transformation relationship between the depth images, and perform closed-loop detection between segments according to the transformation relationship, so as to implement global optimization, and specifically includes:

estimating a visual odometer by adopting a Kintinuous frame to obtain camera pose information under each frame of depth image;

according to the camera pose information, back projecting the point cloud data corresponding to each frame of depth image to an initial coordinate system, comparing the similarity of the depth image obtained after projection with the depth image of the initial frame, and initializing a camera pose and segmenting when the similarity is lower than a similarity threshold value;

extracting a PFFH geometric descriptor in each segmented point cloud data, performing coarse registration between each two segments, and performing fine registration by adopting a GICP algorithm to obtain a matching relation between the segments;

and constructing a graph by using the pose information of each segment and the matching relation between the segments, and performing graph optimization by using a G2O frame to obtain optimized camera track information, thereby realizing the global optimization.

7. The method according to claim 6, wherein the back-projecting the point cloud data corresponding to each frame of depth image to an initial coordinate system according to the camera pose information, comparing the similarity between the depth image obtained after projection and the depth image of the initial frame, and initializing a camera pose for segmentation when the similarity is lower than a similarity threshold, specifically comprises:

step 1: calculating the similarity between each frame of depth image and a first frame of depth image;

step 2: judging whether the similarity is lower than a similarity threshold value;

and step 3: if yes, segmenting the depth image sequence;

and 4, step 4: and taking the depth image of the next frame as the depth image of the starting frame of the next segment, and repeatedly executing the step 1 and the step 2 until all the depth images of the frames are processed.

8. The method according to claim 7, wherein the step 1 specifically comprises:

according to the projection relation and the depth value of any frame of depth image, calculating a first space three-dimensional point corresponding to each pixel on the depth image by using the following formula:

p＝π^-1(u_p,Z(u_p))

wherein u is_pIs any pixel on the depth image; z (u) as defined above_p) And said p represents said u, respectively_pCorresponding depth values and the first spatial three-dimensional points; the pi represents the projection relation;

and rotationally translating the first space three-dimensional point to a world coordinate system according to the following formula to obtain a second space three-dimensional point:

q＝T_ip

wherein, T is_iRepresenting a rotation translation matrix from the spatial three-dimensional point corresponding to the depth map of the ith frame to a world coordinate system; the p represents the first three-dimensional point in space, and the q represents the second three-dimensional point in space; the i is a positive integer;

and back projecting the second space three-dimensional point to a two-dimensional image plane according to the following formula to obtain a projected depth image:

u_{q} = {(\frac{f_{x} x_{q}}{z_{q}} - c_{x}, \frac{f_{y} y_{q}}{z_{q}} - c_{y})}^{T}

wherein u is_qIs the pixel on the projected depth image corresponding to said q; f is_xThe above-mentioned f_yC to c_xAnd c is as described_yRepresenting an internal reference of the depth camera; said x_q、y_q、z_qCoordinates representing said q; the T represents a transpose of a matrix;

and respectively calculating the number of effective pixels on the depth image of the initial frame and the depth image projected by any frame, and taking the ratio of the two as the similarity.

9. The method according to claim 1, wherein the performing weighted volume data fusion according to the processing result to reconstruct the indoor complete scene three-dimensional model specifically comprises: and according to the processing result, fusing the depth image of each frame by using a truncated symbolic distance function grid model, and representing a three-dimensional space by using a voxel grid so as to obtain an indoor complete scene three-dimensional model.

10. The method according to claim 9, wherein, according to the processing result, fusing the depth image of each frame by using a truncated symbolic distance function mesh model, and representing a three-dimensional space by using a voxel mesh, thereby obtaining a three-dimensional model of an indoor complete scene, specifically comprising:

based on the noise characteristic and the interest region, performing weighted fusion on the truncated symbol distance function data by using a Volumetric method frame;

and (4) extracting a Mesh model by adopting a Marching cubes algorithm, thereby obtaining the indoor complete scene three-dimensional model.

11. The method according to claim 9 or 10, wherein the truncated symbol distance function is determined according to the following equation:

f_i(v)＝[K^-1z_i(u)[u^T,1]^T]_z-[v_i]_z

wherein f is_i(v) Representing a truncated sign distance function, namely the distance from the grid to the surface of the object model, wherein the positive and negative represent that the grid is positioned on the shielded side or the visible side of the surface, and the zero-crossing point is a point on the surface; the K represents an intrinsic parameter matrix of the camera; the u represents a pixel; z is_i(u) representing a depth value corresponding to the pixel u; v is_iThe voxels are represented.

12. The method of claim 10, wherein the data weighted fusion is performed according to the following equation:

\{\begin{matrix} F (v) = \frac{Σ_{i = 1}^{n} f_{i} (v) w_{i} (v)}{W (v)} \\ W (v) = Σ_{i = 1}^{n} w_{i} (v) \end{matrix}

wherein v represents a voxel; f is_i(v) And said w_i(v) Respectively representing a truncated symbol distance function and a weight function thereof corresponding to the voxel v; the n is a positive integer; f (v) represents a truncated sign distance function value corresponding to the voxel v after fusion; w (v) represents the weight of the truncated sign distance function value corresponding to the voxel v after fusion;

wherein the weight function may be determined according to the following equation:

w_{i} (v) = \{\begin{matrix} \frac{e^{\frac{- d_{i}^{2}}{2 δ_{s}^{2}}}}{z_{i}^{4}}, & 0 < z_{i} < 2.8 \\ w, & z_{i} &GreaterEqual; 2.8 \end{matrix}

wherein d is_iA radius representing the region of interest; the above-mentioned_sIs the noise variance in the depth data; w is a constant.

13. A system for three-dimensional reconstruction of a complete scene indoors based on a consumer-grade depth camera, the system comprising:

the acquisition module is used for acquiring a depth image;

the filtering module is used for carrying out self-adaptive bilateral filtering on the depth image;

the block fusion and registration module is used for carrying out block fusion and registration processing based on visual content on the filtered depth image;

and the volume data fusion module is used for carrying out weighted volume data fusion according to the processing result so as to reconstruct an indoor complete scene three-dimensional model.

14. The system of claim 13, wherein the filtering module is specifically configured to:

adaptive bilateral filtering is performed according to:

15. The system of claim 13, wherein the patch fusion and registration module is specifically configured to: and segmenting the depth image sequence based on visual content, performing block fusion on each segment, performing closed-loop detection between the segments, and performing global optimization on the result of the closed-loop detection.

16. The system of claim 15, wherein the patch fusion and registration module is further specific to:

17. The system according to claim 16, wherein the block fusion and registration module specifically comprises:

the camera pose information acquisition unit is used for estimating a visual odometer by adopting a Kintinuous frame to obtain camera pose information under each frame of depth image;

the segmentation unit is used for back projecting the point cloud data corresponding to each frame of depth image to an initial coordinate system according to the camera pose information, comparing the similarity of the depth image obtained after projection with the depth image of the initial frame, and initializing a camera pose for segmentation when the similarity is lower than a similarity threshold value;

the registration unit is used for extracting the PFFH geometric descriptor in each segmented point cloud data, performing coarse registration between each two segments, and performing fine registration by adopting a GICP algorithm to obtain the matching relationship between the segments;

and the optimization unit is used for constructing a graph by using the pose information of each segment and the matching relation between the segments, and performing graph optimization by adopting a G2O frame to obtain optimized camera track information so as to realize the global optimization.

18. The system according to claim 17, wherein the segmentation unit comprises in particular:

the calculating unit is used for calculating the similarity between each frame of depth image and the first frame of depth image;

the judging unit is used for judging whether the similarity is lower than a similarity threshold value;

a segmentation subunit, configured to segment the depth image sequence when the similarity is lower than a similarity threshold;

and the processing unit is used for taking the next frame depth image as the starting frame depth image of the next segmentation, and repeatedly executing the calculating unit and the judging unit until all the frame depth images are processed.

19. The system of claim 13, wherein the volumetric data fusion module is specifically configured to: and according to the processing result, fusing the depth image of each frame by using a truncated symbolic distance function grid model, and representing a three-dimensional space by using a voxel grid so as to obtain an indoor complete scene three-dimensional model.

20. The system according to claim 19, wherein the volume data fusion module specifically comprises:

the weighted fusion unit is used for performing weighted fusion on the truncated symbol distance function data by using a Volumetric method frame based on the noise characteristics and the interest region;

and the extraction unit is used for extracting the Mesh model by adopting a Marching cubes algorithm so as to obtain the indoor complete scene three-dimensional model.