Nothing Special   »   [go: up one dir, main page]

CN115861401A - Binocular and point cloud fusion depth recovery method, device and medium - Google Patents

Binocular and point cloud fusion depth recovery method, device and medium Download PDF

Info

Publication number
CN115861401A
CN115861401A CN202310170221.2A CN202310170221A CN115861401A CN 115861401 A CN115861401 A CN 115861401A CN 202310170221 A CN202310170221 A CN 202310170221A CN 115861401 A CN115861401 A CN 115861401A
Authority
CN
China
Prior art keywords
point cloud
depth
image
binocular
sparse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310170221.2A
Other languages
Chinese (zh)
Other versions
CN115861401B (en
Inventor
许振宇
李月华
朱世强
邢琰
姜甜甜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Control Engineering
Zhejiang Lab
Original Assignee
Beijing Institute of Control Engineering
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Control Engineering, Zhejiang Lab filed Critical Beijing Institute of Control Engineering
Priority to CN202310170221.2A priority Critical patent/CN115861401B/en
Publication of CN115861401A publication Critical patent/CN115861401A/en
Application granted granted Critical
Publication of CN115861401B publication Critical patent/CN115861401B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Processing (AREA)

Abstract

The invention discloses a binocular and point cloud fusion depth recovery method, a device and a medium, wherein a depth recovery neural network is constructed by the method and comprises a sparse extension module, a multi-scale feature extraction and fusion module, a variable weight Gaussian modulation module and a cascaded three-dimensional convolution neural network module, sparse point cloud data and a binocular image are used as input, a semi-dense depth image is obtained through neighborhood extension, feature extraction and fusion are carried out based on the image and the binocular image, a cost volume is constructed, a variable weight Gaussian modulation function is used for modulation, cost aggregation is carried out through a deep learning network, and recovery of dense depth information is realized. On the basis of a binocular stereo matching network, sparse point cloud is introduced, the density of guide points is improved by a neighborhood expansion method, and a Gaussian modulation and multi-scale feature extraction fusion method is comprehensively adopted, so that the method is beneficial to improving the precision and the robustness of depth recovery, and is an effective method for dense depth recovery in real application.

Description

一种双目与点云融合深度恢复方法、装置和介质A binocular and point cloud fusion depth recovery method, device and medium

技术领域Technical Field

本发明涉及计算机视觉领域,尤其涉及一种双目与点云融合深度恢复方法、装置和介质。The present invention relates to the field of computer vision, and in particular to a binocular and point cloud fusion depth recovery method, device and medium.

背景技术Background Art

深度恢复是计算机视觉中的一个非常重要的应用,被广泛应用于机器人、自动驾驶、三维重建等诸多领域。Depth recovery is a very important application in computer vision and is widely used in many fields such as robotics, autonomous driving, and 3D reconstruction.

相较于传统的双目立体匹配深度恢复方法,双目与稀疏点云融合的深度恢复算法引入了源于激光雷达、TOF相机等传感器的高精度稀疏点云作为先验信息,对深度的恢复起到了引导性的作用。尤其在纹理特征弱、遮挡、域变化大等场景中,稀疏点云所提供的深度信息可以有效提升深度恢复的精度及鲁棒性。Compared with the traditional binocular stereo matching depth recovery method, the binocular and sparse point cloud fusion depth recovery algorithm introduces high-precision sparse point clouds from sensors such as lidar and TOF cameras as prior information, which plays a guiding role in depth recovery. Especially in scenes with weak texture features, occlusion, and large domain changes, the depth information provided by sparse point clouds can effectively improve the accuracy and robustness of depth recovery.

现有的双目与点云融合深度恢复算法主要分为点云引导代价聚合、点云信息融合两类,且两类算法均直接利用原始的稀疏点云进行融合或引导处理。但是,由于输入点云数据的稀疏性,点云引导代价聚合的方法实际引导信息有限,且仅在深度范围上进行调制引导,无法在图像维度上提供更加充足的先验信息。而对于点云信息融合的方法,直接融合或者特征融合都因数据存在的不连续性,导致提取的融合信息纹理性偏弱。The existing binocular and point cloud fusion depth recovery algorithms are mainly divided into two categories: point cloud guided cost aggregation and point cloud information fusion. Both algorithms directly use the original sparse point cloud for fusion or guidance processing. However, due to the sparsity of the input point cloud data, the point cloud guided cost aggregation method has limited actual guidance information and only performs modulation guidance in the depth range, which cannot provide more sufficient prior information in the image dimension. As for the point cloud information fusion method, direct fusion or feature fusion results in weak texture of the extracted fusion information due to the discontinuity of the data.

发明内容Summary of the invention

本发明的目的在于针对现有技术的不足,提供一种双目与点云融合深度恢复方法、装置和介质。The purpose of the present invention is to provide a binocular and point cloud fusion depth recovery method, device and medium to address the deficiencies of the prior art.

本发明的目的是通过以下技术方案来实现的:本发明实施例第一方面提供了一种双目与点云融合深度恢复方法,包括以下步骤:The objective of the present invention is achieved through the following technical solutions: In a first aspect, an embodiment of the present invention provides a binocular and point cloud fusion depth recovery method, comprising the following steps:

(1)构建深度恢复网络,所述深度恢复网络包括稀疏扩展模块、多尺度特征提取及融合模块、可变权重高斯调制模块和级联三维卷积神经网络模块;所述深度恢复网络的输入为双目图像及稀疏点云数据,所述深度恢复网络的输出为稠密深度图像;(1) constructing a deep restoration network, the deep restoration network comprising a sparse expansion module, a multi-scale feature extraction and fusion module, a variable weight Gaussian modulation module and a cascaded three-dimensional convolutional neural network module; the input of the deep restoration network is a binocular image and sparse point cloud data, and the output of the deep restoration network is a dense depth image;

(2)训练所述步骤(1)构建的深度恢复网络,利用双目数据集,输入双目图像及稀疏点云数据,将稀疏点云数据投影到左目相机坐标系生成稀疏深度图,对比深度真值图像,对双目图像和稀疏深度图进行数据增强,计算输出稠密深度图像的损失值,反向传播网络迭代更新网络权重;(2) training the depth recovery network constructed in step (1), using a binocular data set, inputting a binocular image and sparse point cloud data, projecting the sparse point cloud data into the left camera coordinate system to generate a sparse depth map, comparing the true depth image, performing data enhancement on the binocular image and the sparse depth map, calculating the loss value of the output dense depth image, and iteratively updating the network weights using a back propagation network;

(3)在所述步骤(2)训练得到的深度恢复网络中输入待测试的双目图像及稀疏点云数据,利用传感器标定参数,将稀疏点云数据投影到左目相机坐标系生成稀疏深度图像,以输出稠密深度图像。(3) Inputting the binocular image to be tested and the sparse point cloud data into the depth recovery network trained in step (2), using the sensor calibration parameters, projecting the sparse point cloud data into the left camera coordinate system to generate a sparse depth image, and outputting a dense depth image.

进一步地,所述稀疏扩展模块具体为:以图像的多通道信息为引导,通过邻域扩展的方法,以提升稀疏点云数据的稠密度,并输出半稠密深度图。Furthermore, the sparse expansion module is specifically: guided by the multi-channel information of the image, the density of the sparse point cloud data is improved through the neighborhood expansion method, and a semi-dense depth map is output.

进一步地,构建所述稀疏扩展模块包括以下子步骤:Furthermore, constructing the sparse extension module includes the following sub-steps:

(a1)根据点云数据与左目相机图像之间的位姿关系获取稀疏深度图,分别提取稀疏深度图中的有效点的像素坐标和对应的图像多通道数值及其邻域内点的图像多通道数值;(a1) Obtain a sparse depth map based on the pose relationship between the point cloud data and the left camera image, and extract the pixel coordinates of the valid points in the sparse depth map and the corresponding image multi-channel values and the image multi-channel values of the points in their neighborhood respectively;

(a2)根据有效点的像素坐标对应的图像多通道数值与其邻域内点的图像多通道数值计算平均图像数值偏差;(a2) calculating the average image value deviation based on the image multi-channel values corresponding to the pixel coordinates of the valid points and the image multi-channel values of the points in its neighborhood;

(a3)根据有效点的平均图像数值偏差与设置的固定门限,将稀疏深度图扩展为半稠密深度图,并输出半稠密深度图。(a3) According to the average image value deviation of the valid points and the set fixed threshold, the sparse depth map is expanded into a semi-dense depth map, and the semi-dense depth map is output.

进一步地,所述多尺度特征提取及融合模块具体为:以稀疏扩展模块输出的半稠密深度图及双目图像为输入,采用Unet编码器译码器结构,结合空间金字塔池化方法,以提取点云特征、左目图像特征和右目图像特征,进而在特征层以级联方式融合左目图像特征与点云特征,以得到融合特征。Furthermore, the multi-scale feature extraction and fusion module is specifically as follows: taking the semi-dense depth map and binocular image output by the sparse extension module as input, adopting the Unet encoder-decoder structure, combined with the spatial pyramid pooling method, to extract point cloud features, left-eye image features and right-eye image features, and then fusing the left-eye image features and the point cloud features in a cascade manner at the feature layer to obtain fused features.

进一步地,构建所述多尺度特征提取及融合模块包括以下子步骤:Furthermore, constructing the multi-scale feature extraction and fusion module includes the following sub-steps:

(b1)分别对稀疏扩展模块输出的半稠密深度图及双目图像进行多层下采样编码,以获取多个尺度下采样编码后的左目图像特征、右目图像特征和点云特征;(b1) Perform multi-layer downsampling encoding on the semi-dense depth map and binocular image output by the sparse extension module to obtain left-eye image features, right-eye image features and point cloud features after downsampling encoding at multiple scales;

(b2)分别对最低分辨率的下采样编码后的左目图像特征、右目图像特征和点云特征进行空间金字塔池化处理,以得到池化处理后的结果;(b2) performing spatial pyramid pooling processing on the left-eye image features, right-eye image features and point cloud features after the lowest resolution downsampling encoding, respectively, to obtain the pooled results;

(b3)分别将左目图像特征、右目图像特征和点云特征池化处理后的结果进行多层上采样解码,以获取多个尺度上采样解码后的左目图像特征、右目图像特征和点云特征;(b3) performing multi-layer upsampling decoding on the results of the pooling processing of the left-eye image features, the right-eye image features and the point cloud features, respectively, to obtain the left-eye image features, the right-eye image features and the point cloud features after upsampling decoding at multiple scales;

(b4)将上采样解码后的左目图像特征与点云特征在特征维度上进行级联,得到左目图像特征与点云特征的融合特征。(b4) The upsampled and decoded left-eye image features and point cloud features are cascaded in the feature dimension to obtain the fusion features of the left-eye image features and the point cloud features.

进一步地,所述可变权重高斯调制模块具体为:以半稠密深度图的数据可靠性为依据,生成不同权重的高斯调制函数,对代价卷不同像素位置上的深度维度进行调制。Furthermore, the variable weight Gaussian modulation module is specifically: based on the data reliability of the semi-dense depth map, Gaussian modulation functions with different weights are generated to modulate the depth dimension at different pixel positions of the cost volume.

进一步地,构建所述可变权重高斯调制模块包括以下子步骤:Furthermore, constructing the variable weight Gaussian modulation module includes the following sub-steps:

(c1)根据融合特征及右目图像特征,采用级联的方式构造代价卷;(c1) Based on the fusion features and the right-eye image features, the cost volume is constructed in a cascade manner;

(c2)根据稀疏点云的可靠性,分别构造不同权重的高斯调制函数;(c2) According to the reliability of the sparse point cloud, Gaussian modulation functions with different weights are constructed respectively;

(c3)根据构造的高斯调制函数对代价卷进行调制,以获取调制后的代价卷。(c3) Modulate the cost volume according to the constructed Gaussian modulation function to obtain the modulated cost volume.

进一步地,构建所述级联三维卷积神经网络模块包括以下子步骤:Furthermore, constructing the cascaded three-dimensional convolutional neural network module includes the following sub-steps:

(d1)将低分辨率的代价卷通过三维卷积神经网络进行代价卷融合及代价卷聚合,以获取聚合后的代价卷;(d1) The low-resolution cost volume is fused and aggregated through a three-dimensional convolutional neural network to obtain an aggregated cost volume;

(d2)采用softmax函数获取每个像素坐标上所有深度值的softmax值,以得到低分辨率的深度图;(d2) Use the softmax function to obtain the softmax value of all depth values at each pixel coordinate to obtain a low-resolution depth map;

(d3)根据低分辨率的深度图进行上采样,以获取高分辨率的深度图的预测结果,并通过三次级联的迭代处理,以得到完整分辨率下的稠密深度图。(d3) Upsample the low-resolution depth map to obtain the prediction result of the high-resolution depth map, and then iterate through three cascades to obtain a dense depth map at the full resolution.

本发明实施例第二方面提供了一种双目与点云融合深度恢复装置,包括一个或多个处理器,用于实现上述的双目与点云融合深度恢复方法。A second aspect of an embodiment of the present invention provides a binocular and point cloud fusion depth recovery device, comprising one or more processors, for implementing the above binocular and point cloud fusion depth recovery method.

本发明实施例第三方面提供了一种计算机可读存储介质,其上存储有程序,该程序被处理器执行时,用于实现上述的双目与点云融合深度恢复方法。A third aspect of an embodiment of the present invention provides a computer-readable storage medium having a program stored thereon, which, when executed by a processor, is used to implement the above-mentioned binocular and point cloud fusion depth recovery method.

本发明的有益效果是,本发明是基于点云与双目融合恢复稠密深度的,以稀疏点云数据及双目图像为输入,通过邻域扩展得到半稠密的深度图像,并基于该深度图像及双目图像进行特征提取及特征融合,构建代价卷,利用可变权重的高斯调制函数调制代价卷,并通过深度学习网络进行代价聚合,实现稠密深度信息的恢复;本发明是在双目立体匹配深度恢复网络的设计基础上,引入稀疏点云,以邻域扩展方法提升引导点稠密度,并基于此,采用高斯调制引导的方法及多尺度特征提取及融合的方法,辅助提升深度恢复的精度及鲁棒性。本发明依赖能够提供双目图像数据和稀疏点云数据的传感器设备,有助于提高精度及鲁棒性,是真实应用中稠密深度恢复的有效方法。The beneficial effect of the present invention is that the present invention is based on the fusion of point cloud and binocular to restore dense depth, with sparse point cloud data and binocular image as input, and obtains semi-dense depth image through neighborhood expansion, and performs feature extraction and feature fusion based on the depth image and binocular image, constructs cost volume, modulates cost volume with Gaussian modulation function with variable weight, and performs cost aggregation through deep learning network to realize the recovery of dense depth information; the present invention is based on the design of binocular stereo matching depth recovery network, introduces sparse point cloud, improves the density of guide point by neighborhood expansion method, and based on this, adopts Gaussian modulation guidance method and multi-scale feature extraction and fusion method to assist in improving the accuracy and robustness of depth recovery. The present invention relies on sensor equipment that can provide binocular image data and sparse point cloud data, which helps to improve accuracy and robustness, and is an effective method for dense depth recovery in real applications.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1 为网络总体架构图;Figure 1 is a diagram of the overall network architecture;

图2 为稀疏扩展示意图;Figure 2 is a schematic diagram of sparse expansion;

图3 为可变权重高斯调制示意图;Figure 3 is a schematic diagram of variable weight Gaussian modulation;

图4 为本发明的效果示意图;其中,图4中的a为输入的左目图像,图4中的b为输入的右目图像,图4中的c为输入稀疏点云重投影到左目坐标系下的图片,图4中的d为恢复得到的深度图片;FIG4 is a schematic diagram of the effect of the present invention; wherein a in FIG4 is the input left-eye image, b in FIG4 is the input right-eye image, c in FIG4 is the image of the input sparse point cloud reprojected to the left-eye coordinate system, and d in FIG4 is the restored depth image;

图5为本发明的双目与点云融合深度恢复装置的一种结构示意图。FIG5 is a schematic diagram of the structure of a binocular and point cloud fusion depth recovery device of the present invention.

具体实施方式DETAILED DESCRIPTION

下面根据附图详细说明本发明。The present invention will be described in detail below with reference to the accompanying drawings.

本发明的双目与点云融合深度恢复方法,如图1所示,包括如下步骤:The binocular and point cloud fusion depth recovery method of the present invention, as shown in FIG1 , comprises the following steps:

(1)构建深度恢复网络。(1) Construct a deep recovery network.

整体的网络架构设计基于开源的深度学习框架pytorch,在公开的双目立体匹配网络架构CF-NET上进行改造,构建稀疏扩展模块、多尺度特征提取及融合模块、可变权重高斯调制模块、级联三维卷积神经网络模块四个部分。另外,深度恢复网络的输入为双目图像及稀疏点云数据,深度恢复网络的输出为稠密深度图像。The overall network architecture design is based on the open source deep learning framework pytorch, and is modified on the public binocular stereo matching network architecture CF-NET, constructing four parts: sparse expansion module, multi-scale feature extraction and fusion module, variable weight Gaussian modulation module, and cascaded three-dimensional convolutional neural network module. In addition, the input of the deep recovery network is binocular images and sparse point cloud data, and the output of the deep recovery network is a dense depth image.

(1.1)构建稀疏扩展模块。(1.1) Construct sparse extension modules.

本模块的整体处理流程如图2所示,以图像的多通道信息为引导,通过邻域扩展的方法,可以提升稀疏点云数据的稠密度,并输出半稠密深度图。The overall processing flow of this module is shown in Figure 2. Guided by the multi-channel information of the image, the density of the sparse point cloud data can be improved through the neighborhood expansion method, and a semi-dense depth map can be output.

(a1)根据点云数据与左目相机图像之间的位姿关系,利用openCv重投影函数,将输入的稀疏点云数据投影到相机坐标系,得到稀疏深度图

Figure SMS_2
,并定义深度值
Figure SMS_8
大于0的深度信息是有效点,分别提取稀疏深度图D中的有效点
Figure SMS_11
的像素坐标
Figure SMS_4
和对应的图像多通道数值
Figure SMS_7
及其邻域内点的图像多通道数值
Figure SMS_10
,其中,
Figure SMS_13
表示重投影到左目图像的稀疏深度图D在
Figure SMS_1
坐标位置上的深度值,W、H分别表示图像的宽度及高度,在本实施例中W=960,H=512,
Figure SMS_5
表示在像素坐标
Figure SMS_9
下的c通道的图像多通道数值,
Figure SMS_12
为通道数,RGB图像对应的C=3,α、β分别表示邻域内点的横坐标及纵坐标上的偏置值,
Figure SMS_3
表示邻域的距离,本实施例中,取
Figure SMS_6
=2。应当理解的是,C还可以取其它数值,例如,RGBA图像对应的C=4。(a1) According to the pose relationship between the point cloud data and the left camera image, the openCv reprojection function is used to project the input sparse point cloud data into the camera coordinate system to obtain a sparse depth map.
Figure SMS_2
, and define the depth value
Figure SMS_8
Depth information greater than 0 is a valid point, and the valid points in the sparse depth map D are extracted respectively.
Figure SMS_11
The pixel coordinates
Figure SMS_4
And the corresponding image multi-channel values
Figure SMS_7
The multi-channel values of the image points in its neighborhood
Figure SMS_10
,in,
Figure SMS_13
Represents the sparse depth map D reprojected to the left eye image
Figure SMS_1
The depth value at the coordinate position, W, H represent the width and height of the image respectively. In this embodiment, W=960, H=512,
Figure SMS_5
Represented in pixel coordinates
Figure SMS_9
The multi-channel value of the image of the c channel,
Figure SMS_12
is the number of channels, C=3 for RGB images, α and β represent the offset values on the horizontal and vertical coordinates of the points in the neighborhood, respectively.
Figure SMS_3
Represents the distance of the neighborhood. In this embodiment,
Figure SMS_6
=2. It should be understood that C may also take other values, for example, C=4 for RGBA images.

(a2)根据有效点的像素坐标

Figure SMS_14
对应的图像多通道数值
Figure SMS_15
与其邻域内点的图像多通道数值
Figure SMS_16
计算平均图像数值偏差
Figure SMS_17
,平均图像数值偏差
Figure SMS_18
的表达式为:(a2) According to the pixel coordinates of the effective point
Figure SMS_14
The corresponding image multi-channel values
Figure SMS_15
Multi-channel values of the image and its neighborhood points
Figure SMS_16
Calculate the mean image value deviation
Figure SMS_17
, average image value deviation
Figure SMS_18
The expression is:

Figure SMS_19
Figure SMS_19

其中,C为通道数,

Figure SMS_20
表示在像素坐标
Figure SMS_21
下的c通道的图像多通道数值,
Figure SMS_22
表示在像素坐标
Figure SMS_23
的邻域内点的c通道的图像数值,α、β分别表示邻域内点的横坐标及纵坐标上的偏置值,
Figure SMS_24
Figure SMS_25
Figure SMS_26
表示邻域的距离。Where C is the number of channels,
Figure SMS_20
Represented in pixel coordinates
Figure SMS_21
The multi-channel value of the image of the c channel,
Figure SMS_22
Represented in pixel coordinates
Figure SMS_23
The image value of the c channel of the point in the neighborhood, α and β represent the offset values on the horizontal and vertical coordinates of the point in the neighborhood, respectively.
Figure SMS_24
,
Figure SMS_25
,
Figure SMS_26
Represents the distance of the neighborhood.

(a3)针对每一个有效点的像素坐标

Figure SMS_27
,将平均图像数值偏差
Figure SMS_28
与固定门限(Threshold)对比,该固定门限(Threshold)表示像素扩展的难易程度,可以根据最终深度恢复的精度进行调整,在本实施例中,我们将固定门限(Threshold)设置为8,然后可以通过下列公式将稀疏深度图D扩展为半稠密深度图Dexp,完成全部有效点的邻域扩展后,可以得到最终的半稠密深度图,并输出半稠密深度图:(a3) Pixel coordinates for each valid point
Figure SMS_27
, the average image value deviation
Figure SMS_28
Compared with the fixed threshold (Threshold), the fixed threshold (Threshold) indicates the difficulty of pixel expansion and can be adjusted according to the accuracy of the final depth recovery. In this embodiment, we set the fixed threshold (Threshold) to 8, and then the sparse depth map D can be expanded to a semi-dense depth map D exp by the following formula. After the neighborhood expansion of all valid points is completed, the final semi-dense depth map can be obtained and the semi-dense depth map is output:

Figure SMS_29
Figure SMS_29

其中,

Figure SMS_30
表示重投影到左目图像的稀疏深度图D在
Figure SMS_31
坐标位置上的深度值,
Figure SMS_32
表示重投影到左目图像的稀疏深度图D在
Figure SMS_33
坐标位置上的深度值,α、β分别表示邻域内点的横坐标及纵坐标上的偏置值,
Figure SMS_34
Figure SMS_35
Figure SMS_36
表示邻域的距离。in,
Figure SMS_30
Represents the sparse depth map D reprojected to the left eye image
Figure SMS_31
The depth value at the coordinate position,
Figure SMS_32
Represents the sparse depth map D reprojected to the left eye image
Figure SMS_33
The depth value at the coordinate position, α and β represent the offset values on the horizontal and vertical coordinates of the points in the neighborhood respectively.
Figure SMS_34
,
Figure SMS_35
,
Figure SMS_36
Represents the distance of the neighborhood.

(1.2)构建多尺度特征提取及融合模块。(1.2) Construct a multi-scale feature extraction and fusion module.

本模块以稀疏拓展模块输出的半稠密深度图及双目图像为输入,分别采用Unet编码器译码器结构,结合空间金字塔池化方法,可以提取点云特征、左目图像特征和右目图像特征,进而在特征层以级联方式融合左目图像特征与点云特征,从而得到融合特征。This module takes the semi-dense depth map and binocular image output by the sparse extension module as input, and adopts the Unet encoder-decoder structure, combined with the spatial pyramid pooling method, to extract point cloud features, left eye image features and right eye image features, and then fuses the left eye image features and point cloud features in a cascade manner at the feature layer to obtain fused features.

(b1)分别对半稠密深度图及双目图像进行多层下采样编码,可以得到多个尺度下采样编码后的左目图像特征

Figure SMS_37
、右目图像特征
Figure SMS_38
和点云特征
Figure SMS_39
,其中,
Figure SMS_40
表示下采样编码后第i层的特征维度,在本实施例中将半稠密深度图及双目图像通过5级残差块进行下采样编码,此时,
Figure SMS_41
,W=960,H=512。(b1) Perform multi-layer downsampling encoding on the semi-dense depth map and the binocular image respectively, and obtain the left-eye image features after downsampling encoding at multiple scales.
Figure SMS_37
, right eye image features
Figure SMS_38
and point cloud features
Figure SMS_39
,in,
Figure SMS_40
represents the feature dimension of the i-th layer after downsampling coding. In this embodiment, the semi-dense depth map and the binocular image are downsampled and coded through a 5-level residual block. At this time,
Figure SMS_41
, W=960, H=512.

(b2)分别对最低分辨率的下采样编码后的左目图像特征、右目图像特征和点云特征进行空间金字塔池化处理,以得到池化处理后的结果,池化处理后的结果分别表示为:(b2) The left-eye image features, right-eye image features and point cloud features after the lowest resolution downsampling encoding are subjected to spatial pyramid pooling to obtain the pooled results. The pooled results are expressed as follows:

Figure SMS_42
Figure SMS_42

Figure SMS_43
Figure SMS_43

Figure SMS_44
Figure SMS_44

其中,

Figure SMS_45
表示池化函数,意味着对下采样编码后的特征进行空间金字塔池化处理,N表示下采样编码的最大层数,
Figure SMS_46
表示左目图像特征池化结果,
Figure SMS_47
表示右目图像特征池化结果,
Figure SMS_48
表示点云特征池化结果。in,
Figure SMS_45
represents the pooling function, which means that the features after downsampling encoding are processed by spatial pyramid pooling. N represents the maximum number of layers of downsampling encoding.
Figure SMS_46
Indicates the feature pooling result of the left image.
Figure SMS_47
Indicates the feature pooling result of the right image.
Figure SMS_48
Represents the point cloud feature pooling result.

本实施例中,采用类似于公开网络HSMNet的空间金字塔池化方法对最低分辨率的下采样编码后的左目图像特征、右目图像特征和点云特征进行4级平均池化,每一级的池化大小分别为

Figure SMS_49
,池化函数用
Figure SMS_50
来表示,则池化处理后的结果分别可以表示为:In this embodiment, a spatial pyramid pooling method similar to the public network HSMNet is used to perform 4-level average pooling on the left-eye image features, right-eye image features and point cloud features after the lowest resolution downsampling encoding, and the pooling size of each level is
Figure SMS_49
, the pooling function is used
Figure SMS_50
To express it, the results after pooling can be expressed as:

Figure SMS_51
Figure SMS_51

Figure SMS_52
Figure SMS_52

Figure SMS_53
Figure SMS_53

(b3)分别将左目图像特征、右目图像特征和点云特征池化处理后的结果进行多层上采样解码,可以得到多个尺度上采样解码后的左目图像特征

Figure SMS_54
、右目图像特征
Figure SMS_55
和点云特征
Figure SMS_56
,其中,
Figure SMS_57
表示上采样解码得到的第
Figure SMS_58
层的特征维度,相应地,左目图像特征、右目图像特征和点云特征上采样解码后的结果分别表示为:(b3) Perform multi-layer upsampling decoding on the results of the pooling of the left-eye image features, the right-eye image features and the point cloud features, respectively, to obtain the left-eye image features after upsampling and decoding at multiple scales.
Figure SMS_54
, right eye image features
Figure SMS_55
and point cloud features
Figure SMS_56
,in,
Figure SMS_57
Indicates the first
Figure SMS_58
The feature dimension of the layer, accordingly, the results of upsampling and decoding of the left image feature, the right image feature and the point cloud feature are expressed as:

Figure SMS_59
Figure SMS_59

Figure SMS_60
Figure SMS_60

Figure SMS_61
Figure SMS_61

其中,

Figure SMS_62
表示向量级联函数,
Figure SMS_63
表示上采样解码模块处理函数,N表示上采样解码的最大层数。in,
Figure SMS_62
represents the vector concatenation function,
Figure SMS_63
Represents the upsampling decoding module processing function, and N represents the maximum number of upsampling decoding layers.

在本实施例中,通过5级对应的上采样解码模块进行上采样解码,这里设置F1=64,F2=128,F3=192,F4=256,F5=512,上采样解码后的结果分别表示为:In this embodiment, up-sampling decoding is performed through 5 levels of corresponding up-sampling decoding modules, where F 1 =64, F 2 =128, F 3 =192, F 4 =256, and F 5 =512 are set. The results after up-sampling decoding are respectively expressed as:

Figure SMS_64
Figure SMS_64

Figure SMS_65
Figure SMS_65

Figure SMS_66
Figure SMS_66

(b4)将左目图像特征与点云特征在特征维度上进行级联,可以得到左目图像特征与点云特征的融合特征

Figure SMS_67
,其表达式为:(b4) By concatenating the left-eye image features and the point cloud features in the feature dimension, we can obtain the fusion features of the left-eye image features and the point cloud features.
Figure SMS_67
, whose expression is:

Figure SMS_68
Figure SMS_68

其中,

Figure SMS_69
表示向量级联函数,
Figure SMS_70
表示上采样解码后的左目图像特征,
Figure SMS_71
表示上采样解码后的点云特征,i表示第i层的特征维度。in,
Figure SMS_69
represents the vector concatenation function,
Figure SMS_70
Represents the left image features after upsampling and decoding,
Figure SMS_71
Represents the point cloud features after upsampling and decoding, and i represents the feature dimension of the i-th layer.

(1.3)构建可变权重高斯调制模块。(1.3) Construct a variable weight Gaussian modulation module.

以半稠密深度图像的数据可靠性为依据,生成不同权重的高斯调制函数,对代价卷不同像素位置上的深度维度进行调制。Based on the data reliability of the semi-dense depth image, Gaussian modulation functions with different weights are generated to modulate the depth dimension at different pixel positions of the cost volume.

(c1)根据融合特征及右目图像特征,采用级联的方式构造代价卷

Figure SMS_72
,其中,
Figure SMS_73
表示最大的视差搜索范围,在本实施例中
Figure SMS_74
取值256,
Figure SMS_75
表示代价卷的特征维度,
Figure SMS_76
表示上采样解码得到的第
Figure SMS_77
层的特征维度,W=960,H=512。(c1) Based on the fusion features and the right image features, the cost volume is constructed in a cascade manner
Figure SMS_72
,in,
Figure SMS_73
Indicates the maximum parallax search range. In this embodiment,
Figure SMS_74
The value is 256.
Figure SMS_75
Represents the feature dimension of the cost volume,
Figure SMS_76
Indicates the first
Figure SMS_77
The feature dimensions of the layer are W=960 and H=512.

(c2)根据稀疏点云的可靠性,分别构造不同权重的高斯调制函数,高斯调制函数的表达式为:(c2) According to the reliability of the sparse point cloud, Gaussian modulation functions with different weights are constructed respectively. The expression of the Gaussian modulation function is:

Figure SMS_78
Figure SMS_78

Figure SMS_79
Figure SMS_79

其中,k1、c1分别表示原始稀疏点云对应调制函数的权重及方差,k2、c2分别表示扩展得到的点云对应调制函数的权重及方差,本实施例中,k1=10,c1=1,k2=2,c2=8,

Figure SMS_82
表示重投影到左目图像的稀疏深度图D在
Figure SMS_84
坐标位置上的深度值,
Figure SMS_87
表示重投影到左目图像的半稠密深度图Dexp
Figure SMS_81
坐标位置上的深度值,
Figure SMS_85
Figure SMS_88
分别为
Figure SMS_90
Figure SMS_80
的掩码,当对应的点有效(
Figure SMS_83
)时,
Figure SMS_86
Figure SMS_89
的值设为1,否则设为0,d表示深度维度上的坐标。Wherein, k 1 and c 1 represent the weight and variance of the modulation function corresponding to the original sparse point cloud, respectively; k 2 and c 2 represent the weight and variance of the modulation function corresponding to the expanded point cloud, respectively. In this embodiment, k 1 =10, c 1 =1, k 2 =2, c 2 =8,
Figure SMS_82
Represents the sparse depth map D reprojected to the left eye image
Figure SMS_84
The depth value at the coordinate position,
Figure SMS_87
Represents the semi-dense depth map D exp reprojected to the left eye image
Figure SMS_81
The depth value at the coordinate position,
Figure SMS_85
,
Figure SMS_88
They are
Figure SMS_90
,
Figure SMS_80
, when the corresponding point is valid (
Figure SMS_83
)hour,
Figure SMS_86
,
Figure SMS_89
The value of is set to 1, otherwise it is set to 0, and d represents the coordinate in the depth dimension.

(c3)根据构造的高斯调制函数对代价卷进行调制,得到调制后的代价卷

Figure SMS_91
。具体地,对于所有代价卷
Figure SMS_92
位置上的特征值
Figure SMS_93
,其调制后的特征值表示为:(c3) Modulate the cost volume according to the constructed Gaussian modulation function to obtain the modulated cost volume
Figure SMS_91
Specifically, for all cost volumes
Figure SMS_92
Eigenvalues at positions
Figure SMS_93
, its modulated eigenvalue is expressed as:

Figure SMS_94
Figure SMS_94

可变权重高斯调制模块的整体流程示意图如图3所示,对应的稀疏点云可以分为无效点、原始点和邻域扩展得到的点。The overall process diagram of the variable weight Gaussian modulation module is shown in Figure 3. The corresponding sparse point cloud can be divided into invalid points, original points and points obtained by neighborhood expansion.

具体地,无效点的

Figure SMS_97
Figure SMS_101
,故
Figure SMS_104
Figure SMS_96
,因此,无效点对应位置的代价卷保持不变;原始点的
Figure SMS_100
Figure SMS_103
,故
Figure SMS_106
Figure SMS_95
,因此,原始点对应位置代的价卷使用高权重低方差k1=10,c1=1的高斯调制函数进行调制;邻域扩展得到的点的
Figure SMS_99
Figure SMS_102
,故
Figure SMS_105
Figure SMS_98
,因此,邻域扩展得到的点可靠性偏差,使用低权重高方差k2=2,c2=8的高斯调制函数进行调制。Specifically, the invalid point
Figure SMS_97
,
Figure SMS_101
, so
Figure SMS_104
,
Figure SMS_96
Therefore, the cost volume corresponding to the invalid point remains unchanged; the original point
Figure SMS_100
,
Figure SMS_103
, so
Figure SMS_106
,
Figure SMS_95
Therefore, the valence volume of the original point corresponding to the position is modulated using a Gaussian modulation function with high weight and low variance k 1 = 10, c 1 = 1; the point obtained by neighborhood expansion
Figure SMS_99
,
Figure SMS_102
, so
Figure SMS_105
,
Figure SMS_98
,Therefore, the point reliability deviation obtained by neighborhood expansion is modulated using a Gaussian modulation function with low weight and high variance k 2 =2, c 2 =8.

(1.4)构建级联三维卷积神经网络模块。(1.4) Construct a cascaded three-dimensional convolutional neural network module.

(d1)采用公开网络CF-NET中的级联三维卷积神经网络的方法,将低分辨率的代价卷

Figure SMS_107
通过沙漏型的三维卷积神经网络进行代价卷融合及代价卷聚合,得到聚合后的代价卷
Figure SMS_108
。(d1) Using the cascaded 3D convolutional neural network method in the public network CF-NET, the low-resolution cost volume
Figure SMS_107
The cost volume is fused and aggregated through the hourglass-shaped three-dimensional convolutional neural network to obtain the aggregated cost volume.
Figure SMS_108
.

(d2)采用softmax函数获取每个像素坐标上所有深度值的softmax值,可以得到低分辨率的深度图

Figure SMS_109
。(d2) The softmax function is used to obtain the softmax value of all depth values at each pixel coordinate to obtain a low-resolution depth map.
Figure SMS_109
.

(d3)基于低分辨率的深度图进行上采样,可以得到高分辨率的深度图的预测结果

Figure SMS_110
。以预测结果为中心,根据预测结果的可靠性来划定实际预测的深度的范围,作为高分辨率代价卷聚合的深度分布范围
Figure SMS_111
。该分布范围递归到高分辨率代价卷的代价聚合过程,并通过沙漏型的三维卷积神经网络进行代价聚合,得到聚合后的高一级分辨率的代价卷
Figure SMS_112
,其中,
Figure SMS_113
表示当前的深度层次数,对应的实际深度值可以表示为
Figure SMS_114
。同样,利用softmax函数获取每个像素坐标上所有深度值的softmax值,可以得到当前分辨率的深度图
Figure SMS_115
。(d3) Based on the low-resolution depth map, upsampling can be performed to obtain the prediction result of the high-resolution depth map.
Figure SMS_110
Centered on the prediction results, the actual predicted depth range is defined according to the reliability of the prediction results, which is used as the depth distribution range of the high-resolution cost volume aggregation.
Figure SMS_111
The distribution range is recursively applied to the cost aggregation process of the high-resolution cost volume, and the cost is aggregated through an hourglass-shaped three-dimensional convolutional neural network to obtain a higher-resolution cost volume after aggregation.
Figure SMS_112
,in,
Figure SMS_113
Indicates the current depth level number, the corresponding actual depth value can be expressed as
Figure SMS_114
Similarly, the softmax function is used to obtain the softmax value of all depth values at each pixel coordinate to obtain the depth map of the current resolution.
Figure SMS_115
.

通过上述的过程进行3次级联的迭代处理,最终可以得到完整分辨率下的稠密深度图

Figure SMS_116
,级联三维卷积神经网络的架构如图1中的级联三维卷积神经网络所示。Through the above process, three cascade iterations are performed to finally obtain a dense depth map at full resolution.
Figure SMS_116
,The architecture of the cascaded 3D convolutional neural network is shown in the cascaded 3D convolutional neural network in Figure 1.

(2)训练所述步骤(1)构建的深度恢复网络,利用双目数据集,输入双目图像及稀疏点云数据,将稀疏点云数据投影到左目相机坐标系生成稀疏深度图,对比深度真值图像,对双目图像和稀疏深度图进行数据增强,计算输出稠密深度图像的损失值,反向传播网络迭代更新网络权重。(2) Training the depth recovery network constructed in step (1), using a binocular data set, inputting a binocular image and sparse point cloud data, projecting the sparse point cloud data into the left camera coordinate system to generate a sparse depth map, comparing the true depth image, performing data enhancement on the binocular image and the sparse depth map, calculating the loss value of the output dense depth image, and iteratively updating the network weights through a back propagation network.

本实施例中,可以选用开源的SceneFlow双目数据集作为任务样本;该数据集包含35454对双目图像及深度真值用于训练,7349对双目图像及深度真值用于测试。训练过程中,从深度真值中随机采样5%的点得到一个稀疏深度图,来模拟点云重投影的稀疏深度图,作为稀疏深度图的输入。In this embodiment, the open source SceneFlow binocular dataset can be used as a task sample; the dataset contains 35454 pairs of binocular images and depth truth values for training, and 7349 pairs of binocular images and depth truth values for testing. During the training process, 5% of the points are randomly sampled from the depth truth values to obtain a sparse depth map to simulate the sparse depth map of the point cloud reprojection as the input of the sparse depth map.

双目图像顺序使用随机遮挡、非对称颜色变换、随机剪裁等方法进行数据增强。其中,随机遮挡通过随机生成一个矩形坐标区域,将右图中对应区域内的所有坐标上的图像数据变换为平均图像数值来实现。非对称颜色变换则是分别对左右目图像使用不同的亮度、对比度、gamma数值变换处理来实现,对应的处理函数可以直接调用torchvision.transform下的adjust_brightness,adjust_contrast,adjust_gamma来实现,函数的参数利用随机生成函数来生成。随机剪裁则通过随机生成一个固定大小的矩形坐标区域,并裁剪掉其余区域图像信息来实现。稀疏深度图同样顺序使用随机遮挡、随机裁剪等方法来实现数据增强,其中,随机遮挡的位置随机生成,无需与双目图像的位置保持一致,而随机裁剪的区域则需与双目图像的裁剪位置一致,以确保双目图像信息与深度信息的对应。The binocular images are sequentially enhanced using random occlusion, asymmetric color transformation, random cropping and other methods. Among them, random occlusion is achieved by randomly generating a rectangular coordinate area and transforming the image data at all coordinates in the corresponding area in the right image into the average image value. Asymmetric color transformation is achieved by using different brightness, contrast, and gamma value transformation processing on the left and right images respectively. The corresponding processing function can be directly called adjust_brightness, adjust_contrast, and adjust_gamma under torchvision.transform. The function parameters are generated using random generation functions. Random cropping is achieved by randomly generating a rectangular coordinate area of a fixed size and cropping the image information in the remaining area. Sparse depth maps are also sequentially enhanced using random occlusion, random cropping and other methods. Among them, the position of random occlusion is randomly generated and does not need to be consistent with the position of the binocular image, while the randomly cropped area must be consistent with the cropping position of the binocular image to ensure the correspondence between the binocular image information and the depth information.

经过数据增强处理的双目图像及稀疏深度图像作为输入,送入到步骤(1)的深度恢复网络中,并使用Adam 优化器进行端到端的网络训练,L1损失函数用于评估恢复的深度图与深度真值之间的损失,按照神经网络常见的前向传播及反向传播流程实现迭代训练,训练的学习率初始可以设置为

Figure SMS_117
,共迭代执行20轮,到第16轮,第18轮的时候,学习率降为原来的一半。学习率及迭代参数可以根据实际深度恢复精度结果进行调整。The binocular image and sparse depth image after data enhancement are used as input and sent to the depth recovery network in step (1). The Adam optimizer is used for end-to-end network training. The L1 loss function is used to evaluate the loss between the restored depth map and the true depth value. The iterative training is implemented according to the common forward propagation and back propagation process of the neural network. The initial learning rate of the training can be set to
Figure SMS_117
, a total of 20 rounds of iterations are performed. At the 16th and 18th rounds, the learning rate is reduced to half of the original. The learning rate and iteration parameters can be adjusted according to the actual depth recovery accuracy results.

(3)任务验证过程中,如图4所示,在步骤(2)训练得到的深度恢复网络中输入待测试的双目图像(如图4中的a、图4中的b所示)及稀疏点云数据,利用传感器标定参数,将稀疏点云数据投影到左目相机坐标系生成稀疏深度图像(如图4中的c所示)作为输入,最终输出稠密深度图像(如图4中的d所示),完成可视化过程。(3) During the task verification process, as shown in Figure 4, the binocular image to be tested (as shown in Figure 4a and Figure 4b) and the sparse point cloud data are input into the deep restoration network trained in step (2). The sparse point cloud data is projected into the left camera coordinate system using the sensor calibration parameters to generate a sparse depth image (as shown in Figure 4c) as input, and finally a dense depth image (as shown in Figure 4d) is output to complete the visualization process.

与前述双目与点云融合深度恢复方法的实施例相对应,本发明还提供了双目与点云融合深度恢复装置的实施例。Corresponding to the above-mentioned embodiment of the binocular and point cloud fusion depth recovery method, the present invention also provides an embodiment of the binocular and point cloud fusion depth recovery device.

参见图5,本发明实施例提供的一种双目与点云融合深度恢复装置,包括一个或多个处理器,用于实现上述实施例中的双目与点云融合深度恢复方法。Referring to FIG. 5 , a binocular and point cloud fusion depth recovery device provided in an embodiment of the present invention includes one or more processors for implementing the binocular and point cloud fusion depth recovery method in the above embodiment.

本发明双目与点云融合深度恢复装置的实施例可以应用在任意具备数据处理能力的设备上,该任意具备数据处理能力的设备可以为诸如计算机等设备或装置。装置实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在任意具备数据处理能力的设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,如图5所示,为本发明双目与点云融合深度恢复装置所在任意具备数据处理能力的设备的一种硬件结构图,除了图5所示的处理器、内存、网络接口、以及非易失性存储器之外,实施例中装置所在的任意具备数据处理能力的设备通常根据该任意具备数据处理能力的设备的实际功能,还可以包括其他硬件,对此不再赘述。The embodiment of the binocular and point cloud fusion depth recovery device of the present invention can be applied to any device with data processing capabilities, and the arbitrary device with data processing capabilities can be a device or apparatus such as a computer. The device embodiment can be implemented by software, or by hardware or a combination of software and hardware. Taking software implementation as an example, as a device in a logical sense, it is formed by the processor of any device with data processing capabilities in which it is located to read the corresponding computer program instructions in the non-volatile memory into the memory and run it. From the hardware level, as shown in Figure 5, it is a hardware structure diagram of any device with data processing capabilities in which the binocular and point cloud fusion depth recovery device of the present invention is located. In addition to the processor, memory, network interface, and non-volatile memory shown in Figure 5, any device with data processing capabilities in which the device in the embodiment is located can also include other hardware according to the actual function of the arbitrary device with data processing capabilities, which will not be repeated.

上述装置中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。The implementation process of the functions and effects of each unit in the above-mentioned device is specifically described in the implementation process of the corresponding steps in the above-mentioned method, and will not be repeated here.

对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本发明方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。For the device embodiment, since it basically corresponds to the method embodiment, the relevant parts can refer to the partial description of the method embodiment. The device embodiment described above is only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of the present invention. Ordinary technicians in this field can understand and implement it without paying creative work.

本发明实施例还提供一种计算机可读存储介质,其上存储有程序,该程序被处理器执行时,实现上述实施例中的双目与点云融合深度恢复方法。An embodiment of the present invention further provides a computer-readable storage medium having a program stored thereon. When the program is executed by a processor, the binocular and point cloud fusion depth recovery method in the above embodiment is implemented.

所述计算机可读存储介质可以是前述任一实施例所述的任意具备数据处理能力的设备的内部存储单元,例如硬盘或内存。所述计算机可读存储介质也可以是任意具备数据处理能力的设备,例如所述设备上配备的插接式硬盘、智能存储卡(Smart Media Card,SMC)、SD卡、闪存卡(Flash Card)等。进一步的,所述计算机可读存储介质还可以既包括任意具备数据处理能力的设备的内部存储单元也包括外部存储设备。所述计算机可读存储介质用于存储所述计算机程序以及所述任意具备数据处理能力的设备所需的其他程序和数据,还可以用于暂时地存储已经输出或者将要输出的数据。The computer-readable storage medium may be an internal storage unit of any device with data processing capability described in any of the aforementioned embodiments, such as a hard disk or a memory. The computer-readable storage medium may also be any device with data processing capability, such as a plug-in hard disk, a smart media card (SMC), an SD card, a flash card, etc. equipped on the device. Furthermore, the computer-readable storage medium may also include both an internal storage unit of any device with data processing capability and an external storage device. The computer-readable storage medium is used to store the computer program and other programs and data required by any device with data processing capability, and may also be used to temporarily store data that has been output or is to be output.

以上实施例仅用于说明本发明的设计思想和特点,其目的在于使本领域内的技术人员能够了解本发明的内容并据以实施,本发明的保护范围不限于上述实施例。所以,凡依据本发明所揭示的原理、设计思路所作的等同变化或修饰,均在本发明的保护范围之内。The above embodiments are only used to illustrate the design ideas and features of the present invention, and their purpose is to enable those skilled in the art to understand the content of the present invention and implement it accordingly. The protection scope of the present invention is not limited to the above embodiments. Therefore, any equivalent changes or modifications made based on the principles and design ideas disclosed by the present invention are within the protection scope of the present invention.

Claims (10)

1. A binocular and point cloud fusion depth recovery method is characterized by comprising the following steps:
(1) Constructing a depth recovery network, wherein the depth recovery network comprises a sparse extension module, a multi-scale feature extraction and fusion module, a variable weight Gaussian modulation module and a cascaded three-dimensional convolution neural network module; the input of the depth recovery network is a binocular image and sparse point cloud data, and the output of the depth recovery network is a dense depth image;
(2) Training the depth recovery network constructed in the step (1), inputting a binocular image and sparse point cloud data by using a binocular data set, projecting the sparse point cloud data to a left eye camera coordinate system to generate a sparse depth map, comparing a depth truth value image, performing data enhancement on the binocular image and the sparse depth map, calculating a loss value of an output dense depth image, and iteratively updating network weights by using a back propagation network;
(3) Inputting the binocular image and the sparse point cloud data to be tested in the depth recovery network obtained by training in the step (2), utilizing a sensor to calibrate parameters, projecting the sparse point cloud data to a left eye camera coordinate system to generate a sparse depth image, and outputting the dense depth image.
2. The binocular and point cloud fusion depth restoration method according to claim 1, wherein the sparse extension module is specifically: and (3) taking the multi-channel information of the image as a guide, and improving the density of the sparse point cloud data by a neighborhood expansion method, and outputting a semi-dense depth map.
3. The binocular and point cloud fusion depth restoration method according to claim 2, wherein constructing the sparse extension module comprises the sub-steps of:
(a1) Acquiring a sparse depth map according to the pose relationship between the point cloud data and the left eye camera image, and respectively extracting pixel coordinates of effective points in the sparse depth map, corresponding image multichannel values and image multichannel values of points in the neighborhood of the effective points;
(a2) Calculating the average image numerical value deviation according to the image multi-channel numerical value corresponding to the pixel coordinate of the effective point and the image multi-channel numerical value of the point in the neighborhood;
(a3) And expanding the sparse depth map into a semi-dense depth map according to the average image numerical deviation of the effective points and a set fixed threshold, and outputting the semi-dense depth map.
4. The binocular and point cloud fusion depth restoration method according to claim 1, wherein the multi-scale feature extraction and fusion module specifically comprises: the method comprises the steps of taking a semi-dense depth map and a binocular image output by a sparse extension module as input, adopting a Unet encoder decoder structure and combining a space pyramid pooling method to extract point cloud features, left eye image features and right eye image features, and further fusing the left eye image features and the point cloud features in a cascade mode on a feature layer to obtain fused features.
5. The binocular and point cloud fusion depth restoration method according to claim 4, wherein constructing the multi-scale feature extraction and fusion module comprises the sub-steps of:
(b1) Respectively carrying out multi-layer down-sampling coding on the semi-dense depth map and the binocular image output by the sparse extension module so as to obtain a plurality of scales of down-sampling coded left eye image features, right eye image features and point cloud features;
(b2) Respectively carrying out spatial pyramid pooling on the left eye image feature, the right eye image feature and the point cloud feature after the down-sampling coding with the lowest resolution ratio so as to obtain a result after the pooling;
(b3) Performing multi-layer up-sampling decoding on the left eye image characteristic, the right eye image characteristic and the result of the pooling processing of the point cloud characteristic respectively to obtain the left eye image characteristic, the right eye image characteristic and the point cloud characteristic which are subjected to the up-sampling decoding in multiple scales;
(b4) And cascading the left eye image features and the point cloud features after the up-sampling decoding on feature dimensions to obtain fusion features of the left eye image features and the point cloud features.
6. The binocular and point cloud fusion depth restoration method according to claim 1, wherein the variable weight gaussian modulation module is specifically: and generating Gaussian modulation functions with different weights according to the data reliability of the semi-dense depth map, and modulating the depth dimensions of the cost volume at different pixel positions.
7. The binocular and point cloud fusion depth restoration method according to claim 6, wherein constructing the variable weight Gaussian modulation module comprises the sub-steps of:
(c1) Constructing a cost volume in a cascading mode according to the fusion characteristics and the right-eye image characteristics;
(c2) Respectively constructing Gaussian modulation functions with different weights according to the reliability of the sparse point cloud;
(c3) And modulating the cost volume according to the constructed Gaussian modulation function to obtain the modulated cost volume.
8. The binocular and point cloud fusion depth restoration method according to claim 1, wherein constructing the cascaded three-dimensional convolutional neural network module comprises the sub-steps of:
(d1) Performing cost volume fusion and cost volume aggregation on the low-resolution cost volume through a three-dimensional convolution neural network to obtain an aggregated cost volume;
(d2) Obtaining softmax values of all depth values on each pixel coordinate by adopting a softmax function so as to obtain a low-resolution depth map;
(d3) And performing up-sampling according to the low-resolution depth map to obtain a prediction result of the high-resolution depth map, and performing three times of cascaded iterative processing to obtain a dense depth map under the complete resolution.
9. A binocular and point cloud fused depth recovery device, comprising one or more processors, for implementing the binocular and point cloud fused depth recovery method of any one of claims 1 to 8.
10. A computer-readable storage medium, having stored thereon a program which, when being executed by a processor, is adapted to carry out the binocular and point cloud fusion depth restoration method according to any one of claims 1 to 8.
CN202310170221.2A 2023-02-27 2023-02-27 A binocular and point cloud fusion depth restoration method, device and medium Active CN115861401B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310170221.2A CN115861401B (en) 2023-02-27 2023-02-27 A binocular and point cloud fusion depth restoration method, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310170221.2A CN115861401B (en) 2023-02-27 2023-02-27 A binocular and point cloud fusion depth restoration method, device and medium

Publications (2)

Publication Number Publication Date
CN115861401A true CN115861401A (en) 2023-03-28
CN115861401B CN115861401B (en) 2023-06-09

Family

ID=85659135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310170221.2A Active CN115861401B (en) 2023-02-27 2023-02-27 A binocular and point cloud fusion depth restoration method, device and medium

Country Status (1)

Country Link
CN (1) CN115861401B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118212337A (en) * 2024-05-21 2024-06-18 哈尔滨工业大学(威海) A new viewpoint rendering method for human body based on pixel-aligned 3D Gaussian point cloud representation

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346608A (en) * 2013-07-26 2015-02-11 株式会社理光 Sparse depth map densing method and device
CN109685842A (en) * 2018-12-14 2019-04-26 电子科技大学 A kind of thick densification method of sparse depth based on multiple dimensioned network
CN110738731A (en) * 2019-10-16 2020-01-31 光沦科技(深圳)有限公司 3D reconstruction method and system for binocular vision
CN111028285A (en) * 2019-12-03 2020-04-17 浙江大学 Depth estimation method based on binocular vision and laser radar fusion
CN111563923A (en) * 2020-07-15 2020-08-21 浙江大华技术股份有限公司 Method for obtaining dense depth map and related device
CN112102472A (en) * 2020-09-01 2020-12-18 北京航空航天大学 Sparse three-dimensional point cloud densification method
US10937178B1 (en) * 2019-05-09 2021-03-02 Zoox, Inc. Image-based depth data and bounding boxes
CN112435325A (en) * 2020-09-29 2021-03-02 北京航空航天大学 VI-SLAM and depth estimation network-based unmanned aerial vehicle scene density reconstruction method
US10984543B1 (en) * 2019-05-09 2021-04-20 Zoox, Inc. Image-based depth data and relative depth data
CN114004754A (en) * 2021-09-13 2022-02-01 北京航空航天大学 Scene depth completion system and method based on deep learning
CN114519772A (en) * 2022-01-25 2022-05-20 武汉图科智能科技有限公司 Three-dimensional reconstruction method and system based on sparse point cloud and cost aggregation
CN115512042A (en) * 2022-09-15 2022-12-23 网易(杭州)网络有限公司 Network training and scene reconstruction method, device, machine, system and equipment
CN115511759A (en) * 2022-09-23 2022-12-23 西北工业大学 A Point Cloud Image Depth Completion Method Based on Cascade Feature Interaction

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346608A (en) * 2013-07-26 2015-02-11 株式会社理光 Sparse depth map densing method and device
CN109685842A (en) * 2018-12-14 2019-04-26 电子科技大学 A kind of thick densification method of sparse depth based on multiple dimensioned network
US10937178B1 (en) * 2019-05-09 2021-03-02 Zoox, Inc. Image-based depth data and bounding boxes
US10984543B1 (en) * 2019-05-09 2021-04-20 Zoox, Inc. Image-based depth data and relative depth data
CN110738731A (en) * 2019-10-16 2020-01-31 光沦科技(深圳)有限公司 3D reconstruction method and system for binocular vision
CN111028285A (en) * 2019-12-03 2020-04-17 浙江大学 Depth estimation method based on binocular vision and laser radar fusion
CN111563923A (en) * 2020-07-15 2020-08-21 浙江大华技术股份有限公司 Method for obtaining dense depth map and related device
CN112102472A (en) * 2020-09-01 2020-12-18 北京航空航天大学 Sparse three-dimensional point cloud densification method
CN112435325A (en) * 2020-09-29 2021-03-02 北京航空航天大学 VI-SLAM and depth estimation network-based unmanned aerial vehicle scene density reconstruction method
CN114004754A (en) * 2021-09-13 2022-02-01 北京航空航天大学 Scene depth completion system and method based on deep learning
CN114519772A (en) * 2022-01-25 2022-05-20 武汉图科智能科技有限公司 Three-dimensional reconstruction method and system based on sparse point cloud and cost aggregation
CN115512042A (en) * 2022-09-15 2022-12-23 网易(杭州)网络有限公司 Network training and scene reconstruction method, device, machine, system and equipment
CN115511759A (en) * 2022-09-23 2022-12-23 西北工业大学 A Point Cloud Image Depth Completion Method Based on Cascade Feature Interaction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YONG LUO等: "《Full Resolution Dense Depth Recovery by Fusing RGB Images and Sparse Depth》", 《2019 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118212337A (en) * 2024-05-21 2024-06-18 哈尔滨工业大学(威海) A new viewpoint rendering method for human body based on pixel-aligned 3D Gaussian point cloud representation

Also Published As

Publication number Publication date
CN115861401B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
CN112396645B (en) Monocular image depth estimation method and system based on convolution residual learning
US11348270B2 (en) Method for stereo matching using end-to-end convolutional neural network
CN113658051A (en) A method and system for image dehazing based on recurrent generative adversarial network
CN110570522B (en) Multi-view three-dimensional reconstruction method
KR20210058683A (en) Depth image generation method and device
CN113850900B (en) Method and system for recovering depth map based on image and geometric clues in three-dimensional reconstruction
CN116416376A (en) Three-dimensional hair reconstruction method, system, electronic equipment and storage medium
CN110517352A (en) A three-dimensional reconstruction method, storage medium, terminal and system of an object
CN115035235A (en) Three-dimensional reconstruction method and device
CN117173229A (en) Monocular image depth estimation method and system integrating contrast learning
CN115861401B (en) A binocular and point cloud fusion depth restoration method, device and medium
CN112270701B (en) Parallax prediction method, system and storage medium based on packet distance network
CN117132516A (en) A new view synthesis method based on convolutional neural radiation field
CN114565789A (en) Text detection method, system, device and medium based on set prediction
CN118839745A (en) Step-by-step depth completion method and terminal based on nerve radiation field
CN117274066A (en) Image synthesis model, method, device and storage medium
CN114241052B (en) New perspective image generation method and system for multi-object scenes based on layout diagram
US12086965B2 (en) Image reprojection and multi-image inpainting based on geometric depth parameters
CN117218278A (en) Reconstruction method, device, equipment and storage medium of three-dimensional model
CN115797542A (en) Three-dimensional medical image geometric modeling method with direct volume rendering effect
KR20220154782A (en) Alignment training of multiple images
KR102648938B1 (en) Method and apparatus for 3D image reconstruction based on few-shot neural radiance fields using geometric consistency
Shi Svdm: Single-view diffusion model for pseudo-stereo 3d object detection
CN115482341B (en) Method, electronic device, program product and medium for generating mirage image
US20240161391A1 (en) Relightable neural radiance field model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant