Abstract
We propose a multi-view stereo network based on multi-distribution fitting (MDF-Net), which achieves high-resolution depth map prediction with low memory and high efficiency. This method adopts a four-stage cascade structure, which mainly has the following three contributions. First, view cost regularization is proposed to weaken the influence of matching noise on building the cost volume. Second, it is suggested to adaptively calculate the depth refinement interval using multi-distribution fitting (MDF). Gaussian distribution fitting is used to refine and correct depth within a large interval, and then Laplace distribution fitting is used to accurately estimate depth within a small interval. Third, the lightweight image super-resolution network is applied to upsample the depth map in the fourth stage to reduce running time and memory requirements. The experimental results on the DTU dataset indicate that MDF-Net has achieved the most advanced results. It has the lowest memory consumption and running time among the high-resolution reconstruction methods, requiring only approximately 4.29G memory for predicting a depth map with the resolution of 1600 × 1184. In addition, we validate the generalization ability on Tanks and Temples dataset, achieving very competitive performance. The code has been released at https://github.com/zongh5a/MDF-Net.
Similar content being viewed by others
References
Campbell, N.D.F., Vogiatzis, G., Hernández, C., Cipolla, R.: Using multiple hypotheses to improve depth-maps for multi-view stereo. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 766–779 (2008)
Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32(8), 1362–1376 (2010)
Tola, E., Strecha, C., Fua, P.: Efficient large-scale multi-view stereo for ultra high-resolution image sets. Mach. Vis. Appl. 23(5), 903–920 (2012)
Galliani, S., Lasinger, K., Schindler, K.: Massively parallel multiview stereopsis by surface normal diffusion. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 873–881 (2015)
Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSNet: depth inference for unstructured multi-view stereo. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 785–801 (2018)
Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., Quan, L.: Recurrent MVSNet for high-resolution multi-view stereo depth inference. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5520–5529 (2019)
Chen, R., Han, S., Xu, J., Su, H.: Point-based multi-view stereo network. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 1538–1547 (2019)
Yu, Z., Gao, S.: Fast-MVSNet: Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1946–1955 (2020)
Yi, H., et al.: Pyramid multi-view stereo net with self-adaptive view aggregation. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 766–782 (2020)
Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., Tan, P.: Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2492–2501 (2020)
Yang, J., Mao, W., Alvarez, J.M., Liu, M.: Cost volume pyramid based depth inference for multi-view stereo. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4876–4885 (2020)
Wang, F., Galliani, S., Vogel, C., Speciale, P., Pollefeys, M.: PatchmatchNet: learned multi-view patchmatch stereo. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14189–14198 (2021)
Cheng, S., et al.: Deep stereo using adaptive thin volume representation with uncertainty awareness. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2521–2531 (2020)
Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: SurfaceNet: an end-to-end 3d neural network for multiview stereopsis. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 2326–2334 (2017)
Zhang, J., Yao, Y., Li, S., Luo, Z., Fang, T.: Visibility-aware multi-view stereo network. Int. J. Comput. Vis. 131(1), 199–214 (2023)
Hui, T., Loy, C.C., Tang, X.: Depth map super-resolution by deep multi-scale guidance. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 353–369 (2016)
Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1646–1654 (2016)
Kim, J., Lee, J.K., Lee, K.M.: Deeply-recursive convolutional network for image super-resolution. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1637–1645 (2016)
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 105–114 (2017)
Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference MICCAI, pp. 234–241 (2015)
Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 66–75 (2017)
Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1132–1140 (2017)
Aanæs, H., Jensen, R.R., Vogiatzis, G., Tola, E., Dahl, A.B.: Large-scale data for multiple-view stereopsis. Int. J. Comput. Vis. 120(2), 153–168 (2016)
Yao, Y., et al.: BlendedMVS: a large-scale dataset for generalized multi-view stereo networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1787–1796 (2020)
Knapitsch, A., Park, J., Zhou, Q.-Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans. Graph. 36(4), 1–13 (2017)
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. In: Proceedings of NIPS Autodiff Workshop (2017)
Kingma, D.P., Ba, J.: Adam: a Method for Stochastic Optimization. arXiv, Jan. 29, 2017. Accessed: May 28, 2022. [Online]. http://arxiv.org/abs/1412.6980
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. AAAI 34(07), 11474–11481 (2020)
Merrell, P., et al.: Real-time visibility-based fusion of depth maps. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 1–8 (2007)
Yan, J., et al.: Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 674–689 (2020)
Funding
This work was supported by the National Natural Science Foundation of China under Grant 61971339 and 61471161, the Natural Science Basic Research Program of Shaanxi under Grant 2023-JC-YB-826, the Scientific Research Program Funded by Shaanxi Provincial Education Department under Grant 22JP028, and the Postgraduate Innovation Fund of Xi'an Polytechnic University under Grant chx2022019.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Why use softmax to preprocess feature groups?
Ideally, the more similar the features of different perspectives on the same depth plane, the closer the plane is to the true depth, and the higher the probability. Therefore, we believe that a good cost measurement method should satisfy the following points. First, the performance of the similarity measurement is good. Second, the more similar the features of the depth plane are, the larger the cost is, which is proportional. Third, the value range of the cost metric is consistent with the probability value range, which is between [1]. We choose the inner product operation as the main method of the cost metric (meeting the first point). In addition, compared to vector normalization, the gradient calculation of softmax normalization is simpler, and the value range of the inner product is kept in [1]. The feature group pretreated by softmax function reduces the fitting process, making VCR-Net and 3D CNN more efficient.
1.2 Why can VCR-Net improve the cost volume quality?
The functions of VCR-Net and the 3D CNN regularization network are similar. Both regularize the cost volume to obtain the probability value of each depth plane. The difference is that VCR-Net processes the cost volume of each view separately, takes the probability volume as the weight, and uses the sigmoid activation function. The features of the noise locations are mismatched, with low similarity, and a small weight is obtained by VCR-Net. When calculating the weighted average, a low weight is given to the noise to reduce its matching cost. The VCR-Net network structure is shown in Table
7.
1.3 Visualization of point cloud results
All qualitative results of our method are shown in Figs.
6 and
7.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, J., Yu, Z., Ma, L. et al. Multi-distribution fitting for multi-view stereo. Machine Vision and Applications 34, 93 (2023). https://doi.org/10.1007/s00138-023-01449-4
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-023-01449-4