Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3664647.3680787acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Free access

High Fidelity Aggregated Planar Prior Assisted PatchMatch Multi-View Stereo

Published: 28 October 2024 Publication History

Abstract

The quality of 3D models reconstructed by PatchMatch Multi-View Stereo remains a challenging problem due to unreliable photometric consistency in object boundaries and textureless areas. Since textureless areas usually exhibit strong planarity, previous methods used planar prior to improve the reconstruction performance. However, their planar prior ignores the depth discontinuity at the object boundary, making the boundary inaccurate (not sharp). In addition, due to the unreliable planar models in large-scale low-textured objects, the reconstruction results are incomplete. To address the above issues, we introduce the segmentation generated from Segment Anything Model into PatchMatch. Using segmentation to determine whether the depth is continuous based on the characteristics of segmentation and depth sharing boundaries. Then we construct Boundary Plane that fits the object boundary and Object Plane to increase consistency of planes in large-scale textureless objects. Finally, we use a probability graph model to calculate Aggregated Prior guided by Multiple Planes and embed it into the matching cost. The experimental results indicate that our method achieves SOTA in boundary sharpness on ETH3D and improves the completeness of weakly textured objects.

References

[1]
Henrik Aanæs, Rasmus Ramsbøl Jensen, George Vogiatzis, Engin Tola, and Anders Bjorholm Dahl. 2016. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision, Vol. 120 (2016), 153--168.
[2]
Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. 2009. PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Trans. Graph., Vol. 28, 3 (2009), 24.
[3]
Jonathan T Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and Pratul P Srinivasan. 2021. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5855--5864.
[4]
Michael Bleyer, Christoph Rhemann, and Carsten Rother. 2011. Patchmatch stereo-stereo matching with slanted support windows. In Bmvc, Vol. 11. 1--11.
[5]
Yangang Cai, Xufeng Li, Yueming Wang, and Ronggang Wang. 2022. An Overview of Panoramic Video Projection Schemes in the IEEE 1857.9 Standard for Immersive Visual Content Coding. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 9 (2022), 6400--6413. https://doi.org/10.1109/TCSVT.2022.3165878
[6]
Yangang Cai, Ronggang Wang, Zhenyu Wang, Bingjie Han, and Xufeng Li. 2021. An Efficient and Open Source Encoder Uavs3e for Video Compression. In 2021 IEEE International Conference on Multimedia and Expo (ICME). 1--6. https://doi.org/10.1109/ICME51207.2021.9428331
[7]
John Canny. 1986. A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-8, 6 (1986), 679--698. https://doi.org/10.1109/TPAMI.1986.4767851
[8]
Chenjie Cao, Ren Xue-ping, and Yanwei Fu. 2022. MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth. Cornell University - arXiv,Cornell University - arXiv (Aug 2022).
[9]
Po-Yi Chen, Alexander H Liu, Yen-Cheng Liu, and Yu-Chiang Frank Wang. 2019. Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. 2624--2632.
[10]
Daniel Cremers and Kalin Kolev. 2010. Multiview stereo and silhouette consistency via convex functionals over convex domains. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, 6 (2010), 1161--1174.
[11]
Yikang Ding, Wentao Yuan, Qingtian Zhu, Haotian Zhang, Xiangyue Liu, Yuanjiang Wang, and Xiao Liu. 2022. TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr52688.2022.00839
[12]
Yasutaka Furukawa and Jean Ponce. 2009. Accurate, dense, and robust multiview stereopsis. IEEE transactions on pattern analysis and machine intelligence, Vol. 32, 8 (2009), 1362--1376.
[13]
Silvano Galliani, Katrin Lasinger, and Konrad Schindler. 2015. Massively parallel multiview stereopsis by surface normal diffusion. In Proceedings of the IEEE International Conference on Computer Vision. 873--881.
[14]
Huachen Gao, Xiaoyu Liu, Meixia Qu, and Shijie Huang. 2021. Pdanet: Self-supervised monocular depth estimation using perceptual and data augmentation consistency. Applied Sciences, Vol. 11, 12 (2021), 5383.
[15]
Huachen Gao, Shihe Shen, Zhe Zhang, Kaiqiang Xiong, Rui Peng, Zhirui Gao, Qi Wang, Yugui Xie, and Ronggang Wang. 2024. FDC-NeRF: Learning Pose-Free Neural Radiance Fields with Flow-Depth Consistency. In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 3615--3619. https://doi.org/10.1109/ICASSP48485.2024.10446550
[16]
Xiaodong Gu, Zhiwen Fan, Siyu Zhu, Zuozhuo Dai, Feitong Tan, and Ping Tan. 2020. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2495--2504.
[17]
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 2023. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, Vol. 42, 4 (2023), 1--14.
[18]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. 2023. Segment anything. arXiv preprint arXiv:2304.02643 (2023).
[19]
Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. 2017. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG), Vol. 36, 4 (2017), 1--13.
[20]
Andreas Kuhn, Shan Lin, and Oliver Erdler. 2019. Plane completion and filtering for multi-view stereo reconstruction. In Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Dortmund, Germany, September 10--13, 2019, Proceedings 41. Springer, 18--32.
[21]
Jingliang Li, Zhengda Lu, Yiqun Wang, Ying Wang, and Jun Xiao. 2022. Ds-mvsnet: Unsupervised multi-view stereo via depth synthesis. In Proceedings of the 30th ACM International Conference on Multimedia. 5593--5601.
[22]
Jinli Liao, Yikang Ding, Yoli Shavit, Dihe Huang, Shihao Ren, Jia Guo, Wensen Feng, and Kai Zhang. 2022. WT-MVSNet: Window-based Transformers for Multi-view Stereo. (May 2022).
[23]
Jie Liao, Yanping Fu, Qingan Yan, and Chunxia Xiao. 2019. Pyramid multi-view stereo with local consistency. In Computer Graphics Forum, Vol. 38. Wiley Online Library, 335--346.
[24]
Keyang Luo, Tao Guan, Lili Ju, Yuesong Wang, Zhuo Chen, and Yawei Luo. 2020. Attention-aware multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1590--1599.
[25]
Yawei Luo, Ping Liu, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2021. Category-level adversarial adaptation for semantic segmentation using purified features. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 8 (2021), 3940--3956.
[26]
Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2019. Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2507--2516.
[27]
Xinjun Ma, Yue Gong, Qirui Wang, Jingwei Huang, Lei Chen, and Fan Yu. 2021. Epp-mvsnet: Epipolar-assembling based depth prediction for multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5732--5740.
[28]
Zhenxing Mi, Chang Di, and Dan Xu. 2022. Generalized binary search network for highly-efficient multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12991--13000.
[29]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM, Vol. 65, 1 (2021), 99--106.
[30]
Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. 2022. Instant neural graphics primitives with a multiresolution hash encoding. ACM transactions on graphics (TOG), Vol. 41, 4 (2022), 1--15.
[31]
Rui Peng, Xiaodong Gu, Luyang Tang, Shihe Shen, Fanqi Yu, and Ronggang Wang. 2023. Gens: Generalizable neural surface reconstruction from multi-view images. Advances in Neural Information Processing Systems, Vol. 36 (2023), 56932--56945.
[32]
Rui Peng, Ronggang Wang, Yawen Lai, Luyang Tang, and Yangang Cai. 2021. Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 15560--15569.
[33]
Rui Peng, Ronggang Wang, Yawen Lai, Luyang Tang, and Yangang Cai. 2021. Excavating the potential capacity of self-supervised monocular depth estimation. In Proceedings of the IEEE/cvf international conference on computer vision. 15560--15569.
[34]
Rui Peng, Rongjie Wang, Zhenyu Wang, Yawen Lai, and Ronggang Wang. 2022. Rethinking depth estimation for multi-view stereo: A unified representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8645--8654.
[35]
Andrea Romanoni and Matteo Matteucci. 2019. TAPA-MVS: Textureless-Aware PAtchMatch Multi-View Stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
[36]
Johannes L Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. 2016. Pixelwise view selection for unstructured multi-view stereo. In Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part III 14. Springer, 501--518.
[37]
Thomas Schops, Johannes L Schonberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and Andreas Geiger. 2017. A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3260--3269.
[38]
Steven M Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard Szeliski. 2006. A comparison and evaluation of multi-view stereo reconstruction algorithms. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06), Vol. 1. IEEE, 519--528.
[39]
Shuhan Shen. 2013. Accurate multiple view 3d reconstruction using patch-based stereo for large-scale scenes. IEEE transactions on image processing, Vol. 22, 5 (2013), 1901--1914.
[40]
Hyewon Song, Jaeseong Park, Suwoong Heo, Jiwoo Kang, and Sanghoon Lee. 2020. Patchmatch based multiview stereo with local quadric window. In Proceedings of the 28th ACM International Conference on Multimedia. 2664--2672.
[41]
Wanjuan Su and Wenbing Tao. 2023. Efficient edge-preserving multi-view stereo network for depth estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 2348--2356.
[42]
Zefan Tian, Rongjie Wang, Zhenyu Wang, and Ronggang Wang. 2023. HQP-MVS:High-Quality Plane Priors Assisted Multi-View Stereo for Low-Textured Areas. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1--5. https://doi.org/10.1109/ICASSP49357.2023.10096871
[43]
George Vogiatzis, Carlos Hernández Esteban, Philip HS Torr, and Roberto Cipolla. 2007. Multiview stereo via volumetric graph-cuts and occlusion robust photo-consistency. IEEE transactions on pattern analysis and machine intelligence, Vol. 29, 12 (2007), 2241--2246.
[44]
Fangjinhua Wang, Silvano Galliani, Christoph Vogel, and Marc Pollefeys. 2022. IterMVS: Iterative probability estimation for efficient multi-view stereo. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8606--8615.
[45]
Fangjinhua Wang, Silvano Galliani, Christoph Vogel, Pablo Speciale, and Marc Pollefeys. 2021. Patchmatchnet: Learned multi-view patchmatch stereo. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14194--14203.
[46]
Peng Wang, Xiaohui Shen, Zhe Lin, Scott Cohen, Brian Price, and Alan L Yuille. 2015. Towards unified depth and semantic prediction from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2800--2809.
[47]
Xiaofeng Wang, Zheng Zhu, Guan Huang, Fangbo Qin, Yun Ye, Yijia He, Xu Chi, and Xingang Wang. 2022. MVSTER: Epipolar transformer for efficient multi-view stereo. In European Conference on Computer Vision. Springer, 573--591.
[48]
Yuesong Wang, Zhaojie Zeng, Tao Guan, Wei Yang, Zhuo Chen, Wenkai Liu, Luoyuan Xu, and Yawei Luo. 2023. Adaptive Patch Deformation for Textureless-Resilient Multi-View Stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1621--1630.
[49]
Zizhuang Wei, Qingtian Zhu, Chen Min, Yisong Chen, and Guoping Wang. 2021. AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv48922.2021.00613
[50]
Kaiqiang Xiong, Rui Peng, Zhe Zhang, Tianxing Feng, Jianbo Jiao, Feng Gao, and Ronggang Wang. 2023. CL-MVSNet: Unsupervised Multi-view Stereo with Dual-level Contrastive Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3769--3780.
[51]
Qingshan Xu, Weihang Kong, Wenbing Tao, and Marc Pollefeys. 2022. Multi-scale geometric consistency guided and planar prior assisted multi-view stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 4 (2022), 4945--4963.
[52]
Qingshan Xu and Wenbing Tao. 2019. Multi-scale geometric consistency guided multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5483--5492.
[53]
Qingshan Xu and Wenbing Tao. 2020. Learning inverse depth regression for multi-view stereo with correlation cost volume. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12508--12515.
[54]
Qingshan Xu and Wenbing Tao. 2020. Planar prior assisted patchmatch multi-view stereo. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12516--12523.
[55]
Wangze Xu, Qi Wang, Xinghao Pan, and Ronggang Wang. 2024. HDPNERF: Hybrid Depth Priors for Neural Radiance Fields from Sparse Input Views. In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 3695--3699. https://doi.org/10.1109/ICASSP48485.2024.10446844
[56]
Jiayu Yang, Wei Mao, Jose Alvarez, and Miaomiao Liu. 2021. Cost Volume Pyramid Based Depth Inference for Multi-View Stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence (Jan 2021), 1--1. https://doi.org/10.1109/tpami.2021.3082562
[57]
Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan. 2018. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European conference on computer vision (ECCV). 767--783.
[58]
Yao Yao, Zixin Luo, Shiwei Li, Tianwei Shen, Tian Fang, and Long Quan. 2019. Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2019.00567
[59]
Hongwei Yi, Zizhuang Wei, Mingyu Ding, Runze Zhang, Yisong Chen, Guoping Wang, and Yu-Wing Tai. 2020. Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation. 766--782. https://doi.org/10.1007/978--3-030--58545--7_44
[60]
Zhenlong Yuan, Jiakai Cao, Zhaoxin Li, Hao Jiang, and Zhaoqi Wang. 2024. SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM Optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 6871--6880.
[61]
Jinzhi Zhang, Ruofan Tang, Zheng Cao, Jing Xiao, Ruqi Huang, and Lu Fang. 2022. ElasticMVS: Learning elastic part representation for self-supervised multi-view stereopsis. Advances in Neural Information Processing Systems, Vol. 35 (2022), 23510--23523.
[62]
Jingyang Zhang, Yao Yao, Shiwei Li, Zixin Luo, and Tian Fang. 2020. Visibility-aware multi-view stereo network. arXiv preprint arXiv:2008.07928 (2020).
[63]
Zhe Zhang, Huachen Gao, Yuxi Hu, and Ronggang Wang. 2023. N2MVSNet: Non-Local Neighbors Aware Multi-View Stereo Network. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1--5. https://doi.org/10.1109/ICASSP49357.2023.10095299
[64]
Zhe Zhang, Yuxi Hu, Huachen Gao, and Ronggang Wang. 2023. Bi-ClueMVSNet: Learning Bidirectional Occlusion Clues for Multi-View Stereo. In 2023 International Joint Conference on Neural Networks (IJCNN). 1--8. https://doi.org/10.1109/IJCNN54540.2023.10191325
[65]
Zhe Zhang, Rui Peng, Yuxi Hu, and Ronggang Wang. 2023. GeoMVSNet: Learning Multi-View Stereo With Geometry Perception. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21508--21518.
[66]
Enliang Zheng, Enrique Dunn, Vladimir Jojic, and Jan-Michael Frahm. 2014. Patchmatch based joint view selection and depthmap estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1510--1517.
[67]
Xiaoyun Zheng, Liwei Liao, Jianbo Jiao, Feng Gao, and Ronggang Wang. 2024. Surface-sos: Self-supervised object segmentation via neural surface representation. IEEE Transactions on Image Processing (2024).
[68]
Xiaoyun Zheng, Liwei Liao, Xufeng Li, Jianbo Jiao, Rongjie Wang, Feng Gao, Shiqi Wang, and Ronggang Wang. 2024. PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22530--22540.
[69]
Shengjie Zhu, Garrick Brazil, and Xiaoming Liu. 2020. The edge of depth: Explicit constraints between segmentation and depth. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13116--13125.

Index Terms

  1. High Fidelity Aggregated Planar Prior Assisted PatchMatch Multi-View Stereo

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
      October 2024
      11719 pages
      ISBN:9798400706868
      DOI:10.1145/3664647
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 October 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. depth estimation
      2. multi-view stereo
      3. patchmatch

      Qualifiers

      • Research-article

      Funding Sources

      • Shenzhen Science and Technology Program-Shenzhen Cultivation of Excellent Scientific and Technological Innovation Talents project
      • MIGU-PKU META VISION TECHNOLOGY INNOVATION LAB
      • Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology
      • Outstanding Talents Training Fund in Shenzhen

      Conference

      MM '24
      Sponsor:
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne VIC, Australia

      Acceptance Rates

      MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 12
        Total Downloads
      • Downloads (Last 12 months)12
      • Downloads (Last 6 weeks)12
      Reflects downloads up to 17 Nov 2024

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media