Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3503161.3547946acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Geometric Warping Error Aware CNN for DIBR Oriented View Synthesis

Published: 10 October 2022 Publication History

Abstract

Depth Image based Rendering (DIBR) oriented view synthesis is an important virtual view generation technique. It warps the reference view images to the target viewpoint based on their depth maps, without requiring many available viewpoints. However, in the 3D warping process, pixels are warped to fractional pixel locations and then rounded (or interpolated) to integer pixels, resulting in geometric warping error and reducing the image quality. This resembles, to some extent, the image super-resolution problem, but with unfixed fractional pixel locations. To address this problem, we propose a geometric warping error aware CNN (GWEA) framework to enhance the DIBR oriented view synthesis. First, a deformable convolution based geometric warping error aware alignment (GWEA-DCA) module is developed, by taking advantage of the geometric warping error preserved in the DIBR module. The offset learned in the deformable convolution can account for the geometric warping error to facilitate the mapping from the fractional pixels to integer pixels. Moreover, in view that the pixels in the warped images are of different qualities due to the different strengths of warping errors, an attention enhanced view blending (GWEA-AttVB) module is further developed to adaptively fuse the pixels from different warped images. Finally, a partial convolution based hole filling and refinement module fills the remaining holes and improves the quality of the overall image. Experiments show that our model can synthesize higher-quality images than the existing methods, and ablation study is also conducted, validating the effectiveness of each proposed module.

Supplementary Material

MP4 File (MM22-fp0810.mp4)
Depth Image based Rendering (DIBR) oriented view synthesis is an important virtual view generation technique. However, in the 3D warping process, pixels are warped to fractional pixel locations and then rounded (or interpolated) to integer pixels, resulting in geometric warping error and reducing the image quality. This paper proposes a GWEA-CNN framework for DIBR oriented view synthesis. It addresses the geometric warping error in the 3D warping process to enhance both the warping and the following view blending process. GWEA-DCA is developed to explore the geometric warping error to enhance the warped image. Then GWEA-AttVB is proposed to adaptively fuse the pixels from different warped images based on their qualities. A partial convolution based hole filling and refinement module is used to fill the holes and improve the quality of the synthesized image. Extensive experiments with ablation study have been conducted and demonstrate the effectiveness of the proposed method.

References

[1]
Ilkoo Ahn and Changick Kim. 2013. A novel depth-based virtual view synthesis method for free viewpoint video. IEEE Transactions on Broadcasting, Vol. 59, 4 (2013), 614--626.
[2]
Shai Avidan and Amnon Shashua. 1997. Novel view synthesis in tensor space. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1034--1040.
[3]
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, Vol. 39, 12 (2017), 2481--2495.
[4]
Xiaodong Chen, Haitao Liang, Huaiyuan Xu, Siyu Ren, Huaiyu Cai, and Yi Wang. 2020. Virtual view synthesis based on asymmetric bidirectional DIBR for 3D video and free viewpoint video. Applied Sciences, Vol. 10, 5 (2020), 1562.
[5]
Julian Chibane, Aayush Bansal, Verica Lazova, and Gerard Pons-Moll. 2021. Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7911--7920.
[6]
Inchang Choi, Orazio Gallo, Alejandro Troccoli, Min H Kim, and Jan Kautz. 2019. Extreme view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7781--7790.
[7]
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 764--773.
[8]
Ismaël Daribo and Hideo Saito. 2011. A novel inpainting-based layered depth video for 3DTV. IEEE Transactions on Broadcasting, Vol. 57, 2 (2011), 533--541.
[9]
Christoph Fehn. 2004. Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV. In Stereoscopic Displays and Virtual Reality Systems XI, Vol. 5291. International Society for Optics and Photonics, 93--104.
[10]
John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. Deepstereo: Learning to predict new views from the world's imagery. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5515--5524.
[11]
Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2017. Globally and locally consistent image completion. ACM Transactions on Graphics (ToG), Vol. 36, 4 (2017), 1--14.
[12]
Robert Konrad, Donald G Dansereau, Aniq Masood, and Gordon Wetzstein. 2017. Spinvr: towards live-streaming 3d virtual reality video. ACM Transactions on Graphics (TOG), Vol. 36, 6 (2017), 1--12.
[13]
Akira Kubota, Aljoscha Smolic, Marcus Magnor, Masayuki Tanimoto, Tsuhan Chen, and Cha Zhang. 2007. Multiview imaging and 3DTV. IEEE signal processing magazine, Vol. 24, 6 (2007), 10--21.
[14]
Marc Levoy. 2006. Light fields and computational imaging. Computer, Vol. 39, 8 (2006), 46--55.
[15]
Shuai Li, Ce Zhu, and Ming-Ting Sun. 2018. Hole filling with multiple reference views in DIBR view synthesis. IEEE Transactions on Multimedia, Vol. 20, 8 (2018), 1948--1959.
[16]
Christian Lipski, Felix Klose, and Marcus Magnor. 2014. Correspondence and depth-image based rendering a hybrid approach for free-viewpoint video. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 24, 6 (2014), 942--951.
[17]
Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018a. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European Conference on Computer Vision (ECCV). 85--100.
[18]
Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018b. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European Conference on Computer Vision (ECCV). 85--100.
[19]
Guibo Luo, Yuesheng Zhu, Zhaotian Li, and Liming Zhang. 2016. A hole filling approach based on background reconstruction for view synthesis in 3D video. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1781--1789.
[20]
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 405--421.
[21]
Yuji Mori, Norishige Fukushima, Tomohiro Yendo, Toshiaki Fujii, and Masayuki Tanimoto. 2009. View generation with 3D warping using depth information for FTV. Signal Processing: Image Communication, Vol. 24, 1--2 (2009), 65--72.
[22]
Suryanarayana M Muddala, Mårten Sjöström, and Roger Olsson. 2016. Virtual view synthesis using layered depth image generation and depth-based inpainting for filling disocclusions and translucent disocclusions. Journal of Visual Communication and Image Representation, Vol. 38 (2016), 351--366.
[23]
Patrick Ndjiki-Nya, Martin Koppel, Dimitar Doshkov, Haricharan Lakshman, Philipp Merkle, Karsten Muller, and Thomas Wiegand. 2011. Depth image-based rendering with advanced texture synthesis for 3-D video. IEEE Transactions on Multimedia, Vol. 13, 3 (2011), 453--465.
[24]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017).
[25]
Konstantinos Rematas, Chuong H. Nguyen, Tobias Ritschel, Mario Fritz, and Tinne Tuytelaars. 2016. Novel views of objects from a single image. IEEE transactions on pattern analysis and machine intelligence, Vol. 39, 8 (2016), 1576--1590.
[26]
Gernot Riegler and Vladlen Koltun. 2020. Free view synthesis. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 623--640.
[27]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.
[28]
Thomas Schops, Johannes L. Schonberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and Andreas Geiger. 2017. A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3260--3269.
[29]
Aljoscha Smolic. 2011. 3D video and free viewpoint video-From capture to display. Pattern recognition, Vol. 44, 9 (2011), 1958--1968.
[30]
Aljoscha Smolic and Peter Kauff. 2005. Interactive 3-D video representation and coding technologies. Proc. IEEE, Vol. 93, 1 (2005), 98--110.
[31]
Yuhang Song, Chao Yang, Zhe Lin, Hao Li, Qin Huang, and C-C Jay Kuo. 2017. Image inpainting using multi-scale feature image translation. arXiv preprint arXiv:1711.08590, Vol. 2 (2017), 1.
[32]
Masayuki Tanimoto, Takanori Senoh, Sei Naito, Shinya Shimizu, Hideyoshi Horimai, Marek Doma'nski, Antony Vetro, Marius Preda, and Karsten Mueller. 2013. Proposal on a new activity for the third phase of FTV. In the 105th meeting of MPEG.
[33]
Masayuki Tanimoto, Mehrdad Panahpour Tehrani, Toshiaki Fujii, and Tomohiro Yendo. 2010. Free-viewpoint TV. IEEE Signal Processing Magazine, Vol. 28, 1 (2010), 67--76.
[34]
Masayuki Tanimoto, Mehrdad Panahpour Tehrani, Toshiaki Fujii, and Tomohiro Yendo. 2012. FTV for 3-D spatial communication. Proc. IEEE, Vol. 100, 4 (2012), 905--917.
[35]
Maxim Tatarchenko, Alexey Dosovitskiy, and Thomas Brox. 2016. Multi-view 3d models from single images with a convolutional network. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 322--337.
[36]
Gerhard Tech, Ying Chen, Karsten Müller, Jens-Rainer Ohm, Anthony Vetro, and Ye-Kui Wang. 2015. Overview of the multiview and 3D extensions of high efficiency video coding. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 26, 1 (2015), 35--49.
[37]
Alexandru Telea. 2004. An image inpainting technique based on the fast marching method. Journal of graphics tools, Vol. 9, 1 (2004), 23--34.
[38]
Yu Tian, Xi Peng, Long Zhao, Shaoting Zhang, and Dimitris N. Metaxas. 2018. CR-GAN: learning complete representations for multi-view generation. arXiv preprint arXiv:1806.11191 (2018).
[39]
Richard Tucker and Noah Snavely. 2020. Single-view view synthesis with multiplane images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 551--560.
[40]
Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul P Srinivasan, Howard Zhou, Jonathan T Barron, Ricardo Martin-Brualla, Noah Snavely, and Thomas Funkhouser. 2021. Ibrnet: Learning multi-view image-based rendering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4690--4699.
[41]
Suttisak Wizadwongsa, Pakkapon Phongthawee, Jiraphon Yenphraphai, and Supasorn Suwajanakorn. 2021. Nex: Real-time view synthesis with neural basis expansion. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8534--8543.
[42]
Xiaogang Xu, Ying-Cong Chen, and Jiaya Jia. 2019. View independent generative adversarial network for novel view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7791--7800.
[43]
Chao Yao, Tammam Tillo, Yao Zhao, Jimin Xiao, Huihui Bai, and Chunyu Lin. 2014. Depth map driven hole filling algorithm exploiting temporal correlation information. IEEE Transactions on Broadcasting, Vol. 60, 2 (2014), 394--404.
[44]
Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. 2021. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4578--4587.
[45]
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2018. Generative image inpainting with contextual attention. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5505--5514.
[46]
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2019. Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4471--4480.
[47]
Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. 2020. Learning Enriched Features for Real Image Restoration and Enhancement. In ECCV.
[48]
Chuanxia Zheng, Tat-Jen Cham, and Jianfei Cai. 2019. Pluralistic image completion. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1438--1447.
[49]
Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. 2018. Stereo magnification: Learning view synthesis using multiplane images. ACM Transactions on Graphics, Vol. 37, 4 (2018), 1--12.
[50]
Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alexei A. Efros. 2016. View synthesis by appearance flow. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 286--301.
[51]
Ce Zhu and Shuai Li. 2015. Depth image based view synthesis: New insights and perspectives on hole generation and filling. IEEE Transactions on Broadcasting, Vol. 62, 1 (2015), 82--93.
[52]
Ce Zhu, Yin Zhao, Lu Yu, and Masayuki Tanimoto. 2013. 3D-TV system with depth-image-based rendering. Architect Tech Challenges (2013).
[53]
Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. 2019. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE conference on computer vision and pattern recognition. 9308--9316.
[54]
C. Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. 2004. High-quality video view interpolation using a layered representation. ACM transactions on graphics (TOG), Vol. 23, 3 (2004), 600--608.
[55]
Xueyan Zou, Linjie Yang, Ding Liu, and Yong Jae Lee. 2021. Progressive Temporal Feature Alignment Network for Video Inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition. 16448--16457.

Cited By

View all
  • (2024)ST-4DGS: Spatial-Temporally Consistent 4D Gaussian Splatting for Efficient Dynamic Scene RenderingACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657520(1-11)Online publication date: 13-Jul-2024
  • (2024)Layered Hole Filling Based on Depth-Aware Decomposition and GAN-Enhanced Background Reconstruction for DIBRIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.342923334:12(12466-12479)Online publication date: Dec-2024
  • (2024)Geometric Warping Error Aware Spatial-Temporal Enhancement for DIBR Oriented View SynthesisIEEE Signal Processing Letters10.1109/LSP.2024.338899531(1219-1223)Online publication date: 2024
  • Show More Cited By

Index Terms

  1. Geometric Warping Error Aware CNN for DIBR Oriented View Synthesis

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. DIBR
    2. geometric warping error
    3. view synthesis

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)47
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 09 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ST-4DGS: Spatial-Temporally Consistent 4D Gaussian Splatting for Efficient Dynamic Scene RenderingACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657520(1-11)Online publication date: 13-Jul-2024
    • (2024)Layered Hole Filling Based on Depth-Aware Decomposition and GAN-Enhanced Background Reconstruction for DIBRIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.342923334:12(12466-12479)Online publication date: Dec-2024
    • (2024)Geometric Warping Error Aware Spatial-Temporal Enhancement for DIBR Oriented View SynthesisIEEE Signal Processing Letters10.1109/LSP.2024.338899531(1219-1223)Online publication date: 2024
    • (2023)Dynamic View Synthesis with Spatio-Temporal Feature Warping from Sparse ViewsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612419(1565-1576)Online publication date: 26-Oct-2023
    • (2023)VirtualClassroom: A Lecturer-Centered Consumer-Grade Immersive Teaching System in Cyber–Physical–Social SpaceIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2022.322827053:6(3501-3513)Online publication date: Jun-2023
    • (2023)As-Deformable-As-Possible Single-Image-Based View Synthesis Without Depth PriorIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.323781533:8(3989-4001)Online publication date: 1-Aug-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media