Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3474085.3475595acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Fast, High-Quality Hierarchical Depth-Map Super-Resolution

Published: 17 October 2021 Publication History

Abstract

The low spatial resolution of acquired depth maps is a major drawback of most RGBD sensors. However, there are many scenarios in which fast acquisition of high-resolution and high-quality depth maps would be desirable. One approach to achieve higher quality depth maps is through super-resolution. However, edge preservation is challenging, and artifacts such as depth confusion and blurring are easily introduced near boundaries. In view of this, we propose a method for fast, high-quality hierarchical depth-map super-resolution (HDS). In our method, a high-resolution RGB image is degraded layer by layer to guide the bilateral filtering of the depth map. To improve the upsampled depth map quality, we construct a feature-based bilateral filter (FBF) for the interpolation, by using the extracted RGB shallow and multi-layer features. To accelerate the process, we perform filtering only near depth boundaries and through matrix operations. We also propose an extension of our HDS model to a Classification-based Hierarchical Depth-map Super-resolution (C-HDS) model, where a context-aware trilateral filter reduces the contributions of unreliable neighbors to the current missing depth location. Experimental results show that the proposed method is significantly faster than existing methods for generating high-resolution depth maps, while also significantly improving depth quality compared to the current state-of-the-art approaches, especially for large-scale 16x super-resolution.

References

[1]
Sari Awwad, Fairouz Hussein, and Massimo Piccardi. 2015. Local Depth Patterns for Tracking in Depth Videos. In International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, 1115--1118. https://doi.org/10.1145/2733373.2806295
[2]
A. M. Bensaid, L. O. Hall, J. C. Bezdek, L. P. Clarke, M. L. Silbiger, J. A. Arrington, and R. F. Murtagh. 1996. Validity-guided (re)clustering with applications to image segmentation. IEEE Transactions on Fuzzy Systems, Vol. 4, 2 (May 1996), 112--123. https://doi.org/10.1109/91.493905
[3]
Derek Chan, Hylke Buisman, Christian Theobalt, and Sebastian Thrun. 2008. A Noise-aware Filter for Real-time Depth Upsampling. In ECCV Workshops. IEEE, Marseille, France.
[4]
K. N. Chaudhury. 2013. Acceleration of the Shiftable O(1) Algorithm for Bilateral Filtering and Nonlocal Means. IEEE Transactions on Image Processing, Vol. 22, 4 (April 2013), 1291--1300. https://doi.org/10.1109/TIP.2012.2222903
[5]
Ruijin Chen and Wei Gao. 2020. Color-Guided Depth Map Super-Resolution Using a Dual-Branch Multi-Scale Residual Network with Channel Interaction. Sensors, Vol. 20 (2020), 6. https://doi.org/10.3390/s20061560
[6]
Longquan Dai, Mengke Yuan, and Xiaopeng Zhang. 2016. Speeding up the bilateral filter: A joint acceleration way. IEEE Transactions on Image Processing, Vol. 25, 6 (2016), 2657--2672.
[7]
James Diebel and Sebastian Thrun. 2006. An Application of Markov Random Fields to Range Sensing. In NIPS. MIT Press, Vancouver, B.C., Canada, 291--298. http://papers.nips.cc/paper/2837-an-application-of-markov-random-fields-to-range-sensing.pdf
[8]
Elmar Eisemann and Frédo Durand. 2004. Flash photography enhancement via intrinsic relighting. ACM Transactions on Graphics (Proceedings of SIGGRAPH), Vol. 23, 3 (August 2004), 673--678. https://doi.org/10.1145/1186562.1015778
[9]
David Ferstl, Christian Reinbacher, Rene Ranftl, Matthias Rüther, and Horst Bischof. 2013. Image Guided Depth Upsampling using Anisotropic Total Generalized Variation. In ICCV. IEEE Computer Society, USA, 993--1000.
[10]
Sergi Foix, Guillem Alenyà, and Carme Torras. 2011. Lock-in Time-of-Flight (ToF) Cameras: A Survey. IEEE Sensors Journal, Vol. 11 (2011), 1917--1926.
[11]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In CVPR. IEEE Computer Society, USA, 580--587. https://doi.org/10.1109/CVPR.2014.81
[12]
Jing Gu, Licheng Jiao, Shuyuan Yang, and Fang Liu. 2018. Fuzzy Double C-Means Clustering Based on Sparse Self-Representation. IEEE Transactions on Fuzzy Systems, Vol. 26, 2 (April 2018), 612--626. https://doi.org/10.1109/TFUZZ.2017.2686804
[13]
Bumsub Ham, Minsu Cho, and Jean Ponce. 2018. Robust Guided Image Filtering Using Nonconvex Potentials. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 1 (Jan 2018), 192--207. https://doi.org/10.1109/TPAMI.2017.2669034
[14]
Jungong Han, Ling Shao, Dong Xu, and Jamie Shotton. 2013. Enhanced Computer Vision With Microsoft Kinect Sensor: A Review. IEEE Transactions on Cybernetics, Vol. 43, 5 (oct 2013), 1318--1334. https://doi.org/10.1109/tcyb.2013.2265378
[15]
Kaiming He, Jian Sun, and Xiaoou Tang. 2013. Guided Image Filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 6 (June 2013), 1397--1409. https://doi.org/10.1109/TPAMI.2012.213
[16]
Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, and Philip H. S. Torr. 2019. Deeply Supervised Salient Object Detection with Short Connections. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 4 (April 2019), 815--828. https://doi.org/10.1109/TPAMI.2018.2815688
[17]
Tak-Wai Hui, Chen Change Loy, and Xiaoou Tang. 2016. Depth map super-resolution by deep multi-scale guidance. In European conference on computer vision. Springer, Springer International Publishing, Cham, 353--369.
[18]
J. M. Keller, M. R. Gray, and J. A. Givens. 1985. A fuzzy K-nearest neighbor algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-15, 4 (July 1985), 580--585. https://doi.org/10.1109/TSMC.1985.6313426
[19]
Joohyeok Kim, Gwanggil Jeon, and Jechang Jeong. 2014. Joint-adaptive bilateral depth map upsampling. Signal Processing: Image Communication, Vol. 29, 4 (2014), 506--513. https://doi.org/10.1016/j.image.2014.01.011
[20]
Andreas Kolb, Erhardt Barth, Reinhard Koch, and Rasmus Larsen. 2010. Time-of-Flight Cameras in Computer Graphics. Computer Graphics Forum, Vol. 29, 1 (March 2010), 141--159. https://doi.org/10.1111/j.1467--8659.2009.01583.x
[21]
Johannes Kopf, Michael F. Cohen, Dani Lischinski, and Matt Uyttendaele. 2007. Joint bilateral upsampling. ACM Transaction on Graphics, Vol. 26, 3 (July 2007), 96. https://doi.org/10.1145/1276377.1276497
[22]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS. Association for Computing Machinery, New York, NY, USA, 84--90.
[23]
Yijun Li, Jia-Bin Huang, Ahuja Narendra, and Ming-Hsuan Yang. 2016. Deep Joint Image Filtering. In ECCV. Springer International Publishing, Cham, 154--169.
[24]
Ming-Yu Liu, Oncel Tuzel, and Yuichi Taguchi. 2013. Joint Geodesic Upsampling of Depth Images. In CVPR. IEEE Computer Society, Los Alamitos, CA, USA, 169--176.
[25]
Kai-Han Lo, Yu-Chiang Frank Wang, and Kai-Lung Hua. 2018. Edge-Preserving Depth Map Upsampling by Joint Trilateral Filter. IEEE Transactions on Cybernetics, Vol. 48 (2018), 371--384.
[26]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR. IEEE Press, Boston, MA, USA, 3431--3440.
[27]
Bruno Macchiavello, Camilo Dorea, Edson M. Hung, Gene Cheung, and Wai-Tian Tan. 2014. Loss-Resilient Coding of Texture and Depth for Free-Viewpoint Video Conferencing. Trans. Multi., Vol. 16, 3 (April 2014), 711--725. https://doi.org/10.1109/TMM.2014.2299768
[28]
Ilya Makarov, Vladimir Aliev, and Olga Gerasimova. 2017. Semi-Dense Depth Interpolation Using Deep Convolutional Neural Networks. In International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, 1407--1415. https://doi.org/10.1145/3123266.3123360
[29]
Patrick Ndjiki-Nya, Martin Köppel, Dimitar Doshkov, Haricharan Lakshman, Philipp Merkle, Karsten Müller, and Thomas Wiegand. 2010. Depth Image-Based Rendering With Advanced Texture Synthesis for 3-D Video. IEEE Transactions on Multimedia, Vol. 13 (2010), 453--465.
[30]
Simon Niklaus and Feng Liu. 2018. Context-Aware Synthesis for Video Frame Interpolation. In CVPR. IEEE Computer Society, Los Alamitos, CA, USA, 1701--1710.
[31]
Georg Petschnigg, Richard Szeliski, Maneesh Agrawala, Michael Cohen, Hugues Hoppe, and Kentaro Toyama. 2004. Digital photography with flash and no-flash image pairs. ACM Transactions on Graphics (Proceedings of SIGGRAPH), Vol. 23, 3 (August 2004), 664--672. https://doi.org/10.1145/1186562.1015777
[32]
Yiguo Qiao, Licheng Jiao, Shuyuan Yang, and Biao Hou. 2019. A Novel Segmentation Based Depth Map Up-Sampling. IEEE Transactions on Multimedia, Vol. 21, 1 (Jan 2019), 1--14. https://doi.org/10.1109/TMM.2018.2845699
[33]
Christian Richardt, Carsten Stoll, Neil A. Dodgson, Hans-Peter Seidel, and Christian Theobalt. 2012. Coherent Spatio­temporal Filtering, Upsampling and Rendering of RGBZ Videos. Computer Graphics Forum (Proceedings of Eurographics), Vol. 31, 2 (May 2012), 247--256. https://doi.org/10.1111/j.1467--8659.2012.03003.x
[34]
Daniel Scharstein, Heiko Hirschmüller, York Kitajima, Greg Krathwohl, Xi Wang, and Porter Westling. 2014. High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth. In German Conference on Pattern Recognition. Springer International Publishing, Cham, 31--42.
[35]
Daniel Scharstein and Chris Pal. 2007. Learning Conditional Random Fields for Stereo. In CVPR. IEEE Computer Society, Los Alamitos, CA, USA, 1--8. https://doi.org/10.1109/CVPR.2007.383191
[36]
Daniel Scharstein and Richard Szeliski. 2003. High-accuracy stereo depth maps using structured light. In CVPR. IEEE Computer Society, USA, 195--202. https://doi.org/10.1109/CVPR.2003.1211354
[37]
Jamie Shotton, Ross Girshick, Andrew Fitzgibbon, Toby Sharp, Mat Cook, Mark Finocchio, Richard Moore, Pushmeet Kohli, Antonio Criminisi, Alex Kipman, and Andrew Blake. 2013. Efficient Human Pose Estimation from Single Depth Images. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 12 (December 2013), 2821--2840. https://doi.org/10.1109/TPAMI.2012.241
[38]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR. IEEE, Kuala Lumpur, Malaysia, 730--734.
[39]
Mashhour Solh and Ghassan Al-Regib. 2012. Hierarchical Hole-Filling For Depth-Based View Synthesis in FTV and 3D Video. IEEE Journal of Selected Topics in Signal Processing, Vol. 6 (2012), 495--504.
[40]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper With Convolutions. In CVPR. IEEE, Los Alamitos, CA, USA, 1--9.
[41]
Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2018. Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, 9446--9454.
[42]
Oleg Voynov, Alexey Artemov, Vage Egiazarian, Alexander Notchenko, Gleb Bobrovskikh, Evgeny Burnaev, and Denis Zorin. 2019. Perceptual Deep Depth Super-Resolution. In ICCV. IEEE, Seoul, Korea (South), 5653--5663.
[43]
Anran Wang, Jianfei Cai, Jiwen Lu, and Tat-Jen Cham. 2016a. Modality and Component Aware Feature Fusion for RGB-D Scene Classification. In CVPR. IEEE, Las Vegas, NV, USA, 5995--6004. https://doi.org/10.1109/CVPR.2016.645
[44]
Jiang Wang, Zicheng Liu, Ying Wu, and Junsong Yuan. 2012. Mining Actionlet Ensemble for Action Recognition with Depth Cameras. In CVPR. IEEE, Providence, RI, USA, 1290--1297.
[45]
Yucheng Wang, Jian Zhang, Zicheng Liu, Qiang Wu, Philip A. Chou, Zhengyou Zhang, and Yunde Jia. 2016b. Handling Occlusion and Large Displacement Through Improved RGB-D Scene Flow Estimation. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 26, 7 (July 2016), 1265--1278. https://doi.org/10.1109/TCSVT.2015.2462011
[46]
Lu Xia and J.K. Aggarwal. 2013. Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera. In CVPR. IEEE, Portland, OR, USA, 2834--2841.
[47]
Jingyu Yang, Xinchen Ye, Kun Li, Chunping Hou, and Yao Wang. 2014. Color-Guided Depth Recovery From RGB-D Data Using an Adaptive Autoregressive Model. IEEE Transactions on Image Processing, Vol. 23, 8 (Aug 2014), 3443--3458. https://doi.org/10.1109/TIP.2014.2329776
[48]
Zhengyou Zhang. 2012. Microsoft Kinect Sensor and Its Effect. IEEE MultiMedia, Vol. 19, 2 (February 2012), 4--10. https://doi.org/10.1109/MMUL.2012.24
[49]
Lijun Zhao, Huihui Bai, Jie Liang, Anhong Wang, Bing Zeng, and Yao Zhao. 2019 a. Local activity-driven structural-preserving filtering for noise removal and image smoothing. Signal Processing, Vol. 157 (2019), 62--72. https://doi.org/10.1016/j.sigpro.2018.11.012
[50]
Lijun Zhao, Huihui Bai, Jie Liang, Bing Zeng, Anhong Wang, and Yao Zhao. 2019 b. Simultaneous color-depth super-resolution with conditional generative adversarial networks. Pattern Recognition, Vol. 88 (2019), 356--369. https://doi.org/10.1016/j.patcog.2018.11.028

Cited By

View all
  • (2024)Unpaired Depth Super-Resolution in the WildIEEE Access10.1109/ACCESS.2024.344445212(123322-123338)Online publication date: 2024
  • (2024)Enhancement of guided thermal image super-resolution approachesNeurocomputing10.1016/j.neucom.2023.127197573:COnline publication date: 16-May-2024
  • (2023)Guided Depth Map Super-Resolution: A SurveyACM Computing Surveys10.1145/358486055:14s(1-36)Online publication date: 17-Jul-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. context-aware trilateral filter
  2. edge preservation
  3. feature-based bilateral filter
  4. hierarchical depth-map super-resolution

Qualifiers

  • Research-article

Funding Sources

  • EPSRC CAMERA

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)11
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Unpaired Depth Super-Resolution in the WildIEEE Access10.1109/ACCESS.2024.344445212(123322-123338)Online publication date: 2024
  • (2024)Enhancement of guided thermal image super-resolution approachesNeurocomputing10.1016/j.neucom.2023.127197573:COnline publication date: 16-May-2024
  • (2023)Guided Depth Map Super-Resolution: A SurveyACM Computing Surveys10.1145/358486055:14s(1-36)Online publication date: 17-Jul-2023
  • (2023)Improved Upsampling Based Depth Image Super-Resolution ReconstructionIEEE Access10.1109/ACCESS.2023.327496611(46782-46792)Online publication date: 2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media