Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

3D hierarchical optimization for multi-view depth map coding

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Depth data has a widespread use since the popularity of high resolution 3D sensors. In multi-view sequences, depth information is used to supplement the color data of each view. This article proposes a joint encoding of multiple depth maps with a unique representation. Color and depth images of each view are segmented independently and combined in an optimal Rate-Distortion fashion. The resulting partitions are projected to a reference view where a coherent hierarchy for the multiple views is built. A Rate-Distortion optimization is applied to obtain the final segmentation choosing nodes of the hierarchy. The consistent segmentation is used to robustly encode depth maps of multiple views obtaining competitive results with HEVC coding standards.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 8
Fig. 6
Fig. 7
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. Barrera F, Padoy N (2014) Piecewise planar decomposition of 3D point clouds obtained from multiple static rgb-d cameras. In: 2014 2nd International conference on 3D vision, vol 1, pp 194–201

  2. Charikar M, Guruswami V, Wirth A (2003) Clustering with qualitative information. In: Proceedings of the 44th Annual IEEE symposium on foundations of computer science FOCS ’03. IEEE Computer Society, Washington, DC, pp 524–533

  3. Fehn C (2004) Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV

  4. Fischler M A, Bolles R C (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395

    Article  MathSciNet  Google Scholar 

  5. Freeman H (1961) On the encoding of arbitrary geometric configurations. IRE Trans Electron Comput EC-10(2):260–268

    Article  MathSciNet  Google Scholar 

  6. Gao Y, Cheung G, Maugey T, Frossard P, Liang J (2016) Encoder-driven inpainting strategy in multiview video compression. IEEE Trans Image Process 25 (1):134–149

    Article  MathSciNet  Google Scholar 

  7. Glasner D, Vitaladevuni SN, Basri R (2011) Contour-based joint clustering of multiple segmentations. In: Proceedings of the 2011 IEEE Conference on computer vision and pattern recognition CVPR ’11. IEEE Computer Society, Washington, DC, pp 2385–2392

  8. Gupta S, Arbeláez P, Malik J (2013) Perceptual organization and recognition of indoor scenes from RGB-D images. In: 2013 IEEE Conference on computer vision and pattern recognition (CVPR), pp 564–571

  9. Kowdle A, Sinha S, Szeliski R (2012) Multiple view object cosegmentation using appearance and stereo cues. In: European Conference on computer vision. Firenze, pp 789–803 https://doi.org/10.1007/978-3-642-33715-4_57 https://doi.org/10.1007/978-3-642-33715-4_57

  10. Liang B, Zheng L (2015) A survey on human action recognition using depth sensors. In: 2015 International conference on digital image computing: techniques and applications (DICTA), pp 1–8

  11. Lucas L F R, Wegner K, Rodrigues N M M, Pagliari C L, da Silva E A B, de Faria S M M (2015) Intra predictive depth map coding using flexible block partitioning. IEEE Trans Image Process 24(11):4055– 4068

    Article  MathSciNet  Google Scholar 

  12. Maceira M, Morros J R, Ruiz-Hidalgo J (2016) Depth map compression via 3D region-based representation. Multimed Tools Appl 1–24

  13. Merkle P, Smolic A, Muller K, Wiegand T (2007) Efficient prediction structures for multiview video coding. IEEE Trans Circ Syst Video Technol 17(11):1461–1473

    Article  Google Scholar 

  14. Merkle P, Müller K, Marpe D, Wiegand T (2016) Depth intra coding for 3D video based on geometric primitives. IEEE Trans Circ Syst Vid Technol 26(3):570–582

    Article  Google Scholar 

  15. Micusik B, Kosecka J (2009) Piecewise planar city 3D modeling from street view panoramic sequences. In: IEEE Conference on computer vision and pattern recognition, 2009. CVPR 2009., pp 2906–2912

  16. Müller K, Merkle P, Wiegand T (2011) 3-D video representation using depth maps. Proc IEEE 99(4):643–656

    Article  Google Scholar 

  17. Müller K, Schwarz H, Marpe D, Bartnik C, Bosse S, Brust H, Hinz T, Lakshman H, Merkle P, Rhee FH, Tech G, Winken M, Wiegand T (2013) 3D High-efficiency video coding for multi-view video and depth data. IEEE Trans Image Process 22(9):3366–3378

    Article  MathSciNet  MATH  Google Scholar 

  18. Ortega A, Ramchandran K (1998) Rate-distortion methods for image and video compression. IEEE Signal Process Mag 15(6):23–50

    Article  Google Scholar 

  19. Ostermann J, Bormans J, List P, Marpe D, Narroschke M, Pereira F, Stockhammer T, Wedi T (2004) Video coding with H.264/AVC: tools, performance, and complexity. IEEE Circ Syst Mag 4(1):7–28

    Article  Google Scholar 

  20. Özkalayc BO, Alatan AA (2014) 3D planar representation of stereo depth images for 3DTV applications. IEEE Trans Image Process 23(12):5222–5232

    Article  MathSciNet  MATH  Google Scholar 

  21. Ren X, Bo L, Fox D (2012) RGB-(D) scene labeling: features and algorithms. In: 2012 IEEE Conference on computer vision and pattern recognition (CVPR), pp 2759–2766

  22. Rusanovskyy D, Aflaki P, Hannuksela M (2011) Undo dancer 3DV sequence for purposes of 3DV standardization. ISO/IEC JTC1/SC29/WG11 MPEG2010 M 20028

  23. Salembier P, Garrido L (2000) Binary partition tree as an efficient representation for image processing, segmentation, and information retrieval. IEEE Trans Image Process 9(4):561–576

    Article  Google Scholar 

  24. Schwarz LA, Mateus D, Lallemand J, Navab N (2011) Tracking planes with time of flight cameras and j-linkage. In: 2011 IEEE Workshop on applications of computer vision (WACV), pp 664–671

  25. Shoham Y, Gersho A (1988) Efficient bit allocation for an arbitrary set of quantizers 36(9):1445– 1453

    Google Scholar 

  26. Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from RGBD images. In: ECCV

  27. Sinha S, Steedly D, Szeliski R (2009) Piecewise planar stereo for image-based rendering. In: International conference on computer vision. Kyoto, pp 1881–1888

  28. Sullivan G J, Ohm J R, Han W J, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circ Syst Vid Technol 22 (12):1649–1668

    Article  Google Scholar 

  29. Sullivan G J, Boyce J M, Chen Y, Ohm J R, Segall C A, Vetro A (2013) Standardized extensions of high efficiency video coding (HEVC). IEEE J Selected Top Signal Process 7(6):1001–1016

    Article  Google Scholar 

  30. Torres L, Kunt M (1996) Second generation video coding techniques. Springer, Boston, pp 1–30

    Google Scholar 

  31. Varas D, Alfaro M, Marques F (2015) Multiresolution hierarchy co-clustering for semantic segmentation in sequences with small variations. In: 2015 IEEE International conference on computer vision (ICCV), pp 4579–4587

  32. Verleysen C, De Vleeschouwer C (2016) Piecewise-planar 3d approximation from wide-baseline stereo. In: The IEEE Conference on computer vision and pattern recognition (CVPR)

  33. Wang A, Lu J, Cai J, Wang G, Cham T J (2015) Unsupervised joint feature learning and encoding for RGB-D scene labeling. IEEE Trans Image Process 24 (11):4459–4473

    Article  MathSciNet  Google Scholar 

  34. Yin F, Velastin S A, Ellis T, Makris D (2015) Learning multi-planar scene models in multi-camera videos. IET Comput Vis 9(1):25–40

    Article  Google Scholar 

  35. Zhang J, Li R, Li H, Rusanovskyy D, Hannuksela M M (2011) Ghost Town Fly 3DV sequence for purposes of 3DV standardization. ISO/IEC JTC1/SC29/WG11. Doc M 20027

  36. Zitnick CL, Kang SB, Uyttendaele M, Winder S, Szeliski R (2004) High-quality video view interpolation using a layered representation. In: ACM SIGGRAPH 2004 Papers SIGGRAPH ’04. New York, pp 600–608

Download references

Acknowledgements

This work has been developed in the framework of projects TEC2013-43935-R and TEC2016-75976-R, financed by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Maceira Duch.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duch, M.M., Varas, D., Rubió, J.R.M. et al. 3D hierarchical optimization for multi-view depth map coding. Multimed Tools Appl 77, 19869–19894 (2018). https://doi.org/10.1007/s11042-017-5409-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-5409-z

Keywords

Navigation