Abstract
Depth data has a widespread use since the popularity of high resolution 3D sensors. In multi-view sequences, depth information is used to supplement the color data of each view. This article proposes a joint encoding of multiple depth maps with a unique representation. Color and depth images of each view are segmented independently and combined in an optimal Rate-Distortion fashion. The resulting partitions are projected to a reference view where a coherent hierarchy for the multiple views is built. A Rate-Distortion optimization is applied to obtain the final segmentation choosing nodes of the hierarchy. The consistent segmentation is used to robustly encode depth maps of multiple views obtaining competitive results with HEVC coding standards.
Similar content being viewed by others
References
Barrera F, Padoy N (2014) Piecewise planar decomposition of 3D point clouds obtained from multiple static rgb-d cameras. In: 2014 2nd International conference on 3D vision, vol 1, pp 194–201
Charikar M, Guruswami V, Wirth A (2003) Clustering with qualitative information. In: Proceedings of the 44th Annual IEEE symposium on foundations of computer science FOCS ’03. IEEE Computer Society, Washington, DC, pp 524–533
Fehn C (2004) Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV
Fischler M A, Bolles R C (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395
Freeman H (1961) On the encoding of arbitrary geometric configurations. IRE Trans Electron Comput EC-10(2):260–268
Gao Y, Cheung G, Maugey T, Frossard P, Liang J (2016) Encoder-driven inpainting strategy in multiview video compression. IEEE Trans Image Process 25 (1):134–149
Glasner D, Vitaladevuni SN, Basri R (2011) Contour-based joint clustering of multiple segmentations. In: Proceedings of the 2011 IEEE Conference on computer vision and pattern recognition CVPR ’11. IEEE Computer Society, Washington, DC, pp 2385–2392
Gupta S, Arbeláez P, Malik J (2013) Perceptual organization and recognition of indoor scenes from RGB-D images. In: 2013 IEEE Conference on computer vision and pattern recognition (CVPR), pp 564–571
Kowdle A, Sinha S, Szeliski R (2012) Multiple view object cosegmentation using appearance and stereo cues. In: European Conference on computer vision. Firenze, pp 789–803 https://doi.org/10.1007/978-3-642-33715-4_57 https://doi.org/10.1007/978-3-642-33715-4_57
Liang B, Zheng L (2015) A survey on human action recognition using depth sensors. In: 2015 International conference on digital image computing: techniques and applications (DICTA), pp 1–8
Lucas L F R, Wegner K, Rodrigues N M M, Pagliari C L, da Silva E A B, de Faria S M M (2015) Intra predictive depth map coding using flexible block partitioning. IEEE Trans Image Process 24(11):4055– 4068
Maceira M, Morros J R, Ruiz-Hidalgo J (2016) Depth map compression via 3D region-based representation. Multimed Tools Appl 1–24
Merkle P, Smolic A, Muller K, Wiegand T (2007) Efficient prediction structures for multiview video coding. IEEE Trans Circ Syst Video Technol 17(11):1461–1473
Merkle P, Müller K, Marpe D, Wiegand T (2016) Depth intra coding for 3D video based on geometric primitives. IEEE Trans Circ Syst Vid Technol 26(3):570–582
Micusik B, Kosecka J (2009) Piecewise planar city 3D modeling from street view panoramic sequences. In: IEEE Conference on computer vision and pattern recognition, 2009. CVPR 2009., pp 2906–2912
Müller K, Merkle P, Wiegand T (2011) 3-D video representation using depth maps. Proc IEEE 99(4):643–656
Müller K, Schwarz H, Marpe D, Bartnik C, Bosse S, Brust H, Hinz T, Lakshman H, Merkle P, Rhee FH, Tech G, Winken M, Wiegand T (2013) 3D High-efficiency video coding for multi-view video and depth data. IEEE Trans Image Process 22(9):3366–3378
Ortega A, Ramchandran K (1998) Rate-distortion methods for image and video compression. IEEE Signal Process Mag 15(6):23–50
Ostermann J, Bormans J, List P, Marpe D, Narroschke M, Pereira F, Stockhammer T, Wedi T (2004) Video coding with H.264/AVC: tools, performance, and complexity. IEEE Circ Syst Mag 4(1):7–28
Özkalayc BO, Alatan AA (2014) 3D planar representation of stereo depth images for 3DTV applications. IEEE Trans Image Process 23(12):5222–5232
Ren X, Bo L, Fox D (2012) RGB-(D) scene labeling: features and algorithms. In: 2012 IEEE Conference on computer vision and pattern recognition (CVPR), pp 2759–2766
Rusanovskyy D, Aflaki P, Hannuksela M (2011) Undo dancer 3DV sequence for purposes of 3DV standardization. ISO/IEC JTC1/SC29/WG11 MPEG2010 M 20028
Salembier P, Garrido L (2000) Binary partition tree as an efficient representation for image processing, segmentation, and information retrieval. IEEE Trans Image Process 9(4):561–576
Schwarz LA, Mateus D, Lallemand J, Navab N (2011) Tracking planes with time of flight cameras and j-linkage. In: 2011 IEEE Workshop on applications of computer vision (WACV), pp 664–671
Shoham Y, Gersho A (1988) Efficient bit allocation for an arbitrary set of quantizers 36(9):1445– 1453
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from RGBD images. In: ECCV
Sinha S, Steedly D, Szeliski R (2009) Piecewise planar stereo for image-based rendering. In: International conference on computer vision. Kyoto, pp 1881–1888
Sullivan G J, Ohm J R, Han W J, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circ Syst Vid Technol 22 (12):1649–1668
Sullivan G J, Boyce J M, Chen Y, Ohm J R, Segall C A, Vetro A (2013) Standardized extensions of high efficiency video coding (HEVC). IEEE J Selected Top Signal Process 7(6):1001–1016
Torres L, Kunt M (1996) Second generation video coding techniques. Springer, Boston, pp 1–30
Varas D, Alfaro M, Marques F (2015) Multiresolution hierarchy co-clustering for semantic segmentation in sequences with small variations. In: 2015 IEEE International conference on computer vision (ICCV), pp 4579–4587
Verleysen C, De Vleeschouwer C (2016) Piecewise-planar 3d approximation from wide-baseline stereo. In: The IEEE Conference on computer vision and pattern recognition (CVPR)
Wang A, Lu J, Cai J, Wang G, Cham T J (2015) Unsupervised joint feature learning and encoding for RGB-D scene labeling. IEEE Trans Image Process 24 (11):4459–4473
Yin F, Velastin S A, Ellis T, Makris D (2015) Learning multi-planar scene models in multi-camera videos. IET Comput Vis 9(1):25–40
Zhang J, Li R, Li H, Rusanovskyy D, Hannuksela M M (2011) Ghost Town Fly 3DV sequence for purposes of 3DV standardization. ISO/IEC JTC1/SC29/WG11. Doc M 20027
Zitnick CL, Kang SB, Uyttendaele M, Winder S, Szeliski R (2004) High-quality video view interpolation using a layered representation. In: ACM SIGGRAPH 2004 Papers SIGGRAPH ’04. New York, pp 600–608
Acknowledgements
This work has been developed in the framework of projects TEC2013-43935-R and TEC2016-75976-R, financed by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Duch, M.M., Varas, D., Rubió, J.R.M. et al. 3D hierarchical optimization for multi-view depth map coding. Multimed Tools Appl 77, 19869–19894 (2018). https://doi.org/10.1007/s11042-017-5409-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5409-z