Abstract
Recovering the 3D human pose from a single image with 2D joints is a challenging task in computer vision applications. The sparse representation (SR) model has been successfully adopted in 3D pose estimation approaches. However, since existing available training 3D data are often collected in a constrained environment (i.e., indoor) with limited diversity of subjects and actions, most SR-based approaches would have a lower generalization to real-world scenarios that may contain more complex cases. To alleviate this issue, this paper proposes SDM3d, a novel shape decomposition using multiple geometric priors for 3D pose estimation. SDM3d makes a new attempt by separating a 3D pose into the global structure and body deformations that are encoded explicitly via different priors constraints. Furthermore, a joint learning strategy is designed to learn two over-complete dictionaries from training data to capture more geometric priors information. We have evaluated SDM3d on four well-recognized benchmarks, i.e., Human3.6M, HumanEva-I, CMU MoCap, and MPII. The experiment results show the effectiveness of SDM3d.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agudo A, Moreno-Noguer F (2017) Force-based representation for non-rigid shape and elastic model estimation. IEEE Trans Pattern Anal Mach Intell 40(9):2137–2150
Akhter I, Black MJ (2015) Pose-conditioned joint angle limits for 3D human pose reconstruction. In: Computer vision and pattern recognition, pp 1446–1455
Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: Computer vision and pattern recognition, pp 3686–3693
Bo L, Sminchisescu C (2010) Twin gaussian processes for structured prediction. Int J Comput Vis 87(1–2):28–52
Bogo F, Kanazawa A, Lassner C, Gehler P, Romero J, Black MJ (2016) Keep it smpl: automatic estimation of 3D human pose and shape from a single image. In: European conference on computer vision, pp 561–578
Boumal N, Mishra B, Absil PA, Sepulchre R (2013) Manopt, a matlab toolbox for optimization on manifolds. J Mach Learn Res 15(1):1455–1459
Boyd SP, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn Arch 3(1):1–122
Candes EJ, Tao T (2006) Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans Inform Theory 52(12):5406–5425
Cao W, Yang Z, Ren X, Lyu L, Zhang B, Zhang Y, Wu E (2019) An improved solution for deformation simulation of nonorthotropic geometric models. Comput Anim Virtual Worlds 31:e1915
Chen CH, Ramanan D (2017) 3D human pose estimation = 2D pose estimation+ matching. In: Computer vision and pattern recognition, pp 5759–5767
Chen W, Wang H, Li Y, Su H, Wang Z, Tu C, Lischinski D, Cohen-Or D, Chen B (2016) Synthesizing training images for boosting human 3D pose estimation. In: International conference on 3d vision, pp 479–488
CMU (2014) Mocap: Carnegie mellon university motion capture database. http://mocap.cs.cmu.edu/
Cootes TF, Taylor CJ, Cooper DH, Graham J (1995) Active shape models-their training and application. Comput Vis Image Underst 61(1):38–59
Dai Y, Li H, He M (2012) A simple prior-free method for non-rigid structure-from-motion factorization. In: CVPR, pp 2018–2025
Du Y, Wong Y, Liu Y, Han F, Gui Y, Wang Z, Kankanhalli M, Geng W (2016) Marker-less 3D human motion capture with monocular image sequence and height-maps. In: European conference on computer vision, pp 20–36
Ehlers K, Brama K (2016) A human-robot interaction interface for mobile and stationary robots based on real-time 3d human body and hand-finger pose estimation. In: IEEE international conference on emerging technologies and factory automation, pp 1–6
Fan X, Zheng K, Zhou Y, Wang S (2014) Pose locality constrained representation for 3D human pose reconstruction. In: European conference on computer vision, pp 174–188
Hachiuma R, Saito H (2016) Recognition and pose estimation of primitive shapes from depth images for spatial augmented reality. In: 2016 IEEE 2nd workshop on everyday virtual reality, pp 32–35
Ionescu C, Papava D, Olaru V, Sminchisescu C (2014) Human3.6m: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339
Jiang H (2010) 3D human pose reconstruction using millions of exemplars. In: International conference on pattern recognition, pp 1674–1677
Jiang M, Yu Z, Zhang Y, Wang Q, Li C, Lei Y (2019) Reweighted sparse representation with residual compensation for 3d human pose estimation from a single rgb image. Neurocomputing 358(C):332–343
Katircioglu I, Tekin B, Salzmann M, Lepetit V, Fua P (2018) Learning latent representations of 3D human pose with deep neural networks. Int J Comput Vis 126(12):1–16
Kostrikov I, Gall J (2014) Depth sweep regression forests for estimating 3D human pose from images. In: British machine vision conference, pp 1–13
Lawrence ND, Moore AJ (2007) Hierarchical gaussian process latent variable models. In: International conference on machine learning, pp 481–488
Li S, Zhang W, Chan AB (2017) Maximum-margin structured learning with deep networks for 3D human pose estimation. Int J Comput Vis 122(1):149–168
Lin M, Liang L, Liang X, Wang K, Hui C, Lin M, Liang L, Liang X, Wang K, Hui C (2017) Recurrent 3D pose sequence machines. In: Computer vision and pattern recognition, pp 5543–5552
Liu Z, Song X, Tang Z (2015) Fusing hierarchical multi-scale local binary patterns and virtual mirror samples to perform face recognition. Neural Comput Appl 26(8):2013–2026
Lv Z (2019) Robust3d: a robust 3d face reconstruction application. In: Neural computing and applications, pp 1–8
Martinez J, Hossain R, Romero J, Little JJ (2017) A simple yet effective baseline for 3D human pose estimation. In: International conference on computer vision, pp 2659–2668
Mehta D, Rhodin H, Casas D, Fua P, Sotnychenko O, Xu W, Theobalt C (2017) Monocular 3d human pose estimation in the wild using improved cnn supervision, pp 506–516
Morenonoguer F (2017) 3D human pose estimation from a single image via distance matrix regression. In: Computer vision and pattern recognition, pp 1561–1570
Morozov AA, Sushkova OS, Polupanov AF (2017) Object-oriented logic programming of 3d intelligent video surveillance: the problem statement. In: IEEE 26th international symposium on industrial electronics, pp 1631–1636
Nesterov Yu (2013) Gradient methods for minimizing composite functions. Math Program 140(1):125–161
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision, pp 483–499
Olshausen BA, Field DJ (1997) Sparse coding with an overcomplete basis set: a strategy employed by v1? Vis Res 37(23):3311–3325
Park D, Ramanan D (2015) Articulated pose estimation with tiny synthetic videos. In: Computer vision and pattern recognition workshops, pp 58–66
Pavlakos G, Zhou X, Derpanis KG, Daniilidis K (2017) Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Computer vision and pattern recognition, pp 1263–1272
Pishchulin L, Jain A, Andriluka M, Thormählen T, Schiele B (2012) Articulated people detection and pose estimation: reshaping the future. In: Computer vision and pattern recognition, pp 3178–3185
Radwan I, Dhall A, Goecke R (2013) Monocular image 3D human pose estimation under self-occlusion. In: International conference on computer vision, pp 1888–1895
Ramakrishna V, Kanade T, Sheikh Y (2012) Reconstructing 3D human pose from 2D image landmarks. In: European conference on computer vision, pp 573–586
Sanzari M, Ntouskos V, Pirri F (2016) Bayesian image based 3D pose estimation. In: European conference on computer vision, pp 566–582
Sarafianos N, Boteanu B, Ionescu B, Kakadiaris IA (2016) 3D human pose estimation: a review of the literature and analysis of covariates. Comput Vis Image Underst 152:1–20
Sedai S, Bennamoun M, Huynh DQ (2013) Discriminative fusion of shape and appearance features for human pose estimation. Pattern Recognit 46(12):3223–3237
Shao Y, Nong S, Gao C, Li M (2018) Spatial and class structure regularized sparse representation graph for semi-supervised hyperspectral image classification. Pattern Recognit 81:102–114
Sigal L, Black MJ (2006) Humaneva: synchronized video and motion capture dataset for evaluation of articulated human motion. Int J Comput Vis 87(1–2):4–27
Sigal L, Memisevic R, Fleet DJ (2009) Shared kernel information embedding for discriminative inference. In: Computer vision and pattern recognition, pp 2852–2859
Simo-Serra E, Quattoni A, Torras C, Moreno-Noguer F (2013) A joint model for 2D and 3D pose estimation from a single image. In: Computer vision and pattern recognition, pp 3634–3641
Simo-Serra E, Ramisa A, Alenyà G, Torras C (2012) Single image 3D human pose estimation from noisy observations. In: Computer vision and pattern recognition, pp 2673–2680
Sminchisescu C, Jepson A (2004) Generative modeling for continuous non-linearly embedded visual inference. In: International conference on machine learning
Tekin B, Katircioglu I, Salzmann M, Lepetit V, Fua P (2016) Structured prediction of 3D human pose with deep neural networks. arXiv:1605.05180
Tekin B, Rozantsev A, Lepetit V, Fua P (2016) Direct prediction of 3D body poses from motion compensated sequences. In: Computer vision and pattern recognition, pp 991–1000
Varol G, Romero J, Martin X, Mahmood N, Black M, Laptev I, Schmid C (2017) Learning from synthetic humans. In: Computer vision and pattern recognition, pp 4627–4635
Wang C, Wang Y, Lin Z, Yuille A (2019) Robust 3D human pose estimation from single images or video sequences. IEEE Trans Pattern Anal Mach Intell 41(5):1227–1241
Wang C, Wang Y, Lin Z, Yuille AL, Gao W (2014) Robust estimation of 3D human poses from a single image. In: Computer vision and pattern recognition, pp 2369–2376
Wang K, Lin L, Jiang C, Qian C, Wei P (2019) 3D human pose machines with self-supervised learning. In: IEEE transactions on pattern analysis and machine intelligence, p 1
Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: Computer vision and pattern recognition, pp 4724–4732
Yang X, Sun Q, Wang T (2019) No-reference image quality assessment based on sparse representation. Neural Comput Appl 31(10):6643–6658
Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: Computer vision and pattern recognition, pp 1385–1392
Yang Z, Tang L, Zhang K, Wong PK (2018) Multi-view cnn feature aggregation with elm auto-encoder for 3d shape recognition. Cognit Comput 10(6):908–921
Yasin H, Iqbal U, Krüger B, Weber A, Gall J (2016) A dual-source approach for 3D pose estimation from a single image. In: Computer vision and pattern recognition, pp 4948–4956
Zeng S, Gou J, Yang X (2018) Improving sparsity of coefficients for robust sparse and collaborative representation-based image classification. Neural Comput Appl 30(10):2965–2978
Zhang L, Yang M, Feng X (2011) Sparse representation or collaborative representation: which helps face recognition? In: International conference on computer vision, pp 471–478
Zhou X, Huang Q, Sun X, Xue X, Wei Y (2017) Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: International conference on computer vision, pp 398–407
Zhou X, Leonardos S, Hu X, Daniilidis K (2015) 3D shape estimation from 2d landmarks: a convex relaxation approach. In: Computer vision and pattern recognition, pp 4447–4455
Zhou X, Sun X, Zhang W, Liang S, Wei Y (2016) Deep kinematic pose regression. In: European conference on computer vision, pp 186–201
Zhou X, Zhu M, Leonardos S, Daniilidis K (2017) Sparse representation for 3D shape estimation: a convex relaxation approach. IEEE Trans Pattern Anal Mach Intell 39(8):1648–1661
Zhou X, Zhu M, Leonardos S, Derpanis KG, Daniilidis K (2016) Sparseness meets deepness: 3D human pose estimation from monocular video. In: Computer vision and pattern recognition, pp 4966–4975
Zhou X, Zhu M, Pavlakos G, Leonardos S, Derpanis KG, Daniilidis K (2019) Monocap: monocular human motion capture using a cnn coupled with a geometric prior. IEEE Trans Pattern Anal Mach Intell 41(4):901–914
Acknowledgements
This research was supported by the National Nature Science Foundation of China (Grant No. 61671397).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jiang, M., Yu, Z., Li, C. et al. SDM3d: shape decomposition of multiple geometric priors for 3D pose estimation. Neural Comput & Applic 33, 2165–2181 (2021). https://doi.org/10.1007/s00521-020-05086-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05086-0