Abstract
Multiple human 3D pose estimation is a challenging task. It is mainly because of large variations in the scale and pose of humans, fast motions, multiple persons in the scene, and arbitrary number of visible body parts due to occlusion or truncation. Some of these ambiguities can be resolved by using multiview images. This is due to the fact that more evidences of body parts would be available in multiple views. In this work, a novel method for multiple human 3D pose estimation using evidences in multiview images is proposed. The proposed method utilizes a fully connected pairwise conditional random field that contains two types of pairwise terms. The first pairwise term encodes the spatial dependencies among human body joints based on an articulated human body configuration. The second pairwise term is based on the output of a 2D deep part detector. An approximate inference is then performed using the loopy belief propagation algorithm. The proposed method is evaluated on the Campus, Shelf, Utrecht Multi-Person Motion benchmark, Human3.6M, KTH Football II, and MPII Cooking datasets. Experimental results indicate that the proposed method achieves substantial improvements over the existing state-of-the-art methods in terms of the probability of correct pose and the mean per joint position error performance measures.
Similar content being viewed by others
References
Afrouzian R, Seyedarabi H, Kasaei S (2016) Pose estimation of soccer players using multiple uncalibrated cameras. Multimed Tools Appl 75(12):6809–6827. https://doi.org/10.1007/s11042-015-2611-8
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723. https://doi.org/10.1109/TAC.1974.1100705
Amin S, Andriluka M, Rohrbach M, Schiele B (2013) Multi-view pictorial structures for 3d human pose estimation. In: British Machine Vision Conference, vol. 2. BMVA Press
Amin S, Müller P, Bulling A, Andriluka M (2014) Test-time adaptation for 3d human pose estimation. In: German conference on pattern recognition, pp 253–264. Springer
Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: New benchmark and state of the art analysis. In: IEEE conference on computer vision and pattern recognition (CVPR)
Belagiannis V, Zisserman A (2016). Recurrent human pose estimation. arXiv:1605.02914
Belagiannis V, Amann C, Navab N, Ilic S (2014) Holistic human pose estimation with regression forests. In: Articulated motion and deformable objects, pp 20–30. Springer
Belagiannis V, Amin S, Andriluka M, Schiele B, Navab N, Ilic S (2014) 3d pictorial structures for multiple human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1669–1676. IEEE
Belagiannis V, Wang X, Schiele B, Fua P, Ilic S, Navab N (2014) Multiple human pose estimation with temporally consistent 3D pictorial structures. In: ChaLearn looking at people workshop, European conference on computer vision (ECCV2014). IEEE
Belagiannis V, Rupprecht C, Carneiro G, Navab N (2015) Robust optimization for deep regression. In: 2015 IEEE international conference on computer vision (ICCV), pp 2830–2838. IEEE
Belagiannis V, Amin S, Andriluka M, Schiele B, Navab N, Ilic S (2015) 3d pictorial structures revisited: Multiple human pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence
Berclaz J, Fleuret F, Turetken E, Fua P (2011) Multiple object tracking using k-shortest paths optimization. IEEE Trans Pattern Anal Mach Intell 33(9):1806–1819
Bishop MC (2006) Pattern Recognition and Machine Learning. Springer, Berlin
Bourdev L, Maji S, Brox T, Malik J (2010) Detecting people using mutually consistent poselet activations. In: Computer Vision–ECCV, pp 168–181. Springer
Burenius M, Sullivan J, Carlsson S (2013) 3d pictorial structures for multiple view articulated pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3618–3625. IEEE
Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR
Charles J, Pfister T, Magee D, Hogg D, Zisserman A (2014) Upper body pose estimation with temporal sequential forests. In: Proceedings of the British machine vision conference, pp 1–12. BMVA Press
Charles J, Pfister T, Magee D, Hogg D, Zisserman A (2016) Personalizing human video pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3063– 3072
Chen X, Yuille AL (2014) Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Advances in neural information processing systems, pp 1736–1744
Dantone M, Gall J, Leistner C, Van Gool L (2013) Human pose estimation using body parts dependent joint regressors. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3041–3048. IEEE
Dong J, Chen Q, Xia W, Huang Z, Yan S (2013) A deformable mixture parsing model with parselets. In: IEEE international conference on computer vision (ICCV), pp 3408–3415. IEEE
Dong J, Chen Q, Shen X, Yang J, Yan S (2014) Towards unified human parsing and pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 843–850. IEEE
Felzenszwalb PF, Huttenlocher DP (2006) Efficient belief propagation for early vision. Int J Comput Vis 70(1):41–54
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645. https://doi.org/10.1109/TPAMI.2009.167
Ferrari V, Marin-Jimenez M, Zisserman A (2008) Progressive search space reduction for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Fischler MA, Elschlager RA (1973) The representation and matching of pictorial structures. IEEE Trans Comput 100(1):67–92
Holt B, Ong EJ, Cooper H, Bowden R (2011) Putting the pieces together: Connected poselets for human pose estimation. In: IEEE international conference on computer vision workshops (ICCV Workshops), pp 1196–1201. IEEE
Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) DeeperCut: A deeper, stronger, and faster multi-person pose estimation model. In: Leibe B (ed) Computer Vision – ECCV 2016, Lecture Notes in Computer Science, vol. 9910, pp. 34–50. Springer, Amsterdam, The Netherlands. https://doi.org/10.1007/978-3-319-46466-4_3
Ionescu C, Papava D, Olaru V, Sminchisescu C (2014) Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339
Jain A, Tompson J, Andriluka M, Taylor GW, Bregler C (2013) Learning human pose estimation features with convolutional networks. arXiv:1312.7302
Jain A, Tompson J, LeCun Y, Bregler C (2014) Modeep: A deep learning framework using motion features for human pose estimation. In: Asian conference on computer vision, pp 302–315. Springer
Jammalamadaka N, Zisserman A, Jawahar CV (2017) Human pose search using deep networks. Image Vis Comput 59:31–43. https://doi.org/10.1016/j.imavis.2016.12.002.
Kazemi V, Sullivan J (2012) Using richer models for articulated pose estimation of footballers. In: BMVC, pp 1–10
Kazemi V, Burenius M, Azizpour H, Sullivan J (2013) Multi-view body part recognition with random forests. In: 24th British machine vision conference. British machine vision association
Kiefel M, Gehler P (2014) Human pose estimation with fields of parts. In: Computer Vision–ECCV, pp 331–346. Springer
Li S, Zhang W, Chan AB (2017) Maximum-margin structured learning with deep networks for 3d human pose estimation. Int J Comput Vis 122(1):149–168. https://doi.org/10.1007/s11263-016-0962-x
Mooij JM (2010) libDAI: A free and open source C++ library for discrete approximate inference in graphical models. J. Mach Learn Res 11:2169–2173. http://www.jmlr.org/papers/volume11/mooij10a/mooij10a.pdf
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: ECCV
Pavlakos G, Zhou X, Derpanis KG, Daniilidis K (2017) Harvesting multiple views for marker-less 3d human pose annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Pfister T, Charles J, Zisserman A (2015) Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE international conference on computer vision, pp 1913–1921
Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Poselet conditioned pictorial structures. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 588–595. IEEE
Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Strong appearance and expressive spatial models for human pose estimation. In: IEEE international conference on computer vision (ICCV), pp 3487–3494. IEEE
Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P, Schiele B (2016) DeepCut: Joint subset partition and labeling for multi person pose estimation. In: 29th IEEE conference on computer vision and pattern recognition (CVPR 2016), pp. 4929–4937. IEEE Computer Society, Las Vegas, NV, USA. https://doi.org/10.1109/CVPR.2016.533
Rohrbach M, Amin S, Andriluka M, Schiele B (2012) A database for fine grained activity detection of cooking activities. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 1194–1201. IEEE
Schick A, Stiefelhagen R (2015) 3d pictorial structures for human pose estimation with supervoxels. In: 2015 IEEE winter conference on applications of computer vision (WACV), pp. 140–147. IEEE
Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook M, Moore R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124
Tekin B, Katircioglu I, Salzmann M, Lepetit V, Fua P (2016) Structured prediction of 3d human pose with deep neural networks. CoRR arXiv:1605.05180
Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, pp 1799–1807
Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660
Tran D, Forsyth D (2010) Improved human parsing with a full relational model. In: Computer Vision–ECCV, pp 227–240. Springer
Van der Aa N, Luo X, Giezeman GJ, Tan RT, Veltkamp RC (2011) Umpm benchmark: A multi-person dataset with synchronized video and motion capture data for evaluation of articulated human motion and interaction. In: 2011 IEEE international conference on computer vision workshops (ICCV Workshops), pp 1264–1269. IEEE
Yan C, Zhang Y, Dai F, Wang X, Li L, Dai Q (2014) Parallel deblocking filter for hevc on many-core processor. Electron Lett 50(5):367–368
Yan C, Zhang Y, Dai F, Zhang J, Li L, Dai Q (2014) Efficient parallel hevc intra-prediction on many-core processor. Electron Lett 50(11):805–806
Yan C, Zhang Y, Xu J, Dai F, Li L, Dai Q, Wu F (2014) A highly parallel framework for hevc coding unit partitioning tree decision on many-core processors. IEEE Signal Process Lett 21(5):573–576
Yan C, Zhang Y, Xu J, Dai F, Zhang J, Dai Q, Wu F (2014) Efficient parallel framework for hevc motion estimation on many-core processors. IEEE Trans Circuits Syst Video Technol 24(12):2077–2089
Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR), pp 1385–1392. IEEE
Yang Y, Ramanan D (2013) Articulated human detection with flexible mixtures of parts. IEEE Trans Pattern Anal Mach Intell 35(12):2878–2890
Zhou X, Sun X, Zhang W, Liang S, Wei Y (2016) Deep kinematic pose regression. In: Computer Vision–ECCV 2016 Workshops, pp 186–201. Springer
Zhou X, Zhu M, Leonardos S, Derpanis KG, Daniilidis K (2016) Sparseness meets deepness: 3d human pose estimation from monocular video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4966–4975
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ershadi-Nasab, S., Noury, E., Kasaei, S. et al. Multiple human 3D pose estimation from multiview images. Multimed Tools Appl 77, 15573–15601 (2018). https://doi.org/10.1007/s11042-017-5133-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5133-8