Article

PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model

Authors:

George Papandreou,

Liang-Chieh Chen,

Spyros Gidaris,

Jonathan Tompson,

Kevin MurphyAuthors Info & Claims

Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XIV

Pages 282 - 299

https://doi.org/10.1007/978-3-030-01264-9_17

Published: 08 September 2018 Publication History

Abstract

We present a box-free bottom-up approach for the tasks of pose estimation and instance segmentation of people in multi-person images using an efficient single-shot model. The proposed PersonLab model tackles both semantic-level reasoning and object-part associations using part-based modeling. Our model employs a convolutional network which learns to detect individual keypoints and predict their relative displacements, allowing us to group keypoints into person pose instances. Further, we propose a part-induced geometric embedding descriptor which allows us to associate semantic person pixels with their corresponding person instance, delivering instance-level person segmentations. Our system is based on a fully-convolutional architecture and allows for efficient inference, with runtime essentially independent of the number of people present in the scene. Trained on COCO data alone, our system achieves COCO test-dev keypoint average precision of 0.665 using single-scale inference and 0.687 using multi-scale inference, significantly outperforming all previous bottom-up pose estimation systems. We are also the first bottom-up method to report competitive results for the person class in the COCO instance segmentation task, achieving a person category average precision of 0.417.

References

[1]

Lin, T.Y., et al.: Coco 2016 keypoint challenge (2016)

[2]

Newell, A., Deng, J.: Associative embedding: end-to-end learning for joint detection and grouping. In: NIPS (2017)

[3]

Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: CVPR (2017)

[4]

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings IEEE (1998)

[5]

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)

[6]

Fischler, M.A., Elschlager, R.: The representation and matching of pictorial structures. In: IEEE TOC (1973)

[7]

Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)

[8]

Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: people detection and articulated pose estimation. In: CVPR (2009)

[9]

Eichner, M., Ferrari, V.: Better appearance models for pictorial structures. In: BMVC (2009)

[10]

Sapp, B., Jordan, C., Taskar, B.: Adaptive pose priors for pictorial structures. In: CVPR (2010)

[11]

Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures of parts. In: CVPR (2011)

[12]

Dantone, M., Gall, J., Leistner, C., Gool., L.V.: Human pose estimation using body parts dependent joint regressors. In: CVPR (2013)

[13]

Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: CVPR (2011)

[14]

Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditioned pictorial structures. In: CVPR (2013)

[15]

Sapp, B., Taskar, B.: Modec: Multimodal decomposable models for human pose estimation. In: CVPR (2013)

[16]

Gkioxari, G., Arbelaez, P., Bourdev, L., Malik, J.: Articulated pose estimation using discriminative armlet classifiers. In: CVPR (2013)

[17]

Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: CVPR (2014)

[18]

Jain, A., Tompson, J., Andriluka, M., Taylor, G., Bregler, C.: Learning human pose estimation features with convolutional networks. In: ICLR (2014)

[19]

Tompson, J., Jain, A., LeCun, Y., Bregler, C.: Join training of a convolutional network and a graphical model for human pose estimation. In: NIPS (2014)

[20]

Chen, X., Yuille, A.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: NIPS (2014)

[21]

Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 648–656 (2015)

[22]

Newell A, Yang K, and Deng J Leibe B, Matas J, Sebe N, and Welling M Stacked hourglass networks for human pose estimation Computer Vision – ECCV 2016 2016 Cham Springer 483-499

[23]

Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR (2014)

[24]

Bulat A and Tzimiropoulos G Leibe B, Matas J, Sebe N, and Welling M Human pose estimation via convolutional part heatmap regression Computer Vision – ECCV 2016 2016 Cham Springer 717-732

[25]

Belagiannis, V., Zisserman, A.: Recurrent human pose estimation. arxiv (2016)

[26]

Gkioxari G, Toshev A, and Jaitly N Leibe B, Matas J, Sebe N, and Welling M Chained predictions using convolutional neural networks Computer Vision – ECCV 2016 2016 Cham Springer 728-743

[27]

Pishchulin, L., et al.: DeepCut: joint subset partition and labeling for multi person pose estimation. In: CVPR (2016)

[28]

Insafutdinov E, Pishchulin L, Andres B, Andriluka M, and Schiele B Leibe B, Matas J, Sebe N, and Welling M DeeperCut: a deeper, stronger, and faster multi-person pose estimation model Computer Vision – ECCV 2016 2016 Cham Springer 34-50

[29]

Insafutdinov, E., Andriluka, M., Pishchulin, L., Tang, S., Andres, B., Schiele, B.: Articulated multi-person tracking in the wild. arXiv:1612.01465 (2016)

[30]

Iqbal U and Gall J Hua G and Jégou H Multi-person pose estimation with local joint-to-person associations Computer Vision – ECCV 2016 Workshops 2016 Cham Springer 627-642

[31]

Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. arXiv (2016)

[32]

Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)

[33]

Papandreou, G., et al.: Towards accurate multi-person pose estimation in the wild. In: CVPR (2017)

[34]

He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. arXiv:1703.06870v2 (2017)

[35]

Huang, S., Gong, M., Tao, D.: A coarse-fine network for keypoint localization. In: ICCV (2017)

[36]

Fang, H.S., Xie, S., Tai, Y.W., Lu, C.: RMPE: regional multi-person pose estimation. In: ICCV (2017)

[37]

Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. arXiv:1711.07319 (2017)

[38]

Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)

[39]

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)

[40]

Dai, J., Li, Y., He, K., Sun, J.: R-FCN: Object detection via region-based fully convolutional networks. In: NIPS (2016)

[41]

Carreira J and Sminchisescu C CPMC: automatic object segmentation using constrained parametric min-cuts PAMI 2012 34 7 1312-1328

[42]

Arbeláez, P., Pont-Tuset, J., Barron, J.T., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: CVPR (2014)

[43]

Hariharan B, Arbeláez P, Girshick R, and Malik J Fleet D, Pajdla T, Schiele B, and Tuytelaars T Simultaneous detection and segmentation Computer Vision – ECCV 2014 2014 Cham Springer 297-312

[44]

Pinheiro, P.O., Collobert, R., Dollár, P.: Learning to segment object candidates. In: NIPS (2015)

[45]

Dai, J., He, K., Sun, J.: Convolutional feature masking for joint object and stuff segmentation. In: CVPR (2015)

[46]

Pinheiro PO, Lin T-Y, Collobert R, and Dollár P Leibe B, Matas J, Sebe N, and Welling M Learning to refine object segments Computer Vision – ECCV 2016 2016 Cham Springer 75-91

[47]

Dai J, He K, Li Y, Ren S, and Sun J Leibe B, Matas J, Sebe N, and Welling M Instance-sensitive fully convolutional networks Computer Vision – ECCV 2016 2016 Cham Springer 534-549

[48]

Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: CVPR (2016)

[49]

Peng, C., et al.: MegDet: a large mini-batch object detector (2018)

[50]

Chen, L.C., Hermans, A., Papandreou, G., Schroff, F., Wang, P., Adam, H.: MaskLab: instance segmentation by refining object detection with semantic and direction features. In: CVPR (2018)

[51]

Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: CVPR (2018)

[52]

Liang, X., Wei, Y., Shen, X., Yang, J., Lin, L., Yan, S.: Proposal-free network for instance-level object segmentation. arXiv preprint arXiv:1509.02636 (2015)

[53]

Uhrig Jonas, Cordts Marius, Franke Uwe, and Brox Thomas Pixel-Level Encoding and Depth Layering for Instance-Level Semantic Labeling Lecture Notes in Computer Science 2016 Cham Springer International Publishing 14-25

[54]

Zhang, Z., Schwing, A.G., Fidler, S., Urtasun, R.: Monocular object instance segmentation and depth ordering with CNNs. In: ICCV (2015)

[55]

Zhang, Z., Fidler, S., Urtasun, R.: Instance-level segmentation for autonomous driving with deep densely connected MRFs. In: CVPR (2016)

[56]

Wu, Z., Shen, C., van den Hengel, A.: Bridging category-level and instance-level semantic image segmentation. arXiv:1605.06885 (2016)

[57]

Liu, S., Qi, X., Shi, J., Zhang, H., Jia, J.: Multi-scale patch aggregation (MPA) for simultaneous detection and segmentation. In: CVPR (2016)

[58]

Levinkov, E., et al.: Joint graph decomposition & node labeling: problem, algorithms, applications. In: CVPR (2017)

[59]

Kirillov, A., Levinkov, E., Andres, B., Savchynskyy, B., Rother, C.: InstanceCut: from edges to instances with multicut. In: CVPR (2017)

[60]

Jin, L., Chen, Z., Tu, Z.: Object detection free instance segmentation with labeling transformations. arXiv:1611.08991 (2016)

[61]

Fathi, A., et al.: Semantic instance segmentation via deep metric learning. arXiv:1703.10277 (2017)

[62]

De Brabandere, B., Neven, D., Van Gool, L.: Semantic instance segmentation with a discriminative loss function. arXiv:1708.02551 (2017)

[63]

Bai, M., Urtasun, R.: Deep watershed transform for instance segmentation. In: CVPR (2017)

[64]

Liu, S., Jia, J., Fidler, S., Urtasun, R.: SGN: sequential grouping networks for instance segmentation. In: ICCV (2017)

[65]

Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS: improving object detection with one line of code. In: ICCV (2017)

[66]

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)

[67]

Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI (2017)

[68]

Radosavovic, I., Dollár, P., Girshick, R., Gkioxari, G., He, K.: Data distillation: towards omni-supervised learning. arXiv:1712.04440 (2017)

[69]

Lin T-Y et al. Fleet D, Pajdla T, Schiele B, Tuytelaars T, et al. Microsoft COCO: common objects in context Computer Vision – ECCV 2014 2014 Cham Springer 740-755

[70]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

[71]

Russakovsky O et al. ImageNet large scale visual recognition challenge IJCV 2015 115 3 211-252

[72]

Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)

[73]

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). tensorflow.org

Cited By

Yang YKang SKim MKim GKim H(2025)BalanceVR: balance training to increase tolerance to cybersickness in immersive virtual realityVirtual Reality10.1007/s10055-024-01097-729:1Online publication date: 13-Feb-2025
https://dl.acm.org/doi/10.1007/s10055-024-01097-7
Arenas RMéndez RPedraza LFlores J(2024)Benchmarking analysis of human pose estimation solutions for virtual television setsProceedings of the XXIV International Conference on Human Computer Interaction10.1145/3657242.3657244(1-10)Online publication date: 19-Jun-2024
https://dl.acm.org/doi/10.1145/3657242.3657244
Wang DZhang SWang YTian YHuang TGao WEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)HumVis: Human-Centric Visual Analysis SystemProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612663(9396-9398)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612663
Show More Cited By

Index Terms

PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model
1. Computing methodologies

Index terms have been assigned to the content through auto-classification.

Recommendations

PoseDet: Fast Multi-Person Pose Estimation Using Pose Embedding
2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)
Current methods of multi-person pose estimation typically treat the localization and the association of body joints separately. It is convenient but inefficient, leading to additional computation and a waste of time. This paper, however, presents a novel ...
Globally-Robust Instance Identification and Locally-Accurate Keypoint Alignment for Multi-Person Pose Estimation
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Scenes with a large number of human instances are characterized by significant overlap of the instances with similar appearance, occlusion, and scale variation. We propose GRAPE, a novel method that leverages both Globally Robust human instance ...
Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation
Computer Vision – ECCV 2022
Abstract
Occlusion poses a great threat to monocular multi-person 3D human pose estimation due to large variability in terms of the shape, appearance, and position of occluders. While existing methods try to handle occlusion with pose priors/constraints, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XIV

Sep 2018

844 pages

ISBN:978-3-030-01263-2

DOI:10.1007/978-3-030-01264-9

Editors:
Vittorio Ferrari
Google Research, Zurich, Switzerland
,
Martial Hebert
Carnegie Mellon University, Pittsburgh, PA, USA
,
Cristian Sminchisescu
Google Research, Zurich, Switzerland
,
Yair Weiss
Hebrew University of Jerusalem, Jerusalem, Israel

© Springer Nature Switzerland AG 2018.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 08 September 2018

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

43
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang YKang SKim MKim GKim H(2025)BalanceVR: balance training to increase tolerance to cybersickness in immersive virtual realityVirtual Reality10.1007/s10055-024-01097-729:1Online publication date: 13-Feb-2025
https://dl.acm.org/doi/10.1007/s10055-024-01097-7
Arenas RMéndez RPedraza LFlores J(2024)Benchmarking analysis of human pose estimation solutions for virtual television setsProceedings of the XXIV International Conference on Human Computer Interaction10.1145/3657242.3657244(1-10)Online publication date: 19-Jun-2024
https://dl.acm.org/doi/10.1145/3657242.3657244
Wang DZhang SWang YTian YHuang TGao WEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)HumVis: Human-Centric Visual Analysis SystemProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612663(9396-9398)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612663
Wang HLiu JTang JWu GEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Lightweight Super-Resolution Head for Human Pose EstimationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612236(2353-2361)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612236
Mollyn VArakawa RGoel MHarrison CAhuja K(2023)IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and EarbudsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581392(1-12)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581392
He XWandt BRhodin HKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)AutoLinkProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602888(36123-36141)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602888
Xiao YSu KWang XYu DJin LHe MYuan ZKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)QueryPoseProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601175(12464-12477)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601175
Chen ZZhu YLi ZYang FLi WWang HZhao CWu LZhao RWang JTang MKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Obj2SeqProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600451(2494-2506)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3600451
Li LZhao LXu LXu JJiang SAizawa KChen PYanai K(2022)Towards High Performance One-Stage Human Pose EstimationProceedings of the 4th ACM International Conference on Multimedia in Asia10.1145/3551626.3564968(1-5)Online publication date: 13-Dec-2022
https://dl.acm.org/doi/10.1145/3551626.3564968
Topham LKhan WAl-Jumeily DHussain A(2022)Human Body Pose Estimation for Gait Identification: A Comprehensive Survey of Datasets and ModelsACM Computing Surveys10.1145/353338455:6(1-42)Online publication date: 7-Dec-2022
https://dl.acm.org/doi/10.1145/3533384
Show More Cited By

View Options

View options

Figures

Tables

Media

View Table of Conten