research-article

Localization of hard joints in human pose estimation based on residual down-sampling and attention mechanism

Authors:

Yang TangAuthors Info & Claims

The Visual Computer, Volume 38, Issue 7

Pages 2447 - 2459

https://doi.org/10.1007/s00371-021-02122-5

Published: 01 July 2022 Publication History

Abstract

Hard-joint localization in human pose estimation is a challenging task for some reasons, such as the disappearance of joint points caused by clothing and lighting, the shelter caused by complex environment and the destruction of dependence among each joint point. A majority of existing approaches for hard-joint pose estimation achieve high accuracy by obtaining more high-level feature information. However, most networks suffer from information loss, which is caused by down-sampling. This would result in the loss of joint location. The compensation of information loss introduces useless information to network learning, affecting the extraction of useful information associated with hard joints. Herein, a residual down-sampling module is proposed to replace the pooling layer for down-sampling and fuse high-level features with low-resolution feature maps. This module aims to address the information loss issue. A strategy to guide network learning based on the attention mechanism is proposed, which makes the network focus on useful feature information. A convolutional block attention module is combined with a residual module outside the basic sub-network. The network can learn more effective high-level features. An eight-stack hourglass is used as the basic network, and the proposed method is validated on the MPII and LSP Human Pose dataset. Compared with eight-stack hourglass and HRNet, the proposed method achieves higher accuracy for hard-joint localization. The experimental results show our proposed methods effective for hard-joint localization.

References

[1]

Alp Güler, R., Neverova, N., Kokkinos, I.: Densepose: dense human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7297–7306 (2018)

[2]

Li, J., Wang, C., Zhu, H., Mao, Y., Fang, H.S., Lu, C.: Crowdpose: efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10863–10872 (2019)

[3]

Fang, H.S., Xie, S., Tai, Y.W., Lu, C.: RMPE: regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2334–2343 (2017)

[4]

Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1281–1290 (2017)

[5]

Tang, W., Wu, Y.: Does learning specific features for related parts help human pose estimation? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1107–1116 (2019)

[6]

Sypetkowski M, Sarwas G, and Trzcinski T Synthetic image translation for football players pose estimation J. UCS 2019 25 6 683-700

[7]

Sapp, B., Toshev, A., Taskar, B.: Cascaded models for articulated pose estimation. In: Lecture Notes in Computer Science Proceedings of the 11th European Conference on Computer Vision: Part II, pp. 406–420 (2010)

[8]

Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)

[9]

Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2014).

[10]

Zhang, F., Zhu, X., Ye, M.: Fast human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3517–3526 (2019)

[11]

Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7103–7112 (2018)

[12]

Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)

[13]

Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P.V., Schiele, B.: Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4929–4937 (2016)

[14]

Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Lecture Notes in Computer Science European Conference on Computer Vision. Springer, Cham, pp. 483–499 (2016)

[15]

Newell, A., Huang, Z., Deng, J.: Associative embedding: end-toend learning for joint detection and grouping. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 2277–2287 (2017)

[16]

Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: Proceedings of the 31st AAAI conference on artificial intelligence, pp. 4278–4284 (2016)

[17]

Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)

[18]

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 5998–6008 (2017)

[19]

Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

[20]

Liu X, Xu Q, and Wang N A survey on deep neural network-based image captioning Vis. Comput. 2019 35 3 445-470

[21]

Jiang T, Zhang Z, and Yang Y Modeling coverage with semantic embedding for image caption generation Vis. Comput. 2019 35 11 1655-1665

[22]

Jaderberg M, Simonyan K, and Zisserman A Spatial transformer networks Adv. Neural. Inf. Process. Syst. 2015 28 2017-2025

[23]

Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Lecture Notes in Computer Science Proceedings of the European Conference on Computer Vision, pp. 3–19 (2018)

[24]

Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1831–1840 (2017)

[25]

Su, K., Yu, D., Xu, Z., Geng, X., Wang, C.: Multi-person pose estimation with enhanced channel-wise and spatial information. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5674–5682 (2019)

[26]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). arXiv:1512.03385

[27]

Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, pp: 3686–3693 (2014)

[28]

Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC, vol. 2, no. 4, p. 5 (2010)

Cited By

Shen XWang HCui TGuo ZFu X(2024)Multiple information perception-based attention in YOLO for underwater object detectionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02858-240:3(1415-1438)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s00371-023-02858-2
Xu JLiu WXing WWei X(2023)MSPENet: multi-scale adaptive fusion and position enhancement network for human pose estimationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-022-02460-y39:5(2005-2019)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1007/s00371-022-02460-y
Zhao XGuo CZou Q(2023)Human pose estimation with gated multi-scale feature fusion and spatial mutual informationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-021-02317-w39:1(119-137)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1007/s00371-021-02317-w

Index Terms

Localization of hard joints in human pose estimation based on residual down-sampling and attention mechanism
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

MDA-YOLO Person: a 2D human pose estimation model based on YOLO detection framework
Abstract
Human pose estimation aims to locate and predict the key points of the human body in images or videos. Due to the challenges of capturing complex spatial relationships and handling different body scales, accurate estimation of human pose remains ...
Human Pose Estimation based on Attention Multi-resolution Network
ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

Recently, multi-resolution neural networks, which combine features of different resolutions, have achieved good results in human pose estimation tasks. In this paper, we propose an attention-mechanism-based multi-resolution network, which adds an ...
Tiny Person Pose Estimation via Image and Feature Super Resolution
Image and Graphics
Abstract
Although great progress has been achieved on human pose estimation in recent years, we notice the performance drops dramatically when the scale of target person becomes small. In this paper, we start with analysis on tiny person pose estimation ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Visual Computer: International Journal of Computer Graphics

The Visual Computer: International Journal of Computer Graphics Volume 38, Issue 7

Jul 2022

350 pages

ISSN:0178-2789

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 July 2022

Accepted: 22 March 2021

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shen XWang HCui TGuo ZFu X(2024)Multiple information perception-based attention in YOLO for underwater object detectionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02858-240:3(1415-1438)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s00371-023-02858-2
Xu JLiu WXing WWei X(2023)MSPENet: multi-scale adaptive fusion and position enhancement network for human pose estimationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-022-02460-y39:5(2005-2019)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1007/s00371-022-02460-y
Zhao XGuo CZou Q(2023)Human pose estimation with gated multi-scale feature fusion and spatial mutual informationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-021-02317-w39:1(119-137)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1007/s00371-021-02317-w

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents