Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Localization of hard joints in human pose estimation based on residual down-sampling and attention mechanism

Published: 01 July 2022 Publication History

Abstract

Hard-joint localization in human pose estimation is a challenging task for some reasons, such as the disappearance of joint points caused by clothing and lighting, the shelter caused by complex environment and the destruction of dependence among each joint point. A majority of existing approaches for hard-joint pose estimation achieve high accuracy by obtaining more high-level feature information. However, most networks suffer from information loss, which is caused by down-sampling. This would result in the loss of joint location. The compensation of information loss introduces useless information to network learning, affecting the extraction of useful information associated with hard joints. Herein, a residual down-sampling module is proposed to replace the pooling layer for down-sampling and fuse high-level features with low-resolution feature maps. This module aims to address the information loss issue. A strategy to guide network learning based on the attention mechanism is proposed, which makes the network focus on useful feature information. A convolutional block attention module is combined with a residual module outside the basic sub-network. The network can learn more effective high-level features. An eight-stack hourglass is used as the basic network, and the proposed method is validated on the MPII and LSP Human Pose dataset. Compared with eight-stack hourglass and HRNet, the proposed method achieves higher accuracy for hard-joint localization. The experimental results show our proposed methods effective for hard-joint localization.

References

[1]
Alp Güler, R., Neverova, N., Kokkinos, I.: Densepose: dense human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7297–7306 (2018)
[2]
Li, J., Wang, C., Zhu, H., Mao, Y., Fang, H.S., Lu, C.: Crowdpose: efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10863–10872 (2019)
[3]
Fang, H.S., Xie, S., Tai, Y.W., Lu, C.: RMPE: regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2334–2343 (2017)
[4]
Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1281–1290 (2017)
[5]
Tang, W., Wu, Y.: Does learning specific features for related parts help human pose estimation? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1107–1116 (2019)
[6]
Sypetkowski M, Sarwas G, and Trzcinski T Synthetic image translation for football players pose estimation J. UCS 2019 25 6 683-700
[7]
Sapp, B., Toshev, A., Taskar, B.: Cascaded models for articulated pose estimation. In: Lecture Notes in Computer Science Proceedings of the 11th European Conference on Computer Vision: Part II, pp. 406–420 (2010)
[8]
Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
[9]
Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2014).
[10]
Zhang, F., Zhu, X., Ye, M.: Fast human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3517–3526 (2019)
[11]
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7103–7112 (2018)
[12]
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)
[13]
Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P.V., Schiele, B.: Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4929–4937 (2016)
[14]
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Lecture Notes in Computer Science European Conference on Computer Vision. Springer, Cham, pp. 483–499 (2016)
[15]
Newell, A., Huang, Z., Deng, J.: Associative embedding: end-toend learning for joint detection and grouping. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 2277–2287 (2017)
[16]
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: Proceedings of the 31st AAAI conference on artificial intelligence, pp. 4278–4284 (2016)
[17]
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)
[18]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 5998–6008 (2017)
[19]
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
[20]
Liu X, Xu Q, and Wang N A survey on deep neural network-based image captioning Vis. Comput. 2019 35 3 445-470
[21]
Jiang T, Zhang Z, and Yang Y Modeling coverage with semantic embedding for image caption generation Vis. Comput. 2019 35 11 1655-1665
[22]
Jaderberg M, Simonyan K, and Zisserman A Spatial transformer networks Adv. Neural. Inf. Process. Syst. 2015 28 2017-2025
[23]
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Lecture Notes in Computer Science Proceedings of the European Conference on Computer Vision, pp. 3–19 (2018)
[24]
Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1831–1840 (2017)
[25]
Su, K., Yu, D., Xu, Z., Geng, X., Wang, C.: Multi-person pose estimation with enhanced channel-wise and spatial information. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5674–5682 (2019)
[26]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). arXiv:1512.03385
[27]
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, pp: 3686–3693 (2014)
[28]
Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC, vol. 2, no. 4, p. 5 (2010)

Cited By

View all
  • (2024)Multiple information perception-based attention in YOLO for underwater object detectionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02858-240:3(1415-1438)Online publication date: 1-Mar-2024
  • (2023)MSPENet: multi-scale adaptive fusion and position enhancement network for human pose estimationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-022-02460-y39:5(2005-2019)Online publication date: 1-May-2023
  • (2023)Human pose estimation with gated multi-scale feature fusion and spatial mutual informationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-021-02317-w39:1(119-137)Online publication date: 1-Jan-2023

Index Terms

  1. Localization of hard joints in human pose estimation based on residual down-sampling and attention mechanism
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image The Visual Computer: International Journal of Computer Graphics
        The Visual Computer: International Journal of Computer Graphics  Volume 38, Issue 7
        Jul 2022
        350 pages

        Publisher

        Springer-Verlag

        Berlin, Heidelberg

        Publication History

        Published: 01 July 2022
        Accepted: 22 March 2021

        Author Tags

        1. Residual down-sampling
        2. Attention mechanisms
        3. Deep learning
        4. Hard joints
        5. Pose estimation

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 18 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Multiple information perception-based attention in YOLO for underwater object detectionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02858-240:3(1415-1438)Online publication date: 1-Mar-2024
        • (2023)MSPENet: multi-scale adaptive fusion and position enhancement network for human pose estimationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-022-02460-y39:5(2005-2019)Online publication date: 1-May-2023
        • (2023)Human pose estimation with gated multi-scale feature fusion and spatial mutual informationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-021-02317-w39:1(119-137)Online publication date: 1-Jan-2023

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media