Nothing Special   »   [go: up one dir, main page]

Skip to main content

Efficient Spatial-Attention Module for Human Pose Estimation

  • Conference paper
  • First Online:
Frontiers of Computer Vision (IW-FCV 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1405))

Included in the following conference series:

  • 641 Accesses

Abstract

Not only for human pose estimation but also other machine vision tasks (e.g. object recognition, semantic segmentation, image classification), convolution neural networks (CNNs) have obtained the highest performance today. Besides, their performance over other traditional networks is shown by the Attention Module (AM). Hence, this paper focuses on a valuable feed-forward AM for CNNs. First, feed the feature map into the attention module after a stage in the backbone network, divided into two different dimensions, channel and spatial. After that, by multiplication, the AM combines these two feature maps and gives them to the next stage in the backbone. In long-range dependencies (channel) and spatial data, the network can capture more information, which can gain better precision efficiency. Our experimental findings would also demonstrate the disparity between the use of the attention module and current methods. As a result, with the change to make the spatial better, the expected joint heatmap retains the accuracy while decreasing the number of parameters. In comparison, the proposed architecture benefits more than the baseline by 1.3 points in AP. In addition, the proposed network was trained on the benchmarks of COCO 2017, which is now an open dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, C., Ramanan, D.: 3D human pose estimation = 2d pose estimation + matching. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5759–5767, July 2017. https://doi.org/10.1109/CVPR.2017.610

  2. Chou, C.J., Chien, J.T., Chen, H.T.: Self adversarial training for human pose estimation (2017)

    Google Scholar 

  3. Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning (2016)

    Google Scholar 

  4. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN (2017)

    Google Scholar 

  5. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)

    Google Scholar 

  6. Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks (2017)

    Google Scholar 

  7. Hussain, Z., Sheng, M., Zhang, W.E.: Different approaches for human activity recognition: a survey (2019)

    Google Scholar 

  8. Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model (2016)

    Google Scholar 

  9. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015)

    Google Scholar 

  10. Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks (2015)

    Google Scholar 

  11. Kim, E., Helal, S., Cook, D.: Human activity recognition and pattern discovery. IEEE Pervasive Comput. 9(1), 48–53 (2010). https://doi.org/10.1109/MPRV.2010.7

    Article  Google Scholar 

  12. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2014)

    Google Scholar 

  13. Li, W., Zhao, R., Wang, X.: Human reidentification with transferred metric learning. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 31–44. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_3

    Chapter  Google Scholar 

  14. Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks (2019)

    Google Scholar 

  15. Lin, T., et al.: Microsoft COCO: common objects in context. CoRR abs/1405.0312 (2014). http://arxiv.org/abs/1405.0312

  16. Mastyło, M.: Bilinear interpolation theorems and applications. J. Funct. Anal. 265, 185–207 (2013). https://doi.org/10.1016/j.jfa.2013.05.001

  17. Moon, G., Chang, J.Y., Lee, K.M.: PoseFix: model-agnostic general human pose refinement network (2018)

    Google Scholar 

  18. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. CoRR abs/1603.06937 (2016). http://arxiv.org/abs/1603.06937

  19. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015)

    Google Scholar 

  20. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-ResNet and the impact of residual connections on learning (2016)

    Google Scholar 

  21. Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. CoRR abs/1312.4659 (2013). http://arxiv.org/abs/1312.4659

  22. Tran, T.-D., Vo, X.-T., Russo, M.-A., Jo, K.-H.: Simple fine-tuning attention modules for human pose estimation. In: Hernes, M., Wojtkiewicz, K., Szczerbicki, E. (eds.) ICCCI 2020. CCIS, vol. 1287, pp. 175–185. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63119-2_15

    Chapter  Google Scholar 

  23. Wang, X., Girshick, R.B., Gupta, A., He, K.: Non-local neural networks. CoRR abs/1711.07971 (2017). http://arxiv.org/abs/1711.07971

  24. Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines (2016)

    Google Scholar 

  25. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1

    Chapter  Google Scholar 

  26. Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. CoRR abs/1804.06208 (2018). http://arxiv.org/abs/1804.06208

  27. Yang, X., Wang, M., Tao, D.: Person re-identification with metric learning using privileged information. CoRR abs/1904.05005 (2019). http://arxiv.org/abs/1904.05005

Download references

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government. (MSIT) (No. 2020R1A2C2008972).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kang-Hyun Jo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tran, TD., Vo, XT., Nguyen, DL., Jo, KH. (2021). Efficient Spatial-Attention Module for Human Pose Estimation. In: Jeong, H., Sumi, K. (eds) Frontiers of Computer Vision. IW-FCV 2021. Communications in Computer and Information Science, vol 1405. Springer, Cham. https://doi.org/10.1007/978-3-030-81638-4_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-81638-4_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-81637-7

  • Online ISBN: 978-3-030-81638-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics