Efficient Spatial-Attention Module for Human Pose Estimation

Tien-Dat Tran⁷,
Xuan-Thuy Vo⁷,
Duy-Linh Nguyen⁷ &
…
Kang-Hyun Jo⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1405))

Included in the following conference series:

International Workshop on Frontiers of Computer Vision

641 Accesses

Abstract

Not only for human pose estimation but also other machine vision tasks (e.g. object recognition, semantic segmentation, image classification), convolution neural networks (CNNs) have obtained the highest performance today. Besides, their performance over other traditional networks is shown by the Attention Module (AM). Hence, this paper focuses on a valuable feed-forward AM for CNNs. First, feed the feature map into the attention module after a stage in the backbone network, divided into two different dimensions, channel and spatial. After that, by multiplication, the AM combines these two feature maps and gives them to the next stage in the backbone. In long-range dependencies (channel) and spatial data, the network can capture more information, which can gain better precision efficiency. Our experimental findings would also demonstrate the disparity between the use of the attention module and current methods. As a result, with the change to make the spatial better, the expected joint heatmap retains the accuracy while decreasing the number of parameters. In comparison, the proposed architecture benefits more than the baseline by 1.3 points in AP. In addition, the proposed network was trained on the benchmarks of COCO 2017, which is now an open dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Simple Fine-Tuning Attention Modules for Human Pose Estimation

Combining self-attention and depth-wise convolution for human pose estimation

Article 13 June 2024

GITPose: going shallow and deeper using vision transformers for human pose estimation

Article Open access 20 March 2024

References

Chen, C., Ramanan, D.: 3D human pose estimation = 2d pose estimation + matching. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5759–5767, July 2017. https://doi.org/10.1109/CVPR.2017.610
Chou, C.J., Chien, J.T., Chen, H.T.: Self adversarial training for human pose estimation (2017)
Google Scholar
Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning (2016)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
Google Scholar
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks (2017)
Google Scholar
Hussain, Z., Sheng, M., Zhang, W.E.: Different approaches for human activity recognition: a survey (2019)
Google Scholar
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model (2016)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks (2015)
Google Scholar
Kim, E., Helal, S., Cook, D.: Human activity recognition and pattern discovery. IEEE Pervasive Comput. 9(1), 48–53 (2010). https://doi.org/10.1109/MPRV.2010.7
Article Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2014)
Google Scholar
Li, W., Zhao, R., Wang, X.: Human reidentification with transferred metric learning. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 31–44. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_3
Chapter Google Scholar
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks (2019)
Google Scholar
Lin, T., et al.: Microsoft COCO: common objects in context. CoRR abs/1405.0312 (2014). http://arxiv.org/abs/1405.0312
Mastyło, M.: Bilinear interpolation theorems and applications. J. Funct. Anal. 265, 185–207 (2013). https://doi.org/10.1016/j.jfa.2013.05.001
Moon, G., Chang, J.Y., Lee, K.M.: PoseFix: model-agnostic general human pose refinement network (2018)
Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. CoRR abs/1603.06937 (2016). http://arxiv.org/abs/1603.06937
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015)
Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-ResNet and the impact of residual connections on learning (2016)
Google Scholar
Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. CoRR abs/1312.4659 (2013). http://arxiv.org/abs/1312.4659
Tran, T.-D., Vo, X.-T., Russo, M.-A., Jo, K.-H.: Simple fine-tuning attention modules for human pose estimation. In: Hernes, M., Wojtkiewicz, K., Szczerbicki, E. (eds.) ICCCI 2020. CCIS, vol. 1287, pp. 175–185. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63119-2_15
Chapter Google Scholar
Wang, X., Girshick, R.B., Gupta, A., He, K.: Non-local neural networks. CoRR abs/1711.07971 (2017). http://arxiv.org/abs/1711.07971
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines (2016)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. CoRR abs/1804.06208 (2018). http://arxiv.org/abs/1804.06208
Yang, X., Wang, M., Tao, D.: Person re-identification with metric learning using privileged information. CoRR abs/1904.05005 (2019). http://arxiv.org/abs/1904.05005

Download references

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government. (MSIT) (No. 2020R1A2C2008972).

Author information

Authors and Affiliations

School of Electrical Engineering, University of Ulsan, Ulsan, 44610, South Korea
Tien-Dat Tran, Xuan-Thuy Vo, Duy-Linh Nguyen & Kang-Hyun Jo

Authors

Tien-Dat Tran
View author publications
You can also search for this author in PubMed Google Scholar
Xuan-Thuy Vo
View author publications
You can also search for this author in PubMed Google Scholar
Duy-Linh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Kang-Hyun Jo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kang-Hyun Jo .

Editor information

Editors and Affiliations

Chonnam National University, Gwangju, Korea (Republic of)
Hieyong Jeong
Aoyama Gakuin University, Kanagawa, Japan
Kazuhiko Sumi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tran, TD., Vo, XT., Nguyen, DL., Jo, KH. (2021). Efficient Spatial-Attention Module for Human Pose Estimation. In: Jeong, H., Sumi, K. (eds) Frontiers of Computer Vision. IW-FCV 2021. Communications in Computer and Information Science, vol 1405. Springer, Cham. https://doi.org/10.1007/978-3-030-81638-4_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-81638-4_20
Published: 15 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81637-7
Online ISBN: 978-3-030-81638-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Spatial-Attention Module for Human Pose Estimation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Simple Fine-Tuning Attention Modules for Human Pose Estimation

Combining self-attention and depth-wise convolution for human pose estimation

GITPose: going shallow and deeper using vision transformers for human pose estimation

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Efficient Spatial-Attention Module for Human Pose Estimation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Simple Fine-Tuning Attention Modules for Human Pose Estimation

Combining self-attention and depth-wise convolution for human pose estimation

GITPose: going shallow and deeper using vision transformers for human pose estimation

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation