Abstract
Not only for human pose estimation but also other machine vision tasks (e.g. object recognition, semantic segmentation, image classification), convolution neural networks (CNNs) have obtained the highest performance today. Besides, their performance over other traditional networks is shown by the Attention Module (AM). Hence, this paper focuses on a valuable feed-forward AM for CNNs. First, feed the feature map into the attention module after a stage in the backbone network, divided into two different dimensions, channel and spatial. After that, by multiplication, the AM combines these two feature maps and gives them to the next stage in the backbone. In long-range dependencies (channel) and spatial data, the network can capture more information, which can gain better precision efficiency. Our experimental findings would also demonstrate the disparity between the use of the attention module and current methods. As a result, with the change to make the spatial better, the expected joint heatmap retains the accuracy while decreasing the number of parameters. In comparison, the proposed architecture benefits more than the baseline by 1.3 points in AP. In addition, the proposed network was trained on the benchmarks of COCO 2017, which is now an open dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, C., Ramanan, D.: 3D human pose estimation = 2d pose estimation + matching. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5759–5767, July 2017. https://doi.org/10.1109/CVPR.2017.610
Chou, C.J., Chien, J.T., Chen, H.T.: Self adversarial training for human pose estimation (2017)
Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning (2016)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks (2017)
Hussain, Z., Sheng, M., Zhang, W.E.: Different approaches for human activity recognition: a survey (2019)
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015)
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks (2015)
Kim, E., Helal, S., Cook, D.: Human activity recognition and pattern discovery. IEEE Pervasive Comput. 9(1), 48–53 (2010). https://doi.org/10.1109/MPRV.2010.7
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2014)
Li, W., Zhao, R., Wang, X.: Human reidentification with transferred metric learning. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 31–44. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_3
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks (2019)
Lin, T., et al.: Microsoft COCO: common objects in context. CoRR abs/1405.0312 (2014). http://arxiv.org/abs/1405.0312
Mastyło, M.: Bilinear interpolation theorems and applications. J. Funct. Anal. 265, 185–207 (2013). https://doi.org/10.1016/j.jfa.2013.05.001
Moon, G., Chang, J.Y., Lee, K.M.: PoseFix: model-agnostic general human pose refinement network (2018)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. CoRR abs/1603.06937 (2016). http://arxiv.org/abs/1603.06937
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-ResNet and the impact of residual connections on learning (2016)
Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. CoRR abs/1312.4659 (2013). http://arxiv.org/abs/1312.4659
Tran, T.-D., Vo, X.-T., Russo, M.-A., Jo, K.-H.: Simple fine-tuning attention modules for human pose estimation. In: Hernes, M., Wojtkiewicz, K., Szczerbicki, E. (eds.) ICCCI 2020. CCIS, vol. 1287, pp. 175–185. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63119-2_15
Wang, X., Girshick, R.B., Gupta, A., He, K.: Non-local neural networks. CoRR abs/1711.07971 (2017). http://arxiv.org/abs/1711.07971
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines (2016)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. CoRR abs/1804.06208 (2018). http://arxiv.org/abs/1804.06208
Yang, X., Wang, M., Tao, D.: Person re-identification with metric learning using privileged information. CoRR abs/1904.05005 (2019). http://arxiv.org/abs/1904.05005
Acknowledgement
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government. (MSIT) (No. 2020R1A2C2008972).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Tran, TD., Vo, XT., Nguyen, DL., Jo, KH. (2021). Efficient Spatial-Attention Module for Human Pose Estimation. In: Jeong, H., Sumi, K. (eds) Frontiers of Computer Vision. IW-FCV 2021. Communications in Computer and Information Science, vol 1405. Springer, Cham. https://doi.org/10.1007/978-3-030-81638-4_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-81638-4_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81637-7
Online ISBN: 978-3-030-81638-4
eBook Packages: Computer ScienceComputer Science (R0)