Context-Guided Adaptive Network for Efficient Human Pose Estimation

Authors

  • Lei Zhao Qiushi Academy for Advanced Studies, Zhejiang University, Hangzhou, China College of Computer Science and Techology, Zhejiang University, Hangzhou, China
  • Jun Wen Qiushi Academy for Advanced Studies, Zhejiang University, Hangzhou, China College of Computer Science and Techology, Zhejiang University, Hangzhou, China
  • Pengfei Wang Qiushi Academy for Advanced Studies, Zhejiang University, Hangzhou, China College of Computer Science and Techology, Zhejiang University, Hangzhou, China
  • Nenggan Zheng Qiushi Academy for Advanced Studies, Zhejiang University, Hangzhou, China College of Computer Science and Techology, Zhejiang University, Hangzhou, China Collaborative Innovation Center for Artificial Intelligence by MOE and Zhejiang Provincial Government (ZJU) Zhejiang Lab, Hangzhou, China

DOI:

https://doi.org/10.1609/aaai.v35i4.16463

Keywords:

Biometrics, Face, Gesture & Pose

Abstract

Although recent work has achieved great progress in human pose estimation (HPE), most methods show limitations in either inference speed or accuracy. In this paper, we propose a fast and accurate end-to-end HPE method, which is specifically designed to overcome the commonly encountered jitter box, defective box and ambiguous box problems of box-based methods, e.g. Mask R-CNN. Concretely, 1) we propose the ROIGuider to aggregate box instance features from all feature levels under the guidance of global context instance information. Further, 2) the proposed Center Line Branch is equipped with a Dichotomy Extended Area algorithm to adaptively expand each instance box area, and Ambiguity Alleviation strategy to eliminate duplicated keypoints. Finally, 3) to achieve efficient multi-scale feature fusion and real-time inference, we design a novel Trapezoidal Network (TNet) backbone. Experimenting on the COCO dataset, our method achieves 68.1 AP at 25.4 fps, and outperforms Mask-RCNN by 8.9 AP at a similar speed. The competitive performance on the HPE and person instance segmentation tasks over the state-of-the-art models show the promise of the proposed method. The source code will be made available at https://github.com/zlcnup/CGANet.

Downloads

Published

2021-05-18

How to Cite

Zhao, L., Wen, J., Wang, P., & Zheng, N. (2021). Context-Guided Adaptive Network for Efficient Human Pose Estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4), 3492-3499. https://doi.org/10.1609/aaai.v35i4.16463

Issue

Section

AAAI Technical Track on Computer Vision III