Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3343031.3350984acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

POINet: Pose-Guided Ovonic Insight Network for Multi-Person Pose Tracking

Published: 15 October 2019 Publication History

Abstract

Multi-person pose tracking aims to jointly estimate and track multi-person keypoints in the unconstrained videos. The most popular solution to this task follows the tracking-by-detection strategy that relies on human detection and data association. While human detection has been boosted by deep learning, existing works mainly exploit several separated stages with hand-crafted metrics to realize data association, leading to great uncertainty and feeble adaption in complex scenes. To handle these problems, we propose an end-to-end pose-guided ovonic insight network (POINet) for the data association in multi-person pose tracking, which jointly learns feature extraction, similarity estimation, and identity assignment. Specifically, we design a pose-guided representation network to integrate pose information into hierarchical convolutional features, generating a pose-aligned person representation for person, which helps handle partial occlusions. Moreover, we propose an ovonic insight network to adaptively encode the cross-frame identity transformation, which can cope with the tough tracking cases of person leaving and entering the scene. In general, the proposed POINet provides a new insight to realize multi-person pose tracking in an end-to-end fashion. Extensive experiments conducted on the PoseTrack benchmark demonstrate that our POINet outperforms the state-of-the-art methods.

References

[1]
Umar Iqbal, Anton Milan, and Juergen Gall. Posetrack: Joint multi-person pose estimation and tracking. In Proc. Comput. Vis. Pattern Recognit., 2017.
[2]
Mykhaylo Andriluka, Umar Iqbal, Anton Milan, Eldar Insafutdinov, Leonid Pishchulin, Juergen Gall, and Bernt Schiele. Posetrack: A benchmark for human pose estimation and tracking. In Proc. Comput. Vis. Pattern Recognit., 2018.
[3]
Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, and Du Tran. Detect-and-track: Efficient pose estimation in videos. In Proc. Comput. Vis. Pattern Recognit., 2018.
[4]
Bin Xiao, Haiping Wu, and Yichen Wei. Simple baselines for human pose estimation and tracking. In Proc. Eur. Conf. Comput. Vis., 2018.
[5]
Andreas Doering, Umar Iqbal, Juergen Gall, and DE Bonn. Jointflow: Temporal flow fields for multi person pose tracking. In Proc. Brit. Mach. Vis. Conf., 2018.
[6]
Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. Pose flow: Efficient online pose tracking. In Proc. Brit. Mach. Vis. Conf., 2018.
[7]
Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres, and Bernt Schiele. Arttrack: Articulated multi-person tracking in the wild. In Proc. Comput. Vis. Pattern Recognit., 2017.
[8]
Zheng Wang, Ruimin Hu, Chen Chen, Yi Yu, Junjun Jiang, Chao Liang, and Shin'ichi Satoh. Person reidentification via discrepancy matrix and matrix metric. IEEE Trans. Cybern., 48(10):3006--3020, 2017.
[9]
Mang Ye, Jiawei Li, Andy J Ma, Liang Zheng, and Pong C Yuen. Dynamic graph co-matching for unsupervised video-based person re-identification. IEEE Trans. Image Process., 2019.
[10]
Weijian Ruan, Jun Chen, Jinqiao Wang, Bo Luo, Wenjun Huang, and Ruimin Hu. Boosted local classifiers for visual tracking. In Proc. Int. Conf. Multimedia Expo., pages 1--6. IEEE, 2016.
[11]
Weijian Ruan, Jun Chen, Chao Liang, Yi Wu, and Ruimin Hu. Object tracking via online trajectory optimization with multi-feature fusion. In Proc. Int. Conf. Multimedia Expo., pages 1231--1236. IEEE, 2017.
[12]
Xinchen Liu, Wu Liu, Meng Zhang, Jingwen Chen, Lianli Gao, Chenggang Yan, and Tao Mei. Social relation recognition from videos via multi-scale spatial-temporal reasoning. In Proc. Comput. Vis. Pattern Recognit., pages 3566--3574, 2019.
[13]
Huadong Ma and Wu Liu. A progressive search paradigm for the internet of things. IEEE MultiMedia, 25(1):76--86, 2017.
[14]
Chuang Gan, Naiyan Wang, Yi Yang, Dit-Yan Yeung, and Alex G Hauptmann. Devnet: A deep event network for multimedia event detection and evidence recounting. In Proc. Comput. Vis. Pattern Recognit., pages 2568--2577, 2015.
[15]
Wu Liu, Cheng Zhang, Huadong Ma, and Shuangqun Li. Learning efficient spatial-temporal gait features with deep learning for human identification. Neuroinformatics, 16(3--4):457--471, 2018.
[16]
Wenbing Huang, Lijie Fan, Mehrtash Harandi, Lin Ma, Huaping Liu, Wei Liu, and Chuang Gan. Toward efficient action recognition: Principal backpropagation for training two-stream networks. " IEEE Trans. Image Process., 28(4):1773--1782, 2018.
[17]
Kun Liu, Wu Liu, Chuang Gan, Mingkui Tan, and Huadong Ma. T-c3d: Temporal convolutional 3d network for real-time action recognition. In Thirty-second AAAI conference on artificial intelligence, 2018.
[18]
Matteo Fabbri, Fabio Lanzi, Simone Calderara, Andrea Palazzi, Roberto Vezzani, and Rita Cucchiara. Learning to detect and track visible and occluded body joints in a virtual world. In Proc. Eur. Conf. Comput. Vis., 2018.
[19]
George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, and Kevin Murphy. Towards accurate multi-person pose estimation in the wild. In Proc. Comput. Vis. Pattern Recognit., 2017.
[20]
Alejandro Newell, Kaiyu Yang, and Jia Deng. Stacked hourglass networks for human pose estimation. In Proc. Eur. Conf. Comput. Vis., 2016.
[21]
Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. Cascaded pyramid network for multi-person pose estimation. arXiv preprint arXiv:1711.07319, 2017.
[22]
Zheng Zhu, Wei Wu, Wei Zou, and Junjie Yan. End-to-end flow correlation tracking with spatial-temporal attention. In Proc. Comput. Vis. Pattern Recognit., pages 548--557, 2018.
[23]
Luca Bertinetto, Jack Valmadre, Joao F Henriques, Andrea Vedaldi, and Philip HS Torr. Fully-convolutional siamese networks for object tracking. In Proc. Eur. Conf. Comput. Vis., pages 850--865. Springer, 2016.
[24]
Shijie Sun, Naveed Akhtar, HuanSheng Song, Ajmal Mian, and Mubarak Shah. Deep affinity network for multiple object tracking. arXiv preprint arXiv:1810.11780, 2018.
[25]
Yang Gao, Oscar Beijbom, Ning Zhang, and Trevor Darrell. Compact bilinear pooling. In Proc. Comput. Vis. Pattern Recognit., pages 317--326, 2016.
[26]
Evgeniya Ustinova, Yaroslav Ganin, and Victor Lempitsky. Multi-region bilinear convolutional neural networks for person re-identification. In Proc. Int. Conf. Advanced Video Signal based Surveillance, pages 1--6. IEEE, 2017.
[27]
Eldar Insafutdinov, Leonid Pishchulin, Bjoern Andres, Mykhaylo Andriluka, and Bernt Schiele. Deepercut: A deeper, stronger, and faster multi-person pose estimation model. In Proc. Eur. Conf. Comput. Vis., 2016.
[28]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. Realtime multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv:1611.08050, 2016.
[29]
Siyu Tang, Mykhaylo Andriluka, Bjoern Andres, and Bernt Schiele. Multiple people tracking by lifted multicut and person reidentification. In Proc. Int. Conf. Comput. Vis., 2017.
[30]
Qi Chu, Wanli Ouyang, Hongsheng Li, Xiaogang Wang, Bin Liu, and Nenghai Yu. Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism. In Proc. Int. Conf. Comput. Vis., 2017.
[31]
Ji Zhu, Hua Yang, Nian Liu, Minyoung Kim, Wenjun Zhang, and Ming-Hsuan Yang. Online multi-object tracking with dual matching attention networks. In Proc. Eur. Conf. Comput. Vis., 2018.
[32]
Yu Xiang, Alexandre Alahi, and Silvio Savarese. Learning to track: Online multi-object tracking by decision making. In Proc. Int. Conf. Comput. Vis., 2015.
[33]
Wenhan Luo, Junliang Xing, Anton Milan, Xiaoqin Zhang, Wei Liu, Xiaowei Zhao, and Tae-Kyun Kim. Multiple object tracking: A literature review. arXiv preprint arXiv:1409.7618, 2014.
[34]
Xu Yan, Xuqing Wu, Ioannis A Kakadiaris, and Shishir K Shah. To track or to detect? an ensemble framework for optimal selection. In Proc. Eur. Conf. Comput. Vis., 2012.
[35]
Amir Sadeghian, Alexandre Alahi, and Silvio Savarese. Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In Proc. Int. Conf. Comput. Vis., 2017.
[36]
Jiahui Wang, Yulan Guo, Xing Tang, Qingyong Hu, and Wei An. Semi-online multiple object tracking using graphical tracklet association. IEEE Signal Processing Letters, 2018.
[37]
Ruixing Yu, Irene Cheng, Bing Zhu, Sweta Bedmutha, and Anup Basu. Adaptive resolution optimization and tracklet reliability assessment for efficient multi-object tracking. IEEE Trans. Circuits Syst. Video Technol., 28(7):1623--1633, 2018.
[38]
Weitao Feng, Zhihao Hu, Wei Wu, Junjie Yan, and Wanli Ouyang. Multi-object tracking with multiple cues and switcher-aware classification. arXiv preprint arXiv:1901.06129, 2019.
[39]
Gaoang Wang, Yizhou Wang, Haotian Zhang, Renshu Gu, and Jenq-Neng Hwang. Exploit the connectivity: Multi-object tracking with trackletnet. arXiv preprint arXiv:1811.07258, 2018.
[40]
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[41]
Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
[42]
Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In Proc. Int. Conf. Mach. Leaning., pages 807--814, 2010.
[43]
Weijian Ruan, Jun Chen, Yi Wu, Jinqiao Wang, Chao Liang, Junjun Jiang, and Ruimin Hu. Multi-correlation filters with triangle-structure constraints for object tracking. IEEE Trans. Multimdedia, 2018.
[44]
Keni Bernardin and Rainer Stiefelhagen. Evaluating multiple object tracking performance: the clear mot metrics. Journal on Image and Video Processing, 2008.
[45]
Posetrack: Posetrack leaderboard. https://posetrack.net/leaderboard.php.
[46]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In Advances in Neural Information Processing Systems 2017 Workshops, 2017.
[47]
Ross Girshick. Fast r-cnn. In Proc. Int. Conf. Comput. Vis., 2015.
[48]
Jerome Revaud, Philippe Weinzaepfel, Zaid Harchaoui, and Cordelia Schmid. Deepmatching: Hierarchical deformable dense matching. Int. J. Comput. Vis., 120(3):300--323, 2016.

Cited By

View all
  • (2024)Improving Multi-Person Pose Tracking With a Confidence NetworkIEEE Transactions on Multimedia10.1109/TMM.2023.333053226(5223-5233)Online publication date: 2024
  • (2023)Human Pose Estimation Using Deep Learning: A Systematic Literature ReviewMachine Learning and Knowledge Extraction10.3390/make50400815:4(1612-1659)Online publication date: 13-Nov-2023
  • (2023)Color-Unrelated Head-Shoulder Networks for Fine-Grained Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359973019:6(1-21)Online publication date: 25-May-2023
  • Show More Cited By

Index Terms

  1. POINet: Pose-Guided Ovonic Insight Network for Multi-Person Pose Tracking

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '19: Proceedings of the 27th ACM International Conference on Multimedia
    October 2019
    2794 pages
    ISBN:9781450368896
    DOI:10.1145/3343031
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 October 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. end-to-end network
    2. ovonic insight
    3. pose tracking
    4. pose-guided

    Qualifiers

    • Research-article

    Conference

    MM '19
    Sponsor:

    Acceptance Rates

    MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Improving Multi-Person Pose Tracking With a Confidence NetworkIEEE Transactions on Multimedia10.1109/TMM.2023.333053226(5223-5233)Online publication date: 2024
    • (2023)Human Pose Estimation Using Deep Learning: A Systematic Literature ReviewMachine Learning and Knowledge Extraction10.3390/make50400815:4(1612-1659)Online publication date: 13-Nov-2023
    • (2023)Color-Unrelated Head-Shoulder Networks for Fine-Grained Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359973019:6(1-21)Online publication date: 25-May-2023
    • (2023)TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D Environments2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00855(8856-8866)Online publication date: Jun-2023
    • (2022)Transformer-Based Multimodal Infusion Dialogue SystemsElectronics10.3390/electronics1120340911:20(3409)Online publication date: 20-Oct-2022
    • (2022)Recent Advances of Monocular 2D and 3D Human Pose Estimation: A Deep Learning PerspectiveACM Computing Surveys10.1145/3524497Online publication date: 31-Mar-2022
    • (2022)RMTrack: 6D Object Pose Tracking by Continuous Image Render MatchProceedings of the 2022 5th International Conference on Image and Graphics Processing10.1145/3512388.3512432(303-309)Online publication date: 7-Jan-2022
    • (2022)TICNet: A Target-Insight Correlation Network for Object TrackingIEEE Transactions on Cybernetics10.1109/TCYB.2021.307067752:11(12150-12162)Online publication date: Nov-2022
    • (2022)Temporal Weighting Appearance-Aligned Network for Nighttime Video RetrievalIEEE Signal Processing Letters10.1109/LSP.2022.320762029(2008-2012)Online publication date: 2022
    • (2022)Person Re-Identification on a Mobile Robot Using a Depth Camera2022 IEEE 31st International Symposium on Industrial Electronics (ISIE)10.1109/ISIE51582.2022.9831515(115-122)Online publication date: 1-Jun-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media