Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3474085.3475177acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Self-supervised Multi-view Multi-Human Association and Tracking

Published: 17 October 2021 Publication History

Abstract

Multi-view Multi-human association and tracking (MvMHAT) aims to track a group of people over time in each view, as well as to identify the same person across different views at the same time. This is a relatively new problem but is very important for multi-person scene video surveillance. Different from previous multiple object tracking (MOT) and multi-target multi-camera tracking (MTMCT) tasks, which only consider the over-time human association, MvMHAT requires to jointly achieve both cross-view and over-time data association. In this paper, we model this problem with a self-supervised learning framework and leverage an end-to-end network to tackle it. Specifically, we propose a spatial-temporal association network with two designed self-supervised learning losses, including a symmetric-similarity loss and a transitive-similarity loss, at each time to associate the multiple humans over time and across views. Besides, to promote the research on MvMHAT, we build a new large-scale benchmark for the training and testing of different algorithms. Extensive experiments on the proposed benchmark verify the effectiveness of our method. We have released the benchmark and code to the public.

Supplementary Material

MP4 File (MM21-fp99.mp4)
Presentation video.

References

[1]
Mustafa Ayazoglu, Binlong Li, Caglayan Dicle, Mario Sznaier, and Octavia I Camps. 2011. Dynamic subspace-based coordinated multicamera tracking. In ICCV.
[2]
Philipp Bergmann, Tim Meinhardt, and Laura Leal-Taixe. 2019 a. Tracking without bells and whistles. In ICCV.
[3]
Philipp Bergmann, Tim Meinhardt, and Laura Leal-Taixé. 2019 b. Tracking Without Bells and Whistles. In ICCV.
[4]
Keni Bernardin and Rainer Stiefelhagen. 2008. Evaluating multiple object tracking performance. EURASIP Journal on Image and Video Processing, Vol. 2008 (2008), 1--10.
[5]
Alex Bewley, Zongyuan Ge, Lionel Ott, Fabio Ramos, and Ben Upcroft. 2016. Simple online and realtime tracking. In ICIP.
[6]
Yinghao Cai and Gerard Medioni. 2014. Exploring context information for inter-camera multiple target tracking. In WACV.
[7]
Xiaotang Chen, Kaiqi Huang, and Tieniu Tan. 2014. Object tracking across non-overlapping views by learning inter-camera transfer models. Pattern Recognition, Vol. 47, 3 (2014), 1126--1137.
[8]
Peng Chu and Haibin Ling. 2019. Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In ICCV.
[9]
Gioele Ciaparrone, Francisco Luque Sánchez, Siham Tabik, Luigi Troiano, Roberto Tagliaferri, and Francisco Herrera. 2020. Deep learning in video multi-object tracking: A survey. Neurocomputing, Vol. 381 (2020), 61--88.
[10]
Afshin Dehghan, Shayan Modiri Assari, and Mubarak Shah. 2015. GMMCP Tracker: Globally Optimal Generalized Maximum Multi Clique Problem for Multiple Object Tracking. In CVPR.
[11]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR.
[12]
Carl Doersch, Abhinav Gupta, and Alexei A Efros. 2015. Unsupervised visual representation learning by context prediction. In ICCV.
[13]
Junting Dong, Wen Jiang, Qixing Huang, Hujun Bao, and Xiaowei Zhou. 2019. Fast and Robust Multi-Person 3D Pose Estimation from Multiple Views. In CVPR.
[14]
Ran Eshel and Yael Moses. 2010. Tracking in a dense crowd using multiple cameras. IJCV, Vol. 88, 1 (2010), 129--143.
[15]
Francois Fleuret, Jerome Berclaz, Richard Lengagne, and Pascal Fua. 2008. Multicamera people tracking with a probabilistic occupancy map. IEEE TPAMI, Vol. 30, 2 (2008), 267.
[16]
Xu Gao and Tingting Jiang. 2018. OSMO: Online Specific Models for Occlusion in Multiple Object Tracking under Surveillance Scene. In ACM MM.
[17]
Andrew Gilbert and Richard Bowden. 2006. Tracking objects across cameras by incrementally learning inter-camera colour calibration and patterns of activity. In ECCV.
[18]
Ruize Han, Wei Feng, Yujun Zhang, Jiewen Zhao, and Song Wang. 2021. Multiple Human Association and Tracking from Egocentric and Complementary Top Views. IEEE TPAMI (2021).
[19]
Ruize Han, Wei Feng, Jiewen Zhao, Zicheng Niu, Yujun Zhang, Liang Wan, and Song Wang. 2020 a. Complementary-View Multiple Human Tracking. In AAAI.
[20]
Ruize Han, Yujun Zhang, Wei Feng, Chenxing Gong, Xiaoyu Zhang, Jiewen Zhao, Liang Wan, and Song Wang. 2019. Multiple Human Association between Top and Horizontal Views by Matching Subjects' Spatial Distributions. In arXiv.
[21]
Ruize Han, Jiewen Zhao, Wei Feng, Yiyang Gan, Liang Wan, and Song Wang. 2020 b. Complementary-View Co-Interest Person Detection. In ACM MM.
[22]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR.
[23]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. In arXiv.
[24]
Kalun Ho, Janis Keuper, and Margret Keuper. 2020. Unsupervised multiple person tracking using autoencoder-based lifted multicuts. In arXiv.
[25]
Martin Hofmann, Daniel Wolf, and Gerhard Rigoll. 2013. Hypergraphs for joint multi-view reconstruction and multi-object tracking. In CVPR.
[26]
Yunzhong Hou, Liang Zheng, Zhongdao Wang, and Shengjin Wang. 2019. Locality Aware Appearance Metric for Multi-Target Multi-Camera Tracking. In arXiv.
[27]
Shyamgopal Karthik, Ameya Prabhu, and Vineet Gandhi. 2020. Simple unsupervised multi-object tracking. In arXiv.
[28]
Saad M Khan and Mubarak Shah. 2006. A multiview approach to tracking people in crowded scenes using a planar homography constraint. In ECCV.
[29]
Harold W. Kuhn. 1955. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, Vol. 2, 1 (1955), 83--97.
[30]
Zihang Lai and Weidi Xie. 2019. Self-supervised learning for video correspondence flow. In BMVC.
[31]
Laura Leal-Taixe, Gerard Pons-Moll, and Bodo Rosenhahn. 2012. Branch-and-price global optimization for multi-view multi-target tracking. In CVPR.
[32]
Laura Lealtaixé, Anton Milan, Ian Reid, Stefan Roth, and Konrad Schindler. 2015. MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking. In arXiv.
[33]
Minxian Li, Xiatian Zhu, and Shaogang Gong. 2018. Unsupervised person re-identification by deep learning tracklet association. In ECCV.
[34]
Minxian Li, Xiatian Zhu, and Shaogang Gong. 2019. Unsupervised tracklet person re-identification. IEEE TPAMI, Vol. 42, 7 (2019), 1770--1782.
[35]
Xiaobai Liu, Yuanlu Xu, Lei Zhu, and Yadong Mu. 2017. A stochastic attribute grammar for robust cross-view human tracking. IEEE TCSVT, Vol. 28, 10 (2017), 2884--2895.
[36]
Jonathon Luiten, Aljosa Osep, Patrick Dendorfer, Philip Torr, Andreas Geiger, Laura Leal-Taixé, and Bastian Leibe. 2020. HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking. IJCV, Vol. 129, 2 (2020), 1--31.
[37]
Andrii Maksai, Xinchao Wang, Francois Fleuret, and Pascal Fua. 2017. Non-markovian globally consistent multi-object tracking. In ICCV.
[38]
Jinlong Peng, Yueyang Gu, Yabiao Wang, Chengjie Wang, Jilin Li, and Feiyue Huang. 2020. Dense Scene Multiple Object Tracking with Box-Plane Matching. In ACM MM.
[39]
Bryan James Prosser, Shaogang Gong, and Tao Xiang. 2008. Multi-camera Matching using Bi-Directional Cumulative Brightness Transfer Functions. In BMVC.
[40]
Ergys Ristani, Francesco Solera, Roger S Zou, Rita Cucchiara, and Carlo Tomasi. 2016. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. In CVPR.
[41]
Ergys Ristani and Carlo Tomasi. 2018. Features for Multi-Target Multi-Camera Tracking and Re-Identification. In CVPR.
[42]
Arnold WM Smeulders, Dung M Chu, Rita Cucchiara, Simone Calderara, Afshin Dehghan, and Mubarak Shah. 2013. Visual tracking: An experimental survey. IEEE TPAMI, Vol. 36, 7 (2013), 1442--1468.
[43]
Yonatan Tariku Tesfaye, Eyasu Zemene, Andrea Prati, Marcello Pelillo, and Mubarak Shah. 2019. Multi-target tracking in multiple non-overlapping cameras using fast-constrained dominant sets. IJCV, Vol. 127, 9 (2019), 1303--1320.
[44]
Gaoang Wang, Yizhou Wang, Haotian Zhang, Renshu Gu, and Jenq-Neng Hwang. 2019. Exploit the Connectivity: Multi-Object Tracking with TrackletNet. In ACM MM.
[45]
Sibo Wang, Ruize Han, Wei Feng, and Song Wang. 2021. Multiple Human Tracking in Non-Specific Coverage with Wearable Cameras. In ICASSP.
[46]
Xiaolong Wang, Allan Jabri, and Alexei A Efros. 2019. Learning correspondence from the cycle-consistency of time. In CVPR.
[47]
Zhongdao Wang, Jingwei Zhang, Liang Zheng, Yixuan Liu, Yifan Sun, Yali Li, and Shengjin Wang. 2020. CycAs: Self-supervised Cycle Association for Learning Re-identifiable Descriptions. In ECCV.
[48]
Jialian Wu, Jiale Cao, Liangchen Song, Yu Wang, Ming Yang, and Junsong Yuan. 2021. Track to Detect and Segment: An Online Multi-Object Tracker. In CVPR.
[49]
Jinlin Wu, Yang Yang, Hao Liu, Shengcai Liao, Zhen Lei, and Stan Z Li. 2019 b. Unsupervised graph association for person re-identification. In ICCV.
[50]
Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. 2019 a. Detectron2. https://github.com/facebookresearch/detectron2.
[51]
Yu Xiang, Alexandre Alahi, and Silvio Savarese. 2015. Learning to track: Online multi-object tracking by decision making. In ICCV.
[52]
Jiarui Xu, Yue Cao, Zheng Zhang, and Han Hu. 2019. Spatial-temporal relation networks for multi-object tracking. In ICCV.
[53]
Yuanlu Xu, Xiaobai Liu, Yang Liu, and Songchun Zhu. 2016. Multi-View People Tracking via Hierarchical Trajectory Composition. In CVPR.
[54]
Yuanlu Xu, Xiaobai Liu, Lei Qin, and Song-Chun Zhu. 2017. Cross-view people tracking by scene-centered spatio-temporal parsing. In AAAI.
[55]
Yihong Xu, Aljosa Osep, Yutong Ban, Radu Horaud, Laura Leal-Taixé, and Xavier Alameda-Pineda. 2020. How to train your deep multi-object tracker. In CVPR.
[56]
Bo Yang and Ram Nevatia. 2012a. Multi-target tracking by online learning of non-linear motion patterns and robust appearance models. In CVPR.
[57]
Bo Yang and Ram Nevatia. 2012b. An online learned CRF model for multi-target tracking. In CVPR.
[58]
Amir Roshan Zamir, Afshin Dehghan, and Mubarak Shah. 2012. GMCP-Tracker: Global Multi-Object Tracking Using Generalized Minimum Clique Graphs. In ECCV.
[59]
Richard Zhang, Phillip Isola, and Alexei A Efros. 2016. Colorful image colorization. In ECCV.
[60]
Jiewen Zhao, Ruize Han, Yiyang Gan, Liang Wan, Wei Feng, and Song Wang. 2020. Human Identification and Interaction Detection in Cross-View Multi-Person Videos with Wearable Cameras. In ACM MM.
[61]
Kang Zheng, Xiaochuan Fan, Yuewei Lin, Hao Guo, and Song Wang. 2017. Learning View-Invariant Features for Person Identification in Temporally Synchronized Videos Taken by Wearable Cameras. In ICCV.
[62]
Xingyi Zhou, Vladlen Koltun, and Philipp Krahenbühl. 2020. Tracking objects as points. In ECCV.

Cited By

View all
  • (2025)Multi-View, Multi-Target Tracking in Low-Altitude Scenes with UAV InvolvementDrones10.3390/drones90201389:2(138)Online publication date: 13-Feb-2025
  • (2025)Unveiling the Power of Self-Supervision for Multi-View Multi-Human Association and TrackingIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.346396647:1(351-368)Online publication date: Jan-2025
  • (2024)STCA: High-Altitude Tracking via Single-Drone Tracking and Cross-Drone AssociationRemote Sensing10.3390/rs1620386116:20(3861)Online publication date: 17-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. multi-view cameras
  2. multiple human association and tracking
  3. self-supervised learning

Qualifiers

  • Research-article

Funding Sources

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)117
  • Downloads (Last 6 weeks)6
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Multi-View, Multi-Target Tracking in Low-Altitude Scenes with UAV InvolvementDrones10.3390/drones90201389:2(138)Online publication date: 13-Feb-2025
  • (2025)Unveiling the Power of Self-Supervision for Multi-View Multi-Human Association and TrackingIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.346396647:1(351-368)Online publication date: Jan-2025
  • (2024)STCA: High-Altitude Tracking via Single-Drone Tracking and Cross-Drone AssociationRemote Sensing10.3390/rs1620386116:20(3861)Online publication date: 17-Oct-2024
  • (2024)Enhancing Multi-view Pedestrian Detection Through Generalized 3D Feature Pulling2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00123(1185-1194)Online publication date: 3-Jan-2024
  • (2024)Quantifying the Accuracy of Collaborative IoT and Robot Sensing in Indoor Settings of Rigid Objects2024 21st International Conference on Ubiquitous Robots (UR)10.1109/UR61395.2024.10597533(550-557)Online publication date: 24-Jun-2024
  • (2024)Blockchain-Empowered Distributed Multicamera Multitarget Tracking in Edge ComputingIEEE Transactions on Industrial Informatics10.1109/TII.2023.326189020:1(369-379)Online publication date: Jan-2024
  • (2024)Simultaneously Recovering Multi-Person Meshes and Multi-View Cameras With Human SemanticsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.332837134:6(4229-4242)Online publication date: Jun-2024
  • (2024)Learning to Track With Dynamic Message Passing Neural Network for Multi-Camera Multi-Object TrackingIEEE Access10.1109/ACCESS.2024.338313812(63317-63333)Online publication date: 2024
  • (2024)Multi-view domain-adaptive representation learning for EEG-based emotion recognitionInformation Fusion10.1016/j.inffus.2023.102156104:COnline publication date: 12-Apr-2024
  • (2024)An end-to-end tracking framework via multi-view and temporal feature aggregationComputer Vision and Image Understanding10.1016/j.cviu.2024.104203249(104203)Online publication date: Dec-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media