Deep Human-Interaction and Association by Graph-Based Learning for Multiple Object Tracking in the Wild

Cong Ma¹,
Fan Yang¹,
Yuan Li ORCID: orcid.org/0000-0002-8479-3049¹,
Huizhu Jia¹,
Xiaodong Xie¹ &
…
Wen Gao¹

1085 Accesses
9 Citations
3 Altmetric
Explore all metrics

Abstract

Multiple Object Tracking (MOT) in the wild has a wide range of applications in surveillance retrieval and autonomous driving. Tracking-by-Detection has become a mainstream solution in MOT, which is composed of feature extraction and data association. Most of the existing methods focus on extracting targets’ individual features and optimizing the association by hand-crafted algorithms. In this paper, we specially consider the interrelation cue between targets and we propose Human-Interaction Model (HIM) to extract interaction features between the tracked target and its surrounding. The interaction model has more discriminative features to distinguish objects, especially in crowded (dense) scene. Meanwhile we propose an efficient end-to-end model, Deep Association Network (DAN), to optimize the association with graph-based learning mechanism. Both HIM and DAN are constructed by three kinds of deep networks, which include Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Graph Neural Network (GNN). The CNNs extract appearance features from bounding box images, the RNNs encoder motion features from historical positions of trajectory. And then the GNNs aim to extract interaction features and optimize graph structure to associate the objects in different frames. In addition, we present a novel end-to-end training strategy for Deep Association Network and Human-Interaction Model. Our experimental results demonstrate performance of our method reaches the state-of-the-art on MOT15, MOT16 and DukeMTMCT datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Association with Graph Network for Multi-Object Tracking

Multiple object tracking based on appearance and motion graph convolutional neural networks with an explainer

Article 28 April 2024

Graph-Based Data Association in Multiple Object Tracking: A Survey

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Leal-Taix, L., Milan, A., Reid, I., Roth, S., & Schindler, K. (2015). Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv preprint arXiv:1504.01942.
Milan, A., Leal-Taix, L., Reid, I., Roth, S., & Schindler, K. (2016). Mot16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831.
Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: ECCV workshop on Benchmarking Multi-Target Tracking. (2016)
Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I., Roth, S., Schindler, K., & Leal-Taixe, L. (2019). Cvpr19 tracking and detection challenge: How crowded can it get? arXiv preprint arXiv:1906.04567.
Martín-Martín, R., Rezatofighi, H., Shenoi, A., Patel, M., Gwak, J., Dass, N., Federman, A., Goebel, P., & Savarese, S. (2019). Jrdb: A dataset and benchmark for visual perception for navigation in human environments. arXiv preprint arXiv:1910.11792.
Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6154–6162.
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems. 91–99.
Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.
Sahbani, B., & Adiprawita, W. (2017). Kalman filter and iterative-hungarian algorithm implementation for low complexity point tracking as part of fast multiple object tracking system. In: ICSET. 109–115.
Schulter, S., Vernaza, P., Choi, W., & Chandraker, M. (2017). Deep network flow for multi-object tracking. In: CVPR. 6951–6960.
Milan, A., Taix, L.L., Reid, I.D., Roth, S., & Schindler, K. (2016) MOT16: A benchmark for multi-object tracking. CoRR abs/1603.00831.
Henschel, R., Leal-Taix, L., Cremers, D., & Rosenhahn, B. (2018). Fusion of head and full-body detectors for multi-object tracking. In: Computer Vision and Pattern Recognition Workshops (CVPRW).
Tang, S., Andriluka, M., Andres, B., & Schiele, B. (2017). Multiple people tracking by lifted multicut and person reidentification. In: CVPR. 3539–3548.
Xiang, Y., Alahi, A., & Savarese, S. (2015). Learning to track: Online multi-object tracking by decision making. In: ICCV. 4705–4713.
Choi, W. (2015). Near-online multi-target tracking with aggregated local flow descriptor. In: ICCV. 3029–3037.
Kim, C., Li, F., Ciptadi, A., & Rehg, J.M. (2015). Multiple hypothesis tracking revisited. In ICCV. 4696–4704.
Chen, J., Sheng, H., Zhang, Y., & Xiong, Z. (2017). Enhancing detection model for multiple hypothesis tracking. In: CVPR Workshops. 18–27.
Bergmann, P., Meinhardt, T., & Leal-Taixe, L. (2019). Tracking without bells and whistles. ICCV .
Keuper, M., Tang, S., Andres, B., Brox, T., & Schiele, B. (2018). Motion segmentation & multiple object tracking by correlation co-clustering. IEEE transactions on pattern analysis and machine intelligence, 42(1), 140–53.
Article Google Scholar
Chen, L., Ai, H., Chen, R., & Zhuang, Z. (2019). Aggregate tracklet appearance features for multi-object tracking. IEEE Signal Processing Letters.
Levinkov, E., Uhrig, J., Tang, S., Omran, M., Insafutdinov, E., Kirillov, A., Rother, C., Brox, T., Schiele, B., Andres, B.: Joint graph decomposition and node labeling: Problem, algorithms, applications. CVPR (2017)
Maksai, A., Wang, X., Fleuret, F., & Fua, P. (2017). Globally consistent multi-people tracking using motion patterns. ICCV .
Ma, C., Li, Y., Yang, F., Zhang, Z., Zhuang, Y., Jia, H., & Xie, X. (2019). Deep association: End-to-end graph-based learning for multiple object tracking with conv-graph neural network. In: ICMR, ACM ,253–261.
Shen, H., Huang, L., Huang, C., & Xu, W. (2018). Tracklet association tracker: An end-to-end learning-based association approach for multi-object tracking. arXiv preprint arXiv:1808.01562 .
Sadeghian, A., Alahi, A., & Savarese, S. (2017). Tracking the untrackable: Learning to track multiple cues with long-term dependencies. ICCV .
Yang, F., Yan, K., Lu, S., Jia, H., Xie, X., & Gao, W. (2019). Attention driven person re-identification. Pattern Recognition, 86, 143–155.
Article Google Scholar
Yang, F., Yan, K., Lu, S., Jia, H., Xie, D., Yu, Z., et al. (2020). Part-aware progressive unsupervised domain adaptation for person re-identification. IEEE Transactions on Multimedia, 1–1.
Yang, F., Yan, K., Lu, S., Jia, H., Xie, X., & Gao, W. (2019). Attention driven person re-identification. Pattern Recognition, 86, 143–155.
Article Google Scholar
Yang, F., Yan, K., Lu, S., Jia, H., Xie, D., Yu, Z., et al. (2020). Part-aware progressive unsupervised domain adaptation for person re-identification. IEEE Transactions on Multimedia.
Son, J., Baek, M., Cho, M., & Han, B. (2017). Multi-object tracking with quadruplet convolutional neural networks. In: CVPR. 5620–5629.
Chu, Q., Ouyang, W., Li, H., Wang, X., Liu, B., & Yu, N. (2017). Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism. In: CVPR. 4836–4845
Ma, C., Yang, C., Yang, F., Zhuang, Y., Zhang, Z., Jia, H., & Xie, X. (2018). Trajectory factory: Tracklet cleaving and re-connection by deep siamese bi-gru for multiple object tracking. ICME .
Zhu, J., Yang, H., Liu, N., Kim, M., Zhang, W., & Yang, M.H. Online multi-object tracking with dual matching attention networks. In: ECCV. (September 2018)
Gao, X., & Jiang, T. (2018) . Osmo: Online specific models for occlusion in multiple object tracking under surveillance scene. In: 2018 ACM Multimedia Conference on Multimedia Conference. 201–210.
Wang, G., Wang, Y., Zhang, H., Gu, R., & Hwang, J.N. (2019). Exploit the connectivity: Multi-object tracking with trackletnet. In: Proceedings of the 27th ACM International Conference on Multimedia, ACM .482–490.
Dicle, C., Camps, O.I., & Sznaier, M. (2013). The way they move: Tracking multiple targets with similar appearance. In: ICCV. 2304–2311.
Hong Yoon, J., Lee, C.R., Yang, M.H., & Yoon, K.J. (2016). Online multi-object tracking via structural constraint event aggregation. In: CVPR. 1392–1400.
Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, & L., Savarese, S. (2016). Social lstm: Human trajectory prediction in crowded spaces. In: CVPR. 961–971.
Chen, X., Treiber, M., Kanagaraj, V., & Li, H. (2018). Social force models for pedestrian traffic-state of the art. Transport reviews, 38(5), 625–653.
Article Google Scholar
Yang, D., Redmill, K., & Ozguner, U. (2020). A multi-state social force based framework for vehicle-pedestrian interaction in uncontrolled pedestrian crossing scenarios. arXiv preprint arXiv:2005.07769 .
Zhang, M., Li, T., Yu, Y., Li, Y., Hui, P., & Zheng, Y. (2020). Urban anomaly analytics: Description, detection and prediction. IEEE Transactions on Big Data .
Cai, L., Chen, Z., Luo, C., Gui, J., Ni, J., Li, D., & Chen, H. (2020). Structural temporal graph neural networks for anomaly detection in dynamic graphs. arXiv preprint arXiv:2005.07427.
Sadeghian, A., Kosaraju, V., Sadeghian, A., Hirose, N., Rezatofighi, H., & Savarese, S. (2019). Sophie: An attentive gan for predicting paths compliant to social and physical constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1349–1358.
Kosaraju, V., Sadeghian, A., Martín-Martín, R., Reid, I., Rezatofighi, H.,&Savarese, S. (2019). Social-bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks. In: Advances in Neural Information Processing Systems. 137–146
Lan, L., Wang, X., Zhang, S., Tao, D., Gao, W., & Huang, T. S. (2018). Interacting tracklets for multi-object tracking. IEEE Transactions on Image Processing, 27(9), 4585–4597.
Article MathSciNet Google Scholar
Wang, X., Türetken, E., Fleuret, F., & Fua, P. (2015). Tracking interacting objects using intertwined flows. IEEE transactions on pattern analysis and machine intelligence, 38(11), 2312–2326.
Article Google Scholar
Battaglia, P.W., Hamrick, J.B., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, V., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., & Faulkner, R., et al. (2018). Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 .
Li, Y., Tarlow, D., Brockschmidt, M., & Zemel, R. (2016). Gated graph sequence neural networks. ICLR .
Kipf, T.N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. ICLR .
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., & Bengio, Y. Graph attention networks. ICLR (2018) accepted as poster.
Duvenaud, D.K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., Aspuru-Guzik, A., & Adams, R.P. (2015). Convolutional networks on graphs for learning molecular fingerprints. In: Advances in neural information processing systems. 2224–2232.
Kipf, T., Fetaya, E., Wang, K.C., Welling, M., & Zemel, R. (2018). Neural relational inference for interacting systems. ICML .
Garcia, V., & Bruna, J. (2018). Few-shot learning with graph neural networks. ICLR.
Acuna, D., Ling, H., Kar, A., & Fidler, S. (2018). Efficient interactive annotation of segmentation datasets with polygon-rnn++. In: CVPR. 859–868.
Yan, S., Xiong, & Y., Lin, D. (2018). Spatial temporal graph convolutional networks for skeleton-based action recognition. AAAI .
Shen, Y., Li, H., Yi, S., Chen, D., & Wang, X. (2018). Person re-identification with deep similarity-guided graph neural network. In: ECCV, Springer .508–526.
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In: Proceedings of the IEEE International Conference on Computer Vision. 1116–1124
Zheng, Z., Zheng, L., & Yang, Y. (2017). Unlabeled samples generated by gan improve the person re-identification baseline in vitro. arXiv preprint arXiv:1701.077173.
Zhong, Z., Zheng, L., Cao, D., & Li, S. (2017). Re-ranking person re-identification with k-reciprocal encoding. In: Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, IEEE .3652–3661.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE TPAMI, 32(9), 1627–1645.
Article Google Scholar
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., & Sheikh, Y. (2018). Openpose: realtime multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv:1812.08008 .
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 7132–7141.
Kingma, D., & Ba, J. (2015). Adam: A method for stochastic optimization. ICLR .
Wang, B., Wang, L., Shuai, B., Zuo, Z., Liu, T., Luk Chan, K., & Wang, G. (2016) . Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association. In: CVPR Workshops. 1–8
Long, C., Haizhou, A., Zijie, & Z., Chong, S. (2018). Real-time multiple people tracking with deeply learned candidate selection and person re-identification. ICME
Henschel, R., Leal-Taix, L., Cremers, & D., Rosenhahn, B. (2017). A novel multi-detector fusion framework for multi-object tracking. CoRR .
Xu, J., Cao, Y., Zhang, Z., & Hu, H. (2019). Spatial-temporal relation networks for multi-object tracking. arXiv preprint arXiv:1904.11489 .
Sheng, H., Chen, J., Zhang, Y., Ke, W., Xiong, Z., & Yu, J. (2018). Iterative multiple hypothesis tracking with tracklet-level association. IEEE Transactions on Circuits and Systems for Video Technology.
Chu, P., Fan, H., Tan, C.C., & Ling, H. (2019). Online multi-object tracking with instance-aware tracker and dynamic model refreshment. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE . 161–170
Maksai, A., Wang, X., Fleuret, F., & Fua, P. (2017). Non-markovian globally consistent multi-object tracking. In: 2017 IEEE International Conference on Computer Vision (ICCV), IEEE , 2563–2573.
Ristani, E., Solera, F., Zou, R., Cucchiara, R., & Tomasi, C. (2016). Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision, Springer . 17–35.
Zhang, Z., Wu, J., Zhang, X., & Zhang, C. (2017). Multi-target, multi-camera tracking by hierarchical clustering: Recent progress on dukemtmc project. arXiv preprint arXiv:1712.09531 .
Tesfaye, Y.T., Zemene, E., Prati, A., Pelillo, M., & Shah, M. (2017). Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets. arXiv preprint arXiv:1706.06196 .
Yoon, K., Song, Y.m., & Jeon, M. (2018). Multiple hypothesis tracking algorithm for multi-target multi-camera tracking with disjoint views. IET Image Processing .
Sun, S., Akhtar, N., Song, H., Mian, A. S., & Shah, M. (2019). Deep affinity network for multiple object tracking. IEEE transactions on pattern analysis and machine intelligence.
Chen, L., Ai, H., Shang, C., Zhuang, Z., & Bai, B. (2017). Online multi-object tracking with convolutional neural networks. In: 2017 IEEE International Conference on Image Processing (ICIP), IEEE , 645–649.
Chu, P., & Ling, H. (2019). Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In: Proceedings of the IEEE International Conference on Computer Vision. 6172–6181
Bernardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing, 2008(1), 246309.
Google Scholar

Download references

Author information

Authors and Affiliations

National Engineering Laboratory for Video Technology, Peking University, Beijing, China
Cong Ma, Fan Yang, Yuan Li, Huizhu Jia, Xiaodong Xie & Wen Gao

Authors

Cong Ma
View author publications
You can also search for this author in PubMed Google Scholar
Fan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Huizhu Jia
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Xie
View author publications
You can also search for this author in PubMed Google Scholar
Wen Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuan Li.

Additional information

Communicated by Mei Chen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, C., Yang, F., Li, Y. et al. Deep Human-Interaction and Association by Graph-Based Learning for Multiple Object Tracking in the Wild. Int J Comput Vis 129, 1993–2010 (2021). https://doi.org/10.1007/s11263-021-01460-0

Download citation

Received: 20 December 2019
Accepted: 20 March 2021
Published: 19 April 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11263-021-01460-0

Deep Human-Interaction and Association by Graph-Based Learning for Multiple Object Tracking in the Wild

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Data Association with Graph Network for Multi-Object Tracking

Multiple object tracking based on appearance and motion graph convolutional neural networks with an explainer

Graph-Based Data Association in Multiple Object Tracking: A Survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Deep Human-Interaction and Association by Graph-Based Learning for Multiple Object Tracking in the Wild

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Data Association with Graph Network for Multi-Object Tracking

Multiple object tracking based on appearance and motion graph convolutional neural networks with an explainer

Graph-Based Data Association in Multiple Object Tracking: A Survey

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation