Abstract
In recent years,object detection and data association have getting remarkable progress which are the core components for multi-object tracking. In multi-object tracking field,the main strategy is tracking-by-detection. Although the detection based tracking method can get great results, it is relies on the performance of the detector. In complex scene, detector can not provide reliable results. Moreover,due to the incorrect detection results, data association process can not be trusted. Based on this motivation, this paper focuses on improving the accuracy of detection and data association. We introduce the efficient channel attention module to the backbone network, which can adaptively extract important information in images. Furthermore, we apply switchable atrous convolution in the network to dynamically adjust the receptive field according to object changes. In data association process, the appearance features with minimum occlusion are saved for each existing trajectory, which are used for re-associate after the objects are lost. Extensive experiments on MOT16,MOT17 and MOT20 challenging datasets demonstrate that our method is comparable with the state-of-the-art multi-object tracking methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 941–951
Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J Image Video Process 2008(1):1–10
Bewley A, Ge Z, Ott L, Ramos F, Upcroft R (2016) Simple online and realtime tracking, 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468
Bochinski E, Eiselein V, Sikora T (2017) High-speed tracking-by-detection without using image information, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance , pp. 1–6
Bochkovskiy A, Wang CY, Liao CYM (2020) YOLOv4:Optimal speed and accuracy of object detection, arXiv preprint arXiv:2004.10934
Chen L, Ai H, Zhuang Z, Shang C (2018) Real-time multiple people tracking with deeply learned candidate selection and Person Re-Identification,2018 IEEE International Conference on Multimedia and Expo (ICME),pp. 1-6
Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017) Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism, 2017 IEEE International Conference on Computer Vision, pp. 4846–4855
Dendorfer P, Rezatofighi H, Milan A, et al (2020) MOT20: a benchmark for multi object tracking in crowded scenes, arXiv preprint arXiv:2003.09003
Dollar P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: a benchmark, 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 304–311
Ess A, Leibe B, Schindler K, Van Gool L (2008) A mobile vision system for robust multi-person tracking, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8
Fang K, Xiang Y, Li Y, Savarese S (2018) Recurrent autoregressive networks for online multi-object tracking, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 466–475
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Hong C, Yu J, Zhang J, Jin X, Lee K (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Indus Inf 15(7):3952–3961
Hu J, Shen L, Sun G (2018) Squeeze-and-Excitation Networks, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141
Joseph R, Ali F (2018) Yolov3: an incremental improvement, arXiv preprint arXiv:1804.02767
Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 82D:35–45
Kuhn HW (1955) The hungarian method for the assignment problem. Naval Res Log Quarter 2(1–2):83–97
Law H, Deng J (2018) CornerNet: detecting objects as paired keypoints. Int J Comput Vis
Lin TY, Maire M, Belongie S, et al (2014) Microsoft coco: common objects in context, In: European conference on computer vision(ECCV). pp. 740–755
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Cheng YF, Berg AC (2016) Ssd: Single shot multibox detector. European conference on computer vision pp. 21–37
Mahmoudi N, Ahadi SM, Rahmati M (2019) Multi-target tracking using cnn-based features: Cnnmtt. Multimed Tools and Appl 78(6):7077–7096
Michel M, Leonardo M, Bruno P, Andre CD, Hendrik M (2020) Learning to associate detections for real-time multiple object tracking, arXiv preprint arXiv:2007.06041
Milan A, Leal-Taixe L, et al. (2016) Mot16: A benchmark for multi-object tracking, arXiv preprint arXiv:1603.00831
Pang B, Li Y, Zhang Y, Li M, Lu C (2020) TubeTK: adopting tubes to track multi-object in a one-step training model, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6307–6317
Peng J , Wang C , Wan F , et al (2020) Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking, arXiv preprint arXiv:2007.14557
Qiao S , Chen L C , Yuille A (2020) DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution,arXiv preprint arXiv:2006.02334
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Ristani E, Solera F, et al (2016) Performance measures and a data set for multi-target, multi-camera tracking, in European Conference on Computer Vision , pp. 17–35
Sanchez-Matilla R, Poiesi F, Cavallaro A (2016) Online multi-target tracking with strong and weak detections, in European Conference on Computer Vision, pp. 84–99
Sun S, Akhtar N, Song H, Mian AS, Shah M (2019) Deep affinity network for multiple object tracking,in IEEE Transactions on Pattern Analysis and Machine Intelligence
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) ECA-Net: efficient channel attention for deep convolutional neural networks, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11531–11539
Wang Z, Zheng L, Liu Y, Wang S (2020) Towards real-time multi object tracking. in European Conference on Computer Vision
Wan X, Wang J, Kong Z, Zhao Q, Deng S (2018) Multi-object tracking using online metric learning with long short-term memory, 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 788–792
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric, 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649
Xiao T, Li S,Wang B, Lin L, Wang X (2017) Joint detection and identification feature learning for Person search, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3376–3385
Xingyi Z, Dequan W, Philipp K (2019) Objects as points. arXiv preprint arXiv:1904.07850
Yu F , Koltun V (2016) Multi-Scale Context Aggregation by Dilated Convolutions, International Conference on Learning Representations
Yu J, Rui Y, Chen B (2013) Exploiting click constraints and multi-view features for image re-ranking. IEEE Trans Multimed 16(1):159–168
Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032
Yu J, Tao D, Wang M, Rui Y (2014) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybernet 45(4):767–779
Yu J, Zhu C, Zhang J, Huang Q, Tao D (2019) Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans Neural Netw Learn Syst 31(2):661–674
Yu F, Li W, et al (2016) Multiple object tracking with high performance detection and appearance feature, in European Conference on Computer Vision (ECCV), pp. 36–42
Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell
Yu F, Wang D, Shelhamer E, Darrell T (2018) Deep layer aggregation, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.. 2403–2412
Yu J, Yao J, Zhang J, Yu Z, Tao D, SPRNet: single-pixel reconstruction for one-stage instance segmentation, in IEEE Transactions on Cybernetics, pp. 1–12
Zhang S, Benenson R, Schiele B (2017) CityPersons: a Diverse Dataset for Pedestrian Detection, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4457–4465
Zhan Y, Wang C, Wang X, et al (2020) A Simple Baseline for Multi Object Tracking. arXiv preprint arXiv:2004.01888v4
Zheng L, Zhang H, et al. (2017) Person re-identifification in the wild, in IEEE Conference on Computer Vision and Pattern Recognition, pp. 1367–1376
Zhou X , Koltun V , Krhenbühl, Philipp (2020) Tracking objects as points, in European Conference on Computer Vision
Zhou Z, Xing J, Zhang M, Hu W (2018) Online multi-target tracking with tensor-based high-order graph matching, 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1809–1814
Zhu J, Yang H, Liu N, et al (2018) Online multi-object tracking with dual matching attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 366–382
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported in part by CAAI-Huawei MindSpore Open Fund, in part by the National Natural Science Foundation of China under Grant 61401113, in part by the Natural Science Foundation of Heilongjiang Province of China under Grant LC201426, in part by the Fundamental Research Funds for the Central Universities of China under Grant 3072021CF0801.
Rights and permissions
About this article
Cite this article
Xiang, X., Ren, W., Qiu, Y. et al. Multi-object Tracking Method Based on Efficient Channel Attention and Switchable Atrous Convolution. Neural Process Lett 53, 2747–2763 (2021). https://doi.org/10.1007/s11063-021-10519-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-021-10519-5