Benchmarking 2D Multi-Object Detection and Tracking Algorithms in Autonomous Vehicle Driving Scenarios
<p>General architecture of a multi-object tracking method. The object detector analyzes the frames localizing the objects. For each object, the feature extraction module extracts the descriptors while the classifier predicts the class. The previously tracked objects are projected in the current frame by the motion predictor. The data association module aims to match the detected objects with the previously tracked ones in order to provide the matched pairs. The similarity function and the assignment algorithm completely characterize the approaches. The former estimates the similarity between tracked and detected objects by using appearance-based and position-based features, while the latter defines a strategy to associate them. Finally, the track management module defines and applies the rules for new and expired objects.</p> "> Figure 2
<p>Samples from the BDD100K validation set. The dataset has been collected with different illumination and weather conditions. (<b>a</b>) Daylight cloudy. & (<b>b</b>) Daylight rainy. & (<b>c</b>) Night.</p> "> Figure 3
<p>Examples of a bicycle and a motorcycle and their riders from the BDD100K dataset. (<b>a</b>) Bicycle and Rider. (<b>b</b>) Motorcycle and Rider.</p> "> Figure 4
<p>Comparison between the considered methods in terms of mMOTA, mIDF1, mHOTA and TETA. Each line represents a metric whose value is reported on the y-axis, while the methods are scattered on the x-axis. For all the metrics, the superiority of ConvNext, SwinT and HRNet combined with QDTrack is evident.</p> "> Figure 5
<p>Ranking of the considered methods in terms of mMOTA, mIDF1, mHOTA and TETA. Each line represents a method whose position in the ranking is reported on the y-axis, while the metrics are scattered on the x-axis. ConvNext, SwinT and HRNet combined with QDTrack always keep the top three positions.</p> ">
Abstract
:1. Introduction
- The most promising modern methods for multi-object detection (RetinaNet [25], EfficientDet [26], YOLOv5 [27], YOLOX [28], HRNet [29], Swin Transformer [30], ConvNext [31]) and tracking (DeepSORT [32], UniTrack [33], QDTrack [18]), together with a multi-task approach (FairMOT [34]), have been adapted and fine-tuned for detecting and tracking the objects of interest and evaluated in all their possible combinations, obtaining 22 new MOT algorithms for the context of autonomous vehicle driving.
- The considered multi-object detectors are evaluated separately in terms of Precision, Recall and F-Score, and mIoU by also analyzing Recall and classification accuracy for each class of interest. This analysis, neglected in similar papers, is very useful to determine the impact of this module on the overall performance of the multi-object tracking methods.
- The considered multi-object tracking approaches are evaluated by computing not only the standard CLEAR [35], IDF1 [36] and HOTA [37] metrics, but also considering the recent TETA [23]. In this way, we can separately analyze the influence of the detector and of the data association strategy on the overall performance of the multi-object tracking methods, also with class-agnostic matching strategies.
- The analysis carried out in the proposed experimental framework allows to identify the limitations of the metrics and of the multi-object detection and tracking algorithms applied in the context of self-driving vehicles, as well as the necessity to realize a framework for simulating the impact of perception errors on autonomous navigation, indicating useful future research directions.
2. Related Works
3. Experimental Framework
3.1. Dataset
3.2. Object Detectors
3.3. Object Tracking
3.4. Evaluation Metrics
3.4.1. Detection
3.4.2. Classification
3.4.3. Tracking
- : True Positive Associations, the set of true positives whose predicted and groundtruth identifier is the same of p.
- : False Negative Associations, the set of true positives whose groundtruth identifier is the same of p but with a different predicted identifier.
- : False Positive Associations, the set of true positives whose predicted identifier is the same of p but with a different groundtruth identifier.
3.5. Experimental Setup
4. Results
4.1. Detection Results
4.2. Classification Results
4.3. Tracking Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ahangar, M.N.; Ahmed, Q.Z.; Khan, F.A.; Hafeez, M. A survey of autonomous vehicles: Enabling communication technologies and challenges. Sensors 2021, 21, 706. [Google Scholar] [CrossRef] [PubMed]
- Hakak, S.; Gadekallu, T.R.; Maddikunta, P.K.R.; Ramu, S.P.; Parimala, M.; De Alwis, C.; Liyanage, M. Autonomous Vehicles in 5G and beyond: A Survey. Veh. Commun. 2022, 39, 100551. [Google Scholar] [CrossRef]
- Butt, F.A.; Chattha, J.N.; Ahmad, J.; Zia, M.U.; Rizwan, M.; Naqvi, I.H. On the integration of enabling wireless technologies and sensor fusion for next-generation connected and autonomous vehicles. IEEE Access 2022, 10, 14643–14668. [Google Scholar] [CrossRef]
- Yurtsever, E.; Lambert, J.; Carballo, A.; Takeda, K. A survey of autonomous driving: Common practices and emerging technologies. IEEE Access 2020, 8, 58443–58469. [Google Scholar] [CrossRef]
- Tampuu, A.; Matiisen, T.; Semikin, M.; Fishman, D.; Muhammad, N. A survey of end-to-end driving: Architectures and training methods. IEEE Trans. Neural Netw. Learn. Syst. 2020, 33, 1364–1384. [Google Scholar] [CrossRef] [PubMed]
- Prakash, A.; Chitta, K.; Geiger, A. Multi-modal fusion transformer for end-to-end autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7077–7087. [Google Scholar]
- Greco, A.; Rundo, L.; Saggese, A.; Vento, M.; Vicinanza, A. Imitation Learning for Autonomous Vehicle Driving: How Does the Representation Matter? In Proceedings of the International Conference on Image Analysis and Processing (ICIAP), Lecce, Italy, 23 May 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 15–26. [Google Scholar]
- Tampuu, A.; Aidla, R.; van Gent, J.A.; Matiisen, T. Lidar-as-camera for end-to-end driving. Sensors 2023, 23, 2845. [Google Scholar] [CrossRef] [PubMed]
- Alaba, S.Y.; Ball, J.E. A survey on deep-learning-based lidar 3d object detection for autonomous driving. Sensors 2022, 22, 9577. [Google Scholar] [CrossRef]
- Ravindran, R.; Santora, M.J.; Jamali, M.M. Multi-object detection and tracking, based on DNN, for autonomous vehicles: A review. IEEE Sensors J. 2020, 21, 5668–5677. [Google Scholar] [CrossRef]
- Greco, A.; Saggese, A.; Vento, M.; Vigilante, V. Vehicles Detection for Smart Roads Applications on Board of Smart Cameras: A Comparative Analysis. IEEE Trans. Intell. Transp. Syst. 2021, 23, 8077–8089. [Google Scholar] [CrossRef]
- Li, J.; Ding, Y.; Wei, H.L.; Zhang, Y.; Lin, W. SimpleTrack: Rethinking and Improving the JDE Approach for Multi-Object Tracking. Sensors 2022, 22, 5863. [Google Scholar] [CrossRef] [PubMed]
- Lu, Z.; Rathod, V.; Votel, R.; Huang, J. Retinatrack: Online single stage joint detection and tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 14668–14678. [Google Scholar]
- Zaidi, S.S.A.; Ansari, M.S.; Aslam, A.; Kanwal, N.; Asghar, M.; Lee, B. A survey of modern deep learning based object detection models. Digit. Signal Process. 2022, 126, 103514. [Google Scholar] [CrossRef]
- Su, H.; Qi, W.; Schmirander, Y.; Ovur, S.E.; Cai, S.; Xiong, X. A human activity-aware shared control solution for medical human–robot interaction. Assem. Autom. 2022, 42, 388–394. [Google Scholar] [CrossRef]
- Qi, W.; Ovur, S.E.; Li, Z.; Marzullo, A.; Song, R. Multi-sensor guided hand gesture recognition for a teleoperated robot using a recurrent neural network. IEEE Robot. Autom. Lett. 2021, 6, 6039–6045. [Google Scholar] [CrossRef]
- Carletti, V.; Greco, A.; Saggese, A.; Vento, M. Multi-object tracking by flying cameras based on a forward-backward interaction. IEEE Access 2018, 6, 43905–43919. [Google Scholar] [CrossRef]
- Pang, J.; Qiu, L.; Li, X.; Chen, H.; Li, Q.; Darrell, T.; Yu, F. Quasi-dense similarity learning for multiple object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 164–173. [Google Scholar]
- Carletti, V.; Foggia, P.; Greco, A.; Saggese, A.; Vento, M. Automatic detection of long term parked cars. In Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Karlsruhe, Germany, 25–28 August 2015; pp. 1–6. [Google Scholar]
- Sun, P.; Kretzschmar, H.; Dotiwalla, X.; Chouard, A.; Patnaik, V.; Tsui, P.; Guo, J.; Zhou, Y.; Chai, Y.; Caine, B.; et al. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2446–2454. [Google Scholar]
- Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2636–2645. [Google Scholar]
- Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. Bytetrack: Multi-object tracking by associating every detection box. arXiv 2021, arXiv:2110.06864. [Google Scholar]
- Li, S.; Danelljan, M.; Ding, H.; Huang, T.E.; Yu, F. Tracking Every Thing in the Wild. In European Conference on Computer Vision (ECCV); Springer: Berlin/Heidelberg, Germany, 2022; pp. 498–515. [Google Scholar]
- Yan, B.; Jiang, Y.; Sun, P.; Wang, D.; Yuan, Z.; Luo, P.; Lu, H. Towards grand unification of object tracking. In European Conference on Computer Vision (ECCV); Springer: Berlin/Heidelberg, Germany, 2022; pp. 733–751. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; NanoCode012; Kwon, Y.; Michael, K.; TaoXie; Fang, J.; imyhxy; et al. ultralytics/yolov5: v7.0 - YOLOv5 SOTA Realtime Instance Segmentation. 2022. Available online: https://zenodo.org/record/7347926#.ZDZQX3ZBw2w (accessed on 1 January 2023).
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep High-Resolution Representation Learning for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 3349–3364. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
- Wang, Z.; Zhao, H.; Li, Y.L.; Wang, S.; Torr, P.; Bertinetto, L. Do Different Tracking Tasks Require Different Appearance Models? Adv. Neural Inf. Process. Syst. 2021, 34, 726–738. [Google Scholar]
- Zhang, Y.; Wang, C.; Wang, X.; Zeng, W.; Liu, W. Fairmot: On the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 2021, 129, 3069–3087. [Google Scholar] [CrossRef]
- Bernardin, K.; Stiefelhagen, R. Evaluating multiple object tracking performance: The clear mot metrics. EURASIP J. Image Video Process. 2008, 2008, 1–10. [Google Scholar] [CrossRef]
- Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C. Performance measures and a data set for multi-target, multi-camera tracking. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 17–35. [Google Scholar]
- Luiten, J.; Osep, A.; Dendorfer, P.; Torr, P.; Geiger, A.; Leal-Taixé, L.; Leibe, B. Hota: A higher order metric for evaluating multi-object tracking. Int. J. Comput. Vis. 2021, 129, 548–578. [Google Scholar] [CrossRef] [PubMed]
- Ciaparrone, G.; Sánchez, F.L.; Tabik, S.; Troiano, L.; Tagliaferri, R.; Herrera, F. Deep learning in video multi-object tracking: A survey. Neurocomputing 2020, 381, 61–88. [Google Scholar] [CrossRef]
- Guo, S.; Wang, S.; Yang, Z.; Wang, L.; Zhang, H.; Guo, P.; Gao, Y.; Guo, J. A Review of Deep Learning-Based Visual Multi-Object Tracking Algorithms for Autonomous Driving. Appl. Sci. 2022, 12, 10741. [Google Scholar] [CrossRef]
- Pal, S.K.; Pramanik, A.; Maiti, J.; Mitra, P. Deep learning in multi-object detection and tracking: State of the art. Appl. Intell. 2021, 51, 6400–6429. [Google Scholar] [CrossRef]
- Rakai, L.; Song, H.; Sun, S.; Zhang, W.; Yang, Y. Data association in multiple object tracking: A survey of recent techniques. Expert Syst. Appl. 2022, 192, 116300. [Google Scholar] [CrossRef]
- Wang, Z.; Zheng, L.; Liu, Y.; Li, Y.; Wang, S. Towards real-time multi-object tracking. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 107–122. [Google Scholar]
- Zeng, F.; Dong, B.; Zhang, Y.; Wang, T.; Zhang, X.; Wei, Y. Motr: End-to-end multiple-object tracking with transformer. In European Conference on Computer Vision (ECCV); Springer: Berlin/Heidelberg, Germany, 2022; pp. 659–675. [Google Scholar]
- Chu, P.; Wang, J.; You, Q.; Ling, H.; Liu, Z. Transmot: Spatial-temporal graph transformer for multiple object tracking. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 4870–4880. [Google Scholar]
- Meinhardt, T.; Kirillov, A.; Leal-Taixe, L.; Feichtenhofer, C. Trackformer: Multi-object tracking with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8844–8854. [Google Scholar]
- Milan, A.; Leal-Taixé, L.; Reid, I.; Roth, S.; Schindler, K. MOT16: A benchmark for multi-object tracking. arXiv 2016, arXiv:1603.00831. [Google Scholar]
- Pereira, R.; Carvalho, G.; Garrote, L.; Nunes, U.J. Sort and deep-SORT based multi-object tracking for mobile robotics: Evaluation with new data association metrics. Appl. Sci. 2022, 12, 1319. [Google Scholar] [CrossRef]
- Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar]
- Du, Y.; Zhao, Z.; Song, Y.; Zhao, Y.; Su, F.; Gong, T.; Meng, H. StrongSORT: Make DeepSORT Great Again. IEEE Trans. Multimed. 2023, 1–14. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? In The KITTI vision benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
- Dendorfer, P.; Rezatofighi, H.; Milan, A.; Shi, J.; Cremers, D.; Reid, I.; Roth, S.; Schindler, K.; Leal-Taixé, L. Mot20: A benchmark for multi object tracking in crowded scenes. arXiv 2020, arXiv:2003.09003. [Google Scholar]
- Bergmann, P.; Meinhardt, T.; Leal-Taixe, L. Tracking Without Bells and Whistles. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Jonathon Luiten, A.H. TrackEval. 2020. Available online: https://github.com/JonathonLuiten/TrackEval (accessed on 1 January 2023).
SwinT | 0.73 | 0.82 | 0.78 | 0.84 | 0.74 | 0.70 | 0.85 | 0.76 | 0.80 | 0.63 | 0.58 |
ConvNext | 0.71 | 0.83 | 0.77 | 0.84 | 0.75 | 0.71 | 0.85 | 0.78 | 0.81 | 0.67 | 0.58 |
HRNet | 0.74 | 0.81 | 0.77 | 0.83 | 0.72 | 0.71 | 0.84 | 0.76 | 0.79 | 0.66 | 0.57 |
YOLOX | 0.75 | 0.79 | 0.77 | 0.84 | 0.68 | 0.64 | 0.83 | 0.71 | 0.73 | 0.59 | 0.53 |
YOLOv5 | 0.74 | 0.78 | 0.76 | 0.84 | 0.65 | 0.56 | 0.81 | 0.71 | 0.72 | 0.60 | 0.55 |
FairMOT | 0.93 | 0.56 | 0.70 | 0.80 | 0.43 | 0.30 | 0.60 | 0.48 | 0.42 | 0.15 | 0.20 |
EfficientDet | 0.71 | 0.64 | 0.67 | 0.82 | 0.45 | 0.41 | 0.69 | 0.57 | 0.57 | 0.47 | 0.30 |
RetinaNet | 0.58 | 0.74 | 0.65 | 0.81 | 0.62 | 0.60 | 0.77 | 0.67 | 0.71 | 0.51 | 0.48 |
A | ||||||||
---|---|---|---|---|---|---|---|---|
ConvNext | 0.97 | 0.99 | 0.68 | 1.00 | 0.68 | 0.78 | 0.78 | 0.95 |
SwinT | 0.97 | 0.99 | 0.67 | 1.00 | 0.68 | 0.79 | 0.79 | 0.94 |
YOLOX | 0.97 | 0.99 | 0.65 | 1.00 | 0.66 | 0.79 | 0.81 | 0.94 |
YOLOv5 | 0.97 | 0.99 | 0.65 | 1.00 | 0.66 | 0.78 | 0.74 | 0.95 |
HRNet | 0.97 | 0.99 | 0.65 | 0.99 | 0.68 | 0.76 | 0.79 | 0.94 |
FairMOT | 0.97 | 1.00 | 0.54 | 1.00 | 0.66 | 0.76 | 0.94 | 0.95 |
EfficientDet | 0.97 | 0.99 | 0.55 | 1.00 | 0.59 | 0.74 | 0.79 | 0.91 |
RetinaNet | 0.96 | 0.97 | 0.52 | 0.99 | 0.58 | 0.68 | 0.72 | 0.88 |
Detector | Tracker | |||||
---|---|---|---|---|---|---|
ConvNext | QDTrack | 0.42 | 0.83 | 137,827 | 19,160 | 7262 |
SwinT | QDTrack | 0.42 | 0.83 | 141,444 | 17,215 | 5222 |
HRNet | QDTrack | 0.41 | 0.83 | 135,707 | 20,572 | 5585 |
ConvNext | UniTrack | 0.30 | 0.77 | 180,729 | 22,821 | 38,772 |
HRNet | UniTrack | 0.29 | 0.77 | 188,400 | 22,706 | 35,820 |
RetinaNet | QDTrack | 0.28 | 0.83 | 207,003 | 19,715 | 8326 |
YOLOX | QDTrack | 0.28 | 0.86 | 229,427 | 5043 | 4051 |
SwinT | UniTrack | 0.28 | 0.77 | 207,309 | 16,980 | 21,608 |
YOLOX | UniTrack | 0.27 | 0.78 | 210,391 | 11,065 | 32,710 |
YOLOv5 | UniTrack | 0.24 | 0.79 | 235,107 | 7168 | 27,355 |
FairMOT | FairMOT | 0.23 | 0.77 | 190,537 | 21,262 | 43,140 |
RetinaNet | UniTrack | 0.21 | 0.77 | 234,119 | 19,524 | 30,992 |
YOLOv5 | QDTrack | 0.20 | 0.89 | 254,542 | 3823 | 2665 |
YOLOv5 | DeepSORT | 0.19 | 0.75 | 183,116 | 63,681 | 18,814 |
HRNet | DeepSORT | 0.16 | 0.74 | 156,518 | 98,360 | 23,299 |
SwinT | DeepSORT | 0.16 | 0.74 | 156,647 | 98,523 | 23,307 |
YOLOX | DeepSORT | 0.16 | 0.75 | 165,816 | 76,382 | 21,065 |
ConvNext | DeepSORT | 0.15 | 0.74 | 153,056 | 103,642 | 22,995 |
EfficientDet | DeepSORT | 0.14 | 0.77 | 272,843 | 36,474 | 9998 |
EfficientDet | QDTrack | 0.14 | 0.87 | 305,060 | 5244 | 2379 |
EfficientDet | UniTrack | 0.14 | 0.80 | 314,947 | 3784 | 15,085 |
RetinaNet | DeepSORT | 0.04 | 0.74 | 199,032 | 95,992 | 22,230 |
Detector | Tracker | |||
---|---|---|---|---|
ConvNext | QDTrack | 0.56 | 0.43 | 0.82 |
SwinT | QDTrack | 0.55 | 0.42 | 0.83 |
HRNet | QDTrack | 0.54 | 0.41 | 0.82 |
HRNet | DeepSORT | 0.41 | 0.34 | 0.52 |
SwinT | DeepSORT | 0.41 | 0.34 | 0.52 |
RetinaNet | QDTrack | 0.41 | 0.29 | 0.76 |
ConvNext | UniTrack | 0.41 | 0.30 | 0.67 |
ConvNext | DeepSORT | 0.40 | 0.33 | 0.51 |
YOLOX | QDTrack | 0.39 | 0.27 | 0.83 |
YOLOv5 | DeepSORT | 0.38 | 0.29 | 0.58 |
YOLOX | DeepSORT | 0.38 | 0.30 | 0.54 |
HRNet | UniTrack | 0.36 | 0.27 | 0.61 |
SwinT | UniTrack | 0.36 | 0.26 | 0.66 |
YOLOX | UniTrack | 0.36 | 0.26 | 0.69 |
FairMOT | FairMOT | 0.35 | 0.24 | 0.69 |
YOLOv5 | UniTrack | 0.33 | 0.22 | 0.74 |
RetinaNet | UniTrack | 0.32 | 0.22 | 0.63 |
RetinaNet | DeepSORT | 0.30 | 0.24 | 0.43 |
YOLOv5 | QDTrack | 0.30 | 0.20 | 0.88 |
EfficientDet | DeepSORT | 0.28 | 0.19 | 0.62 |
EfficientDet | QDTrack | 0.23 | 0.15 | 0.82 |
EfficientDet | UniTrack | 0.22 | 0.14 | 0.74 |
Detector | Tracker | |||
---|---|---|---|---|
ConvNext | QDTrack | 0.46 | 0.39 | 0.55 |
SwinT | QDTrack | 0.45 | 0.38 | 0.55 |
HRNet | QDTrack | 0.44 | 0.37 | 0.54 |
YOLOX | QDTrack | 0.35 | 0.27 | 0.47 |
RetinaNet | QDTrack | 0.35 | 0.28 | 0.44 |
ConvNext | DeepSORT | 0.34 | 0.27 | 0.44 |
SwinT | DeepSORT | 0.34 | 0.28 | 0.45 |
HRNet | DeepSORT | 0.34 | 0.27 | 0.45 |
ConvNext | UniTrack | 0.34 | 0.30 | 0.40 |
YOLOv5 | DeepSORT | 0.32 | 0.25 | 0.44 |
YOLOX | DeepSORT | 0.32 | 0.25 | 0.44 |
YOLOX | UniTrack | 0.31 | 0.27 | 0.39 |
HRNet | UniTrack | 0.31 | 0.29 | 0.34 |
SwinT | UniTrack | 0.31 | 0.26 | 0.37 |
FairMOT | FairMOT | 0.29 | 0.24 | 0.39 |
YOLOv5 | UniTrack | 0.29 | 0.23 | 0.39 |
RetinaNet | UniTrack | 0.28 | 0.23 | 0.35 |
RetinaNet | DeepSORT | 0.27 | 0.21 | 0.38 |
EfficientDet | DeepSORT | 0.26 | 0.17 | 0.40 |
YOLOv5 | QDTrack | 0.25 | 0.17 | 0.38 |
EfficientDet | QDTrack | 0.23 | 0.14 | 0.39 |
EfficientDet | UniTrack | 0.21 | 0.14 | 0.34 |
Detector | Tracker | ||||
---|---|---|---|---|---|
ConvNext | QDTrack | 0.49 | 0.41 | 0.46 | 0.60 |
SwinT | QDTrack | 0.49 | 0.39 | 0.46 | 0.61 |
HRNet | QDTrack | 0.48 | 0.39 | 0.46 | 0.59 |
YOLOX | QDTrack | 0.45 | 0.26 | 0.42 | 0.67 |
YOLOv5 | QDTrack | 0.43 | 0.21 | 0.40 | 0.68 |
RetinaNet | QDTrack | 0.41 | 0.29 | 0.36 | 0.57 |
YOLOX | UniTrack | 0.40 | 0.27 | 0.33 | 0.59 |
ConvNext | UniTrack | 0.39 | 0.32 | 0.32 | 0.53 |
YOLOv5 | UniTrack | 0.39 | 0.22 | 0.33 | 0.60 |
EfficientDet | UniTrack | 0.38 | 0.14 | 0.34 | 0.65 |
SwinT | UniTrack | 0.38 | 0.28 | 0.32 | 0.54 |
EfficientDet | QDTrack | 0.37 | 0.15 | 0.35 | 0.61 |
HRNet | UniTrack | 0.37 | 0.31 | 0.30 | 0.52 |
ConvNext | DeepSORT | 0.35 | 0.39 | 0.37 | 0.29 |
RetinaNet | UniTrack | 0.35 | 0.24 | 0.29 | 0.51 |
SwinT | DeepSORT | 0.34 | 0.37 | 0.36 | 0.29 |
HRNet | DeepSORT | 0.34 | 0.37 | 0.35 | 0.29 |
YOLOv5 | DeepSORT | 0.34 | 0.31 | 0.36 | 0.35 |
YOLOX | DeepSORT | 0.34 | 0.35 | 0.36 | 0.32 |
FairMOT | FairMOT | 0.34 | 0.30 | 0.28 | 0.46 |
EfficientDet | DeepSORT | 0.32 | 0.20 | 0.35 | 0.41 |
RetinaNet | DeepSORT | 0.30 | 0.31 | 0.34 | 0.26 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gragnaniello, D.; Greco, A.; Saggese, A.; Vento, M.; Vicinanza, A. Benchmarking 2D Multi-Object Detection and Tracking Algorithms in Autonomous Vehicle Driving Scenarios. Sensors 2023, 23, 4024. https://doi.org/10.3390/s23084024
Gragnaniello D, Greco A, Saggese A, Vento M, Vicinanza A. Benchmarking 2D Multi-Object Detection and Tracking Algorithms in Autonomous Vehicle Driving Scenarios. Sensors. 2023; 23(8):4024. https://doi.org/10.3390/s23084024
Chicago/Turabian StyleGragnaniello, Diego, Antonio Greco, Alessia Saggese, Mario Vento, and Antonio Vicinanza. 2023. "Benchmarking 2D Multi-Object Detection and Tracking Algorithms in Autonomous Vehicle Driving Scenarios" Sensors 23, no. 8: 4024. https://doi.org/10.3390/s23084024