Multi-Object Pedestrian Tracking Using Improved YOLOv8 and OC-SORT
<p>The structure of the Improved-YOLOv8.</p> "> Figure 2
<p>Schematic diagram of the convolutional layer and the GhostNet module [<a href="#B21-sensors-23-08439" class="html-bibr">21</a>].</p> "> Figure 3
<p>The structure of YOLOv8 with GhostNet: (<b>a</b>) Ghost module and (<b>b</b>) C3Ghost.</p> "> Figure 4
<p>Coincident position relationship between detection frame and prediction frame: (<b>a</b>) horizontal overlap and (<b>b</b>) cross overlap.</p> "> Figure 5
<p>Training results of the the improved YOLOv8 model.</p> "> Figure 6
<p>The detection results of the baseline model (<b>left</b>) and the improved model (<b>right</b>).</p> "> Figure 7
<p>The heatmap of the baseline model (<b>left</b>) and the improved model (<b>right</b>).</p> "> Figure 8
<p>Graph of different performance indicators relative to alpha values.</p> "> Figure 9
<p>The pedestrian tracking effect of three consecutive frames in the video.</p> ">
Abstract
:1. Introduction
- SoftNMS for improved pedestrian detection. In pedestrian detection, occlusion poses a common challenge, and traditional NMS techniques often result in missed detections. To address this issue, we introduce SoftNMS to enhance the performance of pedestrian detection under occlusion conditions. SoftNMS effectively handles overlapping bounding boxes and improves the accuracy of pedestrian detection.
- GhostNet for optimized model complexity. Traditional deep neural network models are typically complex and challenging to deploy on resource-constrained devices. In this study, we leverage a GhostNet module to optimize the YOLOv8 architecture. By sharing weights across multiple convolutional layers, it reduces model complexity while maintaining performance, enabling efficient execution on resource-limited devices.
- Integration of OCSORT tracking algorithm and REID model. We combine the OCSORT (GIOU) tracking algorithm with a mobileNetV2-based REID model. The OCSORT algorithm effectively handles occlusion, while the REID model ensures robust identity verification and tracking consistency. By integrating object detection and object tracking, our approach achieves outstanding performance in pedestrian tracking in complex scenarios.
2. Related Works
3. Proposed Method
3.1. The Proposed Network Structure
3.1.1. Softnms Implementation
3.1.2. GhostNet Module Utilization
3.1.3. GIOU Loss Function
3.1.4. Evaluation Index
- Pedestrian Detection Evaluation: (a) Precision (P): denotes the ratio of correctly predicted positive detections to the total predicted positive detections. It gauges the algorithm’s capability to minimize false positives. (b) Recall (R): represents the ratio of correctly predicted positive detections to all actual positive instances in the ground truth. It assesses the algorithm’s effectiveness in minimizing false negatives. (c) Mean Average Precision (mAP): mAP serves as a widely utilized metric in object detection tasks. It computes the average precision across various object categories and IoU thresholds. This metric offers a comprehensive evaluation of the algorithm’s accuracy in object detection. The computation of these metrics is demonstrated as follows:In these equations, TP (true positives) denotes the count of positive samples correctly predicted as positive, while FP (false negatives) corresponds to the number of positive samples erroneously predicted as negative. FP (false positives) represents the instances where negative samples are incorrectly predicted as positive. In this study, the total number of categories is set to 2. Moreover, the number of parameters, model size, and FLOPs serve as benchmarks for assessing the lightweight nature of a model. The quantity of parameters and model size primarily hinges on the network architecture. FLOPs, on the other hand, quantifies the computational complexity of the model by representing the number of calculations required for its operation.
- Pedestrian Tracking Evaluation: for object tracking, we used the CLEAR [22] evaluation indicator, which comprehensively considers FP, FN, and ID-Switch, and is a more common known as MOTA. CLEAR reflects the tracking quality of the tracker more comprehensively; however, as CLEAR ignores the ID characteristics of multiple targets, we additionally introduce IDF1 [23] to make up for the lack of MOTA in this regard. In addition, HOTA [24] is an indicator that has been proposed in recent years; it can reflect the effects of detection and matching in a balanced manner.
4. Experiments
4.1. Datasets
4.2. Experimental Settings
4.3. Experimental Settings
4.3.1. Pedestrian Detection Result
4.3.2. Pedestrian Tracking Result
4.3.3. Tracking Effect Visualization
5. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bernardin, K.; Stiefelhagen, R. Evaluating multiple object tracking performance: The clear mot metrics. Eurasip J. Image Video Process. 2008, 2008, 246309. [Google Scholar] [CrossRef]
- Brasó, G.; Cetintas, O.; Leal-Taixé, L. Multi-Object Tracking and Segmentation via Neural Message Passing. Int. J. Comput. Vis. 2022, 130, 3035–3053. [Google Scholar] [CrossRef]
- Cheng, Y. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 1995, 17, 790–799. [Google Scholar] [CrossRef]
- Ciaparrone, G.; Sánchez, F.L.; Tabik, S.; Troiano, L.; Tagliaferri, R.; Herrera, F. Deep learning in video multi-object tracking: A survey. Neurocomputing 2020, 381, 61–88. [Google Scholar] [CrossRef]
- Dendorfer, P.; Rezatofighi, H.; Milan, A.; Shi, J.; Cremers, D.; Reid, I.; Roth, S.; Schindler, K.; Leal-Taixé, L. Mot20: A benchmark for multi object tracking in crowded scenes. arXiv 2020, arXiv:2003.09003. [Google Scholar]
- Guo, S.; Wang, S.; Yang, Z.; Wang, L.; Zhang, H.; Guo, P.; Gao, Y.; Guo, J. A Review of Deep Learning-Based Visual Multi-Object Tracking Algorithms for Autonomous Driving. Appl. Sci. 2022, 12, 10741. [Google Scholar] [CrossRef]
- He, L.; Wu, F.; Du, X.; Zhang, G. Cascade-SORT: A robust fruit counting approach using multiple features cascade matching. Comput. Electron. Agric. 2022, 200, 107223. [Google Scholar] [CrossRef]
- Hu, X.; Liu, Y.; Zhao, Z.; Liu, J.; Yang, X.; Sun, C.; Chen, S.; Li, B.; Zhou, C. Real-time detection of uneaten feed pellets in underwater images for aquaculture using an improved YOLO-V4 network. Comput. Electron. Agric. 2021, 185, 106135. [Google Scholar] [CrossRef]
- Keuper, M.; Tang, S.; Andres, B.; Brox, T.; Schiele, B. Motion segmentation & multiple object tracking by correlation co-clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 42, 140–153. [Google Scholar]
- Kim, C.; Fuxin, L.; Alotaibi, M.; Rehg, J.M. Discriminative appearance modeling with multi-track pooling for real-time multi-object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9553–9562. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Laroca, R.; Zanlorensi, L.A.; Gonçalves, G.R.; Todt, E.; Schwartz, W.R.; Menotti, D. An efficient and layout-independent automatic license plate recognition system based on the YOLO detector. IET Intell. Transp. Syst. 2021, 15, 483–503. [Google Scholar] [CrossRef]
- Li, J.; Gao, X.; Jiang, T. Graph networks for multiple object tracking. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Seattle, WA, USA, 14–19 June 2020; pp. 719–728. [Google Scholar]
- Liang, C.; Zhang, Z.; Zhou, X.; Li, B.; Zhu, S.; Hu, W. Rethinking the competition between detection and ReID in multiobject tracking. IEEE Trans. Image Process. 2022, 31, 3182–3196. [Google Scholar] [CrossRef]
- Liang, T.; Lan, L.; Zhang, X.; Luo, Z. A generic MOT boosting framework by combining cues from SOT, tracklet and re-identification. Knowl. Inf. Syst. 2021, 63, 2109–2127. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, C.-J.; Lin, T.-N. DET: Depth-enhanced tracker to mitigate severe occlusion and homogeneous appearance problems for indoor multiple object tracking. IEEE Access 2022, 10, 8287–8304. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Luiten, J.; Osep, A.; Dendorfer, P.; Torr, P.; Geiger, A.; Leal-Taixé, L.; Leibe, B. Hota: A higher order metric for evaluating multi-object tracking. Int. J. Comput. Vis. 2021, 129, 548–578. [Google Scholar] [CrossRef] [PubMed]
- Meinhardt, T.; Kirillov, A.; Leal-Taixe, L.; Feichtenhofer, C. Trackformer: Multi-object tracking with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8844–8854. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Milan, A.; Leal-Taixé, L.; Reid, I.; Roth, S.; Schindler, K. MOT16: A benchmark for multi-object tracking. arXiv 2016, arXiv:1603.00831. [Google Scholar]
- Okuma, K.; Taleghani, A.; De Freitas, N.; Little, J.J.; Lowe, D.G. A boosted particle filter: Multitarget detection and tracking. In Proceedings of the Computer Vision-ECCV 2004: 8th European Conference on Computer Vision, Prague, Czech Republic, 11–14 May 2004; Proceedings, Part I 8. Springer: Berlin/Heidelberg, Germany, 2004; pp. 28–39. [Google Scholar]
- Pang, J.; Qiu, L.; Li, X.; Chen, H.; Li, Q.; Darrell, T.; Yu, F. Quasi-dense similarity learning for multiple object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 164–173. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Shao, S.; Zhao, Z.; Li, B.; Xiao, T.; Yu, G.; Zhang, X.; Sun, J. Crowdhuman: A benchmark for detecting human in a crowd. arXiv 2018, arXiv:1805.00123. [Google Scholar]
- Sun, P.; Zhang, R.; Jiang, Y.; Kong, T.; Xu, C.; Zhan, W.; Tomizuka, M.; Li, L.; Yuan, Z.; Wang, C.; et al. Sparse r-cnn: End-to-end object detection with learnable proposals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14454–14463. [Google Scholar]
- Sun, P.; Cao, J.; Jiang, Y.; Zhang, R.; Xie, E.; Yuan, Z.; Wang, C.; Luo, P. Transtrack: Multiple object tracking with transformer. arXiv 2020, arXiv:2012.15460. [Google Scholar]
- Sun, Z.; Chen, J.; Mukherjee, M.; Liang, C.; Ruan, W.; Pan, Z. Online multiple object tracking based on fusing global and partial features. Neurocomputing 2022, 470, 190–203. [Google Scholar] [CrossRef]
- Xu, Y.; Ban, Y.; Delorme, G.; Gan, C.; Rus, D.; Alameda-Pineda, X. Transcenter: Transformers with dense queries for multiple object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 7820–7835. [Google Scholar] [CrossRef]
- Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
- Ye, L.; Li, W.; Zheng, L.; Zeng, Y. Lightweight and Deep Appearance Embedding for Multiple Object Tracking. Iet Comput. Vis. 2022, 16, 489–503. [Google Scholar] [CrossRef]
- Yu, E.; Li, Z.; Han, S. Towards discriminative representation: Multi-view trajectory contrastive learning for online multi-object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8834–8843. [Google Scholar]
- Zhang, Y.; Wang, C.; Wang, X.; Zeng, W.; Liu, W. Fairmot: On the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 2021, 129, 3069–3087. [Google Scholar] [CrossRef]
- Zhou, H.; Wu, T.; Sun, K.; Zhang, C. Towards high accuracy pedestrian detection on edge gpus. Sensors 2022, 22, 5980. [Google Scholar] [CrossRef] [PubMed]
- Zhou, X.; Koltun, V.; Krähenbühl, P. Tracking objects as points. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part IV. Springer: Berlin/Heidelberg, Germany, 2020; pp. 474–490. [Google Scholar]
Model | Precision | Recall | mAP0.5 | mAP0.5:0.95 | Parameters | Size/MB | FLOPs/G |
---|---|---|---|---|---|---|---|
YOLOv8n | 0.857 | 0.710 | 0.820 | 0.521 | 3,005,843 | 6.2 | 8.1 |
+SoftNMS | 0.865 | 0.721 | 0.825 | 0.574 | 3,005,843 | 6.2 | 8.1 |
+Ghost | 0.841 | 0.682 | 0.789 | 0.467 | 1,804,031 | 3.9 | 5.2 |
+SoftNMS+Ghost | 0.886 | 0.653 | 0.793 | 0.537 | 1,804,031 | 3.9 | 5.2 |
Detector | MOTA (↑) | MOTP (↑) | IDF1 (↑) | IDSW (↓) | HOTA (↑) | FP (↓) | FN (↓) | AssA (↑) | AssR (↑) |
---|---|---|---|---|---|---|---|---|---|
Pub_yolov8n | 33.599 | 81.432 | 44.349 | 244 | 37.604 | 3076 | 71,246 | 47.195 | 50.919 |
Pvt_yolov8n | 50.529 | 80.127 | 57.377 | 307 | 45.715 | 2222 | 53,026 | 50.391 | 55.168 |
+SN | 50.248 | 80.343 | 57.473 | 297 | 45.808 | 2142 | 53,431 | 50.668 | 55.668 |
+GH | 43.759 | 80.602 | 52.362 | 233 | 41.725 | 1349 | 61,575 | 48.763 | 52.636 |
+SN+GH | 42.746 | 80.347 | 50.948 | 246 | 41.053 | 1567 | 62,481 | 47.964 | 52.268 |
Detector | MOTA (↑) | MOTP (↑) | IDF1 (↑) | IDSW (↓) | HOTA (↑) | FP (↓) | FN (↓) | AssA (↑) | AssR (↑) |
---|---|---|---|---|---|---|---|---|---|
Pub_yolov8n | 39.698 | 81.278 | 48.346 | 689 | 40.639 | 4889 | 62,139 | 45.895 | 50.669 |
Pvt_yolov8n | 56.546 | 79.884 | 62.324 | 594 | 49.462 | 4046 | 44,157 | 52.255 | 57.625 |
+SN | 56.224 | 80.013 | 62.733 | 591 | 49.889 | 4068 | 44,500 | 53.096 | 58.286 |
+GH | 50.911 | 80.119 | 56.216 | 555 | 44.839 | 2516 | 52,054 | 48.285 | 53.351 |
+SN+GH | 49.593 | 79.907 | 56.201 | 490 | 44.675 | 2852 | 53,263 | 48.787 | 54.081 |
Detector | MOTA (↑) | MOTP (↑) | IDF1 (↑) | IDSW (↓) | HOTA (↑) | FP (↓) | FN (↓) | AssA (↑) | AssR (↑) |
---|---|---|---|---|---|---|---|---|---|
Pub_yolov8n | 39.698 | 81.279 | 48.392 | 686 | 40.644 | 4886 | 62,145 | 45.899 | 50.659 |
Pvt_yolov8n | 56.547 | 79.884 | 62.278 | 592 | 49.471 | 4044 | 44,160 | 52.308 | 57.682 |
+SN | 56.215 | 80.015 | 62.823 | 598 | 49.915 | 4068 | 44,503 | 53.136 | 58.324 |
+GH | 50.907 | 80.119 | 56.127 | 557 | 44.794 | 2516 | 52,057 | 48.192 | 53.229 |
+SN+GH | 49.592 | 79.906 | 56.615 | 489 | 44.857 | 2852 | 53,266 | 49.190 | 54.300 |
Detector | MOTA (↑) | MOTP (↑) | IDF1 (↑) | IDSW (↓) | HOTA (↑) | FP (↓) | FN (↓) | AssA (↑) | AssR (↑) |
---|---|---|---|---|---|---|---|---|---|
Pub_yolov8n | 21.400 | 72.685 | 28.313 | 1648 | 20.495 | 7500 | 882,658 | 26.067 | 27.431 |
Pvt_yolov8n | 57.436 | 78.622 | 55.998 | 2944 | 42.254 | 7691 | 472,297 | 40.142 | 43.498 |
+SN | 53.543 | 78.901 | 53.613 | 2799 | 40.490 | 6792 | 517,516 | 39.287 | 42.305 |
+GH | 45.576 | 78.746 | 46.547 | 2799 | 35.491 | 4594 | 610,104 | 35.339 | 37.921 |
+SN+GH | 43.353 | 78.937 | 45.068 | 2652 | 34.334 | 4330 | 635,744 | 34.721 | 37.173 |
Detector | MOTA (↑) | MOTP (↑) | IDF1 (↑) | IDSW (↓) | HOTA (↑) | FP (↓) | FN (↓) | AssA (↑) | AssR (↑) |
---|---|---|---|---|---|---|---|---|---|
Pub_yolov8n | 30.032 | 72.559 | 36.127 | 4359 | 25.419 | 17,586 | 771924 | 28.226 | 30.146 |
Pvt_yolov8n | 64.933 | 79.036 | 64.315 | 4208 | 48.306 | 14,361 | 379,311 | 46.106 | 50.363 |
+SN | 61.077 | 79.323 | 61.731 | 3761 | 46.362 | 11720 | 426,143 | 44.939 | 49.040 |
+GH | 54.595 | 79.115 | 54.547 | 4562 | 41.087 | 9511 | 501,098 | 39.501 | 43.403 |
+SN+GH | 53.357 | 79.259 | 53.765 | 4287 | 40.570 | 9237 | 515,697 | 39.257 | 42.938 |
Detector | MOTA (↑) | MOTP (↑) | IDF1 (↑) | IDSW (↓) | HOTA (↑) | FP (↓) | FN (↓) | AssA (↑) | AssR (↑) |
---|---|---|---|---|---|---|---|---|---|
Pub_yolov8n | 30.056 | 72.555 | 36.606 | 4250 | 25.625 | 17658 | 771,688 | 28.648 | 30.745 |
Pvt_yolov8n | 64.933 | 79.035 | 64.322 | 4190 | 48.326 | 14352 | 379,334 | 46.136 | 50.338 |
+SN | 61.077 | 79.323 | 61.824 | 3757 | 46.400 | 11711 | 426,160 | 45.003 | 49.029 |
+GH | 54.595 | 79.115 | 54.557 | 4544 | 41.094 | 9508 | 501,120 | 39.506 | 43.365 |
+SN+GH | 53.358 | 79.259 | 53.788 | 4269 | 40.632 | 9229 | 515,708 | 39.372 | 43.015 |
Method | MOTA (↑) | MOTP (↑) | IDF1 (↑) | IDSW (↓) |
---|---|---|---|---|
CCC(2018) [28] | 51.200 | / | / | 1851 |
GN(2020) [29] | 50.200 | / | 47.000 | 5273 |
BLLSTM(2021) [30] | 51.500 | / | 54.900 | 2566 |
STN(2021) [31] | 50.000 | 76.300 | 51.000 | 3312 |
DET(2022) [32] | 43.210 | / | 51.910 | 799 |
Pub_yolov8n | 39.398 | 81.278 | 48.346 | 689 |
Ours | 56.215 | 80.015 | 62.823 | 598 |
Method | MOTA (↑) | MOTP (↑) | IDF1 (↑) | IDSW (↓) |
---|---|---|---|---|
FairMOT(2021) [33] | 61.800 | / | 67.300 | 5243 |
TransCenter(2021) [34] | 62.300 | 79.900 | 50.300 | 4545 |
DET(2022) [32] | 57.700 | / | 48.900 | 7303 |
MTrack(2022) [35] | 63.500 | / | 69.200 | 6031 |
LADE(2022) [36] | 54.900 | 79.100 | 59.100 | 1630 |
Pub_yolov8n | 30.032 | 72.559 | 36.127 | 4359 |
Ours | 64.933 | 79.035 | 64.322 | 4190 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiao, X.; Feng, X. Multi-Object Pedestrian Tracking Using Improved YOLOv8 and OC-SORT. Sensors 2023, 23, 8439. https://doi.org/10.3390/s23208439
Xiao X, Feng X. Multi-Object Pedestrian Tracking Using Improved YOLOv8 and OC-SORT. Sensors. 2023; 23(20):8439. https://doi.org/10.3390/s23208439
Chicago/Turabian StyleXiao, Xin, and Xinlong Feng. 2023. "Multi-Object Pedestrian Tracking Using Improved YOLOv8 and OC-SORT" Sensors 23, no. 20: 8439. https://doi.org/10.3390/s23208439