Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Track initialization and re-identification for 3D multi-view multi-object tracking

Published: 25 September 2024 Publication History

Abstract

We propose a 3D multi-object tracking (MOT) solution using only 2D detections from monocular cameras, which automatically initiates/terminates tracks as well as resolves track appearance–reappearance and occlusions. Moreover, this approach does not require detector retraining when cameras are reconfigured but only the camera matrices of reconfigured cameras need to be updated. Our approach is based on a Bayesian multi-object formulation that integrates track initiation/termination, re-identification, occlusion handling, and data association into a single Bayes filtering recursion. However, the exact filter that utilizes all these functionalities is numerically intractable due to the exponentially growing number of terms in the (multi-object) filtering density, while existing approximations trade-off some of these functionalities for speed. To this end, we develop a more efficient approximation suitable for online MOT by incorporating object features and kinematics into the measurement model, which improves data association and subsequently reduces the number of terms. Specifically, we exploit the 2D detections and extracted features from multiple cameras to provide a better approximation of the multi-object filtering density to realize the track initiation/termination and re-identification functionalities. Further, incorporating a tractable geometric occlusion model based on 2D projections of 3D objects on the camera planes realizes the occlusion handling functionality of the filter. Evaluation of the proposed solution on challenging datasets demonstrates significant improvements and robustness when camera configurations change on-the-fly, compared to existing multi-view MOT solutions.

Highlights

Novel 3D multi-object tracking models with re-identification features.
A filter that performs 3D tracking by fusing multi-view 2D camera detections.
Our Method automatically initializes/terminates, re-identifies, and handles occlusion.
An efficient filter with linear complexity in the number of detections.
Extensive experiments to evaluate the performance on challenging benchmark. datasets.

References

[1]
Thomaidis G., Tsogas M., Lytrivis P., Karaseitanidis I., Amditis A.J., Multiple hypothesis tracking for data association in vehicular networks, Inf. Fusion 14 (4) (2013) 374–383.
[2]
Blackman S., Populi R., Design and Analysis of Modern Tracking Systems(book), Artech House, Norwood, MA, 1999, 1999.
[3]
Ristic B., Beard M., Fantacci C., An overview of particle methods for random finite set models, Inf. Fusion 31 (2016) 110–126.
[4]
Wojke N., Bewley A., Paulus D., Simple online and realtime tracking with a deep association metric, IEEE Int. Conf. Image Process. (2017) 3645–3649.
[5]
Bochinski E., Eiselein V., Sikora T., High-speed tracking-by-detection without using image information, in: IEEE Int. Conf. Adv. Video Signal-Based Surveill., IEEE, 2017, pp. 1–6.
[6]
Kim D.Y., Vo B.-N., Vo B.-T., Jeon M., A labeled random finite set online multi-object tracker for video data, Pattern Recognit. 90 (2019) 377–389.
[7]
Nguyen T.T.D., Vo B.-N., Vo B.-T., Kim D.Y., Choi Y.S., Tracking cells and their lineages via labeled random finite sets, IEEE Trans. Signal Process. 69 (2021) 5611–5626.
[8]
Liang C., Zhang Z., Lu Y., Zhou X., Li B., Ye X., Zou J., Rethinking the competition between detection and ReID in multiobject tracking, IEEE Trans. Image Process. 31 (2022) 3182–3196.
[9]
Zhang Y., Wang C., Wang X., Zeng W., Liu W., FairMOT: On the fairness of detection and re-identification in multiple object tracking, Int. J. Comput. Vis. 129 (11) (2021) 3069–3087.
[10]
Bridgeman L., Volino M., Guillemaut J.-Y., Hilton A., Multi-person 3D pose estimation and tracking in sports, in: IEEE Conf. Comput. Vis. Pattern Recog. Workshops, CVPRW, 2019, pp. 2487–2496,.
[11]
Bradler H., Kretz A., Mester R., Urban traffic surveillance (UTS): A fully probabilistic 3D tracking approach based on 2D detections, in: 2021 IEEE Intelligent Vehicles Symposium, IV, 2021, pp. 1198–1205,.
[12]
Chavdarova T., Fleuret F., Deep multi-camera people detection, in: IEEE Int. Conf. Mach. Learning and Appl., ICMLA, IEEE, 2017, pp. 848–853.
[13]
T. Chavdarova, P. Baqué, S. Bouquet, A. Maksai, C. Jose, T.M. Bagautdinov, L. Lettry, P.V. Fua, L.V. Gool, F. Fleuret, WILDTRACK: A Multi-camera HD Dataset for Dense Unscripted Pedestrian Detection, in: IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 5030–5039.
[14]
Ning X., Yu Z., Li L., Li W., Tiwari P., DILF: Differentiable rendering-based multi-view image–language fusion for zero-shot 3D shape understanding, Inf. Fusion 102 (2024).
[15]
Lupión M., Polo-Rodríguez A., Medina-Quero J., Sanjuan J.F., Ortigosa P.M., 3D human pose estimation from multi-view thermal vision sensors, Inf. Fusion 104 (2024).
[16]
P. Baqué, F. Fleuret, P.V. Fua, Deep occlusion reasoning for multi-camera multi-target detection, in: Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 271–279.
[17]
Ong J., Vo B.-T., Vo B.-N., Kim D.Y., Nordholm S.E., A Bayesian filter for multi-view 3D multi-object tracking with occlusion handling, IEEE Trans. Pattern Anal. Mach. Intell. 44 (2020) 2246–2263.
[18]
Betke M., Makris N.C., Fast object recognition in noisy images using simulated annealing, in: Proc. IEEE Int. Conf. Comput. Vis., IEEE, 1995, pp. 523–530.
[19]
Viola P.A., Jones M.J., Snow D., Detecting pedestrians using patterns of motion and appearance, Int. J. Comput. Vis. 63 (2005) 153–161.
[20]
Lowe D.G., Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis. 60 (2004) 91–110.
[21]
Dalal N., Triggs B., Histograms of oriented gradients for human detection, in: IEEE Conf. Comput. Vis. Pattern Recog., Vol. 1, IEEE, 2005, pp. 886–893.
[22]
R.B. Girshick, J. Donahue, T. Darrell, J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, in: IEEE Conf. Comput. Vis. Pattern Recog., 2014, pp. 580–587.
[23]
Zitnick C.L., Dollár P., Edge boxes: Locating object proposals from edges, in: Eur. Conf. Comput. Vis., Springer, 2014, pp. 391–405.
[24]
Ren S., He K., Girshick R.B., Sun J., Faster R-CNN: Towards real-time object detection with region proposal networks, Adv. Neural Inf. Process. Syst. 28 (2015).
[25]
Redmon J., Farhadi A., YOLOv3: An incremental improvement, 2018, arXiv abs/1804.02767.
[26]
Carion N., Massa F., Synnaeve G., Usunier N., Kirillov A., Zagoruyko S., End-to-end object detection with transformers, in: Eur. Conf. Comput. Vis., Springer, 2020, pp. 213–229.
[27]
Lin T.-Y., Maire M., Belongie S.J., Hays J., Perona P., Ramanan D., Dollár P., Zitnick C.L., Microsoft COCO: Common objects in context, in: Eur. Conf. Comput. Vis., Springer, 2014, pp. 740–755.
[28]
Russakovsky O., Deng J., Su H., Krause J., Satheesh S., Ma S., Huang Z., Karpathy A., Khosla A., Bernstein M.S., Berg A.C., Fei-Fei L., ImageNet large scale visual recognition challenge, Int. J. Comput. Vis. 115 (2015) 211–252.
[29]
Fleuret F., Berclaz J., Lengagne R., Fua P.V., Multicamera people tracking with a probabilistic occupancy map, IEEE Trans. Pattern Anal. Mach. Intell. 30 (2) (2007) 267–282.
[30]
Peng P., Tian Y., Wang Y., Li J., Huang T., Robust multiple cameras pedestrian detection with multi-view Bayesian network, Pattern Recognit. 48 (5) (2015) 1760–1772.
[31]
Ge W., Collins R.T., Crowd detection with a multiview sampler, in: Eur. Conf. Comput. Vis., Springer, 2010, pp. 324–337.
[32]
Hou Y., Zheng L., Gould S., Multiview detection with feature perspective transformation, in: Eur. Conf. Comput. Vis., Springer, 2020, pp. 1–18.
[33]
Q. Zhang, W. Lin, A.B. Chan, Cross-View Cross-Scene Multi-View Crowd Counting, in: IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 557–567.
[34]
L. Song, J. Wu, M. Yang, Q. Zhang, Y. Li, J. Yuan, Stacked Homography Transformations for Multi-View Pedestrian Detection, in: Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 6049–6057.
[35]
Chen L., Ai H., Zhuang Z., Shang C., Real-time multiple people tracking with deeply learned candidate selection and person re-identification, in: IEEE Int. Conf. Multimedia Expo, IEEE, 2018, pp. 1–6.
[36]
Wang Z., Zheng L., Liu Y., Wang S., Towards real-time multi-object tracking, in: Eur. Conf. Comput. Vis., Springer, 2020, pp. 107–122.
[37]
Yu F., Li W., Li Q., Liu Y., Shi X., Yan J., POI: Multiple object tracking with high performance detection and appearance feature, in: Eur. Conf. Comput. Vis., Springer, 2016, pp. 36–42.
[38]
Wojke N., Bewley A., Deep cosine metric learning for person re-identification, in: IEEE Winter Conf. Appl. Comput. Vis., WACV, IEEE, 2018, pp. 748–756,.
[39]
Wang Y., Kitani K., Weng X., Joint object detection and multi-object tracking with graph neural networks, in: IEEE Int. Conf. Robot. Autom., IEEE, 2021, pp. 13708–13715.
[40]
Khan S.M., Shah M., A multiview approach to tracking people in crowded scenes using a planar homography constraint, in: Eur. Conf. Comput. Vis., Springer, 2006, pp. 133–146.
[41]
Eshel R., Moses Y., Homography based multiple camera detection and tracking of people in a dense crowd, in: IEEE Conf. Comput. Vis. Pattern Recog., IEEE, 2008, pp. 1–8.
[42]
Hu W., Hu M., Zhou X., Tan T., Lou J., Maybank S.J., Principal axis-based correspondence between multiple cameras for people tracking, IEEE Trans. Pattern Anal. Mach. Intell. 28 (4) (2006) 663–671.
[43]
Y. Xu, X. Liu, L. Qin, S.-C. Zhu, Cross-View People Tracking by Scene-Centered Spatio-Temporal Parsing, in: Proc. AAAI Conf. Artif. Intell., Vol. 31, 2017.
[44]
Y. Xu, X. Liu, Y. Liu, S.-C. Zhu, Multi-view people tracking via hierarchical trajectory composition, in: IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 4256–4265.
[45]
T. Zhang, X. Chen, Y. Wang, Y. Wang, H. Zhao, MUTR3D: A multi-camera tracking framework via 3D-to-2D queries, in: IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 4537–4546.
[46]
Z. Pang, J. Li, P. Tokmakov, D. Chen, S. Zagoruyko, Y.-X. Wang, Standing between past and future: Spatio-temporal modeling for multi-camera 3D multi-object tracking, in: IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 17928–17938.
[47]
Ouyang W., Zhou H., Li H., Li Q., Yan J., Wang X., Jointly learning deep features, deformable parts, occlusion and classification for pedestrian detection, IEEE Trans. Pattern Anal. Mach. Intell. 40 (8) (2018) 1874–1887.
[48]
Ma Y., Chen Q., Depth assisted occlusion handling in video object tracking, in: Int. Symp. Vis. Comput., Springer, 2010, pp. 449–460.
[49]
D. Stadler, J. Beyerer, Improving multiple pedestrian tracking by track management and occlusion handling, in: IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 10958–10967.
[50]
X. Yuan, A. Kortylewski, Y. Sun, A. Yuille, Robust instance segmentation through reasoning about multi-object occlusion, in: IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 11141–11150.
[51]
Vo B.-T., Vo B.-N., Labeled random finite sets and multi-object conjugate priors, IEEE Trans. Signal Process. 61 (13) (2013) 3460–3475.
[52]
Vo B.-N., Vo B.-T., Beard M., Multi-sensor multi-object tracking with the generalized labeled multi-Bernoulli filter, IEEE Trans. Signal Process. 67 (23) (2019) 5952–5967.
[53]
A. Wang, Y. Sun, A. Kortylewski, A.L. Yuille, Robust Object Detection Under Occlusion With Context-Aware CompositionalNets, in: IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 12645–12654.
[54]
Trezza A., Bucci D.J., Varshney P.K., Multi-sensor joint adaptive birth sampler for labeled random finite set tracking, IEEE Trans. Signal Process. 70 (2022) 1010–1025.
[55]
Zhang Z., A flexible new technique for camera calibration, IEEE Trans. Pattern Anal. Mach. Intell. 22 (11) (2000) 1330–1334.
[56]
Do C.-T., Nguyen T.T.D., Nguyen H.V., Robust multi-sensor generalized labeled multi-Bernoulli filter, Signal Process. 192 (2022).
[57]
Mahler R., Statistical Multisource-Multitarget Information Fusion, Artech House Norwood, MA, USA, 2007.
[58]
Vo B.-N., Vo B.-T., Nguyen T.T.D., Shim C., Multi-object estimation beyond the probability hypothesis density filter, 2023, arXiv.
[59]
Ishtiaq N., Gostar A.K., Bab-Hadiashar A., Hoseinnezhad R., Interaction-aware labeled multi-Bernoulli filter, IEEE Trans. Intell. Transp. Syst. (2023).
[60]
Beard M., Vo B.-T., Vo B.-N., Bayesian multi-target tracking with merged measurements using labelled random finite sets, IEEE Trans. Signal Process. 63 (6) (2015) 1433–1447,.
[61]
Vo B.-N., Vo B.-T., Phung D., Labeled random finite sets and the Bayes multi-target tracking filter, IEEE Trans. Signal Process. 62 (24) (2014) 6554–6567.
[62]
Vo B.-N., Vo B.-T., Hoang H.G., An efficient implementation of the generalized labeled multi-Bernoulli filter, IEEE Trans. Signal Process. 65 (8) (2017) 1975–1987.
[63]
Ristani E., Solera F., Zou R., Cucchiara R., Tomasi C., Performance measures and a data set for multi-target, multi-camera tracking, in: Eur. Conf. Comput. Vis., Springer, 2016, pp. 17–35.
[64]
Bernardin K., Stiefelhagen R., Evaluating multiple object tracking performance: The CLEAR MOT metrics, EURASIP J. Image Video Process. 2008 (2008) 1–10.
[65]
Beard M., Vo B.-T., Vo B.-N., A solution for large-scale multi-object tracking, IEEE Trans. Signal Process. 68 (2020) 2754–2769.
[66]
Nguyen T.T.D., Rezatofighi H., Vo B.-N., Vo B.-T., Savarese S., Reid I., How trustworthy are the existing performance evaluations for basic vision tasks?, IEEE Trans. Pattern Anal. Mach. Intell. (ISSN ) 45 (07) (2023) 8538–8552.
[67]
R. Qiu, M. Xu, Y. Yan, J.S. Smith, X. Yang, 3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera Pedestrian Localization, in: Eur. Conf. Comput. Vis., 2022.

Index Terms

  1. Track initialization and re-identification for 3D multi-view multi-object tracking
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Information Fusion
          Information Fusion  Volume 111, Issue C
          Nov 2024
          518 pages

          Publisher

          Elsevier Science Publishers B. V.

          Netherlands

          Publication History

          Published: 25 September 2024

          Author Tags

          1. Multi-view
          2. Multi-sensor
          3. Multi-object visual tracking
          4. Occlusion handling
          5. Generalized labeled multi-Bernoulli
          6. Re-identification
          7. Adaptive birth

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 09 Jan 2025

          Other Metrics

          Citations

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media