Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Adaptive Affinity for Associations in Multi-Target Multi-Camera Tracking

Published: 01 January 2022 Publication History

Abstract

Data associations in multi-target multi-camera tracking (MTMCT) usually estimate affinity directly from re-identification (re-ID) feature distances. However, we argue that it might not be the best choice given the difference in matching scopes between re-ID and MTMCT problems. Re-ID systems focus on <italic>global matching</italic>, which retrieves targets from all cameras and all times. In contrast, data association in tracking is a <italic>local matching</italic> problem, since its candidates only come from neighboring locations and time frames. In this paper, we design experiments to verify such misfit between <italic>global</italic> re-ID feature distances and <italic>local</italic> matching in tracking, and propose a simple yet effective approach to adapt affinity estimations to corresponding matching scopes in MTMCT. Instead of trying to deal with all appearance changes, we tailor the affinity metric to specialize in ones that might emerge during data associations. To this end, we introduce a new data sampling scheme with temporal windows originally used for data associations in tracking. Minimizing the mismatch, the adaptive affinity module brings significant improvements over global re-ID distance, and produces competitive performance on CityFlow and DukeMTMC datasets.

References

[1]
E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, “Performance measures and a data set for multi-target, multi-camera tracking,” in Proc. Eur. Conf. Comput. Vis. Berlin, Germany: Springer, 2016, pp. 17–35.
[2]
Z. Tanget al., “CityFlow: A city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 8797–8806.
[3]
M. Andriluka, S. Roth, and B. Schiele, “People-tracking-by-detection and people-detection-by-tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2008, pp. 1–8.
[4]
R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1440–1448.
[5]
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 91–99.
[6]
W. Liuet al., “SSD: Single shot MultiBox detector,” in Proc. Eur. Conf. Comput. Vis. Berlin, Germany: Springer, 2016, pp. 21–37.
[7]
P.-O. Fjällström, “Algorithms for graph partitioning: A survey,” Linköping Univ. Electronic Press, Linköping, Sweden, Tech. Rep., 1998, vol. 3.
[8]
A. Milan, L. Leal-Taixé, I. Reid, S. Roth, and K. Schindler, “MOT16: A benchmark for multi-object tracking,” 2016, arXiv:1603.00831.
[9]
L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1116–1124.
[10]
F. Yu, W. Li, Q. Li, Y. Liu, X. Shi, and J. Yan, “Poi: Multiple object tracking with high performance detection and appearance feature,” in Proc. Eur. Conf. Comput. Vis. Berlin, Germany: Springer, 2016, pp. 36–42.
[11]
S. Tang, M. Andriluka, B. Andres, and B. Schiele, “Multiple people tracking by lifted multicut and person re-identification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 3539–3548.
[12]
E. Ristani and C. Tomasi, “Features for multi-target multi-camera tracking and re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 6036–6046.
[13]
J. Bromleyet al., “Signature verification using a ‘Siamese’ time delay neural network,” Int. J. Pattern Recognit. Artif. Intell., vol. 7, no. 4, pp. 669–688, 1993.
[14]
L. Zheng, Y. Yang, and A. G. Hauptmann, “Person re-identification: Past, present and future,” 2016, arXiv:1610.02984.
[15]
Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, “Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline),” in Proc. ECCV, 2018, pp. 480–496.
[16]
A. Hermans, L. Beyer, and B. Leibe, “In defense of the triplet loss for person re-identification,” 2017, arXiv:1703.07737.
[17]
L. Leal-Taixé, A. Milan, I. Reid, S. Roth, and K. Schindler, “MOTChallenge 2015: Towards a benchmark for multi-target tracking,” 2015, arXiv:1504.01942.
[18]
L. Leal-Taixé, A. Milan, K. Schindler, D. Cremers, I. Reid, and S. Roth, “Tracking the trackers: An analysis of the state of the art in multiple object tracking,” 2017, arXiv:1704.02781.
[19]
C. Kim, F. Li, A. Ciptadi, and J. M. Rehg, “Multiple hypothesis tracking revisited,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 4696–4704.
[20]
M. Yang, Y. Wu, and Y. Jia, “A hybrid data association framework for robust online multi-object tracking,” IEEE Trans. Image Process., vol. 26, no. 12, pp. 5667–5679, Dec. 2017.
[21]
S. Zhanget al., “Tracking persons-of-interest via adaptive discriminative features,” in Proc. Eur. Conf. Comput. Vis. Berlin, Germany: Springer, 2016, pp. 415–433.
[22]
J. Son, M. Baek, M. Cho, and B. Han, “Multi-object tracking with quadruplet convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 5620–5629.
[23]
L. Fagot-Bouquet, R. Audigier, Y. Dhome, and F. Lerasle, “Improving multi-frame data association with sparse representations for robust near-online multi-object tracking,” in Proc. Eur. Conf. Comput. Vis. Berlin, Germany: Springer, 2016, pp. 774–790.
[24]
W. Choi, “Near-online multi-target tracking with aggregated local flow descriptor,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 3029–3037.
[25]
J. Berclaz, F. Fleuret, E. Türetken, and P. Fua, “Multiple object tracking using k-shortest paths optimization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 9, pp. 1806–1819, Sep. 2011.
[26]
A. Dehghan, Y. Tian, P. H. S. Torr, and M. Shah, “Target identity-aware network flow for online multiple target tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 1146–1154.
[27]
W. Brendel, M. Amer, and S. Todorovic, “Multiobject tracking as maximum weight independent set,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2011, pp. 1273–1280.
[28]
Y. Cai and G. Medioni, “Exploring context information for inter-camera multiple target tracking,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Mar. 2014, pp. 761–768.
[29]
B. Wang, G. Wang, K. L. Chan, and L. Wang, “Tracklet association with online target-specific metric learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 1234–1241.
[30]
S.-I. Yu, D. Meng, W. Zuo, and A. Hauptmann, “The solution path algorithm for identity-aware multi-object tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 3871–3879.
[31]
V. K. Singh, B. Wu, and R. Nevatia, “Pedestrian tracking by associating tracklets using detection residuals,” in Proc. IEEE Workshop Motion Video Comput. (WMVC), Jan. 2008, pp. 1–8.
[32]
H. B. Shitrit, J. Berclaz, F. Fleuret, and P. Fua, “Multi-commodity network flow for tracking multiple people,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 8, pp. 1614–1627, Aug. 2014.
[33]
A. Sadeghian, A. Alahi, and S. Savarese, “Tracking the untrackable: Learning to track multiple cues with long-term dependencies,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 300–311.
[34]
T. D’Orazio and C. Guaragnella, “A survey of automatic event detection in multi-camera third generation surveillance systems,” Int. J. Pattern Recognit. Artif. Intell., vol. 29, no. 1, Feb. 2015, Art. no.
[35]
S. F. Tahir and A. Cavallaro, “Low-cost multi-camera object matching,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2014, pp. 6869–6873.
[36]
M. Ferecatu and H. Sahbi, “Multi-view object matching and tracking using canonical correlation analysis,” in Proc. 16th IEEE Int. Conf. Image Process. (ICIP), Nov. 2009, pp. 2109–2112.
[37]
G. Di Caterina, T. Doshi, J. J. Soraghan, and L. Petropoulakis, “A novel decentralised system architecture for multi-camera target tracking,” in Proc. Int. Conf. Adv. Concepts Intell. Vis. Syst. Berlin, Germany: Springer, 2016, pp. 92–104.
[38]
J. Black and T. Ellis, “Multi camera image tracking,” Image Vis. Comput., vol. 24, no. 11, pp. 1256–1267, Nov. 2006.
[39]
Y. T. Tesfaye, E. Zemene, A. Prati, M. Pelillo, and M. Shah, “Multi-target tracking in multiple non-overlapping cameras using fast-constrained dominant sets,” Int. J. Comput. Vis., vol. 127, no. 9, pp. 1303–1320, Sep. 2019.
[40]
A. Maksai, X. Wang, F. Fleuret, and P. Fua, “Non-Markovian globally consistent multi-object tracking,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2563–2573.
[41]
K. Yoon, Y. Song, and M. Jeon, “Multiple hypothesis tracking algorithm for multi-target multi-camera tracking with disjoint views,” IET Image Process., vol. 12, no. 7, pp. 1175–1184, Jul. 2018.
[42]
Z. Zhang, J. Wu, X. Zhang, and C. Zhang, “Multi-target, multi-camera tracking by hierarchical clustering: Recent progress on DukeMTMC project,” 2017, arXiv:1712.09531.
[43]
N. Jiang, S. Bai, Y. Xu, C. Xing, Z. Zhou, and W. Wu, “Online inter-camera trajectory association exploiting person re-identification and camera topology,” in Proc. 26th ACM Int. Conf. Multimedia, Oct. 2018, pp. 1457–1465.
[44]
Z. Tang, G. Wang, H. Xiao, A. Zheng, and J.-N. Hwang, “Single-camera and inter-camera vehicle tracking and 3D speed estimation based on fusion of visual and semantic features,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2018, pp. 108–115.
[45]
S. C. Turaga, K. L. Briggman, M. Helmstaedter, W. Denk, and H. S. Seung, “Maximin affinity learning of image segmentation,” 2009, arXiv:0911.5372.
[46]
H. Sahbi, “Learning CCA representations for misaligned data,” in Proc. Eur. Conf. Comput. Vis. (ECCV) Workshops, 2018, pp. 1–17.
[47]
X. Yang, L. Prasad, and L. J. Latecki, “Affinity learning with diffusion on tensor product graph,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 28–38, Jan. 2013.
[48]
X. Wang, Y. Tang, S. Masnou, and L. Chen, “A global/local affinity graph for image segmentation,” IEEE Trans. Image Process., vol. 24, no. 4, pp. 1399–1411, Apr. 2015.
[49]
Y. Zhou, X. Bai, W. Liu, and L. J. Latecki, “Similarity fusion for visual tracking,” Int. J. Comput. Vis., vol. 118, no. 3, pp. 337–363, Jul. 2016.
[50]
Z. Wang, J. Jiang, Y. Wu, M. Ye, X. Bai, and S. Satoh, “Learning sparse and identity-preserved hidden attributes for person re-identification,” IEEE Trans. Image Process., vol. 29, pp. 2013–2025, 2020.
[51]
M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. H. Hoi, “Deep learning for person re-identification: A survey and outlook,” IEEE Trans. Pattern Anal. Mach. Intell., early access, Jan. 26, 2021. 10.1109/TPAMI.2021.3054775.
[52]
R. R. Varior, M. Haloi, and G. Wang, “Gated Siamese convolutional neural network architecture for human re-identification,” in Proc. Eur. Conf. Comput. Vis. Berlin, Germany: Springer, 2016, pp. 791–808.
[53]
F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified embedding for face recognition and clustering,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 815–823.
[54]
D. Cheng, Y. Gong, S. Zhou, J. Wang, and N. Zheng, “Person re-identification by multi-channel parts-based CNN with improved triplet loss function,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 1335–1344.
[55]
H. Liu, J. Feng, M. Qi, J. Jiang, and S. Yan, “End-to-end comparative attention networks for person re-identification,” IEEE Trans. Image Process., vol. 26, no. 7, pp. 3492–3506, Jul. 2017.
[56]
Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, “Random erasing data augmentation,” 2017, arXiv:1708.04896.
[57]
Z. Wanget al., “Orientation invariant feature embedding and spatial temporal regularization for vehicle re-identification,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 379–387.
[58]
Y. Zhou and L. Shao, “Aware attentive multi-view inference for vehicle re-identification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 6489–6498.
[59]
L. Zhenget al., “Mars: A video benchmark for large-scale person re-identification,” in Proc. Eur. Conf. Comput. Vis. Berlin, Germany: Springer, 2016, pp. 868–884.
[60]
K. Liu, B. Ma, W. Zhang, and R. Huang, “A spatio-temporal appearance representation for video-based pedestrian re-identification,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 3810–3818.
[61]
S. Li, S. Bak, P. Carr, and X. Wang, “Diversity regularized spatiotemporal attention for video-based person re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 369–378.
[62]
Y. Liu, Z. Yuan, W. Zhou, and H. Li, “Spatial and temporal mutual promotion for video-based person re-identification,” in Proc. AAAI Conf. Artif. Intell., vol. 33, 2019, pp. 8786–8793.
[63]
G. Chen, J. Lu, M. Yang, and J. Zhou, “Spatial-temporal attention-aware learning for video-based person re-identification,” IEEE Trans. Image Process., vol. 28, no. 9, pp. 4192–4205, Sep. 2019.
[64]
T. Zhang, L. Xie, L. Wei, Y. Zhang, B. Li, and Q. Tian, “Single camera training for person re-identification,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 12878–12885.
[65]
C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 1321–1330.
[66]
K. Bernardin and R. Stiefelhagen, “Evaluating multiple object tracking performance: The clear MOT metrics,” EURASIP J. Image Video Process., vol. 2008, pp. 1–10, Feb. 2008.
[67]
Z. Wang, L. Zheng, Y. Liu, and S. Wang, “Towards real-time multi-object tracking,” in Proc. ECCV, 2020, pp. 107–122.
[68]
Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, “OpenPose: Realtime multi-person 2D pose estimation using part affinity fields,” 2018, arXiv:1812.08008.
[69]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.
[70]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2009, pp. 248–255.
[71]
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4700–4708.
[72]
I. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent with warm restarts,” in Proc. 5th Int. Conf. Learn. Represent. (ICLR), Toulon, France, Apr. 2017, pp. 1–16. [Online]. Available: https://openreview.net/forum?id=Skq89Scxx
[73]
X. Tanet al., “Multi-camera vehicle tracking and re-identification based on visual and spatial-temporal features,” in Proc. CVPR Workshops, 2019, pp. 275–284.
[74]
Z. Cai and N. Vasconcelos, “Cascade R-CNN: Delving into high quality object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 6154–6162.
[75]
Y. Hou, H. Du, and L. Zheng, “A locality aware city-scale multi-camera vehicle tracking system,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, Jun. 2019, pp. 167–174.
[76]
Z. He, Y. Lei, S. Bai, and W. Wu, “Multi-camera vehicle tracking with powerful visual features and spatial-temporal cue,” in Proc. CVPR Workshops, 2019, pp. 203–212.
[77]
T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2117–2125.
[78]
P. Liet al., “Spatio-temporal consistency and hierarchical matching for multi-target multi-camera vehicle tracking,” in Proc. CVPR Workshops, 2019, pp. 222–230.
[79]
H.-M. Hsu, T.-W. Huang, G. Wang, J. Cai, Z. Lei, and J.-N. Hwang, “Multi-camera tracking of vehicles based on deep features Re-ID and trajectory-based camera link models,” in Proc. CVPR Workshops, 2019, pp. 416–424.
[80]
K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2961–2969.
[81]
P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1627–1645, Sep. 2010.
[82]
P. Li, J. Zhang, Z. Zhu, Y. Li, L. Jiang, and G. Huang, “State-aware re-identification feature for multi-target multi-camera tracking,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2019, pp. 1–11.
[83]
V. Belagiannis, S. Amin, M. Andriluka, B. Schiele, N. Navab, and S. Ilic, “3D pictorial structures for multiple human pose estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 1669–1676.
[84]
J. Dong, W. Jiang, Q. Huang, H. Bao, and X. Zhou, “Fast and robust multi-person 3D pose estimation from multiple views,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 7792–7801.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing
IEEE Transactions on Image Processing  Volume 31, Issue
2022
3518 pages

Publisher

IEEE Press

Publication History

Published: 01 January 2022

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media