Abstract
Matching video clips of people across non-overlapping surveillance cameras (video-based person re-identification) is of significant importance in many real-world applications. In this paper, we address the video-based person re-identification by developing a Local and Global Aligned Spatiotemporal Attention (LGASA) network. Our LGASA network consists of five cascaded modules, including 3D convolutional layers, residual block, spatial transformer network (STN), multi-stream recurrent network and multiple-attention module. Specifically, the 3D convolutional layers are used to capture local short-term fast-varying motion information encoded in multiple adjacent original frames. The residual block is used to extract mid-level feature maps. STN is applied to align the mid-level feature maps. The multi-stream recurrent network is designed to exploit the useful local and global long-term temporal dependency from the aligned mid-level feature maps. The multiple-attention module is designed to aggregate feature vectors of the same body part (or global) from different frames within each video into a single vector according to their importance. Experimental results on three video pedestrian datasets verify the effectiveness of the proposed local and global aligned spatiotemporal attention network.
Similar content being viewed by others
Notes
ReduceLROnPlateau is a scheduler function provided by Pytorch in https://pytorch.org/docs/stable/optim.html
References
Alexander H, Lucas B, Bastian L (2017) In defense of the triplet loss for person re-identification. arXiv:1703.07737
Alexander K, Marcin M, Cordelia S (2008) A spatio-temporal descriptor based on 3d-gradients. In: Conference on BMVC, pp 1–10
Ashish V, Noam S, Niki P, Jakob U, Llion J, Gomez AN, Lukasz K, Illia P (2017) Attention is all you need. In: Conference on NIPS, pp 6000–6010
Bazzani L, Cristani M, Perina A, Farenzena M, Murino V (2010) Multiple-shot person re-identification by hpe signature. In: IEEE Conference on CPR. IEEE, pp 1413–1416
Bazzani L, Cristani M, Perina A, Murino V (2012) Multiple-shot person re-identification by chromatic and epitomic analyses. Pattern Recogn Lett 33(7):898–903
Bhaswati S, Sai K, Jayanta R, Aditi M, Anchit RN (2018) Video based person re-identification by re-ranking attentive temporal information in deep recurrent convolutional networks. In: IEEE Conference on ICIP, pp 1663–1667
Bryan James P, Wei-Shi Z, Shaogang G, Tao X (2010) Person re-identification by support vector ranking. In: Conference on BMVC, pp 1–11
Chen L, Yang H, Zhu J, Zhou Q, Wu S, Gao Z (2017) Deep spatial-temporal fusion network for video-based person re-identification. In: IEEE Conference on CVPR Workshops, pp 478–1485
Chen YC, Zhu X, Zheng WS, Lai JH (2018) Person re-identification by camera correlation aware feature augmentation. IEEE Trans Pattern Anal Mach Intell 40(2):392–408
Cheng D, Gong Y, Zhou S, Wang J, Zheng N (2016) Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In: IEEE Conference on CVPR, pp 1335–1344
Chi S, Jianing L, Shiliang Z, Junliang X, Wen G, Qi T (2017) Pose-driven deep convolutional model for person re-identification. In: IEEE Conference on ICCV, pp 3980–3989
Chung D, Tahboub K, Delp EJ (2017) A two stream siamese convolutional neural network for person re-identification. In: IEEE Conference on ICCV
Chunxiao L, Shaogang G, Chen CL, Xinggang L (2012) Person re-identification: what features are important?. In: ECCV Workshops, pp 391–401
Chen D, Zheng-Jun Z, Jiawei L, Hongtao X, Yongdong Z (2018) Temporal-contextual attention network for video-based person re-identification. In: Advances in multimedia information processing - PCM, pp 146–157
Dangwei L, Xiaotang C, Zhang Z, Kaiqi H (2017) Learning deep context-aware features over body and latent parts for person re-identification. In: IEEE Conference on CVPR, pp 7398–7407
De C, Yihong G, Sanping Z, Jinjun W, Nanning Z (2016) Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In: IEEE Conference on CVPR, pp 1335–1344
Dehghan A, Modiri Assari S, Shah M (2015) Gmmcp yracker: globally optimal generalized maximum multi clique problem for multiple object tracking. In: IEEE Conference on CVPR, pp 4091–4099
Farenzena M, Bazzani L, Perina A, Murino V, Cristani M (2010) Person re-identification by symmetry-driven accumulation of local features. In: IEEE Conference on CVPR, pp 2360–2367
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Fergnani F, Alletto S, Serra G, De Mira J, Cucchiara R (2016) Body part based re-identification from an egocentric perspective. In: IEEE Conference on CVPR
Furqan MK, Franċois B (2017) Multi-shot person re-identification using part appearance mixture. In: IEEE Conference on WACV, pp 605–614
Gong S, Cristani M, Yan S, Loy CC (2014) Person re-identification. Springer
Hao Y, Chunfeng Y, Bing L, Yang D, Junliang X, Weiming H, Maybank SJ (2019) Asymmetric 3d convolutional neural networks for action recognition. Pattern Recogn 85:1–12
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on CVPR, pp 770–778
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: Conference on ECCV. Springer, pp 630–645
Hirzer M, Beleznai C, Roth PM, Bischof H (2011) Person re-identification by descriptive and discriminative classification. In: Image analysis, pp 91–102
Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Jianlou S, Honggang Z, Chun-Guang L, Jason K, Xiangfei K, Kot AC, Gang W (2018) Dual attention matching network for context-aware feature sequence based person re-identification. In: IEEE Conference on CVPR, pp 5363–5372
Jing XY, Zhu X, Wu F, You X, Liu Q, Yue D, Hu R, Xu B (2015) Super-resolution person re-identification with semi-coupled low-rank discriminant dictionary learning. In: IEEE Conference on CVPR, pp 695–704
Ju D, Pingping Z, Dong W, Huchuan L, Hongyu W (2019) Video person re-identification by temporal residual learning. IEEE Trans Image Process 28 (3):1366–1377
Karanam S, Gou M, Wu Z, Rates-Borras A, Camps O, Radke RJ (2019) A systematic evaluation and benchmark for person re-identification: features, metrics, and datasets. IEEE Trans Pattern Anal Mach Intell 41(3):523–536
Kelvin X, Jimmy B, Ryan K, Kyunghyun C, Courville AC, Ruslan S, Zemel RS, Yoshua B (2015) Show, attend and tell: neural image caption generation with visual attention. In: IEEE Conference on ICML, pp 2048–2057
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: IEEE Conference on ICLR
Li K, Ding Z, Li S, Fu Y (2019) Toward resolution-invariant person reidentification via projective dictionary learning. IEEE Trans Neural Netw Learning Syst 30(6):1896–1907
Li S, Shao M, Fu Y (2015) Cross-view projective dictionary learning for person re-identification. In: IJCAI, pp 2155–2161
Li S, Shao M, Fu Y (2018) Person re-identification by cross-view multi-level dictionary learning. IEEE Trans Pattern Anal Mach Intell 40(12):2963–2977
Li Y, Wu Z, Karanam S, Radke RJ (2014) Real-world re-identification in an airport camera network. In: International conference on ICDSC. ACM, p 35
Liao S, Hu Y, Zhu X, Li SZ (2015) Person re-identification by local maximal occurrence representation and metric learning. In: IEEE Conference on CVPR, pp 2197–2206
Liao S, Li SZ (2015) Efficient psd constrained asymmetric metric learning for person re-identification. In: IEEE Conference on ICCV, pp 3685–3693
Liu H, Jie Z, Jayashree K, Qi M, Jiang J, Yan S, Feng J (2017) Video-based person re-identification with accumulative motion context. IEEE Transactions on Circuits and Systems for Video Technology
Manmatha R, Wu C, Smola AJ, Krähenbühl P. (2017) Sampling matters in deep embedding learning. In: IEEE Conference on ICCV, pp 2859–2867
Max J, Karen S, Andrew Z, Koray K (2015) Spatial transformer networks. In: Conference on NIPS, pp 2017–2025
McLaughlin N, Martinez del Rincon J, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: IEEE Conference on CVPR, pp 1325–1334
Mclaughlin N, Rincon JMD, Miller P (2017) Video person re-identification for wide area tracking based on recurrent neural networks. IEEE Trans Circ Syst Video Technol PP(99):1–1
Niloofar G, Thomas BS, Richard IH (2006) Person reidentification using spatiotemporal appearance. In: IEEE Conference on CVPR, pp 1528–1535
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch
Pavlo M, Xiaodong Y, Shalini G, Kihwan K, Stephen T, Jan K (2016) Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural networks. In: IEEE Conference on CVPR, pp 4207–4215
Rohit G, Deva R (2017) Attentional pooling for action recognition. In: Conference on NIPS, pp 33–44
Rui Z, Wanli O, Xiaogang W (2014) Learning mid-level filters for person re-identification. In: IEEE Conference on CVPR, pp 144–151
Sergey Z, Nikos K (2016) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. arXiv:1612.03928
Shangxuan W, Ying-Cong C, Xiang L, Ancong W, Jinjie Y, Wei-Shi Z (2016) An enhanced deep feature representation for person re-identification. In: IEEE Conference on WACV, pp 1–8
Song C, Huang Y, Ouyang W, Wang L (2018) Mask-guided contrastive attention model for person re-identification. In: IEEE Conference on CVPR, pp 1179–1188
Su C, Yang F, Zhang S, Tian Q, Davis LS, Gao W (2015) Multi-task learning with low rank attribute embedding for person re-identification. In: IEEE Conference on ICCV, pp 3739–3747
Sumit C, Raia H, Yann L (2005) Learning a similarity metric discriminatively, with application to face verification. In: IEEE Conference on CVPR, pp 539–546
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Conference on AAAI, vol 4, p 12
Varior RR, Shuai B, Lu J, Xu D, Wang G (2016) A siamese long short-term memory architecture for human re-identification. In: Conference on ECCV. Springer, pp 135–153
Volodymyr M, Nicolas H, Alex G, Koray K (2014) Recurrent models of visual attention. In: Conference on NIPS, pp 2204–2212
Wang T, Gong S, Zhu X, Wang S (2014) Person re-identification by video ranking. In: Conference on ECCV, pp 688–703
Wei-Shi Z, Shaogang G, Tao X (2011) Person re-identification by probabilistic relative distance comparison. In: IEEE Conference on CVPR, pp 649–656
Wei Z, Xiaodong Y, Xuanyu H (2018) Learning bidirectional temporal cues for video-based person re-identification. IEEE Trans Circuits Syst Video Techn 28 (10):2768–2776
Xie Y, Yu H, Gong X, Dong Z, Gao Y (2015) Learning visual-spatial saliency for multiple-shot person re-identification. IEEE Signal Process Lett 22(11):1854–1858
Xu S, Cheng Y, Gu K, Yang Y, Chang S, Zhou P (2017) Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: IEEE Conference on ICCV, pp 4743–4752
Yang W, Jie Q, Jun T, Tsukasa O (2018) Temporal-enhanced convolutional network for person re-identification. In: Conference on AAAI, pp 7412–7419
Yi D, Lei Z, Liao S, Li SZ (2014) Deep metric learning for person re-identification. In: IEEE Conference on ICPR, pp 34–39
Yifan S, Liang Z, Yi Y, Qi T, Shengjin W (2018) Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: ECCV, pp 501–518
Yiluan G, Ngai-Man C (2018) Efficient and deep person re-identification using multi-level similarity. In: IEEE Conference on CVPR, pp 2335–2344
Yizhou Z, Xiaoyan S, Zheng-Jun Z, Wenjun Z (2018) Mict: mixed 3d/2d convolutional tube for human action recognition. In: IEEE Conference on CVPR, pp 449–458
You J, Wu A, Li X, Zheng WS (2016) Top-push video-based person re-identification. In: IEEE Conference on CVPR, pp 1345–1353
Yu L, Junjie Y, Wanli O (2017) Quality aware network for set to set recognition. In: IEEE Conference on CVPR, pp 4694–4703
Zhang W, Ma B, Liu K, Huang R (2017) Video-based pedestrian re-identification by adaptive spatio-temporal appearance model. IEEE Trans Image Process PP(99):1–1
Zhen L, Shiyu C, Feng L, Thomas SH, Liangliang C, John RS (2013) Learning locally-adaptive decision functions for person verification. In: IEEE Conference on CVPR, pp 3610–3617
Zhen Z, Yan H, Wei W, Liang W, Tieniu T (2017) See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification. In: IEEE Conference on CVPR, pp 6776–6785
Zheng L, Wang S, Tian L, He F, Liu Z, Tian Q (2015) Query-adaptive late fusion for image search and person re-identification. In: IEEE Conference on CVPR, pp 1741–1750
Zheng L, Bie Z, Sun Y, Wang J, Su C, Wang S, Tian Q (2016) Mars: a video benchmark for large-scale person re-identification. In: Conference on ECCV. Springer, pp 868–884
Zhihui L, Lina Y, Feiping N, Dingwen Z, Min X (2018) Multi-rate gated recurrent convolutional networks for video-based pedestrian re-identification. In: Conference on AAAI, pp 7081–7088
Zhu X, Jing XY, Wu F, Feng H (2016) Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics. In: IJCAI, pp 3552–3559
Zhun Z, Liang Z, Donglin C, Shaozi L (2017) Re-ranking person re-identification with k-reciprocal encoding. In: IEEE Conference on CVPR, pp 3652–3661
Acknowledgments
The authors would like to thank the editors and anonymous reviewers for their constructive comments and suggestions. This work was supported by the NSFC-Key Project under Grant No. 61933013, the NSFC-Key Project of General Technology Fundamental Research United Fund under Grant No. U1736211, the Key Project of Natural Science Foundation of Hubei Province under Grant No. 2018CFA024, the Natural Science Foundation of Guangdong Province under Grant No. 2019A1515011076, the National Key Research and Development Program of China under Grant No.2017YFB0202001, the National Nature Science Foundation of China under Grant No. 61672208, the Higher Education Institution Key Research Projects of Henan Province, No. 19A520001, the Key Scientific and Technological Project of Henan Province, No.192102210277.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cheng, L., Jing, XY., Zhu, X. et al. Local and global aligned spatiotemporal attention network for video-based person re-identification. Multimed Tools Appl 79, 34489–34512 (2020). https://doi.org/10.1007/s11042-020-08765-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-08765-1