Spatial-temporal graph-guided global attention network for video-based person re-identification

Xiaobao Li¹,
Wen Wang²,
Qingyong Li² &
…
Jiang Zhang³

505 Accesses
1 Citation
Explore all metrics

Abstract

Global attention learning has been extensively applied in video-based person re-identification due to its superiority in capturing contextual correlations. However, existing global attention learning methods usually adopt the conventional neural network to model non-Euclidean contextual correlations, resulting in a limited representation ability. Inspired by the graph-structure property of the contextual correlations, we propose a spatial-temporal graph-guided global attention network (STG$^3$A) for video-based person re-identification. STG$^3$A comprises two graph-guided attention modules to capture the spatial contexts within a frame and temporal contexts across all frames in a sequence for global attention learning. Furthermore, the graphs from both modules are encoded as graph representations, which combine with weighted representations to grasp the spatial-temporal contextual information adequately for video feature learning. To reduce the effect of noisy graph nodes and learn robust graph representations, a graph node attention is developed to trade-off the importance of each graph node, leading to noise-tolerant graph models. Finally, we design a graph-guided fusion scheme to integrate the representations output by these two attentive modules for a more compact video feature. Extensive experiments on MARS and DukeMTMCVideoReID datasets demonstrate the superior performance of the STG$^3$A.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention-guided spatial–temporal graph relation network for video-based person re-identification

Article 24 March 2023

Temporal Extension Topology Learning for Video-Based Person Re-identification

Multi-scale Context Aggregation for Video-Based Person Re-Identification

References

Karanam, S., Gou, M., Wu, Z., Rates-Borras, A., Camps, O.I., Radke, R.J.: A systematic evaluation and benchmark for person re-identification: features, metrics, and datasets. IEEE Trans. Pattern Anal. Mach. Intell. 41(3), 523–536 (2019)
Article Google Scholar
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.: Deep learning for person re-identification: a survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 2872–2893 (2021)
Article Google Scholar
Sun, J., Li, Y., Chen, H., Peng, Y., Zhu, J.: Visible-infrared person re-identification model based on feature consistency and modal indistinguishability. Mach. Vis. Appl. 34(1), 14 (2023)
Article Google Scholar
Perwaiz, N., Shahzad, M., Fraz, M.: Ubiquitous vision of transformers for person re-identification. Mach. Vis. Appl. 34(2), 27–40 (2023)
Article Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Article Google Scholar
Dehghan, A., Modiri Assari, S., Shah, M.: Gmmcp tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE pp. 4091–4099 (2015)
Zhang, Z., Lan, C., Zeng, W., Jin, X., Chen, Z.: Relation-aware global attention for person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3186–3195 (2020). IEEE
Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: Iaunet: global context-aware feature learning for person reidentification. IEEE Trans. Neural Netw. Learn. Syst. 32(10), 4460–4474 (2020)
Article Google Scholar
Wu, Y., Bourahla, O.E.F., Li, X., Wu, F., Tian, Q., Zhou, X.: Adaptive graph representation learning for video person re-identification. IEEE Trans. Image Process. 29, 8821–8830 (2020)
Article Google Scholar
Hou, R., Chang, H., Ma, B., Shan, S., Chen, X.: Temporal complementary learning for video person re-identification. In: European Conference on Computer Vision, pp. 388–405 (2020). Springer
Zhang, Z., Lan, C., Zeng, W., Chen, Z.: Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10407–10416 (2020). IEEE
Wu, Y., Lin, Y., Dong, X., Yan, Y., Ouyang, W., Yang, Y.: Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5177–5186 (2018). IEEE
Li, S., Bak, S., Carr, P., Wang, X.: Diversity regularized spatiotemporal attention for video-based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 369–378 (2018). IEEE
Zhang, R., Li, J., Sun, H., Ge, Y., Luo, P., Wang, X., Lin, L.: Scan: Self-and-collaborative attention network for video person re-identification. IEEE Trans. Image Process. 28(10), 4870–4882 (2019)
Article MathSciNet Google Scholar
Chen, G., Lu, J., Yang, M., Zhou, J.: Spatial-temporal attention-aware learning for video-based person re-identification. IEEE Trans. Image Process. 28(9), 4192–4205 (2019)
Article MathSciNet Google Scholar
Li, J., Zhang, S., Huang, T.: Multi-scale 3d convolution network for video based person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8618–8625 (2019)
Chen, G., Lu, J., Yang, M., Zhou, J.: Learning recurrent 3d attention for video-based person re-identification. IEEE Trans. Image Process. 29, 6963–6976 (2020)
Article Google Scholar
Yan, Y., Ni, B., Song, Z., Ma, C., Yan, Y., Yang, X.: Person re-identification via recurrent feature aggregation. In: European Conference on Computer Vision, pp. 701–716 (2016). Springer
Li, X., Zhou, W., Zhou, Y., Li, H.: Relation-guided spatial attention and temporal refinement for video-based person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11434–11441 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). IEEE
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2020)
Article MathSciNet Google Scholar
Nayak, R., Balabantaray, B.K., Patra, D.: A new single-image super-resolution using efficient feature fusion and patch similarity in non-Euclidean space. Arab. J. Sci. Eng. 45(12), 10261–10285 (2020)
Article Google Scholar
Yu, J., Tan, M., Zhang, H., Rui, Y., Tao, D.: Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans. Pattern Anal. Mach. Intell. 44(2), 563–578 (2019)
Article Google Scholar
Gu, X., Chang, H., Ma, B., Zhang, H., Chen, X.: Appearance-preserving 3d convolution for video-based person re-identification. In: European Conference on Computer Vision, pp. 228–243 (2020). Springer
Chen, Y., Duffner, S., Stoian, A., Dufour, J.-Y., Baskurt, A.: List-wise learning-to-rank with convolutional neural networks for person re-identification. Mach. Vis. Appl. 32, 1–14 (2021). (Springer)
Article Google Scholar
Ye, Z., Hong, C., Zeng, Z., Zhuang, W.: Self-supervised person re-identification with channel-wise transformer. In: 2022 IEEE International Conference on Big Data (Big Data), pp. 4210–4217 (2022). IEEE
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015). IEEE
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017). IEEE
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020). Springer
Hong, C., Yu, J., Wan, J., Tao, D., Wang, M.: Multimodal deep autoencoder for human pose recovery. IEEE Trans. Image Process. 24(12), 5659–5670 (2015)
Article MathSciNet Google Scholar
Hong, C., Yu, J., Zhang, J., Jin, X., Lee, K.-H.: Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans. Industr. Inf. 15(7), 3952–3961 (2018)
Article Google Scholar
Gu, X., Chang, H., Ma, B., Zhang, H., Chen, X.: Appearance-preserving 3d convolution for video-based person re-identification. In: European Conference on Computer Vision, pp. 228–243 (2020). Springer
Gao, C., Chen, Y., Yu, J.-G., Sang, N.: Pose-guided spatiotemporal alignment for video-based person re-identification. Inf. Sci. 527, 176–190 (2020)
Article MathSciNet Google Scholar
Zhao, H., Tian, M., Sun, S., Shao, J., Yan, J., Yi, S., Wang, X., Tang, X.: Spindle net: Person re-identification with human body region guided feature decomposition and fusion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1077–1085 (2017). IEEE
Fang, P., Zhou, J., Roy, S.K., Petersson, L., Harandi, M.: Bilinear attention networks for person retrieval. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8030–8039 (2019). IEEE
Chen, B., Deng, W., Hu, J.: Mixed high-order attention network for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 371–381 (2019). IEEE
Chen, T., Ding, S., Xie, J., Yuan, Y., Chen, W., Yang, Y., Ren, Z., Wang, Z.: Abd-net: Attentive but diverse person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8351–8361 (2019). IEEE
Dong, H., Yang, Y., Sun, X., Zhang, L., Fang, L.: Cascaded attention-guided multi-granularity feature learning for person re-identification. Mach. Vis. Appl. 34(1), 1–16 (2023). (Springer)
Article Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018). IEEE
Chen, J., Lei, B., Song, Q., Ying, H., Chen, D.Z., Wu, J.: A hierarchical graph network for 3d object detection on point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 392–401 (2020). IEEE
Lin, Z.-H., Huang, S.-Y., Wang, Y.-C.F.: Convolution in the cloud: Learning deformable kernels in 3d graph convolution networks for point cloud analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1800–1809 (2020). IEEE
Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic graph convolutional networks for 3d human pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3425–3435 (2019). IEEE
Cai, Y., Ge, L., Liu, J., Cai, J., Cham, T.-J., Yuan, J., Thalmann, N.M.: Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2272–2281 (2019). IEEE
Shen, Y., Li, H., Yi, S., Chen, D., Wang, X.: Person re-identification with deep similarity-guided graph neural network. In: Proceedings of the European Conference on Computer Vision, pp. 486–504 (2018). Springer
Yang, J., Zheng, W.-S., Yang, Q., Chen, Y.-C., Tian, Q.: Spatial-temporal graph convolutional network for video-based person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3289–3299 (2020). IEEE
Guo, S., Lin, Y., Feng, N., Song, C., Wan, H.: Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 922–929 (2019)
Guo, S., Lin, Y., Wan, H., Li, X., Cong, G.: Learning dynamics and heterogeneity of spatial-temporal graph data for traffic forecasting. IEEE Trans. Knowl. Data Eng. 34(11), 5415–5428 (2021). (IEEE)
Article Google Scholar
Su, Y., Zhu, H., Tan, Y., An, S., Xing, M.: Prime: privacy-preserving video anomaly detection via motion exemplar guidance. Knowl.-Based Syst. 278, 110872 (2023)
Article Google Scholar
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence (2018)
Liu, J., Zha, Z.-J., Wu, W., Zheng, K., Sun, Q.: Spatial-temporal correlation and topology learning for person re-identification in videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4370–4379 (2021). IEEE
Wei, X., Yu, R., Sun, J.: View-gcn: View-based graph convolutional network for 3d shape analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1850–1859 (2020). IEEE
Yang, L., Zhan, X., Chen, D., Yan, J., Loy, C.C., Lin, D.: Learning to cluster faces on an affinity graph. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2298–2306 (2019). IEEE
Huang, H.-C., Chuang, Y.-Y., Chen, C.-S.: Affinity aggregation for spectral clustering. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 773–780 (2012). IEEE
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). PMLR
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011). JMLR Workshop and Conference Proceedings
Li, G., Muller, M., Thabet, A., Ghanem, B.: Deepgcns: Can gcns go as deep as cnns? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9267–9276 (2019). IEEE
Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., Tian, Q.: Mars: A video benchmark for large-scale person re-identification. In: European Conference on Computer Vision, pp. 868–884 (2016). Springer
Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision Workshops, pp. 17–35 (2016). Springer
Zhong, Z., Zheng, L., Cao, D., Li, S.: Re-ranking person re-identification with k-reciprocal encoding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1318–1327 (2017). IEEE
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014). IEEE
Jiang, X., Qiao, Y., Yan, J., Li, Q., Zheng, W., Chen, D.: SSN3D: Self-separated network to align parts for 3D convolution in video person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1691–1699 (2021)
Liu, Y., Yan, J., Ouyang, W.: Quality aware network for set to set recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5790–5799 (2017). IEEE
Chen, D., Li, H., Xiao, T., Yi, S., Wang, X.: Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1169–1178 (2018). IEEE
Fu, Y., Wang, X., Wei, Y., Huang, T.: Sta: Spatial-temporal attention for large-scale video-based person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8287–8294 (2019)
Li, J., Wang, J., Tian, Q., Gao, W., Zhang, S.: Global-local temporal representations for video person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3958–3967 (2019). IEEE
Subramaniam, A., Nambiar, A., Mittal, A.: Co-segmentation inspired attention networks for video-based person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 562–572 (2019). IEEE
Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: Vrstc: Occlusion-free video person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7183–7192 (2019). IEEE
Hou, R., Chang, H., Ma, B., Huang, R., Shan, S.: Bicnet-tks: Learning efficient spatial-temporal representation for video person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2014–2023 (2021)
Chen, C., Ye, M., Qi, M., Wu, J., Liu, Y., Jiang, J.: Saliency and granularity: discovering temporal coherence for video-based person re-identification. IEEE Trans. Circuits Syst. Video Technol. 32(9), 6100–6112 (2022)
Article Google Scholar
Pan, H., Chen, Y., He, Z.: Multi-granularity graph pooling for video-based person re-identification. Neural Netw. 160, 22–33 (2023)
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant 62276019 and 62006017.

Author information

Authors and Affiliations

School of Computer Science and Technology, Jiangsu Normal University, No.101 Shanghai Road Tongshan District, Xuzhou, 221116, Jiangsu, China
Xiaobao Li
Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, No.3 Shangyuancun Haidian District, Beijing, 100044, Beijing, China
Wen Wang & Qingyong Li
China Academy of Aerospace Aerodynamics, No.17 YunGang West Road Fengtai District, Beijing, 100074, Beijing, China
Jiang Zhang

Authors

Xiaobao Li
View author publications
You can also search for this author in PubMed Google Scholar
Wen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qingyong Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The proposed method was designed by XL, QL and WW. Material preparation, data collection and experimentation were performed by XL, and the first draft of the manuscript was written by XL. The analysis of experimental results was performed by XL, QL, WW and JZ. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xiaobao Li.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, X., Wang, W., Li, Q. et al. Spatial-temporal graph-guided global attention network for video-based person re-identification. Machine Vision and Applications 35, 8 (2024). https://doi.org/10.1007/s00138-023-01489-w

Download citation

Received: 09 May 2023
Revised: 26 September 2023
Accepted: 01 November 2023
Published: 03 December 2023
DOI: https://doi.org/10.1007/s00138-023-01489-w

Spatial-temporal graph-guided global attention network for video-based person re-identification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Attention-guided spatial–temporal graph relation network for video-based person re-identification

Temporal Extension Topology Learning for Video-Based Person Re-identification

Multi-scale Context Aggregation for Video-Based Person Re-Identification

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Spatial-temporal graph-guided global attention network for video-based person re-identification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Attention-guided spatial–temporal graph relation network for video-based person re-identification

Temporal Extension Topology Learning for Video-Based Person Re-identification

Multi-scale Context Aggregation for Video-Based Person Re-Identification

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now