Abstract
Person re-identification, as an essential research direction in intelligent security, has gained the focus of researchers and scholars. In practical scenarios, visible light cameras depend highly on lighting conditions and have limited detection capability in poor light. Therefore, many scholars have gradually shifted their research goals to cross-modality person re-identification. However, there are few relevant studies, and challenges remain in resolving the differences in the images of different modalities. In order to solve these problems, this paper will use the research method based on the attention mechanism to narrow the difference between the two modes and guide the network in a more appropriate direction to improve the recognition performance of the network. Aiming at the problem of using the attention mechanism method can improve training efficiency. However, it is easy to cause the model training instability. This paper proposes a cross-modal pedestrian re-recognition method based on the attention mechanism. A new attention mechanism module is designed to allow the network to use less time to focus on more critical features of a person. In addition, a cross-modality hard center triplet loss is designed to supervise the model training better. The paper has conducted extensive experiments on the above two methods on two publicly available datasets, which obtained better performance than similar current methods and verified the effectiveness and feasibility of the proposed methods in this paper.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
Not applicable.
References
Gong, S., Xiang, T.: Person re-identification[M]. London: Springer. 301-313. (2011)
Gong, S., Xiang, T.: Person re-identification[M]. London: Springer. 301-313. (2011)
Nguyen, D.T., Hong, H.G., Kim, K.W., et al.: Person recognition system based on a combination of body images from visible light and thermal cameras[J]. Sensors 17(3), 605 (2017)
Liu, H., Cheng, J., Wang, W., et al.: Enhancing the discriminative feature learning for visible-thermal cross-modality person re-identification. Neurocomputing 398, 11–19 (2020)
Ye, M., Shen, J., Lin, G., et al.: Deep learning for person re-identification: a survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1-20. (2021)
Qi, M., Wang, S., Huang, G., et al.: Mask-guided dual attention-aware network for visible-infrared person re-identification. Multimed. Tools Appl. 80(12), 17645–17666 (2021)
Ye, M., Shen, J., J Crandall, D., et al.: Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. Eur. Conf. Comput. Vis. 229-247. (2020)
Dai, P., Ji, R., Wang, H., et al.: Cross-modality person re-identification with generative adversarial training. IJCAI, 677-683. (2018)
Wang, Z., Wang, Z., Zheng, Y., et al.: Learning to reduce dual-level discrepancy for infrared-visible person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 618-626. (2019)
Ye, M., Wang, Z., Lan, X., et al.: Visible Thermal Person Re-Identification via Dual-Constrained Top-Ranking. IJCAI, 1092-1099. (2018)
Ye, M., Lan, X., Li, J., et al.: Hierarchical discriminative learning for visible thermal person re-identification. Thirty-Second AAAI conference on artificial intelligence, 7501-7508. (2018)
Hao, Y., Wang, N., Li, J., et al.: HSME: hypersphere manifold embedding for visible thermal person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 8385-8392. (2019)
Li, D., Wei, X., Hong, X., et al.: Infrared-visible cross-modal person re-identification with an x modality. Proceedings of the AAAI Conference on Artificial Intelligence, 4610-4617. (2020)
Wang, G.a., Zhang, T., Cheng, J., et al.: RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment. Proceedings of the IEEE International Conference on Computer Vision, 3623-3632. (2019)
Hao, Y., Li, J., Wang, N., et al.: Modality adversarial neural network for visible-thermal person re-identification. Pattern Recogn. 107, 107533 (2020)
Lin, J.-W., Li, H.: HPILN: A feature learning framework for cross-modality person re-identification. IET Image Proc. 13(14), 2897–2904 (2019)
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737, (2017)
Feng, Z., Lai, J., Xie, X.: Learning modality-specific representations for visible-infrared person re-identification. IEEE Trans. Image Process. 29, 579–590 (2019)
Wang, G.-A., Zhang, T., Yang, Y., et al.: Cross-modality paired-images generation for rgb-infrared person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 12144-12151. (2020)
Ye, M., Wang, Z., Lan, X., et al.: Visible Thermal Person Re-Identification via Dual-Constrained Top-Ranking. IJCAI, 1092-1099. (2018)
Shu, X., Xu, B., Zhang, L., Tang, J.: Multi-granularity anchor-contrastive representation learning for semi-supervised skeleton-based action recognition, In: IEEE Transactions on Pattern Analysis and Machine Intelligence, (2022). https://doi.org/10.1109/TPAMI.2022.3222871
Shu, X., Yang, J., Yan, R., Song, Y.: Expansion-Squeeze-Excitation Fusion Network for Elderly Activity Recognition. IEEE Trans. Circuits Syst. Video Technol. 32(8), 5281–5292 (2022). https://doi.org/10.1109/TCSVT.2022.3142771
Xu, B., Shu, X., Song, Y.: X-Invariant contrastive augmentation and representation learning for semi-supervised skeleton-based action recognition. IEEE Trans. Image Process. 31, 3852–3867 (2022). https://doi.org/10.1109/TIP.2022.3175605
Luo, Hao, Jiang, Wei, Fan, Xing: etc. Research progress of pedestrian recognizance based on deep learning. Acta Automatica Sinica, 45(11): 2032-2049. (2019)
Frng, Xia, Du, Jiahao, Duan, Yinong: etc. A review of pedestrian recognizance based on deep learning. Computer Application Research, 37(11): 3220-3226+3240. (2020)
Radenović, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1655–1668 (2018)
Selvaraju, R.R., Cogswell, M., Das, A., et al.: Grad-cam: Visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE international conference on computer vision, 618-626. (2017)
Liu, D., Cui, Y., Tan, W., et al.: SG-Net: spatial granularity network for one-stage video instance segmentation. (2021)
Bellen, E., Mendoza, J., Seroy, D., et al.: Integrated visual-based ASL captioning in videoconferencing using CNN// TENCON 2022 - 2022 IEEE Region 10 Conference (TENCON). 0
Cui, Y., Yan, L., Cao, Z., et al.: TF-Blender: Temporal feature blender for video object detection. (2021)
Funding
This work is supported by the National Natural Science Foundation of China (51607059), Heilongjiang Postdoctoral Financial Assistance, China (LBH-Z20188) and the Basic Science Research Project of Heilongjiang Univesity, China (KJCX201904,2020-KYYWF-1001).
Author information
Authors and Affiliations
Contributions
Not applicable.
Corresponding author
Ethics declarations
Conflict of interest
Not applicable.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Code availability
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, Y., Zhou, H., Cheng, H. et al. Cross-modal pedestrian re-recognition based on attention mechanism. Vis Comput 40, 2405–2418 (2024). https://doi.org/10.1007/s00371-023-02926-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-02926-7