Abstract
Visible-Infrared Person Re-identification (VI-ReID) is a challenging task that involves matching visible and infrared person images across multiple camera views. The huge gap between visible and infrared modalities has become a significant bottleneck. Existing works typically employ dual-stream networks to extract shared modality representation, yet struggle to relieve such gap, resulting in inferior performance. To overcome this issue, we propose a robust synthetic modality learning with a dual-level alignment method (RSDL) for VI-ReID that aims to generate a robust synthetic modality as a bridge to guide cross-modality alignment. Specifically, the hetero-modality fusion (HMF) strategy is introduced to generate the robust synthetic modality by using multi-scale feature fusion with a structure rebuild module (SRM) and a cross-modality spatial alignment (CSA) module. The strategy incorporates rich semantic structural patterns from visible and infrared images to handle the modality variation. Additionally, we design the dual-level regulation loss to jointly explore the stable feature relationships among three modalities at both the instance and distribution levels for cross-modality alignment. This facilitates discovering modality-consistent and identity-aware representations. Extensive experiments on three VI-ReID benchmarks demonstrate the effectiveness of our proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, C., Ye, M., Qi, M., Wu, J., Jiang, J., Lin, C.W.: Structure-aware positional transformer for visible-infrared person re-identification. IEEE Trans. Image Process. 31, 2352–2364 (2022)
Choi, S., Lee, S., Kim, Y., Kim, T., Kim, C.: Hi-cmd: Hierarchical cross-modality disentanglement for visible-infrared person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10257–10266 (2020)
Dai, P., Ji, R., Wang, H., Wu, Q., Huang, Y.: Cross-modality person re-identification with generative adversarial training. In: IJCAI, vol. 1, p. 6 (2018)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale (2020). arXiv:2010.11929
Fu, C., Hu, Y., Wu, X., Shi, H., Mei, T., He, R.: Cm-nas: Cross-modality neural architecture search for visible-infrared person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11823–11832 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hershey, J.R., Olsen, P.A.: Approximating the kullback leibler divergence between gaussian mixture models. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, vol. 4, pp. IV–317. IEEE (2007)
Howard, A., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Huang, Z., Liu, J., Li, L., Zheng, K., Zha, Z.: Modality-adaptive mixup and invariant decomposition for RGB-infrared person re-identification (2022). arXiv:2203.01735
Jambigi, C., Rawal, R., Chakraborty, A.: MMD-ReID: A simple but effective solution for visible-thermal person ReID (2021). arXiv:2111.05059
Kansal, K., Subramanyam, A.V., Wang, Z., Satoh, S.: SDL: spectrum-disentangled representation learning for visible-infrared person re-identification. IEEE Trans. Circuits Syst. Video Technol. 30(10), 3422–3432 (2020)
Kong, J., He, Q., Jiang, M., Liu, T.: Dynamic center aggregation loss with mixed modality for visible-infrared person re-identification. IEEE Signal Process. Lett. 28, 2003–2007 (2021)
Li, D., Wei, X., Hong, X., Gong, Y.: Infrared-visible cross-modal person re-identification with an x modality. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 4610–4617 (2020)
Liu, H., Chai, Y., Tan, X., Li, D., Zhou, X.: Strong but simple baseline with dual-granularity triplet loss for visible-thermal person re-identification. IEEE Signal Process. Lett. 28, 653–657 (2021)
Liu, J., Sun, Y., Zhu, F., Pei, H., Yang, Y., Li, W.: Learning memory-augmented unidirectional metrics for cross-modality person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19366–19375 (2022)
Lu, H., Zou, X., Zhang, P.: Learning progressive modality-shared transformers for effective visible-infrared person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 1835–1843 (2023)
Lu, Y., Wu, Y., Liu, B., Zhang, T., Li, B., Chu, Q., Yu, N.: Cross-modality person re-identification with shared-specific feature transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13379–13389 (2020)
Luo, H., Gu, Y., Liao, X., Lai, S., Jiang, W.: Bag of tricks and a strong baseline for deep person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res.9(11) (2008)
Nguyen, D.T., Hong, H.G., Kim, K.W., Park, K.R.: Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors 17(3), 605 (2017)
Park, H., Lee, S., Lee, J., Ham, B.: Learning by aligning: visible-infrared person re-identification using cross-modal correspondences. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12046–12055 (2021)
Sun, H., Liu, J., Zhang, Z., Wang, C., Qu, Y., Xie, Y., Ma, L.: Not all pixels are matched: dense contrastive learning for cross-modality person re-identification. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 5333–5341 (2022)
Wang, G.A., Zhang, T., Yang, Y., Cheng, J., Chang, J., Liang, X., Hou, Z.G.: Cross-modality paired-images generation for RGB-infrared person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12144–12151 (2020)
Wei, Z., Yang, X., Wang, N., Gao, X.: Syncretic modality collaborative learning for visible infrared person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 225–234 (2021)
Wu, A., Zheng, W.S., Yu, H.X., Gong, S., Lai, J.: RGB-infrared cross-modality person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5380–5389 (2017)
Wu, J., Liu, H., Su, Y., Shi, W., Tang, H.: Learning concordant attention via target-aware alignment for visible-infrared person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11122–11131 (2023)
Wu, Q., Dai, P., Chen, J., Lin, C.W., Wu, Y., Huang, F., Zhong, B., Ji, R.: Discover cross-modality nuances for visible-infrared person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4330–4339 (2021)
Yang, M., Huang, Z., Hu, P., Li, T., Lv, J., Peng, X.: Learning with twin noisy labels for visible-infrared person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14308–14317 (2022)
Ye, M., Ruan, W., Du, B., Shou, M.Z.: Channel augmented joint learning for visible-infrared recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13567–13576 (2021)
Ye, M., Shen, J., J Crandall, D., Shao, L., Luo, J.: Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In: European Conference on Computer Vision, pp. 229–247. Springer (2020)
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.: Deep learning for person re-identification: a survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 2872–2893 (2021)
Zhang, Q., Lai, C., Liu, J., Huang, N., Han, J.: FMCNet: feature-level modality compensation for visible-infrared person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7349–7358 (2022)
Zhang, S., Shang, Z., Zhou, M., Wang, Y., Sun, G.: Cross-modal identity correlation mining for visible-thermal person re-identification. Multimed. Tools Appl. 1–14 (2022)
Zhang, Y., Wang, H.: Diverse embedding expansion network and low-light cross-modality benchmark for visible-infrared person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2153–2162 (2023)
Zhang, Y., Yan, Y., Lu, Y., Wang, H.: Towards a unified middle modality learning for visible-infrared person re-identification. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 788–796 (2021)
Zheng, H., Zhong, X., Huang, W., Jiang, K., Liu, W., Wang, Z.: Visible-infrared person re-identification: a comprehensive survey and a new setting. Electronics 11(3), 454 (2022)
Zheng, L., Yang, Y., Hauptmann, A.G.: Person re-identification: past, present and future (2016). arXiv:1610.02984
Zhong, X., Lu, T., Huang, W., Ye, M., Jia, X., Lin, C.W.: Grayscale enhancement colorization network for visible-infrared person re-identification. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1418–1430 (2021)
Zhong, X., Lu, T., Huang, W., Yuan, J., Liu, W., Lin, C.W.: Visible-infrared person re-identification via colorization-based siamese generative adversarial network. In: Proceedings of the 2020 International Conference on Multimedia Retrieval, pp. 421–427 (2020)
Zhu, Y., Yang, Z., Wang, L., Zhao, S., Hu, X., Tao, D.: Hetero-center loss for cross-modality person re-identification. Neurocomputing 386, 97–109 (2020)
Acknowledgements
This study was partly supported by the National Natural Science Foundation of China (Grant No. 61802058, 62306061).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, Z., Cheng, X. (2025). Learning a Robust Synthetic Modality with Dual-Level Alignment for Visible-Infrared Person Re-identification. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15035. Springer, Singapore. https://doi.org/10.1007/978-981-97-8620-6_20
Download citation
DOI: https://doi.org/10.1007/978-981-97-8620-6_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8619-0
Online ISBN: 978-981-97-8620-6
eBook Packages: Computer ScienceComputer Science (R0)