Abstract
Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. Due to the significant cross-modality discrepancy, it is difficult to learn discriminative features. Attention-based methods have been widely utilized to extract discriminative features for VI-ReID. However, the existing methods are confined by first-order structures that just exploit simple and coarse information. The existing approach lacks the sufficient capability to learn both modality-irrelevant and modality-relevant features. In this paper, we extract the second-order information from mid-level features to complement the first-order cues. Specifically, we design a flexible second-order module, which considers the correlations between the common features and learns refined feature representations for pedestrian images. Additionally, the visible and infrared modality has a significant gap. Therefore, we propose a plug-and-play mixed intermediate modality module to generate intermediate modality representations to reduce the modality discrepancy between the visible and infrared features. Extensive experimental results on two challenging datasets SYSU-MM01 and RegDB demonstrate that our method considerably achieves competitive performance compared to the state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cai, S., Zuo, W., Zhang, L.: Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: Proceedings of the ICCV, pp. 511–520 (2017)
Chen, B., Deng, W., Hu, J.: Mixed high-order attention network for person re-identification. In: Proceedings of the ICCV, pp. 371–381 (2019)
Chen, C., Ye, M., Qi, M., Wu, J., Jiang, J., Lin, C.W.: Structure-aware positional transformer for visible-infrared person re-identification. In: IEEE TIP, pp. 2352–2364 (2022)
Chen, D., Wu, P., Jia, T., Xu, F.: Hob-net: high-order block network via deep metric learning for person re-identification. Appl. Intell. 52(5), 4844–4857 (2022)
Chen, Y., Wan, L., Li, Z., Jing, Q., Sun, Z.: Neural feature search for RGB-infrared person re-identification. In: Proceedings of the CVPR, pp. 587–597 (2021)
Dai, Y., Liu, J., Sun, Y., Tong, Z., Zhang, C., Duan, L.Y.: IDM: an intermediate domain module for domain adaptive person re-id. In: Proceedings of the ICCV, pp. 11844–11854 (2021)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of the CVPR, pp. 248–255 (2009)
Fan, X., Zhang, Y., Lu, Y., Wang, H.: Parformer: transformer-based multi-task network for pedestrian attribute recognition. In: IEEE TCSVT, p. 1 (2023)
Fu, C., Hu, Y., Wu, X., Shi, H., Mei, T., He, R.: CM-NAS: cross-modality neural architecture search for visible-infrared person re-identification. In: Proceedings of the ICCV, pp. 11803–11812 (2021)
Gao, Y., et al.: MSO: multi-feature space joint optimization network for RGB-infrared person re-identification. In: Proceedings of the 29th ACM MM, pp. 5257–5265 (2021)
Hao, X., Zhao, S., Ye, M., Shen, J.: Cross-modality person re-identification via modality confusion and center aggregation. In: Proceedings of the ICCV, pp. 16383–16392 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the CVPR, pp. 770–778 (2016)
Huang, Z., Liu, J., Li, L., Zheng, K., Zha, Z.J.: Modality-adaptive mixup and invariant decomposition for RGB-infrared person re-identification. In: Proceedings of the AAAI, pp. 1034–1042 (2022)
Jacob, P., Picard, D., Histace, A., Klein, E.: Metric learning with HORDE: high-order regularizer for deep embeddings. In: Proceedings of the ICCVw, pp. 6539–6548 (2019)
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
Li, D., Wei, X., Hong, X., Gong, Y.: Infrared-visible cross-modal person re-identification with an x modality. In: Proceedings of the AAAI, pp. 4610–4617 (2020)
Li, P., Xie, J., Wang, Q., Zuo, W.: Is second-order information helpful for large-scale visual recognition? In: Proceedings of the ICCV, pp. 2089–2097 (2017)
Liu, L., Zhang, Y., Chen, J., Gao, C.: Fusing global and semantic-part features with multiple granularities for person re-identification. In: 2019 IEEE ISPA/BDCloud/SocialCom/SustainCom, pp. 1436–1440 (2019)
Lu, H., Zou, X., Zhang, P.: Learning progressive modality-shared transformers for effective visible-infrared person re-identification. In: Proceedings of the AAAI, pp. 1835–1843 (2022)
Luo, H., et al.: A strong baseline and batch normalization neck for deep person re-identification. IEEE Trans. Multim. 22(10), 2597–2609 (2020)
Nguyen, D., Hong, H., Kim, K., Park, K.: Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors 17(3), 605 (2017)
Park, H., Lee, S., Lee, J., Ham, B.: Learning by aligning: Visible-infrared person re-identification using cross-modal correspondences. In: Proceedings of the ICCV, pp. 12026–12035 (2021)
Shao, R., Lan, X., Li, J., Yuen, P.C.: Multi-adversarial discriminative deep domain generalization for face presentation attack detection. In: Proceedings of the CVPR, pp. 10015–10023 (2019)
Sun, H., et al.: Not all pixels are matched: dense contrastive learning for cross-modality person re-identification. In: Proceedings of the ACM MM, pp. 5333–5341 (2022)
Tay, C.P., Roy, S., Yap, K.H.: Aanet: attribute attention network for person re-identifications. In: Proceedings of the CVPR, pp. 7127–7136 (2019)
Wang, G., Zhang, T., Cheng, J., Liu, S., Yang, Y., Hou, Z.: RGB-infrared cross-modality person re-identification via joint pixel and feature alignment. In: Proceedings of the ICCV, pp. 3622–3631 (2019)
Wang, G.-A., et al.: Cross-modality paired-images generation for RGB-infrared person re-identification. Proc. AAAI Conf. Artif. Intell. 34(07), 12144–12151 (2020)
Wei, Z., Yang, X., Wang, N., Gao, X.: Syncretic modality collaborative learning for visible infrared person re-identification. In: Proceedings of the ICCV, pp. 225–234 (2021)
Woo, S., Park, J., Lee, J., Kweon, I.S.: CBAM: convolutional block attention module. In: Proceedings of the ECCV, pp. 3–19 (2018)
Wu, A., Zheng, W.S., Yu, H.X., Gong, S., Lai, J.: RGB-infrared cross-modality person re-identification. In: Proceedings of the ICCV, pp. 5390–5399 (2017)
Wu, Q., et al.: Discover cross-modality nuances for visible-infrared person re-identification. In: Proceedings of the CVPR, pp. 4328–4337 (2021)
Xu, J., Zhao, R., Zhu, F., Wang, H., Ouyang, W.: Attention-aware compositional network for person re-identification. In: Proceedings of the CVPR, pp. 2119–2128 (2018)
Yan, Y., Lu, Y., Wang, H.: Towards a unified middle modality learning for visible-infrared person re-identification. In: Proceedings of the ACM MM, pp. 788–796 (2021)
Yang, M., Huang, Z., Hu, P., Li, T., Lv, J., Peng, X.: Learning with twin noisy labels for visible-infrared person re-identification. In: Proceedings of the CVPR, pp. 14288–14297 (2022)
Ye, M., Lan, X., Li, J., Yuen, P.C.: Hierarchical discriminative learning for visible thermal person re-identification. In: Proceedings of the AAAI, pp. 7501–7508 (2018)
Ye, M., Ruan, W., Du, B., Shou, M.Z.: Channel augmented joint learning for visible-infrared recognition. In: Proceedings of the ICCV, pp. 13547–13556 (2021)
Ye, M., Shen, J., Crandall, D.J., Shao, L., Luo, J.: Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In: Proceedings of the ECCV, pp. 229–247 (2020)
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.H.: Deep learning for person re-identification: a survey and outlook. In: IEEE TPAMI, pp. 2872–2893 (2022)
Ye, M., Shen, J., Shao, L.: Visible-infrared person re-identification via homogeneous augmented tri-modal learning. In: IEEE TIFS, pp. 728–739 (2021)
Zhang, Y., Wang, H.: Diverse embedding expansion network and low-light cross-modality benchmark for visible-infrared person re-identification. In: Proceedings of the CVPR, pp. 2153–2162 (2023)
Zhang, Y., Yan, Y., Li, J., Wang, H.: MRCN: a novel modality restitution and compensation network for visible-infrared person re-identification. In: Proceedings of the AAAI, pp. 3498–3506 (2023)
Zhao, Z., Liu, B., Chu, Q., Lu, Y., Yu, N.: Joint color-irrelevant consistency learning and identity-aware modality adaptation for visible-infrared cross modality person re-identification. In: Proceedings of the AAAI, pp. 3520–3528 (2021)
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI, pp. 13001–13008 (2020)
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant U21A20514, 62002302, by the FuXiaQuan National Independent Innovation Demonstration Zone Collaborative Innovation Platform Project under Grant 3502ZCQXT2022008, and by the China Fundamental Research Funds for the Central Universities under Grants 20720230038.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tao, H., Zhang, Y., Lu, Y., Wang, H. (2024). An Effective Visible-Infrared Person Re-identification Network Based on Second-Order Attention and Mixed Intermediate Modality. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14433. Springer, Singapore. https://doi.org/10.1007/978-981-99-8546-3_10
Download citation
DOI: https://doi.org/10.1007/978-981-99-8546-3_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8545-6
Online ISBN: 978-981-99-8546-3
eBook Packages: Computer ScienceComputer Science (R0)