Nothing Special   »   [go: up one dir, main page]

Skip to main content

Image-Centered Pseudo Label Generation for Weakly Supervised Text-Based Person Re-Identification

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15042))

Included in the following conference series:

  • 6 Accesses

Abstract

Weakly supervised text-based person re-identification aims to identify a target person using textual descriptions, where the identity annotations are not available during the training phase. Previous methods attempted to cluster images and texts simultaneously for generating pseudo identity labels. However, we observed that the number of text clusters is significantly smaller than the number of true identities, while the number of image clusters is closer to the actual number of identities. This leads to uncertain pseudo identity labels. To address this issue, we propose a new approach called Image-Centered Pseudo Label Generation (ICPG) for weakly supervised text-based person re-identification. It directly generates pseudo labels for images and texts based on image clustering results. Firstly, we introduce a cross-modal distribution matching loss, which focuses on minimizing the KL divergence between the distributions of image-text similarity and normalized pseudo label matching distributions. Secondly, to enhance cross-modal associations, we propose a cross-modal hard sample mining method to explore challenging cross-modal examples. Experimental results demonstrate the effectiveness of our proposed methods. Compared to the state-of-the-art method, our approach achieves improvements of 3.6\(\%\), 2.4\(\%\) and 3.0\(\%\) in rank-1 accuracy on three datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, T., Xu, C., Luo, J.: Improving text-based person search by spatial matching and adaptive threshold. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1879–1887. IEEE (2018)

    Google Scholar 

  2. Chen, Y., Huang, R., Chang, H., Tan, C., Xue, T., Ma, B.: Cross-modal knowledge adaptation for language-based person search. IEEE Trans. Image Process. 30, 4057–4069 (2021)

    Article  Google Scholar 

  3. Dai, Y., Liu, J., Bai, Y., Tong, Z., Duan, L.Y.: Dual-refinement: joint label and feature refinement for unsupervised domain adaptive person re-identification. IEEE Trans. Image Process. 30, 7815–7829 (2021)

    Article  Google Scholar 

  4. Dai, Z., Wang, G., Yuan, W., Zhu, S., Tan, P.: Cluster contrast for unsupervised person re-identification. In: Proceedings of the Asian Conference on Computer Vision (ACCV), pp. 1142–1160 (2022)

    Google Scholar 

  5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding (2018). arXiv:1810.04805

  6. Ding, Z., Ding, C., Shao, Z., Tao, D.: Semantically self-aligned network for text-to-image part-aware person re-identification (2021). arXiv:2107.12666

  7. Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, vol. 96, pp. 226–231 (1996)

    Google Scholar 

  8. Fu, Y., Wei, Y., Wang, G., Zhou, Y., Shi, H., Huang, T.S.: Self-similarity grouping: a simple unsupervised cross domain adaptation approach for person re-identification. In: proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6112–6121 (2019)

    Google Scholar 

  9. Ge, Y., Chen, D., Li, H.: Mutual mean-teaching: Pseudo label refinery for unsupervised domain adaptation on person re-identification (2020). arXiv:2001.01526

  10. Ge, Y., Zhu, F., Chen, D., Zhao, R., et al.: Self-paced contrastive learning with hybrid memory for domain adaptive object re-id. Adv. Neural. Inf. Process. Syst. 33, 11309–11321 (2020)

    Google Scholar 

  11. Han, X., He, S., Zhang, L., Xiang, T.: Text-based person search with limited data (2021). arXiv:2110.10807

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

    Google Scholar 

  13. Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification (2017). arXiv:1703.07737

  14. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  15. Jiang, D., Ye, M.: Cross-modal implicit relation reasoning and aligning for text-to-image person retrieval. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2787–2797 (2023)

    Google Scholar 

  16. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv:1412.6980

  17. Li, S., Cao, M., Zhang, M.: Learning semantic-aligned feature representation for text-based person search. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2724–2728. IEEE (2022)

    Google Scholar 

  18. Li, S., Xiao, T., Li, H., Yang, W., Wang, X.: Identity-aware textual-visual matching with latent co-attention. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1890–1899 (2017)

    Google Scholar 

  19. Li, S., Xiao, T., Li, H., Zhou, B., Yue, D., Wang, X.: Person search with natural language description. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5187–5196 (2017)

    Google Scholar 

  20. Li, W., Tan, L., Dai, P., Zhang, Y.: Prompt decoupling for text-to-image person re-identification (2024). arXiv:2401.02173

  21. Lin, Y., Dong, X., Zheng, L., Yan, Y., Yang, Y.: A bottom-up clustering approach to unsupervised person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8738–8745 (2019)

    Google Scholar 

  22. Lin, Y., Xie, L., Wu, Y., Yan, C., Tian, Q.: Unsupervised person re-identification via softened similarity learning. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3387–3396 (2020)

    Google Scholar 

  23. Oord, A.V.D., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding (2018). arXiv:1807.03748

  24. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning, vol. 139, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  25. Shao, Z., Zhang, X., Ding, C., Wang, J., Wang, J.: Unified pre-training with pseudo texts for text-to-image person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11174–11184 (2023)

    Google Scholar 

  26. Shu, X., Wen, W., Wu, H., Chen, K., Song, Y., Qiao, R., Ren, B., Wang, X.: See finer, see more: Implicit modality alignment for text-based person retrieval. In: European Conference on Computer Vision, pp. 624–641. Springer (2022)

    Google Scholar 

  27. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556

  28. Song, L., Wang, C., Zhang, L., Du, B., Zhang, Q., Huang, C., Wang, X.: Unsupervised domain adaptive re-identification: theory and practice. Pattern Recogn. 102, 107173 (2020)

    Article  Google Scholar 

  29. Wang, C., Luo, Z., Lin, Y., Li, S.: Text-based person search via multi-granularity embedding learning. In: IJCAI, pp. 1068–1074 (2021)

    Google Scholar 

  30. Wang, C., Luo, Z., Lin, Y., Li, S.: Improving embedding learning by virtual attribute decoupling for text-based person search. Neural Comput. Appl. 1–23 (2022)

    Google Scholar 

  31. Wang, C., Luo, Z., Zhong, Z., Li, S.: Divide-and-merge the embedding space for cross-modality person search. Neurocomputing 463, 388–399 (2021)

    Article  Google Scholar 

  32. Wang, Z., Fang, Z., Wang, J., Yang, Y.: Vitaa: visual-textual attributes alignment in person search by natural language. In: Computer Vision–ECCV 2020, pp. 402–420. Springer (2020)

    Google Scholar 

  33. Wang, Z., Zhu, A., Xue, J., Wan, X., Liu, C., Wang, T., Li, Y.: Caibc: capturing all-round information beyond color for text-based person retrieval. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 5314–5322 (2022)

    Google Scholar 

  34. Yan, S., Dong, N., Zhang, L., Tang, J.: Clip-driven fine-grained text-image person re-identification. IEEE Trans. Image Process. 32, 6032–6046 (2023)

    Article  Google Scholar 

  35. Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.: Deep learning for person re-identification: a survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 2872–2893 (2022)

    Article  Google Scholar 

  36. Zeng, K., Ning, M., Wang, Y., Guo, Y.: Hierarchical clustering with hard-batch triplet loss for person re-identification. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13654–13662 (2020)

    Google Scholar 

  37. Zhao, S., Gao, C., Shao, Y., Zheng, W.S., Sang, N.: Weakly supervised text-based person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11395–11404 (2021)

    Google Scholar 

  38. Zheng, Y., Zhao, X., Lan, C., Zhang, X., Huang, B., Yang, J., Yu, D.: Cpcl: cross-modal prototypical contrastive learning for weakly supervised text-based person re-identification (2024). arXiv:2401.10011

  39. Zhong, Z., Zheng, L., Li, S., Yang, Y.: Generalizing a person retrieval model hetero-and homogeneously. In: Proceedings of the European conference on computer vision (ECCV), pp. 172–188 (2018)

    Google Scholar 

  40. Zhu, A., Wang, Z., Li, Y., Wan, X., Jin, J., Wang, T., Hu, F., Hua, G.: Dssl: deep surroundings-person separation learning for text-based person retrieval. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 209–217 (2021)

    Google Scholar 

Download references

Acknowledgement

This work was partially supported by the China Postdoctoral Science Foundation (2023M741305), the Fundamental Research Funds for the Central Universities (CCNU23XJ001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chengji Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nie, W., Wang, C., Sun, H., Xie, W. (2025). Image-Centered Pseudo Label Generation for Weakly Supervised Text-Based Person Re-Identification. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15042. Springer, Singapore. https://doi.org/10.1007/978-981-97-8858-3_33

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-8858-3_33

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-8857-6

  • Online ISBN: 978-981-97-8858-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics