Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Fusion pose guidance and transformer feature enhancement for person re-identification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In spite of Convolutional Neural Network (CNN) has dominated in the area of Person Re-Identification, Transformer-based methods have emerged with their advantages in computer vision for processing long sequences in recent two years. In this work, for the purpose of reinforcing complementary advantages of Transformer and CNN in computer vision, a concise method combining Convolution and Transformer is proposed to boost the performance. Firstly, a convolutional network with attention mechanism is employed to generate features with channel and inter-channel relationship information. Moreover, a feature enhancement module is designed to combine pose information and ViT information, and the heatmap generated by the pose estimator is applied to guide ViT features to become good discriminative features. Finally, a relationship reinforced transformer layer is proposed to effectively increase the relationship between features. Experimental results show that the proposed method achieves superior results than interrelated advanced methods on two large-scale person re-Identification benchmark datasets and one occlusion dataset. For Market-1501, our method called Fusion Pose Guidance and Transformer Feature Enhancement for Person Re-Identification gain 94.3% and 87.0% for Rank-1 and mAP respectively. For DukeMTMC-reID our method reaches 88.7% and 77.2% for Rank-1 and mAP respectively. Especially, for the dataset Occluded-Duke, compared with the state of art model HONet, our method, with up to 2.7% and 4.5% performance gains in Rank-1 and mAP respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Adil M, Mamoon S, Zakir A et al (2020) Multi scale-adaptive super-resolution person re-identification using gan. IEEE Access 8(177):351–177,362

    Google Scholar 

  2. Ahmed E, Jones M, Marks TK (2015) An improved deep learning architecture for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3908–3916

  3. Ajagbe SA, Amuda KA, Oladipupo MA et al (2021) Multi-classification of alzheimer disease on magnetic resonance images (mri) using deep convolutional neural network (dcnn) approaches. Int J Adv Comput Res 11(53):51

    Article  Google Scholar 

  4. Ajagbe SA, Oki OA, Oladipupo MA et al (2022) Investigating the efficiency of deep learning models in bioinspired object detection. In: 2022 International Conference on Electrical, Computer and Energy Technologies (ICECET), IEEE, pp 1–6

  5. Cambria E, White B (2014) Jumping nlp curves: a review of natural language processing research. IEEE Comput Intell Mag 9(2):48–57

    Article  Google Scholar 

  6. Cao Z, Simon T, Wei SE et al (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7291–7299

  7. Carion N, Massa F, Synnaeve G et al (2020) End-to-end object detection with transformers. In: European conference on computer vision, Springer, pp 213–229

  8. Chen B, Deng W, Hu J (2019) Mixed high-order attention network for person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 371–381

  9. Chen T, Ding S, Xie J et al (2019) Abd-net: Attentive but diverse person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8351–8361

  10. Chen H, Wang Y, Guo T et al (2021) Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 12,299–12,310

  11. Cheng D, Gong Y, Zhou S et al (2016) Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In: Proceedings of the iEEE conference on computer vision and pattern recognition, pp 1335–1344

  12. De Boer PT, Kroese DP, Mannor S et al (2005) A tutorial on the cross-entropy method. Ann Oper Res 134(1):19–67

    Article  MathSciNet  Google Scholar 

  13. Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:201011929

  14. Gao H, Chen S, Zhang Z (2019) Parts semantic segmentation aware representation learning for person re-identification. Appl Sci 9(6):1239

    Article  Google Scholar 

  15. Gao S, Wang J, Lu H et al (2020) Pose-guided visible part matching for occluded person reid. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 11,744–11,752

  16. Ge Y, Li Z, Zhao H et al (2018) Fd-gan: Pose-guided feature distilling gan for robust person re-identification. In: Advances in neural information processing systems, pp 1229–1240

  17. Han L, Wang P, Yin Z et al (2020) Exploiting better feature aggregation for video object detection. In: Proceedings of the 28th ACM International conference on multimedia, pp 1469–1477

  18. He L, Liang J, Li H et al (2018) Deep spatial feature reconstruction for partial person re-identification: Alignment-free approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7073–7082

  19. He S, Luo H, Wang P et al (2021) Transreid: Transformer-based object re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 15,013–15,022

  20. He L, Sun Z, Zhu Y et al (2018) Recognizing partial biometric patterns. arXiv:181007399

  21. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  22. Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). arXiv:160608415

  23. Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv:170307737

  24. Huang H, Li D, Zhang Z et al (2018) Adversarially occluded samples for person re-identification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 5098–5107

  25. Huang Y, Sheng H, Zheng Y et al (2017) Deepdiff: Learning deep difference features on human body parts for person re-identification. Neurocomputing 241:191–203

    Article  Google Scholar 

  26. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, PMLR, pp 448–456

  27. Kumar T A, Rajmohan R, Pavithra M et al (2022) Automatic face mask detection system in public transportation in smart cities using iot and deep learning. Electronics 11(6):904

    Article  Google Scholar 

  28. Li D, Chen X, Zhang Z et al (2017) Learning deep context-aware features over body and latent parts for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 384–393

  29. Li W, Zhao R, Xiao T et al (2014) Deepreid: Deep filter pairing neural network for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 152–159

  30. Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, Springer, pp 740–755

  31. Lin Y, Zheng L, Zheng Z et al (2019) Improving person re-identification by attribute and identity learning. Pattern Recogn 95:151–161

    Article  ADS  Google Scholar 

  32. Luo C, Chen Y, Wang N et al (2019) Spectral feature transformation for person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4976–4985

  33. Luo H, Gu Y, Liao X et al (2019) Bag of tricks and a strong baseline for deep person re-identification. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 1487–1495. https://doi.org/10.1109/CVPRW.2019.00190

  34. Matsukawa T, Suzuki E (2016) Person re-identification using cnn features learned from combination of attributes. In: 2016 23Rd international conference on pattern recognition (ICPR), IEEE, pp 2428–2433

  35. Miao J, Wu Y, Liu P et al (2019) Pose-guided feature alignment for occluded person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 542–551

  36. Park H, Ham B (2020) Relation network for person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, pp 11,839–11,847

  37. Ristani E, Tomasi C (2018) Features for multi-target multi-camera tracking and re-identification. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 6036–6046. https://doi.org/10.1109/CVPR.2018.00632

  38. Song C, Huang Y, Ouyang W et al (2018) Mask-guided contrastive attention model for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1179–1188

  39. Su C, Li J, Zhang S et al (2017) Pose-driven deep convolutional model for person re-identification. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 3980–3989. https://doi.org/10.1109/ICCV.2017.427

  40. Suh Y, Wang J, Tang S et al (2018) Part-aligned bilinear representations for person re-identification. In: Proceedings of the European conference on computer vision (ECCV). 402–419

  41. Sun P, Cao J, Jiang Y et al (2020) Transtrack: Multiple object tracking with transformer. arXiv:201215460

  42. Sun H, Chen Z, Yan S et al (2019) Mvp matching: A maximum-value perfect matching for mining hard samples, with application to person re-identification. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 6737–6747

  43. Sun Y, Cheng C, Zhang Y et al (2020) Circle loss: A unified perspective of pair similarity optimization. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 6398–6407

  44. Sun K, Xiao B, Liu D et al (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5693–5703

  45. Sun Y, Xu Q, Li Y et al (2019) Perceive where to focus: Learning visibility-aware part-level features for partial person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 393–402

  46. Sun Y, Zheng L, Yang Y et al (2018) Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European conference on computer vision (ECCV), pp 480–496

  47. Tay CP, Roy S, Yap KH (2019) Aanet: Attribute attention network for person re-identifications. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7134–7143

  48. Touvron H, Cord M, Douze M et al (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning, PMLR, pp 10,347–10,357

  49. Varior RR, Haloi M, Wang G (2016) Gated siamese convolutional neural network architecture for human re-identification. In: European conference on computer vision, Springer, pp 791–808

  50. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  51. Wang G, Yang S, Liu H et al (2020) High-order information matters: Learning relation and topology for occluded person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6449–6458

  52. Wei L, Zhang S, Yao H et al (2017) Glad: Global-local-alignment descriptor for pedestrian retrieval. In: Proceedings of the 25th ACM international conference on multimedia, pp 420–428

  53. Yuan L, Chen Y, Wang T et al (2021) Tokens-to-token vit: Training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 558–567

  54. Zhang Z, Lan C, Zeng W et al (2020) Relation-aware global attention for person re-identification. In: Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp 3186–3195

  55. Zhang X, Luo H, Fan X et al (2017) Alignedreid: Surpassing human-level performance in person re-identification. arXiv:171108184

  56. Zhao L, Li X, Zhuang Y et al (2017) Deeply-learned part-aligned representations for person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 3219–3228

  57. Zheng L, Huang Y, Lu H et al (2019) Pose-invariant embedding for deep person re-identification. IEEE Trans Image Process 28(9):4500–4509

    Article  ADS  MathSciNet  Google Scholar 

  58. Zheng L, Shen L, Tian L et al (2015) Scalable person re-identification: A benchmark. pp 1116–1124. https://doi.org/10.1109/ICCV.2015.133

  59. Zheng L, Yang Y, Hauptmann AG (2016) Person re-identification: Past, present and future. arXiv:161002984

  60. Zheng Z, Zheng L, Yang Y (2017) A discriminatively learned cnn embedding for person reidentification. ACM Trans Multimed Comput Commun Appl (TOMM) 14(1):1–20

    CAS  Google Scholar 

  61. Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by gan improve the person re-identification baseline in vitro. pp 3774–3782. https://doi.org/10.1109/ICCV.2017.405

  62. Zhong Z, Zheng L, Kang G et al (2020) Random erasing data augmentation. In: Proceedings of the AAAI conference on artificial intelligence, pp 13,001–13,008

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61972056, in part by the Hunan Provincial Natural Science Foundation of China under Grant 2021JJ30743, in part by the Degree & Post-graduate Education Reform Project of Hunan Province of China under Grant 2020JGZD043.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuren Zhou.

Ethics declarations

The authors declare that they have no conflicts of interest to report regarding the present study.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, S., Zou, W. Fusion pose guidance and transformer feature enhancement for person re-identification. Multimed Tools Appl 83, 21745–21763 (2024). https://doi.org/10.1007/s11042-023-15303-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15303-2

Keywords

Navigation