Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Neural Architectures for Feature Embedding in Person Re-Identification: A Comparative View

Published: 09 October 2023 Publication History

Abstract

Solving Person Re-Identification (Re-Id) through Deep Convolutional Neural Networks is a daunting challenge due to the small size and variety of the training data, especially in Single-Shot Re-Id, where only two images per person are available. The lack of training data causes the overfitting of the deep neural models, leading to degenerated performance.
This article explores a wide assortment of neural architectures that have been commonly used for object classification and analyzes their suitability in a Re-Id model. These architectures have been trained through a Triplet Model and evaluated over two challenging Single-Shot Re-Id datasets, PRID2011 and CUHK. This comparative study is aimed at obtaining the best-performing architectures and some concluding guidance to optimize the features embedding for the Re-Identification task. The obtained results present Inception-ResNet and DenseNet as potentially useful models, especially when compared with other methods, specifically designed for solving Re-Id.

References

[1]
Ejaz Ahmed, Michael Jones, and Tim Marks. 2015. An improved deep learning architecture for person re-identification. 3908–3916. https://ieeexplore.ieee.org/document/7299016
[2]
Kaixuan Chen, Lina Yao, Dalin Zhang, Xianzhi Wang, Xiaojun Chang, and Feiping Nie. 2020. A semisupervised recurrent convolutional attention model for human activity recognition. IEEE Trans. Neural Netw. Learn. Syst. 31, 5 (2020), 1747–1756. DOI:DOI:
[3]
Shi-Zhe Chen, Chun-Chao Guo, and Jian-Huang Lai. 2016. Deep ranking for person re-identification via joint representation learning. IEEE Trans. Image Process. 25, 5 (2016), 2353–2367.
[4]
Yanbei Chen, Xiatian Zhu, and Shaogang Gong. 2017. Person re-identification by deep learning multi-scale representations. 2590–2600. https://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w37/Chen_Person_Re-Identification_by_ICCV_2017_paper.pdf
[5]
De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng. 2016. Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1335–1344.
[6]
Jason V. Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S. Dhillon. 2007. Information-theoretic metric learning. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). Association for Computing Machinery, New York, NY, 209–216.
[7]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248–255. DOI:DOI:
[8]
Mert Dikmen, Emre Akbas, Thomas S. Huang, and Narendra Ahuja. 2011. Pedestrian recognition with a learned metric. In Proceedings of the 10th Asian Conference on Computer Vision(Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 6495 (2011), 501–512.
[9]
Shengyong Ding, Liang Lin, Guangrun Wang, and Hongyang Chao. 2015. Deep Feature Learning with Relative Distance Comparison for Person Re-identification. (2015). https://arxiv.org/abs/1512.03622
[10]
John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, July (2011), 2121–2159.
[11]
M. Farenzena, L. Bazzani, A. Perina, V. Murino, and M. Cristani. 2010. Person re-identification by symmetry-driven accumulation of local features. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2360–2367.
[12]
Lishuai Gao, Hua Zhang, Zan Gao, Weili Guan, Zhiyong Cheng, and Meng Wang. 2020. Texture semantically aligned with visibility-aware for partial person re-identification. In Proceedings of the 28th ACM International Conference on Multimedia (MM’20). Association for Computing Machinery, New York, NY, 3771–3779. DOI:DOI:
[13]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580–587.
[14]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. 249–256.
[15]
Maria Jose Gomez-Silva, Jose Maria Armingol, and Arturo de la Escalera. 2017. Deep part features learning by a normalised double-margin-based contrastive loss function for person re-identification. In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP’17). 277–285.
[16]
María José Gómez-Silva, Jose M. Armingol, and Arturo de la Escalera. 2019. Triplet permutation method for deep learning of single-shot person re-identification. (2019). https://arxiv.org/abs/2003.08303
[17]
María Gómez-Silva, Ebroul Izquierdo, Arturo de la Escalera, and J. M. Armingol. 2019. Transferring learning from multi-person tracking to person re-identification. Integ. Comput.-aid. Eng. 26 (04 2019), 1–16.
[18]
María J. Gómez-Silva. 2020. Deep multi-shot network for modelling appearance similarity in multi-person tracking applications. (2020). https://link.springer.com/article/10.1007/s11042-020-10256-2
[19]
Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. IEEE, 1735–1742.
[20]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[21]
David Held, Sebastian Thrun, and Silvio Savarese. 2016. Learning to track at 100 FPS with deep regression networks. In Proceedings of the European Conference Computer Vision (ECCV’16).
[22]
Martin Hirzer, Csaba Beleznai, Peter M. Roth, and Horst Bischof. 2011. Person re-identification by descriptive and discriminative classification. In Proceedings of the Scandinavian Conference on Image Analysis. Springer, 91–102.
[23]
Martin Hirzer, Peter M. Roth, and Horst Bischof. 2012. Person re-identification by efficient impostor-based metric learning. In Proceedings of the IEEE 9th International Conference on Advanced Video and Signal-based Surveillance. 203–208.
[24]
Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2018. Densely connected convolutional networks. (2018). https://arxiv.org/abs/1608.06993
[25]
Gregory R. Koch. 2015. Siamese neural networks for one-shot image recognition. https://www.cs.cmu.edu/rsalakhu/papers/oneshot1.pdf
[26]
Martin Köstinger, Martin Hirzer, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2012. Large scale metric learning from equivalence constraints. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[27]
Laura Leal-Taixé, Cristian Canton-Ferrer, and Konrad Schindler. 2016. Learning by tracking: Siamese CNN for robust target association. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 33–40.
[28]
Qingming Leng, Mang Ye, and Qi Tian. 2019. A survey of open-world person re-identification. IEEE Trans. Circ. Syst. Vid. Technol. PP (02 2019), 1–1.
[29]
Peng Li, Jiabin Zhang, Zheng Zhu, Yanwei Li, Lu Jiang, and Guan Huang. 2019. State-aware re-identification feature for multi-target multi-camera tracking. (2019). https://arxiv.org/abs/1906.01357
[30]
Wei Li and Xiaogang Wang. 2013. Locally aligned feature transforms across views. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[31]
Wei Li, Rui Zhao, and Xiaogang Wang. 2012. Human reidentification with transferred metric learning. In Proceedings of the Asian Conference on Computer Vision. Springer, 31–44.
[32]
Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. DeepReId: Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 152–159.
[33]
Z. Li, S. Chang, F. Liang, T. S. Huang, L. Cao, and J. R. Smith. 2013. Learning locally-adaptive decision functions for person verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3610–3617.
[34]
G. Lisanti, I. Masi, A. D. Bagdanov, and A. D. Bimbo. 2015. Person re-identification by iterative re-weighted sparse ranking. IEEE Trans. Pattern Anal. Mach. Intell. 37, 8 (2015), 1629–1642.
[35]
Hao Liu, Jiashi Feng, Meibin Qi, Jianguo Jiang, and Shuicheng Yan. 2017. End-to-end comparative attention networks for person re-identification. IEEE Trans. Image Process. 26, 7 (July 2017), 3492–3506.
[36]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. Lect. Notes Comput. Sci. abs/1512.02325 (2016), 21–37.
[37]
Minnan Luo, Xiaojun Chang, Liqiang Nie, Yi Yang, Alexander G. Hauptmann, and Qinghua Zheng. 2018. An adaptive semisupervised feature analysis for video semantic recognition. IEEE Trans. Cybern. 48, 2 (2018), 648–660. DOI:DOI:
[38]
Huadong Ma, Chengbin Zeng, and Charles X. Ling. 2012. A reliable people counting system via multiple cameras. ACM Trans. Intell. Syst. Technol. 3, 2, Article 31 (Feb. 2012), 22 pages.
[39]
T. Matsukawa, T. Okabe, E. Suzuki, and Y. Sato. 2016. Hierarchical Gaussian descriptor for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1363–1372.
[40]
Hyeonjoon Moon and P. Jonathon Phillips. 2001. Computational and performance aspects of PCA-based face-recognition algorithms. Perception 30, 3 (2001), 303–321.
[41]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15).
[42]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. (2015). https://arxiv.org/abs/1409.1556
[43]
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. 2016. Inception-v4, Inception-ResNet and the impact of residual connections on learning. (2016). https://arxiv.org/abs/1602.07261
[44]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2014. Going deeper with convolutions. (2014). https://arxiv.org/abs/1409.4842
[45]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015. Rethinking the inception architecture for computer vision. (2015). https://arxiv.org/abs/1512.00567
[46]
Evgeniya Ustinova, Yaroslav Ganin, and Victor Lempitsky. 2017. Multi-region bilinear convolutional neural networks for person re-identification. In Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS’17). IEEE, 1–6.
[47]
Rahul Rama Varior, Mrinal Haloi, and Gang Wang. 2016. Gated siamese convolutional neural network architecture for human re-identification. In Proceedings of the European Conference on Computer Vision. Springer, 791–808.
[48]
Jiang Wang, Yang Song, Thomas Leung, Chuck Rosenberg, Jingbin Wang, James Philbin, Bo Chen, and Ying Wu. 2014. Learning fine-grained image similarity with deep ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1386–1393.
[49]
Peng Wang and Xiangzhi Bai. 2018. Regional parallel structure based CNN for thermal infrared face identification. Integ. Comput.-aid. Eng.Preprint 25 (2018), 1–14.
[50]
Tong Xiao, Hongsheng Li, Wanli Ouyang, and Xiaogang Wang. 2016. Learning deep feature representations with domain guided dropout for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1249–1258.
[51]
Xin Xu, Lei Liu, Xiaolong Zhang, Weili Guan, and Ruimin Hu. 2021. Rethinking data collection for person re-identification: Active redundancy reduction. Pattern Recog. 113 (2021), 107827. DOI:DOI:
[52]
Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z. Li. 2014. Deep metric learning for person re-identification. In Proceedings of the 22nd International Conference on Pattern Recognition (ICPR’14). IEEE, 34–39.
[53]
Mengyao Zhai, Lei Chen, Greg Mori, and Mehrsan Javan Roshtkhari. 2018. Deep learning of appearance models for online object tracking. In Proceedings of the European Conference on Computer Vision. Springer, 681–686.
[54]
Y. Zhang, B. Li, H. Lu, A. Irie, and X. Ruan. 2016. Sample-specific SVM learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1278–1287.
[55]
Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Person re-identification by salience matching. In Proceedings of the IEEE International Conference on Computer Vision. 2528–2535.
[56]
R. Zhao, W. Ouyang, and X. Wang. 2014. Learning mid-level filters for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 144–151.
[57]
Runwu Zhou, Xiaojun Chang, Lei Shi, Yi-Dong Shen, Yi Yang, and Feiping Nie. 2020. Person reidentification via multi-feature fusion with adaptive graph learning. IEEE Trans. Neural Netw. Learn. Syst. 31, 5 (2020), 1592–1601. DOI:DOI:
[58]
Bohan Zhuang, Guosheng Lin, Chunhua Shen, and Ian Reid. 2016. Fast training of triplet-based deep binary embedding networks. (2016). https://arxiv.org/abs/1603.02844
[59]
Bohan Zhuang, Guosheng Lin, Chunhua Shen, and Ian Reid. 2016. Fast training of triplet-based deep binary embedding networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5955–5964.
[60]
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning transferable architectures for scalable image recognition. (2018). https://arxiv.org/abs/1707.07012

Cited By

View all
  • (2024)Intermediary-Generated Bridge Network for RGB-D Cross-Modal Re-IdentificationACM Transactions on Intelligent Systems and Technology10.1145/368206615:6(1-25)Online publication date: 29-Jul-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 14, Issue 5
October 2023
472 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3615589
  • Editor:
  • Huan Liu
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 October 2023
Online AM: 21 July 2023
Accepted: 13 July 2023
Revised: 24 May 2023
Received: 02 June 2021
Published in TIST Volume 14, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Single-shot person re-identification
  2. Deep Convolutional Neural Network
  3. neural architecture
  4. triplet loss

Qualifiers

  • Research-article

Funding Sources

  • Spanish Government through the CICYT projects
  • Universidad Carlos III of Madrid through (PEAVAUTO-CMUC3M), and the Comunidad de Madrid through SEGVAUTO-4.0-CM

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)92
  • Downloads (Last 6 weeks)7
Reflects downloads up to 08 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Intermediary-Generated Bridge Network for RGB-D Cross-Modal Re-IdentificationACM Transactions on Intelligent Systems and Technology10.1145/368206615:6(1-25)Online publication date: 29-Jul-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media