research-article

Neural Architectures for Feature Embedding in Person Re-Identification: A Comparative View

Authors:

Javier Domínguez-Martín,

María J. Gómez-Silva,

Arturo De la EscaleraAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology, Volume 14, Issue 5

Article No.: 91, Pages 1 - 21

https://doi.org/10.1145/3610298

Published: 09 October 2023 Publication History

Abstract

Solving Person Re-Identification (Re-Id) through Deep Convolutional Neural Networks is a daunting challenge due to the small size and variety of the training data, especially in Single-Shot Re-Id, where only two images per person are available. The lack of training data causes the overfitting of the deep neural models, leading to degenerated performance.

This article explores a wide assortment of neural architectures that have been commonly used for object classification and analyzes their suitability in a Re-Id model. These architectures have been trained through a Triplet Model and evaluated over two challenging Single-Shot Re-Id datasets, PRID2011 and CUHK. This comparative study is aimed at obtaining the best-performing architectures and some concluding guidance to optimize the features embedding for the Re-Identification task. The obtained results present Inception-ResNet and DenseNet as potentially useful models, especially when compared with other methods, specifically designed for solving Re-Id.

References

[1]

Ejaz Ahmed, Michael Jones, and Tim Marks. 2015. An improved deep learning architecture for person re-identification. 3908–3916. https://ieeexplore.ieee.org/document/7299016

[2]

Kaixuan Chen, Lina Yao, Dalin Zhang, Xianzhi Wang, Xiaojun Chang, and Feiping Nie. 2020. A semisupervised recurrent convolutional attention model for human activity recognition. IEEE Trans. Neural Netw. Learn. Syst. 31, 5 (2020), 1747–1756. DOI:DOI:

[3]

Shi-Zhe Chen, Chun-Chao Guo, and Jian-Huang Lai. 2016. Deep ranking for person re-identification via joint representation learning. IEEE Trans. Image Process. 25, 5 (2016), 2353–2367.

Digital Library

[4]

Yanbei Chen, Xiatian Zhu, and Shaogang Gong. 2017. Person re-identification by deep learning multi-scale representations. 2590–2600. https://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w37/Chen_Person_Re-Identification_by_ICCV_2017_paper.pdf

[5]

De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng. 2016. Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1335–1344.

[6]

Jason V. Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S. Dhillon. 2007. Information-theoretic metric learning. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). Association for Computing Machinery, New York, NY, 209–216.

Digital Library

[7]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248–255. DOI:DOI:

[8]

Mert Dikmen, Emre Akbas, Thomas S. Huang, and Narendra Ahuja. 2011. Pedestrian recognition with a learned metric. In Proceedings of the 10th Asian Conference on Computer Vision(Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 6495 (2011), 501–512.

[9]

Shengyong Ding, Liang Lin, Guangrun Wang, and Hongyang Chao. 2015. Deep Feature Learning with Relative Distance Comparison for Person Re-identification. (2015). https://arxiv.org/abs/1512.03622

[10]

John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, July (2011), 2121–2159.

Digital Library

[11]

M. Farenzena, L. Bazzani, A. Perina, V. Murino, and M. Cristani. 2010. Person re-identification by symmetry-driven accumulation of local features. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2360–2367.

[12]

Lishuai Gao, Hua Zhang, Zan Gao, Weili Guan, Zhiyong Cheng, and Meng Wang. 2020. Texture semantically aligned with visibility-aware for partial person re-identification. In Proceedings of the 28th ACM International Conference on Multimedia (MM’20). Association for Computing Machinery, New York, NY, 3771–3779. DOI:DOI:

Digital Library

[13]

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580–587.

Digital Library

[14]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. 249–256.

[15]

Maria Jose Gomez-Silva, Jose Maria Armingol, and Arturo de la Escalera. 2017. Deep part features learning by a normalised double-margin-based contrastive loss function for person re-identification. In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP’17). 277–285.

[16]

María José Gómez-Silva, Jose M. Armingol, and Arturo de la Escalera. 2019. Triplet permutation method for deep learning of single-shot person re-identification. (2019). https://arxiv.org/abs/2003.08303

[17]

María Gómez-Silva, Ebroul Izquierdo, Arturo de la Escalera, and J. M. Armingol. 2019. Transferring learning from multi-person tracking to person re-identification. Integ. Comput.-aid. Eng. 26 (04 2019), 1–16.

[18]

María J. Gómez-Silva. 2020. Deep multi-shot network for modelling appearance similarity in multi-person tracking applications. (2020). https://link.springer.com/article/10.1007/s11042-020-10256-2

[19]

Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. IEEE, 1735–1742.

Digital Library

[20]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[21]

David Held, Sebastian Thrun, and Silvio Savarese. 2016. Learning to track at 100 FPS with deep regression networks. In Proceedings of the European Conference Computer Vision (ECCV’16).

[22]

Martin Hirzer, Csaba Beleznai, Peter M. Roth, and Horst Bischof. 2011. Person re-identification by descriptive and discriminative classification. In Proceedings of the Scandinavian Conference on Image Analysis. Springer, 91–102.

Digital Library

[23]

Martin Hirzer, Peter M. Roth, and Horst Bischof. 2012. Person re-identification by efficient impostor-based metric learning. In Proceedings of the IEEE 9th International Conference on Advanced Video and Signal-based Surveillance. 203–208.

Digital Library

[24]

Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2018. Densely connected convolutional networks. (2018). https://arxiv.org/abs/1608.06993

[25]

Gregory R. Koch. 2015. Siamese neural networks for one-shot image recognition. https://www.cs.cmu.edu/rsalakhu/papers/oneshot1.pdf

[26]

Martin Köstinger, Martin Hirzer, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2012. Large scale metric learning from equivalence constraints. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]

Laura Leal-Taixé, Cristian Canton-Ferrer, and Konrad Schindler. 2016. Learning by tracking: Siamese CNN for robust target association. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 33–40.

[28]

Qingming Leng, Mang Ye, and Qi Tian. 2019. A survey of open-world person re-identification. IEEE Trans. Circ. Syst. Vid. Technol. PP (02 2019), 1–1.

[29]

Peng Li, Jiabin Zhang, Zheng Zhu, Yanwei Li, Lu Jiang, and Guan Huang. 2019. State-aware re-identification feature for multi-target multi-camera tracking. (2019). https://arxiv.org/abs/1906.01357

[30]

Wei Li and Xiaogang Wang. 2013. Locally aligned feature transforms across views. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

Digital Library

[31]

Wei Li, Rui Zhao, and Xiaogang Wang. 2012. Human reidentification with transferred metric learning. In Proceedings of the Asian Conference on Computer Vision. Springer, 31–44.

[32]

Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. DeepReId: Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 152–159.

Digital Library

[33]

Z. Li, S. Chang, F. Liang, T. S. Huang, L. Cao, and J. R. Smith. 2013. Learning locally-adaptive decision functions for person verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3610–3617.

Digital Library

[34]

G. Lisanti, I. Masi, A. D. Bagdanov, and A. D. Bimbo. 2015. Person re-identification by iterative re-weighted sparse ranking. IEEE Trans. Pattern Anal. Mach. Intell. 37, 8 (2015), 1629–1642.

Digital Library

[35]

Hao Liu, Jiashi Feng, Meibin Qi, Jianguo Jiang, and Shuicheng Yan. 2017. End-to-end comparative attention networks for person re-identification. IEEE Trans. Image Process. 26, 7 (July 2017), 3492–3506.

Digital Library

[36]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. Lect. Notes Comput. Sci. abs/1512.02325 (2016), 21–37.

[37]

Minnan Luo, Xiaojun Chang, Liqiang Nie, Yi Yang, Alexander G. Hauptmann, and Qinghua Zheng. 2018. An adaptive semisupervised feature analysis for video semantic recognition. IEEE Trans. Cybern. 48, 2 (2018), 648–660. DOI:DOI:

[38]

Huadong Ma, Chengbin Zeng, and Charles X. Ling. 2012. A reliable people counting system via multiple cameras. ACM Trans. Intell. Syst. Technol. 3, 2, Article 31 (Feb. 2012), 22 pages.

Digital Library

[39]

T. Matsukawa, T. Okabe, E. Suzuki, and Y. Sato. 2016. Hierarchical Gaussian descriptor for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1363–1372.

[40]

Hyeonjoon Moon and P. Jonathon Phillips. 2001. Computational and performance aspects of PCA-based face-recognition algorithms. Perception 30, 3 (2001), 303–321.

[41]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15).

[42]

Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. (2015). https://arxiv.org/abs/1409.1556

[43]

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. 2016. Inception-v4, Inception-ResNet and the impact of residual connections on learning. (2016). https://arxiv.org/abs/1602.07261

[44]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2014. Going deeper with convolutions. (2014). https://arxiv.org/abs/1409.4842

[45]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015. Rethinking the inception architecture for computer vision. (2015). https://arxiv.org/abs/1512.00567

[46]

Evgeniya Ustinova, Yaroslav Ganin, and Victor Lempitsky. 2017. Multi-region bilinear convolutional neural networks for person re-identification. In Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS’17). IEEE, 1–6.

[47]

Rahul Rama Varior, Mrinal Haloi, and Gang Wang. 2016. Gated siamese convolutional neural network architecture for human re-identification. In Proceedings of the European Conference on Computer Vision. Springer, 791–808.

[48]

Jiang Wang, Yang Song, Thomas Leung, Chuck Rosenberg, Jingbin Wang, James Philbin, Bo Chen, and Ying Wu. 2014. Learning fine-grained image similarity with deep ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1386–1393.

Digital Library

[49]

Peng Wang and Xiangzhi Bai. 2018. Regional parallel structure based CNN for thermal infrared face identification. Integ. Comput.-aid. Eng.Preprint 25 (2018), 1–14.

[50]

Tong Xiao, Hongsheng Li, Wanli Ouyang, and Xiaogang Wang. 2016. Learning deep feature representations with domain guided dropout for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1249–1258.

[51]

Xin Xu, Lei Liu, Xiaolong Zhang, Weili Guan, and Ruimin Hu. 2021. Rethinking data collection for person re-identification: Active redundancy reduction. Pattern Recog. 113 (2021), 107827. DOI:DOI:

[52]

Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z. Li. 2014. Deep metric learning for person re-identification. In Proceedings of the 22nd International Conference on Pattern Recognition (ICPR’14). IEEE, 34–39.

Digital Library

[53]

Mengyao Zhai, Lei Chen, Greg Mori, and Mehrsan Javan Roshtkhari. 2018. Deep learning of appearance models for online object tracking. In Proceedings of the European Conference on Computer Vision. Springer, 681–686.

[54]

Y. Zhang, B. Li, H. Lu, A. Irie, and X. Ruan. 2016. Sample-specific SVM learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1278–1287.

[55]

Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Person re-identification by salience matching. In Proceedings of the IEEE International Conference on Computer Vision. 2528–2535.

Digital Library

[56]

R. Zhao, W. Ouyang, and X. Wang. 2014. Learning mid-level filters for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 144–151.

Digital Library

[57]

Runwu Zhou, Xiaojun Chang, Lei Shi, Yi-Dong Shen, Yi Yang, and Feiping Nie. 2020. Person reidentification via multi-feature fusion with adaptive graph learning. IEEE Trans. Neural Netw. Learn. Syst. 31, 5 (2020), 1592–1601. DOI:DOI:

[58]

Bohan Zhuang, Guosheng Lin, Chunhua Shen, and Ian Reid. 2016. Fast training of triplet-based deep binary embedding networks. (2016). https://arxiv.org/abs/1603.02844

[59]

Bohan Zhuang, Guosheng Lin, Chunhua Shen, and Ian Reid. 2016. Fast training of triplet-based deep binary embedding networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5955–5964.

[60]

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning transferable architectures for scalable image recognition. (2018). https://arxiv.org/abs/1707.07012

Cited By

Wu JHong RTang S(2024)Intermediary-Generated Bridge Network for RGB-D Cross-Modal Re-IdentificationACM Transactions on Intelligent Systems and Technology10.1145/368206615:6(1-25)Online publication date: 29-Jul-2024
https://dl.acm.org/doi/10.1145/3682066

Index Terms

Neural Architectures for Feature Embedding in Person Re-Identification: A Comparative View
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Matching
        Object identification
      2. Computer vision representations
        Appearance and texture representations

Recommendations

Deep feature embedding learning for person re-identification based on lifted structured loss

Person re-identification (re-id) aims at matching the same individual in videos captured by multiple cameras, and much progress has been made in recent years due to large scale pedestrian data sets and deep learning-based techniques. In this paper, we ...
Person re-identification by the asymmetric triplet and identification loss function

Person re-identification(re-id) aims to match the same individuals across different non-overlapping camera views. In this paper, we analyze the effectiveness of two widely used triplet loss and softmax loss on person re-id task. We conclude that the ...
Cross-dataset person re-identification using deep convolutional neural networks: effects of context and domain adaptation

Over the past years, the impact of surveillance systems on public safety increases dramatically. One significant challenge in this domain is person re-identification, which aims to detect whether a person has already been captured by another camera in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 14, Issue 5

October 2023

472 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3615589

Editor:
Huan Liu
Arizona State University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 October 2023

Online AM: 21 July 2023

Accepted: 13 July 2023

Revised: 24 May 2023

Received: 02 June 2021

Published in TIST Volume 14, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Spanish Government through the CICYT projects
Universidad Carlos III of Madrid through (PEAVAUTO-CMUC3M), and the Comunidad de Madrid through SEGVAUTO-4.0-CM

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
159
Total Downloads

Downloads (Last 12 months)92
Downloads (Last 6 weeks)7

Reflects downloads up to 08 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu JHong RTang S(2024)Intermediary-Generated Bridge Network for RGB-D Cross-Modal Re-IdentificationACM Transactions on Intelligent Systems and Technology10.1145/368206615:6(1-25)Online publication date: 29-Jul-2024
https://dl.acm.org/doi/10.1145/3682066

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents