research-article

Multi-level Similarity Perception Network for Person Re-identification

Authors:

Xian-Sheng HuaAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 15, Issue 2

Article No.: 32, Pages 1 - 19

https://doi.org/10.1145/3309881

Published: 05 June 2019 Publication History

Abstract

In this article, we propose a novel deep Siamese architecture based on a convolutional neural network (CNN) and multi-level similarity perception for the person re-identification (re-ID) problem. According to the distinct characteristics of diverse feature maps, we effectively apply different similarity constraints to both low-level and high-level feature maps during training stage. Due to the introduction of appropriate similarity comparison mechanisms at different levels, the proposed approach can adaptively learn discriminative local and global feature representations, respectively, while the former is more sensitive in localizing part-level prominent patterns relevant to re-identifying people across cameras. Meanwhile, a novel strong activation pooling strategy is utilized on the last convolutional layer for abstract local-feature aggregation to pursue more representative feature representations. Based on this, we propose final feature embedding by simultaneously encoding original global features and discriminative local features. In addition, our framework has two other benefits: First, classification constraints can be easily incorporated into the framework, forming a unified multi-task network with similarity constraints. Second, as similarity-comparable information has been encoded in the network’s learning parameters via back-propagation, pairwise input is not necessary at test time. That means we can extract features of each gallery image and build an index in an off-line manner, which is essential for large-scale real-world applications. Experimental results on multiple challenging benchmarks demonstrate that our method achieves splendid performance compared with the current state-of-the-art approaches.

References

[1]

Ejaz Ahmed, Michael Jones, and Tim K. Marks. 2015. An improved deep learning architecture for person re-identification. In Proceedings of the CVPR.

[2]

Léon Bottou. 2012. Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade. Springer, 421--436.

[3]

Xiaobin Chang, Timothy M. Hospedales, and Tao Xiang. 2018. Multi-level factorisation net for person re-identification. In Proceedings of the CVPR.

[4]

Dapeng Chen, Zejian Yuan, Badong Chen, and Nanning Zheng. 2016b. Similarity learning with spatial constraints for person re-identification. In Proceedings of the CVPR.

[5]

Dapeng Chen, Zejian Yuan, Gang Hua, Nanning Zheng, and Jingdong Wang. 2015a. Similarity learning on an explicit polynomial kernel feature map for person re-identification. In Proceedings of the CVPR.

[6]

Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2016a. A multi-task deep network for person re-identification. Retrieved from arXiv preprint arXiv:1607.05369 (2016).

Digital Library

[7]

Ying-Cong Chen, Wei-Shi Zheng, and Jianhuang Lai. 2015b. Mirror representation for modeling view-specific transform in person re-identification. In Proceedings of the IJCAI.

Digital Library

[8]

Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a similarity metric discriminatively, with application to face verification. In Proceedings of the CVPR.

Digital Library

[9]

Michela Farenzena, Loris Bazzani, Alessandro Perina, Vittorio Murino, and Marco Cristani. 2010. Person re-identification by symmetry-driven accumulation of local features. In Proceedings of the CVPR.

[10]

Niloofar Gheissari, Thomas B. Sebastian, and Richard Hartley. 2006. Person reidentification using spatiotemporal appearance. In Proceedings of the CVPR.

Digital Library

[11]

Ross Girshick. 2015. Fast R-CNN. In Proceedings of the ICCV.

Digital Library

[12]

Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid. 2009. Is that you? Metric learning approaches for face identification. In Proceedings of the ICCV.

[13]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the ICML.

Digital Library

[14]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM-MM.

Digital Library

[15]

Cijo Jose and François Fleuret. 2016. Scalable metric learning via weighted approximate rank component analysis. In Proceedings of the ECCV.

[16]

Martin Koestinger, Martin Hirzer, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2012. Large scale metric learning from equivalence constraints. In Proceedings of the CVPR.

Digital Library

[17]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the NIPS.

Digital Library

[18]

Sheng Li, Ming Shao, and Yun Fu. 2015. Cross-view projective dictionary learning for person re-identification. In Proceedings of the IJCAI.

Digital Library

[19]

Wei Li, Rui Zhao, and Xiaogang Wang. 2012. Human reidentification with transferred metric learning. In Proceedings of the ACCV.

Digital Library

[20]

Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. Deepreid: Deep filter pairing neural network for person re-identification. In Proceedings of the CVPR.

Digital Library

[21]

Wei Li, Xiatian Zhu, and Shaogang Gong. 2017. Person re-identification by deep joint learning of multi-loss classification. In Proceedings of the IJCAI.

Digital Library

[22]

Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the CVPR.

[23]

Shengcai Liao and Stan Z. Li. 2015. Efficient PSD constrained asymmetric metric learning for person re-identification. In Proceedings of the ICCV.

Digital Library

[24]

Xiaokai Liu, Hongyu Wang, Yi Wu, Jimei Yang, and Ming-Hsuan Yang. 2015. An ensemble color model for human re-identification. In Proceedings of the WACV.

Digital Library

[25]

Tetsu Matsukawa, Takahiro Okabe, Einoshin Suzuki, and Yoichi Sato. 2016. Hierarchical Gaussian descriptor for person re-identification. In Proceedings of the CVPR.

[26]

Alexis Mignon and Frédéric Jurie. 2012. PCCA: A new approach for distance learning from sparse pairwise constraints. In Proceedings of the CVPR.

Digital Library

[27]

Hyeonjoon Moon and P. Jonathon Phillips. 2001. Computational and performance aspects of PCA-based face-recognition algorithms. Perception 30, 3 (2001), 303--321.

[28]

Sakrapee Paisitkriangkrai, Chunhua Shen, and Anton van den Hengel. 2015. Learning to rank in person re-identification with metric ensembles. In Proceedings of the CVPR.

[29]

Sateesh Pedagadi, James Orwell, Sergio Velastin, and Boghos Boghossian. 2013. Local fisher discriminant analysis for pedestrian re-identification. In Proceedings of the CVPR.

Digital Library

[30]

Bryan Prosser, Wei-Shi Zheng, Shaogang Gong, Tao Xiang, and Q. Mary. 2010. Person re-identification by support vector ranking. In Proceedings of the BMVC.

[31]

Xuelin Qian, Yanwei Fu, Yu-Gang Jiang, Tao Xiang, and Xiangyang Xue. 2017. Multi-scale deep learning architectures for person re-identification. In Proceedings of the ICCV.

[32]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the CVPR.

[33]

Chen Shen, Zhongming Jin, Yiru Zhao, Zhihang Fu, Rongxin Jiang, Yaowu Chen, and Xian-Sheng Hua. 2017a. Deep Siamese network with multi-level similarity perception for person re-identification. In Proceedings of the ACM-MM. ACM, 1942--1950.

Digital Library

[34]

Chen Shen, Chang Zhou, Zhongming Jin, Wenqing Chu, Rongxin Jiang, Yaowu Chen, and Xian-Sheng Hua. 2017b. Learning feature embedding with strong neural activations for fine-grained retrieval. In Proceedings of the ACM-MM Thematic Workshops.

Digital Library

[35]

Hailin Shi, Yang Yang, Xiangyu Zhu, Shengcai Liao, Zhen Lei, Weishi Zheng, and Stan Z. Li. 2016. Embedding deep metric for person re-identification: A study against large variations. In Proceedings of the ECCV.

[36]

Zhiyuan Shi, Timothy M. Hospedales, and Tao Xiang. 2015. Transferring a semantic representation for person re-identification and search. In Proceedings of the CVPR.

[37]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Retrieved from arXiv preprint arXiv:1409.1556.

[38]

Chi Su, Jianing Li, Shiliang Zhang, Junliang Xing, Wen Gao, and Qi Tian. 2017. Pose-driven deep convolutional model for person re-identification. In Proceedings of the ICCV.

[39]

Arulkumar Subramaniam, Moitreya Chatterjee, and Anurag Mittal. 2016. Deep neural networks with inexact matching for person re-identification. In Proceedings of the NIPS.

Digital Library

[40]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the CVPR.

[41]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the CVPR.

[42]

Rahul Rama Varior, Mrinal Haloi, and Gang Wang. 2016. Gated Siamese convolutional neural network architecture for human re-identification. In Proceedings of the ECCV.

[43]

Faqiang Wang, Wangmeng Zuo, Liang Lin, David Zhang, and Lei Zhang. 2016b. Joint learning of single-image and cross-image representations for person re-identification. In Proceedings of the CVPR.

[44]

Jingdong Wang, Ting Zhang, Jingkuan Song, Nicu Sebe, and Heng Tao Shen. 2016a. A survey on learning to hash. Retrieved from arXiv preprint arXiv:1606.00185.

[45]

Kilian Q. Weinberger, John Blitzer, and Lawrence Saul. 2006. Distance metric learning for large margin nearest neighbor classification. In Proceedings of the NIPS.

Digital Library

[46]

Tong Xiao, Hongsheng Li, Wanli Ouyang, and Xiaogang Wang. 2016. Learning deep feature representations with domain guided dropout for person re-identification. In Proceedings of the CVPR.

[47]

Fei Xiong, Mengran Gou, Octavia Camps, and Mario Sznaier. 2014. Person re-identification using kernel-based metric learning methods. In Proceedings of the ECCV.

[48]

Yang Yang, Shengcai Liao, Zhen Lei, and Stan Z Li. 2016. Large scale similarity learning using similar pairs for person verification. In Proceedings of the AAAI.

Digital Library

[49]

Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z Li. 2014. Deep metric learning for person re-identification. In Proceedings of the ICPR.

Digital Library

[50]

Li Zhang, Tao Xiang, and Shaogang Gong. 2016b. Learning a discriminative null space for person re-identification. In Proceedings of the CVPR.

[51]

Ying Zhang, Baohua Li, Huchuan Lu, Atshushi Irie, and Xiang Ruan. 2016a. Sample-specific SVM learning for person re-identification. In Proceedings of the CVPR.

[52]

Haiyu Zhao, Maoqing Tian, Shuyang Sun, Jing Shao, Junjie Yan, Shuai Yi, Xiaogang Wang, and Xiaoou Tang. 2017. Spindle net: Person re-identification with human body region guided feature decomposition and fusion. In Proceedings of the CVPR.

[53]

Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Person re-identification by salience matching. In Proceedings of the ICCV.

Digital Library

[54]

Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2014. Learning mid-level filters for person re-identification. In Proceedings of the CVPR.

Digital Library

[55]

Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the ICCV.

Digital Library

[56]

Liang Zheng, Yi Yang, and Alexander G. Hauptmann. 2016a. Person re-identification: Past, present and future. Retrieved from arXiv preprint arXiv:1610.02984.

[57]

Liang Zheng, Yi Yang, and Qi Tian. 2016b. SIFT meets CNN: A decade survey of instance retrieval. Retrieved from arXiv preprint arXiv:1608.01807.

[58]

Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In Proceedings of the ICCV.

Cited By

Zhou JZhao SLi SCheng BChen J(2024)Research on Person Re-Identification through Local and Global Attention Mechanisms and Combination PoolingsSensors10.3390/s2417563824:17(5638)Online publication date: 30-Aug-2024
https://doi.org/10.3390/s24175638
He QZheng ZHu H(2023)A Feature Map is Worth a Video Frame: Rethinking Convolutional Features for Visible-Infrared Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3617375Online publication date: 24-Aug-2023
https://doi.org/10.1145/3617375
Wang XZheng ZHe YYan FZeng ZYang Y(2023)Progressive Local Filter Pruning for Image Retrieval AccelerationIEEE Transactions on Multimedia10.1109/TMM.2023.325609225(9597-9607)Online publication date: 2023
https://doi.org/10.1109/TMM.2023.3256092
Show More Cited By

Index Terms

Multi-level Similarity Perception Network for Person Re-identification
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object identification
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Deep Siamese Network with Multi-level Similarity Perception for Person Re-identification
MM '17: Proceedings of the 25th ACM international conference on Multimedia

Person re-identification (re-ID), which aims at spotting a person of interest across multiple camera views, has gained more and more attention in computer vision community. In this paper, we propose a novel deep Siamese architecture based on ...
Multi-view feature fusion for person re-identification
Abstract
Person re-identification (ReID) suffers from camera view variants. Existing works, which typically learn a feature for each image, share a limitation that the learned features are single-view: each feature only contains information in one camera ...
Highlights
- The complementary-view features are defined to mitigate view bias.
- Multi-view Message Passing (MVMP) scheme generates multi-view features in the test stage.
- Multi-view Feature Fusion Network (MFFN) increases sensitivity to ...
Graphical abstract

Display Omitted
Part-based Feature Extraction for Person Re-identification
ICMLC '18: Proceedings of the 2018 10th International Conference on Machine Learning and Computing

In this paper, we propose a new part-based CNN feature extraction method for end-to-end person re-identification. In our method, the input images are first divided into two different non-overlapping parts, and then two different CNN models are trained ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 15, Issue 2

May 2019

375 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3339884

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2019

Accepted: 01 January 2019

Revised: 01 November 2018

Received: 01 June 2018

Published in TOMM Volume 15, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Fundamental Research Funds for the Central Universities

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
275
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhou JZhao SLi SCheng BChen J(2024)Research on Person Re-Identification through Local and Global Attention Mechanisms and Combination PoolingsSensors10.3390/s2417563824:17(5638)Online publication date: 30-Aug-2024
https://doi.org/10.3390/s24175638
He QZheng ZHu H(2023)A Feature Map is Worth a Video Frame: Rethinking Convolutional Features for Visible-Infrared Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3617375Online publication date: 24-Aug-2023
https://doi.org/10.1145/3617375
Wang XZheng ZHe YYan FZeng ZYang Y(2023)Progressive Local Filter Pruning for Image Retrieval AccelerationIEEE Transactions on Multimedia10.1109/TMM.2023.325609225(9597-9607)Online publication date: 2023
https://doi.org/10.1109/TMM.2023.3256092
Yadav AVishwakarma D(2023)Deep learning algorithms for person re-identification: sate-of-the-art and research challengesMultimedia Tools and Applications10.1007/s11042-023-16286-w83:8(22005-22054)Online publication date: 10-Aug-2023
https://doi.org/10.1007/s11042-023-16286-w
Wang KDing CPang JXu X(2022)Context Sensing Attention Network for Video-based Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3573203Online publication date: Dec-2022
https://doi.org/10.1145/3573203
Wang XZheng ZHe YYan FZeng ZYang Y(2022)Soft Person Reidentification Network Pruning via Blockwise Adjacent Filter DecayingIEEE Transactions on Cybernetics10.1109/TCYB.2021.313004752:12(13293-13307)Online publication date: Dec-2022
https://doi.org/10.1109/TCYB.2021.3130047
Zeng DYu YOyama K(2020)Deep Triplet Neural Networks with Cluster-CCA for Audio-Visual Cross-Modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/338716416:3(1-23)Online publication date: 14-Jul-2020
https://dl.acm.org/doi/10.1145/3387164

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents