research-article

Deep Residual Network with Self Attention Improves Person Re-Identification Accuracy

Authors:

Jean-Paul Ainam,

Guangchun LuoAuthors Info & Claims

ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and Computing

Pages 380 - 385

https://doi.org/10.1145/3318299.3318324

Published: 22 February 2019 Publication History

Abstract

In this paper, we present an attention mechanism scheme to improve the person re-identification task. Inspired by biology, we propose Self Attention Grid (SAG) to discover the most informative parts from a high-resolution image using its internal representation. In particular, given an input image, the proposed model is fed with two copies of the same image and consists of two branches. The upper branch processes the high-resolution image and learns high dimensional feature representation while the lower branch processes the low-resolution image and learns a filtering attention grid. We apply a max filter operation to non-overlapping sub-regions on the high feature representation before element-wise multiplied with the output of the second branch. The feature maps of the second branch are subsequently weighted to reflect the importance of each patch of the grid using a softmax operation. Our attention module helps the network to learn the most discriminative visual features of multiple image regions and is specifically optimized to attend feature representation at different levels. Extensive experiments on three large-scale datasets show that our self-attention mechanism significantly improves the baseline model and outperforms various state-of-art models by a large margin.

References

[1]

J. Ba, V. Mnih, and K. Kavukcuoglu. 2014. Multiple Object Recognition with Visual Attention. ArXiv e-prints (Dec. 2014). arXiv:1412.7755

[2]

S. Bai, X. Bai, and Q. Tian. 2017. Scalable Person Re-identification on Supervised Smoothed Manifold. In 2017 IEEE Conference on CVPR. 3356--3365.

[3]

D. Chen, Z. Yuan, B. Chen, and N. Zheng. 2016. Similarity Learning with Spatial Constraints for Person Re-identification. In 2016 IEEE Conference on CVPR. 1268-- 1277.

[4]

Jianpeng Cheng, Li Dong, and Mirella Lapata. 2016. Long Short-Term MemoryNetworks for Machine Reading. In EMNLP.

[5]

Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. Attention-based Models for Speech Recognition. In NIPS. MIT Press, Cambridge, MA, USA, 577--585. http://dl.acm.org/citation.cfm?id= 2969239.2969304

Digital Library

[6]

Weijian Deng, Liang Zheng, Qixiang Ye, Guoliang Kang, Yi Yang, and Jianbin Jiao. 2018. Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification. In CVPR.

[7]

H. Fan, L. Zheng, and Y. Yang. 2017. Unsupervised Person Re-identification: Clustering and Fine-tuning. ArXiv e-prints (May 2017). arXiv:cs.CV/1705.10444

[8]

P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. 2010. Object Detection with Discriminatively Trained Part-Based Models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 9 (Sept 2010), 1627--1645.

Digital Library

[9]

M. Geng, Y. Wang, T. Xiang, and Y. Tian. 2016. Deep Transfer Learning for Person Re-identification. ArXiv e-prints (Nov. 2016). arXiv:cs.CV/1611.05244

[10]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 CVPR. 770--778.

[11]

A. Hermans, L. Beyer, and B. Leibe. 2017. In Defense of the Triplet Loss for Person Re-Identification. ArXiv e-prints (March 2017). arXiv:cs.CV/1703.07737

[12]

M. Kostinger, M. Hirzer, P. Wohlhart, P. M. Roth, and H. Bischof. 2012. Large scale metric learning from equivalence constraints. In 2012 IEEE CVPR. 2288--2295.

Digital Library

[13]

D. Li, X. Chen, Z. Zhang, and K. Huang. 2017. Learning Deep Context-Aware Features over Body and Latent Parts for Person Re-identification. In 2017 IEEE Conference on CVPR. 7398--7407.

[14]

W. Li, R. Zhao, T. Xiao, and X. Wang. 2014. DeepReID: Deep Filter Pairing Neural Network for Person Re-identification. In 2014 IEEE CVPR. 152--159.

Digital Library

[15]

Wei Li, Xiatian Zhu, and Shaogang Gong. 2017. Person Re-identification by Deep Joint Learning of Multi-loss Classification. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI'17). AAAI Press, 2194--2200. http://dl.acm.org/citation.cfm?id=3172077.3172193

Digital Library

[16]

Wei Li, Xiatian Zhu, and Shaogang Gong. 2018. Harmonious Attention Network for Person Re-Identification. In The IEEE Conference on CVPR.

[17]

S. Liao, Y. Hu, Xiangyu Zhu, and S. Z. Li. 2015. Person re-identification by Local Maximal Occurrence representation and metric learning. In 2015 IEEE CVPR. 2197--2206.

[18]

S. Liao and S. Z. Li. 2015. Efficient PSD Constrained Asymmetric Metric Learning for Person Re-Identification. In 2015 IEEE International Conference on Computer Vision (ICCV). 3685--3693.

Digital Library

[19]

H. Liu, J. Feng, M. Qi, J. Jiang, and S. Yan. 2017. End-to-End Comparative Attention Networks for Person Re-Identification. IEEE Transactions on Image Processing 26, 7 (July 2017), 3492--3506.

Digital Library

[20]

Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. Recurrent Models of Visual Attention. In NIPS (NIPS'14). MIT Press, Cambridge, MA, USA, 2204--2212. http://dl.acm.org/citation.cfm?id=2969033.2969073

Digital Library

[21]

S. Paisitkriangkrai, C. Shen, and A. van den Hengel. 2015. Learning to rank in person re-identification with metric ensembles. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1846--1855.

[22]

Ankur Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A Decomposable Attention Model for Natural Language Inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2249--2255.

[23]

N. Parmar, A. Vaswani, J. Uszkoreit, Ł. Kaiser, N. Shazeer, A. Ku, and D. Tran. 2018. Image Transformer. ArXiv e-prints (Feb. 2018). arXiv:cs.CV/1802.05751

[24]

A. Rahimpour, L. Liu, A. Taalimi, Y. Song, and H. Qi. 2017. Person re-identification using visual attention. In 2017 IEEE International Conference on Image Processing (ICIP). 4242--4246.

[25]

Ergys Ristani, Francesco Solera, Roger S. Zou, Rita Cucchiara, and Carlo Tomasi. 2016. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. Conference on Computer Vision workshop on Benchmarking Multi-Target Tracking abs/1609.01775 (2016).

[26]

Y. Shen, T. Xiao, H. Li, S. Yi, and X. Wang. 2018. End-to-End Deep KroneckerProduct Matching for Person Re-identification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]

Y. Sun, L. Zheng, W. Deng, and S. Wang. 2017. SVDNet for Pedestrian Retrieval. In 2017 IEEE International Conference on Computer Vision (ICCV). 3820--3828.

[28]

Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang. 2017. Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline). ArXiv e-prints (Nov. 2017). arXiv:cs.CV/1711.09349

[29]

E. Ustinova, Y. Ganin, and V. Lempitsky. 2015. Multiregion Bilinear Convolutional Neural Networks for Person Re-Identification. ArXiv e-prints (Dec. 2015). arXiv:cs.CV/1512.05300

[30]

Rahul Rama Varior, Mrinal Haloi, and Gang Wang. 2016. Gated Siamese Convolutional Neural Network Architecture for Human Re-identification. In ECCV.

[31]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention Is All You Need. In NIPS. The Neural Information Processing Systems.

Digital Library

[32]

Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. 2017. Residual Attention Network for Image Classification. In Computer Vision and Pattern Recognition (CVPR). 6450--6458.

[33]

Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local Neural Networks. CVPR (2018).

[34]

L. Wu, Y. Wang, J. Gao, and D. Tao. 2018. Deep Co-attention based Comparators For Relative Representation Learning in Person Re-identification. ArXiv e-prints (April 2018). arXiv:cs.CV/1804.11027

[35]

T. Xiao, S. Li, B. Wang, L. Lin, and X. Wang. 2017. Joint Detectionand Identification Feature Learning for Person Search. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3376--3385.

[36]

Fei Xiong, Mengran Gou, Octavia Camps, and Mario Sznaier. 2014. Person ReIdentification Using Kernel-Based Metric Learning Methods. In Computer Vision -- ECCV 2014. Springer International Publishing, Cham, 1--16.

[37]

Huijuan Xu and Kate Saenko. 2016. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering. In Computer Vision -- ECCV 2016. Springer International Publishing, Cham, 451--466.

[38]

Y. Zhang and S. Li. 2011. Gabor-LBP Based Region Covariance Descriptor for Person Re-identification. In 2011 Sixth International Conference on Image and Graphics. 368--371.

Digital Library

[39]

L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian. 2015. Scalable Person Re-identification: A Benchmark. In ICCV. 1116--1124.

Digital Library

[40]

Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. A Discriminatively Learned CNN Embedding for Person Re-identification. ACM Transactions on Multimedia Computing Communications and Applications (2017).

Digital Library

[41]

Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in vitro. In Proceedings of the IEEE International Conference on Computer Vision.

[42]

Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. 2017. Re-ranking Person Re-identification with k-reciprocal Encoding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017).

[43]

Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang. 2017. Random Erasing Data Augmentation. ArXiv e-prints (Aug. 2017). arXiv:cs.CV/1708.04896

[44]

Zhun Zhong, Liang Zheng, Zhedong Zheng, Shaozi Li, and Yi Yang. 2018. Camera Style Adaptation for Person Re-identification. In 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Cited By

Duan WWang Z(2024)Person Re-Identification Based on Local Salient Feature Embedding Network2024 5th International Conference on Computer Vision, Image and Deep Learning (CVIDL)10.1109/CVIDL62147.2024.10603836(307-311)Online publication date: 19-Apr-2024
https://doi.org/10.1109/CVIDL62147.2024.10603836
Antwi-Bekoe ELiu GAinam JSun GXie X(2022)A deep learning approach for insulator instance segmentation and defect detectionNeural Computing and Applications10.1007/s00521-021-06792-z34:9(7253-7269)Online publication date: 30-Jan-2022
https://doi.org/10.1007/s00521-021-06792-z
Du HLi ZLiu PHe LHuo D(2022)Two‐level salient feature complementary network for person re‐identificationInternational Journal of Intelligent Systems10.1002/int.2282437:9(5971-5995)Online publication date: 14-Jan-2022
https://doi.org/10.1002/int.22824

Index Terms

Deep Residual Network with Self Attention Improves Person Re-Identification Accuracy
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Matching
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Ranking

Recommendations

Enforcing Affinity Feature Learning through Self-attention for Person Re-identification

Person re-identification is the task of recognizing an individual across heterogeneous non-overlapping camera views. It has become a crucial capability needed by many applications in public space video surveillance. However, it remains a challenging ...
JPEG Image Super-Resolution via Deep Residual Network
Intelligent Computing Methodologies
Abstract
In many practical scenarios, the images to be super-resolved are not only of low resolution (LR) but also JPEG compressed, while most of the existing super-resolution methods assume compression free LR image inputs. As a result, the JPEG ...
Deep attention network for person re-identification with multi-loss
Abstract
Person re-identification (person re-ID) is one of the most challenging tasks in the computer vision area as it involves large variations in human appearances, human poses, background illuminations, camera views, etc. In particular, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and Computing

February 2019

563 pages

ISBN:9781450366007

DOI:10.1145/3318299

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Southwest Jiaotong University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Fundamental Research Funds for the Central Universities in China
Ministry of Science and Technology of Sichuan province

Conference

ICMLC '19

ICMLC '19: 2019 11th International Conference on Machine Learning and Computing

February 22 - 24, 2019

Zhuhai, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
102
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Duan WWang Z(2024)Person Re-Identification Based on Local Salient Feature Embedding Network2024 5th International Conference on Computer Vision, Image and Deep Learning (CVIDL)10.1109/CVIDL62147.2024.10603836(307-311)Online publication date: 19-Apr-2024
https://doi.org/10.1109/CVIDL62147.2024.10603836
Antwi-Bekoe ELiu GAinam JSun GXie X(2022)A deep learning approach for insulator instance segmentation and defect detectionNeural Computing and Applications10.1007/s00521-021-06792-z34:9(7253-7269)Online publication date: 30-Jan-2022
https://doi.org/10.1007/s00521-021-06792-z
Du HLi ZLiu PHe LHuo D(2022)Two‐level salient feature complementary network for person re‐identificationInternational Journal of Intelligent Systems10.1002/int.2282437:9(5971-5995)Online publication date: 14-Jan-2022
https://doi.org/10.1002/int.22824

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents