research-article

Looking beyond appearances: : Synthetic training data for deep CNNs in re-identification

Authors:

Igor Barros Barbosa,

Marco Cristani,

Barbara Caputo,

Aleksander Rognhaugen,

Theoharis TheoharisAuthors Info & Claims

Volume 167, Issue C

Pages 50 - 62

https://doi.org/10.1016/j.cviu.2017.12.002

Published: 01 February 2018 Publication History

Highlights

•

A new synthetic dataset for re-identification is presented.

•

Fine tuning of re-id DNN with synthetic data is proposed.

•

Synthetic data experiments indicates re-id performance increase with other datasets.

Abstract

Re-identification is generally carried out by encoding the appearance of a subject in terms of outfit, suggesting scenarios where people do not change their attire. In this paper we overcome this restriction, by proposing a framework based on a deep convolutional neural network, SOMAnet, that additionally models other discriminative aspects, namely, structural attributes of the human figure (e.g. height, obesity, gender). Our method is unique in many respects. First, SOMAnet is based on the Inception architecture, departing from the usual siamese framework. This spares expensive data preparation (pairing images across cameras) and allows the understanding of what the network learned. Second, and most notably, the training data consists of a synthetic 100K instance dataset, SOMAset, created by photorealistic human body generation software. SOMAset will be released with a open source license to enable further developments in re-identification. Synthetic data represents a cost-effective way of acquiring semi-realistic imagery (full realism is usually not required in re-identification since surveillance cameras capture low-resolution silhouettes), while at the same time providing complete control of the samples in terms of ground truth. Thus it is relatively easy to customize the data w.r.t. the surveillance scenario at-hand, e.g. ethnicity. SOMAnet, trained on SOMAset and fine-tuned on recent re-identification benchmarks, matches subjects even with different apparel.

References

[1]

E. Ahmed, M. Jones, T.K. Marks, An improved deep learning architecture for person re-identification, Proc. CVPR, 2015.

[2]

S. Bak, F. Brémond, Re-identification by Covariance Descriptors, Person Re-Identification, Springer, 2014, pp. 71–91.

[3]

I.B. Barbosa, M. Cristani, A.D. Bue, L. Bazzani, V. Murino, Re-identification with rgb-d sensors, Proc. ECCV - Workshops and Demonstrations, 2012.

[4]

A. Bedagkar-Gala, S.K. Shah, A survey of approaches and trends in person re-identification, Image Vis. Comput. 32 (4) (2014) 270–286.

Digital Library

[5]

A. Borji, S. Izadi, L. Itti, ilab-20m: a large-scale controlled object dataset to investigate deep learning, Proc. CVPR, 2016.

[6]

M. Buhrmester, T. Kwang, S.D. Gosling, Amazon’s mechanical turk a new source of inexpensive, yet high-quality, data?, Perspect. Psychol. Sci. 6 (1) (2011) 3–5.

[7]

A. Chakraborty, A. Das, A.K. Roy-Chowdhury, Network Consistent Data Association, Trans. PAMI 38 (2016) 1859–1871,.

Digital Library

[8]

D. Chen, Z. Yuan, B. Chen, N. Zheng, Similarity learning with spatial constraints for person re-identification, Proc. CVPR, 2016.

[9]

W. Chen, X. Chen, J. Zhang, K. Huang, A multi-task deep network for person re-identification, Assoc. Adv. Artif. Intell. (AAAI) (2017) 3988–3994.

[10]

Chen, X., Gupta, A., 2015. Webly supervised learning of convolutional networks. ArXiv preprint. Arxiv:1505.01554.

[11]

D.S. Cheng, M. Cristani, M. Stoppa, L. Bazzani, V. Murino, Custom pictorial structures for re-identification, Proc. BMVC, 2011.

[12]

D.S. Cheng, F. Setti, N. Zeni, R. Ferrario, M. Cristani, Semantically-driven automatic creation of training sets for object recognition, Comput. Vis. Image Understanding 131 (2015) 56–71.

[13]

Cmu graphics lab motion capture database,. http://www.mocap.cs.cmu.edu/, accessed: 2015-09-30.

[14]

A. Das, A. Chakraborty, A.K. Roy-Chowdhury, Consistent re-identification in a camera network, Proc. ECCV, 2014.

[15]

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, T. Darrell, Decaf: A deep convolutional activation feature for generic visual recognition, Proceedings of the 31th International Conference on Machine Learning, ICML, 2014.

[16]

D. Erhan, Y. Bengio, A. Courville, P. Vincent, Visualizing higher-layer features of a deep network, Tech. Rep. 4323, Dept. IRO, Université de Montréal, 2009.

[17]

M. Farenzena, L. Bazzani, A. Perina, V. Murino, M. Cristani, Person re-identification by symmetry-driven accumulation of local features, Proc. CVPR, 2010.

[18]

R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, Proc. CVPR, 2014.

[19]

X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Aistats 9 (2010) 249–256.

[20]

X. Glorot, A. Bordes, Y. Bengio, Deep sparse rectifier neural networks, Aistats (2011).

[21]

X. Glorot, A. Bordes, Y. Bengio, Domain adaptation for large-scale sentiment classification: a deep learning approach, Proc. ICML, 2011.

[22]

S. Gong, M. Cristani, S. Yan, C.C. Loy, Person Re-identification, 1, Springer, 2014.

[23]

D. Gray, H. Tao, Viewpoint invariant pedestrian recognition with an ensemble of localized features, Proc. ECCV, 2008.

[24]

G. Hinton, J.L. McClelland, D.E. Rumelhart, Distributed Representations, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, 1986.

[25]

O. Huynh, B. Stanciulescu, Person re-identification using the silhouette shape described by a point distribution model, Proc. WACV, 2015.

[26]

S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, Proc. ICML, 2015.

[27]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrell, Caffe: Convolutional architecture for fast feature embedding, Proceedings of the ACM International Conference on Multimedia, ACM, 2014, pp. 675–678.

Digital Library

[28]

M. Köstinger, M. Hirzer, P. Wohlhart, P.M. Roth, H. Bischof, Large scale metric learning from equivalence constraints, Proc. CVPR, 2012.

[29]

A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet Classification with Deep Convolutional Neural Networks, Advances in Neural Information Processing Systems 25, Curran Associates, Inc., 2012, pp. 1097–1105.

[30]

C.H. Lampert, H. Nickisch, S. Harmeling, Learning to detect unseen object classes by between-class attribute transfer, Proc. CVPR, 2009.

[31]

W. Li, R. Zhao, T. Xiao, X. Wang, Deepreid: Deep filter pairing neural network for person re-identification, Proc. CVPR, 2014.

[32]

S. Liao, Y. Hu, X. Zhu, S.Z. Li, Person re-identification by local maximal occurrence representation and metric learning, Proc. CVPR, 2015.

[33]

S. Liao, S.Z. Li, Efficient psd constrained asymmetric metric learning for person re-identification, Proc. ICCV, 2015.

[34]

T.-Y. Lin, A. RoyChowdhury, S. Maji, Bilinear cnn models for fine-grained visual recognition, Proc. ICCV, 2015.

[35]

N. Martinel, A. Das, C. Micheloni, A.K. Roy-Chowdhury, Re-identification in the function space of feature warps, Trans. PAMI 37 (8) (2015) 1656–1669,.

Digital Library

[36]

N. McLaughlin, J.M.D. Rincon, P. Miller, Data-augmentation for reducing dataset bias in person re-identification, Proc. AVSS, 2015.

[37]

M. Munaro, A. Fossati, A. Basso, E. Menegatti, L.V. Gool, One-shot Person Re-identification with a Consumer Depth Camera, Person Re-Identification, Springer, 2014, pp. 161–181.

[38]

S. Paisitkriangkrai, C. Shen, A. van den Hengel, Learning to rank in person re-identification with metric ensembles, Proc. CVPR, 2015.

[39]

S.J. Pan, Q. Yang, A survey on transfer learning, knowledge and data engineering, IEEE Trans. 22 (10) (2010) 1345–1359.

[40]

X. Peng, B. Sun, K. Ali, K. Saenko, Learning deep object detectors from 3d models, Proc. ICCV, 2015.

[41]

T. Plate, Distributed representations, Encyclopedia of Cognitive Science, 2006.

[42]

D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning Representations by Back-propagating Errors, Neurocomputing: Foundations of Research, 1988, pp. 696–699.

[43]

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A.C. Berg, L. Fei-Fei, Imagenet large scale visual recognition challenge, IJCV 115 (3) (2015) 211–252.

[44]

W.H. Sheldon, S.S. Stevens, W.B. Tucker, The Varieties of Human Physique, Harper, 1940.

[45]

Simonyan, K., Vedaldi, A., Zisserman, A., 2013. Deep inside convolutional networks: visualising image classification models and saliency maps. ArXiv preprint. Arxiv:1312.6034.

[46]

Simonyan, K., Zisserman, A., 2015. Very deep convolutional networks for large-scale image recognition. ArXiv preprint. Arxiv:1409.1556.

[47]

C. Su, S. Zhang, J. Xing, W. Gao, Q. Tian, Deep attributes driven multi-camera person re-identification, Proc. ECCV, 2016.

[48]

B. Sun, K. Saenko, From virtual to reality: fast adaptation of virtual object detectors to real domains, Proc. BMVC, 2014.

[49]

Y. Sun, Y. Chen, X. Wang, X. Tang, Deep Learning Face Representation by Joint Identification-verification, in: Ghahramani Z., Welling M., Cortes C., Lawrence N.D., Weinberger K.Q. (Eds.), Advances in Neural Information Processing Systems 27, Curran Associates, Inc., 2014, pp. 1988–1996.

[50]

I. Sutskever, J. Martens, G.E. Dahl, G.E. Hinton, On the importance of initialization and momentum in deep learning, Proc. ICML, 2013.

[51]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, Proc. CVPR, 2015.

[52]

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, Proc. CVPR, 2016.

[53]

Ustinova, E., Ganin, Y., Lempitsky, V.S., 2016. Multiregion bilinear convolutional neural networks for person re-identification. ArXiv preprint. Arxiv:1512.05300.

[54]

A.B. Valdez, M.H. Papesh, D.M. Treiman, K.A. Smith, S.D. Goldinger, P.N. Steinmetz, Distributed representation of visual objects by single neurons in the human brain, J. Neurosci. 35 (13) (2015) 5180–5186.

[55]

R.R. Varior, M. Haloi, G. Wang, Gated siamese convolutional neural network architecture for human re-identification, Proc. ECCV, 2016.

[56]

R.R. Varior, B. Shuai, J. Lu, D. Xu, G. Wang, A siamese long short-term memory architecture for human re-identification, Proc. ECCV, 2016.

[57]

L. Wu, C. Shen, A. van den Hengel, Personnet: Person re-identification with deep convolutional neural networks, CoRR, 2016.

[58]

Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, J. Xiao, 3dshapenets: a deep representation for volumetric shapes, Proc. CVPR, 2015.

[59]

Y. Xia, X. Cao, F. Wen, J. Sun, Well begun is half done: Generating high-quality seeds for automatic image dataset construction from web, Proc. ECCV, 2014.

[60]

T. Xiao, H. Li, W. Ouyang, X. Wang, Learning deep feature representations with domain guided dropout for person re-identification, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

[61]

F. Xiong, M. Gou, O. Camps, M. Sznaier, Person re-identification using kernel-based metric learning methods, Proc. ECCV, Springer, 2014.

[62]

J. Yosinski, J. Clune, Y. Bengio, H. Lipson, How Transferable are Features in Deep Neural Networks?, Advances in Neural Information Processing Systems, 2014, pp. 3320–3328.

[63]

J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, H. Lipson, Understanding neural networks through deep visualization, Deep Learning Workshop, International Conference on Machine Learning (ICML), 2015.

[64]

M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, Proc. ECCV, 2014.

[65]

L. Zhang, T. Xiang, S. Gong, Learning a discriminative null space for person re-identification, Proc. CVPR, 2016.

[66]

Zhang, X., Fu, Y., Zang, A., Sigal, L., Agam, G., 2016. Learning classifiers from synthetic data using a multichannel autoencoder. ArXiv preprint. Arxiv:1503.03163.

[67]

Z. Zhang, Y. Chen, V. Saligrama, Group membership prediction, Proc. ICCV, 2015.

[68]

L. Zhseng, L. Shen, L. Tian, S. Wang, J. Wang, Q. Tian, Scalable person re-identification: a benchmark, Proc. ICCV, 2015.

Cited By

Jia ZZhang ZWang LTan T(2024)Human Image Generation: A Comprehensive SurveyACM Computing Surveys10.1145/366586956:11(1-39)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3665869
Paproki ASalvado OFookes C(2024)Synthetic Data for Deep Learning in Computer Vision & Medical Imaging: A Means to Reduce Data BiasACM Computing Surveys10.1145/366375956:11(1-37)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3663759
Joshi IGrimmer MRathgeb CBusch CBremond FDantcheva A(2024)Synthetic Data in Human Analysis: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.336282146:7(4957-4976)Online publication date: 6-Feb-2024
https://dl.acm.org/doi/10.1109/TPAMI.2024.3362821
Show More Cited By

Index Terms

Looking beyond appearances: Synthetic training data for deep CNNs in re-identification
1. Computing methodologies

Index terms have been assigned to the content through auto-classification.

Recommendations

Exploiting Robust Memory Features for Unsupervised Reidentification
Pattern Recognition and Computer Vision
Abstract
Unsupervised re-identification (ReID) is a task that does not use labels for classification and recognition. It is fundamentally challenging due to the need to retrieve target objects across different perspectives, and coupled with the absence of ...
Effect of Reconstruction Losses in Discriminative and Generative Learning based Networks for the Person Re-identification
Abstract
The Person Re-identification (Re-ID) task has gained popularity in recent times. Researchers are continuously looking to improve the accuracy of the existing person Re-ID systems. Identifying the person from the surveillance footage can be ...
Full-scaled deep metric learning for pedestrian re-identification
Abstract
The pedestrian re-identification problem (i.e., re-id) is essential and pre-requisite in multi-camera video surveillance studies, provided the fact that pedestrian targets need to be accurately re-identified across a network of multiple cameras ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computer Vision and Image Understanding

Computer Vision and Image Understanding Volume 167, Issue C

Feb 2018

153 pages

ISSN:1077-3142

Issue’s Table of Contents

Elsevier Inc.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 February 2018

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jia ZZhang ZWang LTan T(2024)Human Image Generation: A Comprehensive SurveyACM Computing Surveys10.1145/366586956:11(1-39)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3665869
Paproki ASalvado OFookes C(2024)Synthetic Data for Deep Learning in Computer Vision & Medical Imaging: A Means to Reduce Data BiasACM Computing Surveys10.1145/366375956:11(1-37)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3663759
Joshi IGrimmer MRathgeb CBusch CBremond FDantcheva A(2024)Synthetic Data in Human Analysis: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.336282146:7(4957-4976)Online publication date: 6-Feb-2024
https://dl.acm.org/doi/10.1109/TPAMI.2024.3362821
Zhao FLiao SHuo JHuo ZWang WHan JShan C(2024)Weakly Supervised Joint Transfer and Regression of Textures for 3-D Human ReconstructionIEEE Transactions on Consumer Electronics10.1109/TCE.2024.337317870:1(4400-4410)Online publication date: 5-Mar-2024
https://dl.acm.org/doi/10.1109/TCE.2024.3373178
Hu BLiu JZheng YZheng KZha Z(2024)Exert Diversity and Mitigate Bias: Domain Generalizable Person Re-identification with a Comprehensive BenchmarkInternational Journal of Computer Vision10.1007/s11263-024-02124-5132:11(5124-5150)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1007/s11263-024-02124-5
Delussu RPutzu LFumera G(2024)Synthetic Data for Video Surveillance Applications of Computer Vision: A ReviewInternational Journal of Computer Vision10.1007/s11263-024-02102-x132:10(4473-4509)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1007/s11263-024-02102-x
Wu ZHu ZDing J(2024)Same-clothes person re-identification with dual-stream networkMultimedia Systems10.1007/s00530-024-01269-030:2Online publication date: 26-Feb-2024
https://dl.acm.org/doi/10.1007/s00530-024-01269-0
Xiang SQian DGuan MYan BLiu TFu YYou G(2023)Less Is More: Learning from Synthetic Data with Fine-Grained Attributes for Person Re-IdentificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358844119:5s(1-20)Online publication date: 7-Jun-2023
https://dl.acm.org/doi/10.1145/3588441
Hou YZhang SMa RJia HXie X(2023)Frame-Recurrent Video Crowd CountingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.325094633:9(5186-5199)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1109/TCSVT.2023.3250946
Li SChen HYu SHe ZZhu FZhao RChen JQiao Y(2023)COCAS+: Large-Scale Clothes-Changing Person Re-Identification With Clothes TemplatesIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.321676933:4(1839-1853)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TCSVT.2022.3216769
Show More Cited By

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents