Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Deep salient-Gaussian Fisher vector encoding of the spatio-temporal trajectory structures for person re-identification

Published: 01 January 2019 Publication History

Abstract

In this paper, we propose a deep spatio-temporal appearance (DSTA) descriptor for person re-identification (re-ID). The proposed descriptor is based on the deep Fisher vector (FV) encoding of the trajectory spatio-temporal structures. These have the advantage of robustly handling the misalignment in the pedestrian tracklets. The deep encoding exploits the richness of the spatio-temporal structural information around the trajectories. This is achieved by hierarchically encoding the trajectory structures leveraging a larger tracklet neighborhood scale when moving from one layer to the next one. In order to eliminate the noisy background located around the pedestrian and model the uniqueness of its identity, the deep FV encoder is further enriched towards the deep Salient-Gaussian weighted FV (deepSGFV) encoder by integrating the pedestrian Gaussian and saliency templates in the encoding process, respectively. The proposed descriptor produces competitive accuracy with respect to state-of-the art methods and especially the deep CNN ones without necessitating either pre-training or data augmentation on four challenging pedestrian video datasets: PRID2011, i-LIDS-VID, Mars and LPW. The further combination of DSTA with deep CNN boosts the current state-of-the-art methods and demonstrates their complementarity.

References

[1]
Bedagkar-Gala A, Shah SK (2011) Multiple person re-identification using part based spatio-temporal color appearance model. In: IEEE international conference on computer vision workshops (ICCV), pp 1721---1728
[2]
Cheng D, Gong Y, Zhou S, Wang J, Zheng N (2016) Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In: The IEEE conference on computer vision and pattern recognition (CVPR)
[3]
Chinnasamy GMG (2015) Segmentation of pedestrian video using thresholding algorithm and its parameter analysis. In: International journal of applied research, vol 1, pp 43---46
[4]
de Avila SEF, Thome N, Cord M, Valle E, de Albuquerque Araújo A (2011) BOSSA: extended bow formalism for image classification. In: 18th IEEE international conference on image processing (ICIP), pp 2909---2912
[5]
Farenzena M, Bazzani L, Perina A, Murino V, Cristani M (2010) Person re-identification by symmetry-driven accumulation of local features. In: The twenty-third IEEE conference on computer vision and pattern recognition, CVPR, pp 2360---2367
[6]
Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. In: 13th Scandinavian conference on image analysis (SCIA), pp 363---370
[7]
Farquhar J, Szedmak S, Meng H, Taylor JS (2005) Improving bag-of-keypoints image categorisation generative models and pdf-kernels. Report
[8]
Han J, Bhanu B (2006) Individual recognition using gait energy image. IEEE Trans Pattern Anal Mach Intell 28(2):316---322
[9]
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770---778
[10]
Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. CoRR arXiv:1703.07737
[11]
Hirzer M, Beleznai C, Roth PM, Bischof H (2011) Person re-identification by descriptive and discriminative classification. In: 17th Scandinavian conference on image analysis (SCIA), pp 91---102
[12]
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: ICML, JMLR workshop and conference proceedings, vol 37, pp 448---456
[13]
Jobson D J, Rahman Z, Woodell G A (1997) A multiscale retinex for bridging the gap between color images and the human observation of scenes. IEEE Trans Image Process 6(7):965---976
[14]
Kläser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: Proceedings of the British machine vision conference (BMVC), pp 1---10
[15]
Ko?stinger M, Hirzer M, Wohlhart P, Roth P M, Bischof H (2012) Large scale metric learning from equivalence constraints. In: 2012 IEEE conference on computer vision and pattern recognition. Providence, pp 2288---2295
[16]
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems (NIPS), pp 1097---1105
[17]
Ksibi S, Mejdoub M, Ben Amar C (2016) Extended fisher vector encoding for person re-identification. In: 2016 IEEE international conference on systems, man, and cybernetics (SMC), pp 4344---4349
[18]
Ksibi S, Mejdoub M, Ben Amar C (2016) Person re-identification based on combined gaussian weighted fisher vectors. In: 13th IEEE/ACS international conference of computer systems and applications (AICCSA), pp 1---8
[19]
Ksibi S, Mejdoub M, Ben Amar C (2016) Topological weighted fisher vectors for person re-identification. In: 23rd international conference on pattern recognition (ICPR), pp 3097---3102
[20]
Ksibi S, Mejdoub M, Ben Amar C (2018) Supervised person re-id based on deep hand-crafted and cnn features. In: International conference on computer vision theory and applications.
[21]
Kuo CH, Khamis S, Shet VD (2013) Person re-identification using semantic color names and rankboost. In: IEEE workshop on applications of computer vision, pp 281---287
[22]
Li Z, Chang S, Liang F, Huang T S, Cao L, Smith J R (2013) Learning locally-adaptive decision functions for person verification. In: 2013 IEEE conference on computer vision and pattern recognition. Portland, 3610---3617
[23]
Liao S, Hu Y, Zhu X, Li S Z (2015) Person re-identification by local maximal occurrence representation and metric learning. In: IEEE conference on computer vision and pattern recognition, CVPR 2015. Boston, pp 2197---2206
[24]
Lin Y, Zheng L, Zheng Z, Wu Y, Yang Y (2017) Improving person re-identification by attribute and identity learning. CoRR arXiv:1703.07220
[25]
Liu K, Ma B, Zhang W, Huang R (2015) A spatio-temporal appearance representation for video-based pedestrian re-identification. In: IEEE international conference on computer vision (ICCV), pp 3810---3818
[26]
Ma B, Su Y, Jurie F (2012) Local descriptors encoded by fisher vectors for person re-identification. In: ECCV workshops, vol 7583, pp 413---422
[27]
Ma B, Su Y, Jurie F (2014) Covariance descriptor based on bio-inspired features for person re-identification and face verification. Image Vis Comput 32(6-7):379---390
[28]
McLaughlin N, Martinez del Rincon J, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR)
[29]
Mejdoub M, Ksibi S, Ben Amar C, Koubaa M (2017) Person re-id while crossing different cameras: Combination of salient-gaussian weighted bossanova and fisher vector encodings. In: International journal of advanced computer science and applications (ijacsa), vol 8, pp 399---410
[30]
Messelodi S, Modena C M (2015) Boosting fisher vector based scoring functions for person re-identification. Image Vis Comput 44:44---58
[31]
Muja M, Lowe DG (2009) Fast approximate nearest neighbors with automatic algorithm configuration. In: International Conference on Computer Vision Theory and Applications (VISAPP), pp 331---340
[32]
Othmani M, Bellil W, Ben Amar C, Alimi AM (2010) A new structure and training procedure for multi-mother wavelet networks. IJWMIP 8(1):149---175.
[33]
Sapienza M, Cuzzolin F, Torr P H S (2014) Learning discriminative space-time action parts from weakly labelled videos. Int J Comput Vis 110(1):30---47
[34]
Song G, Leng B, Liu Y, Hetang C, Cai S (2017) Region-based quality estimation network for large-scale person re-identification. CoRR arXiv:1711.08766
[35]
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Computer vision and pattern recognition (CVPR)
[36]
Wali A, Ben Aoun N, Karray H, Ben Amar C, Alimi AM (2010) A new system for event detection from video surveillance sequences. In: Advanced concepts for intelligent vision systems - 12th international conference, ACIVS 2010, Sydney, Australia, December 13-16, 2010, Proceedings, Part II, pp 110---120.
[37]
Wang H, Kla?ser A, Schmid C, Liu C (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60---79
[38]
Wang T, Gong S, Zhu X, Wang S (2014) Person re-identification by video ranking. In: 13th European conference on computer vision (ECCV), pp 688---703
[39]
Xiong F, Gou M, Camps OI, Sznaier M (2014) Person re-identification using kernel-based metric learning methods. In: The 13th European conference on computer vision (ECCV), pp 1---16
[40]
Xu Y, Ma B, Huang R, Lin L (2014) Person search in a scene by jointly modeling people commonness and person uniqueness. In: Proceedings of the ACM international conference on multimedia, pp 937---940
[41]
Yi D, Lei Z, Li S Z (2014) Deep metric learning for practical person re-identification. CoRR arXiv:1407.4979
[42]
Zhang L, Xiang T, Gong S (2016) Learning a discriminative null space for person re-identification. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 1239---1248
[43]
Zhang W, Hu S, Liu K (2017) Learning compact appearance representation for video-based person re-identification. CoRR arXiv:1702.06294
[44]
Zhao R, Ouyang W, Wang X (2013) Unsupervised salience learning for person re-identification. In: IEEE conference on computer vision and pattern recognition, pp 3586---3593
[45]
Zheng L, Shen L, Tian L, Wang S, Bu J, Tian Q (2015) Person re-identification meets image search. In: CoRR, arXiv:1502.02171, pp 2360---2367
[46]
Zheng L, Bie Z, Sun Y, Wang J, Su C, Wang S, Tian Q (2016) Mars: a video benchmark for large-scale person re-identification. In: European conference on computer vision (ECCV)
[47]
Zheng L, Zhang H, Sun S, Chandraker M, Tian Q (2016) Person re-identification in the wild. CoRR arXiv:1604.02531
[48]
Zhong Z, Zheng L, Cao D, Li S (2017) Re-ranking person re-identification with k-reciprocal encoding. CoRR arXiv:1701.08398
[49]
Zhou Z, Huang Y, Wang W, Wang L, Tan T (2017) See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR)

Cited By

View all
  • (2023)From Collective Attribute Association of Groups to Precise Attribute Association of IndividualsIEEE Transactions on Multimedia10.1109/TMM.2023.325109725(1547-1554)Online publication date: 1-Jan-2023
  • (2021)Re-ranking person re-identification using distance aggregation of k-nearest neighbors hierarchical treeMultimedia Tools and Applications10.1007/s11042-020-10123-080:5(8015-8038)Online publication date: 1-Feb-2021
  • (2020)Video-based person re-identification using a novel feature extraction and fusion techniqueMultimedia Tools and Applications10.1007/s11042-019-08432-079:17-18(12471-12491)Online publication date: 16-Jan-2020
  1. Deep salient-Gaussian Fisher vector encoding of the spatio-temporal trajectory structures for person re-identification

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Multimedia Tools and Applications
      Multimedia Tools and Applications  Volume 78, Issue 2
      Jan 2019
      1394 pages

      Publisher

      Kluwer Academic Publishers

      United States

      Publication History

      Published: 01 January 2019

      Author Tags

      1. Deep CNN
      2. Deep spatio-temporal appearance descriptor
      3. Deep weighted encoding
      4. Person re-identification
      5. Spatio-temporal trajectory structures

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 23 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)From Collective Attribute Association of Groups to Precise Attribute Association of IndividualsIEEE Transactions on Multimedia10.1109/TMM.2023.325109725(1547-1554)Online publication date: 1-Jan-2023
      • (2021)Re-ranking person re-identification using distance aggregation of k-nearest neighbors hierarchical treeMultimedia Tools and Applications10.1007/s11042-020-10123-080:5(8015-8038)Online publication date: 1-Feb-2021
      • (2020)Video-based person re-identification using a novel feature extraction and fusion techniqueMultimedia Tools and Applications10.1007/s11042-019-08432-079:17-18(12471-12491)Online publication date: 16-Jan-2020

      View Options

      View options

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media