Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Cross-View Gait Recognition Using Pairwise Spatial Transformer Networks

Published: 01 January 2021 Publication History

Abstract

In this paper, we propose a pairwise spatial transformer network (PSTN) for cross-view gait recognition, which reduces unwanted feature mis-alignment due to view differences before a recognition step for better performance. The proposed PSTN is a unified CNN architecture that consists of a pairwise spatial transformer (PST) and subsequent recognition network (RN). More specifically, given a matching pair of gait features from different source and target views, the PST estimates a non-rigid deformation field to register the features in the matching pair into their intermediate view, which mitigates distortion by registration compared with the case of direct deformation from the source view to target view. The registered matching pair is then fed into the RN to output a dissimilarity score. Although registration may reduce not only intra-subject variations but also inter-subject variations, we can still achieve a good trade-off between them using a loss function designed to optimize recognition accuracy. Experiments on three publicly available gait datasets demonstrate that the proposed method yields superior performance for both verification and identification scenarios by combining any gait recognition network benchmarks with the PST.

References

[1]
I. Bouchrika, M. Goffredo, J. Carter, and M. Nixon, “On using gait in forensic biometrics,” J. Forensic Sci., vol. 56, no. 4, pp. 882–889, May 2011.
[2]
H. Iwama, D. Muramatsu, Y. Makihara, and Y. Yagi, “Gait verification system for criminal investigation,” IPSJ Trans. Comput. Vis. Appl., vol. 5, pp. 163–175, Oct. 2013.
[3]
N. Lynnerup and P. K. Larsen, “Gait as evidence,” IET Biometrics, vol. 3, no. 2, pp. 47–54, Jun. 2014.
[4]
Y. Makihara, R. Sagawa, Y. Mukaigawa, T. Echigo, and Y. Yagi, “Gait recognition using a view transformation model in the frequency domain,” in Proc. 9th Eur. Conf. Comput. Vis., Graz, Austria, May 2006, pp. 151–163.
[5]
Z. Wu, Y. Huang, L. Wang, X. Wang, and T. Tan, “A comprehensive study on cross-view gait based human identification with deep CNNs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 2, pp. 209–226, Feb. 2017.
[6]
M. Altab Hossain, Y. Makihara, J. Wang, and Y. Yagi, “Clothing-invariant gait identification using part-based clothing categorization and adaptive weight control,” Pattern Recognit., vol. 43, no. 6, pp. 2281–2291, Jun. 2010.
[7]
X. Li, Y. Makihara, C. Xu, D. Muramatsu, Y. Yagi, and M. Ren, “Gait energy response function for clothing-invariant gait recognition,” in Proc. 13th Asian Conf. Comput. Vis. (ACCV), Taipei, Taiwan, Nov. 2016, pp. 257–272.
[8]
Y. Guan and C.-T. Li, “A robust speed-invariant gait recognition system for walker and runner identification,” in Proc. Int. Conf. Biometrics (ICB), Jun. 2013, pp. 1–8.
[9]
C. Xu, Y. Makihara, X. Li, Y. Yagi, and J. Lu, “Speed invariance vs. Stability: Cross-speed gait recognition using single-support gait energy image,” in Proc. 13th Asian Conf. Comput. Vis. (ACCV), Taipei, Taiwan, Nov. 2016, pp. 52–67.
[10]
J. Han and B. Bhanu, “Individual recognition using gait energy image,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 2, pp. 316–322, Feb. 2006.
[11]
A. Kale, A. K. R. Chowdhury, and R. Chellappa, “Towards a view invariant gait recognition algorithm,” in Proc. IEEE Conf. Adv. Video Signal Based Surveill., Jul. 2003, pp. 143–150.
[12]
W. Kusakunniran, Q. Wu, J. Zhang, and H. Li, “Support vector regression for multi-view gait recognition based on local motion feature selection,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., San Francisco, CA, USA, Jun. 2010, pp. 1–8.
[13]
W. Kusakunniran, Q. Wu, J. Zhang, and H. Li, “Gait recognition under various viewing angles based on correlated motion regression,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 6, pp. 966–980, Jun. 2012.
[14]
K. Shiraga, Y. Makihara, D. Muramatsu, T. Echigo, and Y. Yagi, “GEINet: View-invariant gait recognition using a convolutional neural network,” in Proc. Int. Conf. Biometrics (ICB), Jun. 2016, pp. 1–8.
[15]
N. Takemura, Y. Makihara, D. Muramatsu, T. Echigo, and Y. Yagi, “On Input/Output architectures for convolutional neural network-based cross-view gait recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 9, pp. 2708–2719, Sep. 2019.
[16]
Y. He, J. Zhang, H. Shan, and L. Wang, “Multi-task GANs for view-specific feature learning in gait recognition,” IEEE Trans. Inf. Forensics Security, vol. 14, no. 1, pp. 102–113, Jan. 2019.
[17]
F. Jean, R. Bergevin, and A. B. Albu, “Computing and evaluating view-normalized body part trajectories,” Image Vis. Comput., vol. 27, no. 9, pp. 1272–1284, Aug. 2009. 10.1016/j.imavis.2008.11.009.
[18]
R. Martín-Félez and T. Xiang, “Uncooperative gait recognition by learning to rank,” Pattern Recognit., vol. 47, no. 12, pp. 3793–3806, Dec. 2014. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320314002325
[19]
Z. Wu, Y. Huang, and L. Wang, “Learning representative deep features for image set analysis,” IEEE Trans. Multimedia, vol. 17, no. 11, pp. 1960–1968, Nov. 2015.
[20]
C. Zhang, W. Liu, H. Ma, and H. Fu, “Siamese neural network based gait recognition for human identification,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Mar. 2016, pp. 2832–2836.
[21]
H. Chao, Y. He, J. Zhang, and J. Feng, “GaitSet: Regarding gait as a set for cross-view gait recognition,” in Proc. AAAI Conf. Artif. Intell., vol. 33, Jul. 2019, pp. 8126–8133.
[22]
K. Zhang, W. Luo, L. Ma, W. Liu, and H. Li, “Learning joint gait representation via quintuplet loss minimization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 4695–4704.
[23]
Z. Zhanget al., “Gait recognition via disentangled representation learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 4705–4714.
[24]
X. Zhu, Z. Lei, X. Liu, H. Shi, and S. Z. Li, “Face alignment across large poses: A 3D solution,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 146–155.
[25]
A. Jourabloo and X. Liu, “Pose-invariant face alignment via CNN-based dense 3D model fitting,” Int. J. Comput. Vis., vol. 124, no. 2, pp. 187–203, Apr. 2017.
[26]
V. Naresh Boddeti, M.-C. Roh, J. Shin, T. Oguri, and T. Kanade, “Face alignment robust to pose, expressions and occlusions,” 2017, arXiv:1707.05938. [Online]. Available: http://arxiv.org/abs/1707.05938
[27]
M. Jaderberg, K. Simonyan, A. Zisserman, and K. kavukcuoglu, “Spatial transformer networks,” in Proc. Adv. Neural Inf. Process. Syst., C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds. New York, NY, USA: Curran Associates, 2015, pp. 2017–2025. [Online]. Available: http://papers.nips.cc/paper/5854-spatial-transformer-networks.pdf
[28]
N. S. Detlefsen, O. Freifeld, and S. Hauberg, “Deep diffeomorphic transformer networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 4403–4412.
[29]
X. Peng, N. Ratha, and S. Pankanti, “Learning face recognition from limited training data using deep neural networks,” in Proc. 23rd Int. Conf. Pattern Recognit. (ICPR), Dec. 2016, pp. 1442–1447.
[30]
W. Wu, M. Kan, X. Liu, Y. Yang, S. Shan, and X. Chen, “Recursive spatial transformer (ReST) for alignment-free face recognition,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 3792–3800.
[31]
Y. Shi and A. Jain, “Improving face recognition by exploring local features with visual attention,” in Proc. Int. Conf. Biometrics (ICB), Feb. 2018, pp. 247–254.
[32]
N. Takemura, Y. Makihara, D. Muramatsu, T. Echigo, and Y. Yagi, “Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition,” IPSJ Trans. Comput. Vis. Appl., vol. 10, no. 1, pp. 1–14, Feb. 2018.
[33]
H. Iwama, M. Okumura, Y. Makihara, and Y. Yagi, “The OU-ISIR gait database comprising the large population dataset and performance evaluation of gait recognition,” IEEE Trans. Inf. Forensics Security, vol. 7, no. 5, pp. 1511–1521, Oct. 2012.
[34]
S. Yu, D. Tan, and T. Tan, “A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition,” in Proc. 18th Int. Conf. Pattern Recognit. (ICPR), Hong Kong, vol. 4, Aug. 2006, pp. 441–444.
[35]
G. Zhao, G. Liu, H. Li, and M. Pietikainen, “3D gait recognition using multiple cameras,” in Proc. IEEE Int. Conf. Autom. Face Gesture Recognit., Apr. 2006, pp. 529–534.
[36]
H.-D. Yang and S.-W. Lee, “Reconstruction of 3D human body pose for gait recognition,” in Proc. IAPR Int. Conf. Biometrics 2006, Jan. 2006, pp. 619–625.
[37]
G. Shakhnarovich, L. Lee, and T. Darrell, “Integrated face and gait recognition from multiple views,” in Proc. the IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. CVPR, vol. 1, Dec. 2001, pp. I–I.
[38]
M. Goffredo, I. Bouchrika, J. N. Carter, and M. S. Nixon, “Self-calibrating view-invariant gait biometrics,” IEEE Trans. Syst., Man, Cybern. B. Cybern., vol. 40, no. 4, pp. 997–1008, Aug. 2010.
[39]
H. El-Alfy, C. Xu, Y. Makihara, D. Muramatsu, and Y. Yagi, “A geometric view transformation model using free-form deformation for cross-view gait recognition,” in Proc. 4th IAPR Asian Conf. Pattern Recognit. (ACPR), Nov. 2017.
[40]
W. Kusakunniran, Q. Wu, J. Zhang, and H. Li, “Cross-view and multi-view gait recognitions based on view transformation model using multi-layer perceptron,” Pattern Recognit. Lett., vol. 33, no. 7, pp. 882–889, May 2012.
[41]
D. Muramatsu, A. Shiraishi, Y. Makihara, M. Z. Uddin, and Y. Yagi, “Gait-based person recognition using arbitrary view transformation model,” IEEE Trans. Image Process., vol. 24, no. 1, pp. 140–154, Jan. 2015.
[42]
J. Lu and Y.-P. Tan, “Uncorrelated discriminant simplex analysis for view-invariant gait signal computing,” Pattern Recognit. Lett., vol. 31, no. 5, pp. 382–393, Apr. 2010.
[43]
A. Mansur, Y. Makihara, D. Muramatsu, and Y. Yagi, “Cross-view gait recognition using view-dependent discriminative analysis,” in Proc. IEEE Int. Joint Conf. Biometrics, Sep. 2014, pp. 1–8.
[44]
Z. Zhang, J. Chen, Q. Wu, and L. Shao, “GII representation-based cross-view gait recognition by discriminative projection with list-wise constraints,” IEEE Trans. Cybern., vol. 48, no. 10, pp. 2935–2947, Oct. 2018.
[45]
K. Bashir, T. Xiang, and S. Gong, “Cross view gait recognition using correlation strength,” in Proc. Procedings Brit. Mach. Vis. Conf., 2010, pp. 1–11.
[46]
W. Kusakunniran, Q. Wu, J. Zhang, H. Li, and L. Wang, “Recognizing gaits across views through correlated motion co-clustering,” IEEE Trans. Image Process., vol. 23, no. 2, pp. 696–709, Feb. 2014.
[47]
X. Xing, K. Wang, T. Yan, and Z. Lv, “Complete canonical correlation analysis with application to multi-view gait recognition,” Pattern Recognit., vol. 50, pp. 107–117, Feb. 2016.
[48]
M. Hu, Y. Wang, Z. Zhang, J. J. Little, and D. Huang, “View-invariant discriminative projection for multi-view gait-based human identification,” IEEE Trans. Inf. Forensics Security, vol. 8, no. 12, pp. 2034–2045, Dec. 2013.
[49]
T. Wolf, M. Babaee, and G. Rigoll, “Multi-view gait recognition using 3D convolutional neural networks,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2016, pp. 4165–4169.
[50]
S. Yu, H. Chen, E. B. G. Reyes, and N. Poh, “GaitGAN: Invariant gait feature extraction using generative adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jul. 2017, pp. 532–539.
[51]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, pp. 84–90, May 2017.
[52]
Y. Zhong, J. Chen, and B. Huang, “Toward end-to-end face recognition through alignment learning,” IEEE Signal Process. Lett., vol. 24, no. 8, pp. 1213–1217, Aug. 2017.
[53]
Y. Makihara and Y. Yagi, “Silhouette extraction based on iterative spatio-temporal local color transformation and graph-cut segmentation,” in Proc. 19th Int. Conf. Pattern Recognit., Tampa, FL, USA, Dec. 2008, pp. 1–4.
[54]
G. Lin, A. Milan, C. Shen, and I. Reid, “RefineNet: Multi-path refinement networks for high-resolution semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1925–1934.
[55]
P. Krähenbühl and V. Koltun, “Efficient inference in fully connected CRFs with Gaussian edge potentials,” in Proc. Adv. Neural Inf. Process. Syst., J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, Eds. New York, NY, USA: Curran Associates, 2011, pp. 109–117.
[56]
T. W. Sederberg and S. R. Parry, “Free-form deformation of solid geometric models,” ACM SIGGRAPH Comput. Graph., vol. 20, no. 4, pp. 151–160, Aug. 1986. 10.1145/15886.15903.
[57]
Y. Makihara, D. Adachi, C. Xu, and Y. Yagi, “Gait recognition by deformable registration,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2018, pp. 561–571.
[58]
J. Chaki and N. Dey, A Beginner’s Guide to Image Preprocessing Techniques (Intelligent Signal Processing and Data Analysis). Boca Raton, FL, USA: CRC Press, 2018. [Online]. Available: https://books.google.co.jp/books?id=Dfp0DwAAQBAJ
[59]
V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proc. 27th Int. Conf. Int. Conf. Mach. Learn. (ICML). Madison, WI, USA: Omnipress, 2010, pp. 807–814. [Online]. Available: http://dl.acm.org/citation.cfm?id=3104322.3104425
[60]
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 1725–1732.
[61]
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, pp. 1929–1958, Jun. 2014.
[62]
R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), vol. 2, Jun. 2006, pp. 1735–1742.
[63]
J. Wanget al., “Learning fine-grained image similarity with deep ranking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Washington, DC, USA: IEEE Computer Society, Jun. 2014, pp. 1386–1393. 10.1109/CVPR.2014.180.
[64]
Y. Makihara, R. Sagawa, Y. Mukaigawa, T. Echigo, and Y. Yagi, “Which reference view is effective for gait identification using a view transformation model?” in Proc. IEEE Comput. Soc. Workshop Biometrics, New York, NY, USA, Jun. 2006.
[65]
N. Otsu, “Optimal linear and nonlinear solutions for least-square discriminant feature extraction,” in Proc. 6th Int. Conf. Pattern Recognit., 1982, pp. 557–560.
[66]
A. Sharma, A. Kumar, H. Daume, and D. W. Jacobs, “Generalized multiview analysis: A discriminative latent space,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 2160–2167.
[67]
X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proc. 13th Int. Conf. Artif. Intell. Statist. (AISTATS), vol. 9, May 2010, pp. 249–256.
[68]
L. Bottou and O. Bousquet, “The tradeoffs of large scale learning,” in Proc. 20th Int. Conf. Neural Inf. Process. Syst. (NIPS), 2007, pp. 161–168.

Cited By

View all
  • (2024)GaitDAN: Cross-View Gait Recognition via Adversarial Domain AdaptationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.338430834:9(8026-8040)Online publication date: 1-Sep-2024
  • (2024)Kinematic Diversity and Rhythmic Alignment in Choreographic Quality Transformers for Dance Quality AssessmentIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.336045234:7(5677-5692)Online publication date: 1-Jul-2024
  • (2024)Cloth-Imbalanced Gait Recognition via HallucinationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.336023234:7(5665-5676)Online publication date: 1-Jul-2024
  • Show More Cited By

Index Terms

  1. Cross-View Gait Recognition Using Pairwise Spatial Transformer Networks
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image IEEE Transactions on Circuits and Systems for Video Technology
        IEEE Transactions on Circuits and Systems for Video Technology  Volume 31, Issue 1
        Jan. 2021
        424 pages

        Publisher

        IEEE Press

        Publication History

        Published: 01 January 2021

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 11 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)GaitDAN: Cross-View Gait Recognition via Adversarial Domain AdaptationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.338430834:9(8026-8040)Online publication date: 1-Sep-2024
        • (2024)Kinematic Diversity and Rhythmic Alignment in Choreographic Quality Transformers for Dance Quality AssessmentIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.336045234:7(5677-5692)Online publication date: 1-Jul-2024
        • (2024)Cloth-Imbalanced Gait Recognition via HallucinationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.336023234:7(5665-5676)Online publication date: 1-Jul-2024
        • (2024)Spatiotemporal multi-scale bilateral motion network for gait recognitionThe Journal of Supercomputing10.1007/s11227-023-05607-380:3(3412-3440)Online publication date: 1-Feb-2024
        • (2024)Spatiotemporal smoothing aggregation enhanced multi-scale residual deep graph convolutional networks for skeleton-based gait recognitionApplied Intelligence10.1007/s10489-024-05422-054:8(6154-6174)Online publication date: 1-Apr-2024
        • (2024)Gait Recognition Based on Temporal Gait Information EnhancingMultiMedia Modeling10.1007/978-3-031-53308-2_33(451-463)Online publication date: 29-Jan-2024
        • (2023)Key Role Guided Transformer for Group Activity RecognitionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328328233:12(7803-7818)Online publication date: 1-Dec-2023
        • (2023)Pseudo-Mono for Monocular 3D Object Detection in Autonomous DrivingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.323757933:8(3962-3975)Online publication date: 1-Aug-2023
        • (2023)TransGait: Multimodal-based gait recognition with set transformerApplied Intelligence10.1007/s10489-022-03543-y53:2(1535-1547)Online publication date: 1-Jan-2023
        • (2023)Gait recognition using free-area transformer networksMachine Vision and Applications10.1007/s00138-023-01467-234:6Online publication date: 6-Oct-2023
        • Show More Cited By

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media