Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Hankelet-based dynamical systems modeling for 3D action recognition

Published: 01 December 2015 Publication History

Abstract

This paper proposes to model an action as the output of a sequence of atomic Linear Time Invariant (LTI) systems. The sequence of LTI systems generating the action is modeled as a Markov chain, where a Hidden Markov Model (HMM) is used to model the transition from one atomic LTI system to another. In turn, the LTI systems are represented in terms of their Hankel matrices. For classification purposes, the parameters of a set of HMMs (one for each action class) are learned via a discriminative approach. This work proposes a novel method to learn the atomic LTI systems from training data, and analyzes in detail the action representation in terms of a sequence of Hankel matrices. Extensive evaluation of the proposed approach on two publicly available datasets demonstrates that the proposed method attains state-of-the-art accuracy in action classification from the 3D locations of body joints (skeleton). Display Omitted We model an action as sequence of outputs of linear time invariant (LTI) systems.We represent the outputs of LTI systems by means of Hankelets.We adopt an HMM to model the transitions from one LTI system to another.We formulate an inference and supervised learning formulation for our model.We also present a deep analysis of the parameter settings for our action representation.

References

[1]
S. Kwak, B. Han, J. Han, Scenario-based video event recognition by constraint flow, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2011, pp. 3345-3352.
[2]
U. Gaur, Y. Zhu, B. Song, A. Roy-Chowdhury, A "string of feature graphs" model for recognition of complex activities in natural videos, in: Proc. of International Conference on Computer Vision (ICCV), IEEE, 2011, pp. 2595-2602.
[3]
S. Park, J. Aggarwal, Recognition of two-person interactions using a hierarchical Bayesian network, in: First ACM SIGMM international workshop on Video surveillance, ACM, 2003, pp. 65-76.
[4]
I. Junejo, E. Dexter, I. Laptev, P. Pérez, View-independent action recognition from temporal self-similarities, IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), 33 (2011) 172-185.
[5]
Z. Duric, W. Gray, R. Heishman, F. Li, A. Rosenfeld, M. Schoelles, C. Schunn, H. Wechsler, Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction, Proc. IEEE, 90 (2002) 1272-1289.
[6]
Y.-J. Chang, S.-F. Chen, J.-D. Huang, A Kinect-based system for physical rehabilitation: a pilot study for young adults with motor disabilities, Res. Dev. Disabil., 32 (2011) 2566-2570.
[7]
A. Thangali, J.P. Nash, S. Sclaroff, C. Neidle, Exploiting phonological constraints for handshape inference in ASL video, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2011, pp. 521-528.
[8]
A. Thangali Varadaraju, Exploiting phonological constraints for handshape recognition in sign language video, Boston University, MA, USA, 2013.
[9]
H. Cooper, R. Bowden, Large lexicon detection of sign language, in: Lecture Notes in Computer Science, Springer, Berlin Heidelberg, 2007, pp. 88-97.
[10]
J.M. Rehg, G.D. Abowd, A. Rozga, M. Romero, M.A. Clements, S. Sclaroff, I. Essa, O.Y. Ousley, Y. Li, C. Kim, Decoding children's social behavior, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2013, pp. 3414-3421.
[11]
L. Lo Presti, S. Sclaroff, A. Rozga, Joint alignment and modeling of correlated behavior streams, in: Proc. of International Conference on Computer Vision-Workshops (ICCVW), 2013, pp. 730-737.
[12]
H. Moon, R. Sharma, N. Jung, Method and system for measuring shopper response to products based on behavior and facial expression, uS Patent 8,219,438 (Jul. 10 2012). URL http://www.google.com/patents/US8219438
[13]
J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio, A. Blake, M. Cook, R. Moore, Real-time human pose recognition in parts from single depth images, Commun. ACM, 56 (2013) 116-124.
[14]
L. Xia, C.-C. Chen, J. Aggarwal, View invariant human action recognition using histograms of 3D joints, in: Proc. of Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, 2012, pp. 20-27.
[15]
X. Yang, Y. Tian, EigenJoints-based action recognition using Naive-Bayes-Nearest-Neighbor, in: Proc. of Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, 2012, pp. 14-19.
[16]
O. Oreifej, Z. Liu, W. Redmond, HON4D: histogram of oriented 4D normals for activity recognition from depth sequences, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 716-723.
[17]
G. Evangelidis, G. Singh, R. Horaud, Skeletal quads: human action recognition using joint quadruples, in: Proc. of International Conference on Pattern Recognition (ICPR), IEEE, 2014, pp. 4513-4518.
[18]
R. Slama, H. Wannous, M. Daoudi, A. Srivastava, Accurate 3D action recognition using learning on the Grassmann manifold, Pattern Recogn. (PR), 48 (2015) 556-567.
[19]
J. Wang, Z. Liu, J. Chorowski, Z. Chen, Y. Wu, Robust 3D action recognition with random occupancy patterns, in: Proc. of European Conference on Computer Vision (ECCV), Springer, 2012, pp. 872-885.
[20]
J. Wang, Z. Liu, Y. Wu, J. Yuan, Mining actionlet ensemble for action recognition with depth cameras, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2012, pp. 1290-1297.
[21]
N. Ikizler, D. Forsyth, Searching video for complex activities with finite state models, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2007, pp. 1-8.
[22]
R. Li, T.-P. Tian, S. Sclaroff, M.-H. Yang, 3D human motion tracking with a coordinated mixture of factor analyzers, Int. J. Comput. Vis. (IJCV), 87 (2010) 170-190.
[23]
R. Li, T.-P. Tian, S. Sclaroff, Simultaneous learning of nonlinear manifold and dynamical models for high-dimensional time series, in: Proc. of International Conference on Computer Vision (ICCV), IEEE, 2007, pp. 1-8.
[24]
B. Li, O.I. Camps, M. Sznaier, Cross-view activity recognition using Hankelets, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2012, pp. 1362-1369.
[25]
B. Li, M. Ayazoglu, T. Mao, O.I. Camps, M. Sznaier, Activity recognition using dynamic subspace angles, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2011, pp. 3193-3200.
[26]
L. Lo Presti, M. La Cascia, S. Sclaroff, O. Camps, Gesture modeling by Hanklet-based hidden Markov model, in: Vol. 9005 of Lecture Notes in Computer Science, Springer International Publishing, 2015, pp. 529-546.
[27]
T.B. Moeslund, E. Granum, A survey of computer vision-based human motion capture, Comp. Vision Image Underst. (CVIU), 81 (2001) 231-268.
[28]
S. Mitra, T. Acharya, Gesture recognition: a survey, IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., 37 (2007) 311-324.
[29]
R. Poppe, A survey on vision-based human action recognition, Image Vis. Comput. (IMAVIS), 28 (2010) 976-990.
[30]
D. Weinland, R. Ronfard, E. Boyer, A survey of vision-based methods for action representation, segmentation and recognition, Comp. Vision Image Underst. (CVIU), 115 (2011) 224-241.
[31]
J. Klamka, Controllability of dynamical systems. A survey, Bull. Pol. Acad. Sci. Tech. Sci., 61 (2013) 335-342.
[32]
N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, IEEE, 2005, pp. 886-893.
[33]
P. Scovanner, S. Ali, M. Shah, A 3-dimensional SIFT descriptor and its application to action recognition, in: Proc. of Conference on Multimedia (MM), ACM, 2007, pp. 357-360.
[34]
V. Kellokumpu, G. Zhao, M. Pietikäinen, Human activity recognition using a dynamic texture based method, in: Proc. of British Machine Vision Conference (BMVC), vol. 1, BMVA Press, 2008, pp. 2.
[35]
J.C. Niebles, H. Wang, L. Fei-Fei, Unsupervised learning of human action categories using spatial-temporal words, Int. J. Comput. Vis. (IJCV), 79 (2008) 299-318.
[36]
I. Laptev, On space-time interest points, Int. J. Comput. Vis. (IJCV), 64 (2005) 107-123.
[37]
H. Wang, A. Kläser, C. Schmid, C.-L. Liu, Dense trajectories and motion boundary descriptors for action recognition, Int. J. Comput. Vis. (IJCV), 103 (2013) 60-79.
[38]
N. Dalal, B. Triggs, C. Schmid, Human detection using oriented histograms of flow and appearance, in: Proc. of European Conference on Computer Vision (ECCV), Springer, 2006, pp. 428-441.
[39]
W. Li, Z. Zhang, Z. Liu, Action recognition based on a bag of 3D points, in: Proc. of Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, 2010, pp. 9-14.
[40]
A.W. Vieira, E.R. Nascimento, G.L. Oliveira, Z. Liu, M.F. Campos, STOP: space-time occupancy patterns for 3D action recognition from depth map sequences, 2012.
[41]
A. Yao, J. Gall, G. Fanelli, L.J. Van Gool, Does human action recognition benefit from pose estimation?, in: Proc. of the British Machine Vision Conference (BMVC), vol. 3, BMVA Press, 2011, pp. 67.1-67.11.
[42]
M. Devanne, H. Wannous, S. Berretti, P. Pala, M. Daoudi, A. Del Bimbo, Space-time pose representation for 3D human action recognition, in: Proc. of the International Conference on Image Analysis and Processing (ICIAP), Springer, 2013, pp. 456-464.
[43]
J. Sung, C. Ponce, B. Selman, A. Saxena, Unstructured human activity detection from RGBD images, in: Proc. of International Conference on Robotics and Automation (ICRA), IEEE, 2012, pp. 842-849.
[44]
Y. Zhu, W. Chen, G. Guo, Fusing spatiotemporal features and joints for 3D action recognition, in: Proc. of Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, 2013, pp. 486-491.
[45]
B. Bamieh, L. Giarre, Identification of linear parameter varying models, Int. J. Robust Nonlinear Control, 12 (2002) 841-853.
[46]
S. Paoletti, A.L. Juloski, G. Ferrari-Trecate, R. Vidal, Identification of hybrid systems: a tutorial, Eur. J. Control., 13 (2007) 242-260.
[47]
E.D. Sontag, Nonlinear regulation: the piecewise linear approach, IEEE Trans. Autom. Control, 26 (1981) 346-358.
[48]
V. Gupta, R.M. Murray, L. Shi, B. Sinopoli, Networked sensing, estimation and control systems, California Institute of Technology Report (2009).
[49]
G. Doretto, A. Chiuso, Y.N. Wu, S. Soatto, Dynamic textures, Int. J. Comput. Vis. (IJCV), 51 (2003) 91-109.
[50]
C. Dicle, O.I. Camps, M. Sznaier, The way they move: tracking multiple targets with similar appearance, in: Proc. of International Conference on Computer Vision (ICCV), IEEE, 2013, pp. 2304-2311.
[51]
A.C. Sankaranarayanan, P.K. Turaga, R.G. Baraniuk, R. Chellappa, Compressive acquisition of dynamic scenes, in: Proc. of European Conference on Computer Vision (ECCV), Springer, 2010, pp. 129-142.
[52]
L. Lo Presti, M. La Cascia, Using Hankel matrices for dynamics-based facial emotion recognition and pain detection, in: Proc. of Computer Vision and Pattern Recognition Workshops (AMFG-CVPRW), IEEE, 2015, pp. 36-33.
[53]
L. Lo Presti, M. La Cascia, Ensemble of Hankel Matrices for Face Emotion Recognition, Proc. of International Conference on Image Analysis and Processing (ICIAP), Springer International Publishing, 2015.
[54]
F. Cuzzolin, M. Sapienza, Learning pullback HMM distances, IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), 36 (2014) 1483-1489.
[55]
H. Jiang, Discriminative training of HMMs for automatic speech recognition: a survey, Comput. Speech Lang., 24 (2010) 589-608.
[56]
T. Jebara, A. Pentland, Maximum conditional likelihood via bound maximization and the cem algorithm, in: Advances in Neural Information Processing Systems (NIPS), vol. 1, The MIT Press, 1998, pp. 494-500.
[57]
L. Bahl, P. Brown, P. De Souza, R. Mercer, Maximum mutual information estimation of hidden markov model parameters for speech recognition, in: Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, 1986, pp. 49-52.
[58]
R. Meir, Empirical risk minimization versus maximum-likelihood estimation: a case study, Neural Comput., 7 (1995) 144-157.
[59]
A.J. Smola, Advances in large margin classifiers, The MIT Press, 2000.
[60]
J. Lafferty, A. McCallum, F.C. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data, in: Proc. of International Conference on Machine Learning (ICML), Morgan Kaufmann Publishers Inc., 2001, pp. 282-289.
[61]
A. Quattoni, S. Wang, L.-P. Morency, M. Collins, T. Darrell, Hidden conditional random fields, IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), 29 (2007) 1848-1852.
[62]
P. Felzenszwalb, D. McAllester, D. Ramanan, A discriminatively trained, multiscale, deformable part model, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2008, pp. 1-8.
[63]
I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun, Large margin methods for structured and interdependent output variables, J. Mach. Learn. Res. (JMLR), 6 (2005) 1453-1484.
[64]
Y. Wang, G. Mori, Max-margin Hidden Conditional Random Fields for human action recognition, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2009, pp. 872-879.
[65]
D. Weinland, E. Boyer, R. Ronfard, Action recognition from arbitrary views using 3D exemplars, in: Proc. of International Conference on Computer Vision (ICCV), IEEE, 2007, pp. 1-7.
[66]
F. Martinez-Contreras, C. Orrite-Urunuela, E. Herrero-Jaraba, H. Ragheb, S.A. Velastin, Recognizing human actions using silhouette-based HMM, in: Proc. of International Conference on Advanced Video and Signal Based Surveillance (AVSS), IEEE, 2009, pp. 43-48.
[67]
F. Lv, R. Nevatia, Recognition and segmentation of 3-D human action using HMM and multi-class AdaBoost, in: Proc. of European Conference on Computer Vision (ECCV), Springer, 2006, pp. 359-372.
[68]
T. Lan, Y. Wang, W. Yang, G. Mori, Beyond actions: discriminative models for contextual group activities, in: Advances in Neural Information Processing Systems (NIPS), vol. 4321, The MIT Press, 2010, pp. 4322-4325.
[69]
A.D. Wilson, A.F. Bobick, Parametric Hidden Markov Models for gesture recognition, IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), 21 (1999) 884-900.
[70]
L.E. Baum, T. Petrie, G. Soules, N. Weiss, A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains, Ann. Math. Stat., 41 (1970) 164-171.
[71]
F. Sha, L.K. Saul, Large margin Hidden Markov Models for automatic speech recognition, in: Advances in Neural Information Processing Systems (NIPS), vol. 19, The MIT Press, 2007, pp. 1249.
[72]
M. Collins, Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms, in: Proc. of Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 10, Association for Computational Linguistics, 2002, pp. 1-8.
[73]
B. Taskar, C. Guestrin, D. Koller, Max-margin Markov networks, in: Advances in Neural Information Processing Systems (NIPS), vol. 16, 2004, pp. 25.
[74]
Y. Altun, I. Tsochantaridis, T. Hofmann, Hidden Markov support vector machines, in: Proc. of International Conference on Machine Learning (ICML), vol. 3, 2003, pp. 3-10.
[75]
L. Rabiner, A tutorial on Hidden Markov Models and selected applications in speech recognition, Proc. IEEE, 77 (1989) 257-286.
[76]
J. Nocedal, S.J. Wright, Numerical optimization, Springer, 2006.
[77]
S.Z. Masood, C. Ellis, M.F. Tappen, J.J. LaViola, R. Sukthankar, Exploring the trade-off between accuracy and observational latency in action recognition, Int. J. Comput. Vis. (IJCV), 101 (2013) 420-436.
[78]
F. Ofli, R. Chaudhry, G. Kurillo, R. Vidal, R. Bajcsy, Sequence of the most informative joints (SMIJ): a new representation for human skeletal action recognition, J. Vis. Commun. Image Represent., 25 (2014) 24-38.
[79]
J. Martens, I. Sutskever, Learning recurrent neural networks with Hessian-free optimization, in: Proc. of International Conference on Machine Learning (ICML), 2011, pp. 1033-1040.
[80]
R. Vemulapalli, F. Arrate, R. Chellappa, Human action recognition by representing 3D skeletons as points in a Lie Group, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2014, pp. 588-595.
[81]
C. Wang, Y. Wang, A.L. Yuille, An approach to pose-based action recognition, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2013, pp. 915-922.
[82]
X. Yang, C. Zhang, Y. Tian, Recognizing actions using depth motion maps-based histograms of oriented gradients, in: Proc. of International Conference on Multimedia (MM), ACM, 2012, pp. 1057-1060.
[83]
E. Ohn-Bar, M.M. Trivedi, Joint angles similarities and HOG2 for action recognition, in: Proc. of Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, 2013, pp. 465-470.
[84]
N. Raman, S.J. Maybank, Action classification using a discriminative multilevel HDP-HMM, Neurocomputing (2015).
[85]
M.E. Hussein, M. Torki, M.A. Gowayyed, M. El-Saban, Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations, in: Proc. of International Joint Conference on Artificial Intelligence (IJCAI), AAAI Press, 2013, pp. 2466-2472.
[86]
J.R. Padilla-López, A.A. Chaaraoui, F. Flórez-Revuelta, A discussion on the validation tests employed to compare human action recognition methods using the MSR Action3D dataset, CoRR abs/1407.7390, 2015.

Cited By

View all
  • (2022)Exploring Sub-skeleton Trajectories for Interpretable Recognition of Sign LanguageDatabase Systems for Advanced Applications10.1007/978-3-031-00123-9_19(241-249)Online publication date: 11-Apr-2022
  • (2018)Representation, Analysis, and Recognition of 3D HumansACM Transactions on Multimedia Computing, Communications, and Applications10.1145/318217914:1s(1-36)Online publication date: 6-Mar-2018
  • (2018)A Hierarchical Model for Action Recognition Based on Body Parts2018 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA.2018.8460516(1978-1985)Online publication date: 21-May-2018
  • Show More Cited By
  1. Hankelet-based dynamical systems modeling for 3D action recognition

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Image and Vision Computing
    Image and Vision Computing  Volume 44, Issue C
    December 2015
    74 pages

    Publisher

    Butterworth-Heinemann

    United States

    Publication History

    Published: 01 December 2015

    Author Tags

    1. Action
    2. Discriminative learning
    3. Hankel Matrix
    4. Hidden Markov Model
    5. Linear time invariant system

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Exploring Sub-skeleton Trajectories for Interpretable Recognition of Sign LanguageDatabase Systems for Advanced Applications10.1007/978-3-031-00123-9_19(241-249)Online publication date: 11-Apr-2022
    • (2018)Representation, Analysis, and Recognition of 3D HumansACM Transactions on Multimedia Computing, Communications, and Applications10.1145/318217914:1s(1-36)Online publication date: 6-Mar-2018
    • (2018)A Hierarchical Model for Action Recognition Based on Body Parts2018 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA.2018.8460516(1978-1985)Online publication date: 21-May-2018
    • (2017)WALKING WALKing walkingProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3172077.3172230(2457-2463)Online publication date: 19-Aug-2017

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media