research-article

Hankelet-based dynamical systems modeling for 3D action recognition

Authors:

Liliana Lo Presti,

Marco La Cascia,

Octavia CampsAuthors Info & Claims

Image and Vision Computing, Volume 44, Issue C

Pages 29 - 43

https://doi.org/10.1016/j.imavis.2015.09.007

Published: 01 December 2015 Publication History

Abstract

This paper proposes to model an action as the output of a sequence of atomic Linear Time Invariant (LTI) systems. The sequence of LTI systems generating the action is modeled as a Markov chain, where a Hidden Markov Model (HMM) is used to model the transition from one atomic LTI system to another. In turn, the LTI systems are represented in terms of their Hankel matrices. For classification purposes, the parameters of a set of HMMs (one for each action class) are learned via a discriminative approach. This work proposes a novel method to learn the atomic LTI systems from training data, and analyzes in detail the action representation in terms of a sequence of Hankel matrices. Extensive evaluation of the proposed approach on two publicly available datasets demonstrates that the proposed method attains state-of-the-art accuracy in action classification from the 3D locations of body joints (skeleton). Display Omitted We model an action as sequence of outputs of linear time invariant (LTI) systems.We represent the outputs of LTI systems by means of Hankelets.We adopt an HMM to model the transitions from one LTI system to another.We formulate an inference and supervised learning formulation for our model.We also present a deep analysis of the parameter settings for our action representation.

References

[1]

S. Kwak, B. Han, J. Han, Scenario-based video event recognition by constraint flow, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2011, pp. 3345-3352.

[2]

U. Gaur, Y. Zhu, B. Song, A. Roy-Chowdhury, A "string of feature graphs" model for recognition of complex activities in natural videos, in: Proc. of International Conference on Computer Vision (ICCV), IEEE, 2011, pp. 2595-2602.

Digital Library

[3]

S. Park, J. Aggarwal, Recognition of two-person interactions using a hierarchical Bayesian network, in: First ACM SIGMM international workshop on Video surveillance, ACM, 2003, pp. 65-76.

Digital Library

[4]

I. Junejo, E. Dexter, I. Laptev, P. Pérez, View-independent action recognition from temporal self-similarities, IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), 33 (2011) 172-185.

Digital Library

[5]

Z. Duric, W. Gray, R. Heishman, F. Li, A. Rosenfeld, M. Schoelles, C. Schunn, H. Wechsler, Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction, Proc. IEEE, 90 (2002) 1272-1289.

[6]

Y.-J. Chang, S.-F. Chen, J.-D. Huang, A Kinect-based system for physical rehabilitation: a pilot study for young adults with motor disabilities, Res. Dev. Disabil., 32 (2011) 2566-2570.

[7]

A. Thangali, J.P. Nash, S. Sclaroff, C. Neidle, Exploiting phonological constraints for handshape inference in ASL video, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2011, pp. 521-528.

Digital Library

[8]

A. Thangali Varadaraju, Exploiting phonological constraints for handshape recognition in sign language video, Boston University, MA, USA, 2013.

[9]

H. Cooper, R. Bowden, Large lexicon detection of sign language, in: Lecture Notes in Computer Science, Springer, Berlin Heidelberg, 2007, pp. 88-97.

[10]

J.M. Rehg, G.D. Abowd, A. Rozga, M. Romero, M.A. Clements, S. Sclaroff, I. Essa, O.Y. Ousley, Y. Li, C. Kim, Decoding children's social behavior, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2013, pp. 3414-3421.

Digital Library

[11]

L. Lo Presti, S. Sclaroff, A. Rozga, Joint alignment and modeling of correlated behavior streams, in: Proc. of International Conference on Computer Vision-Workshops (ICCVW), 2013, pp. 730-737.

Digital Library

[12]

H. Moon, R. Sharma, N. Jung, Method and system for measuring shopper response to products based on behavior and facial expression, uS Patent 8,219,438 (Jul. 10 2012). URL http://www.google.com/patents/US8219438

[13]

J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio, A. Blake, M. Cook, R. Moore, Real-time human pose recognition in parts from single depth images, Commun. ACM, 56 (2013) 116-124.

Digital Library

[14]

L. Xia, C.-C. Chen, J. Aggarwal, View invariant human action recognition using histograms of 3D joints, in: Proc. of Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, 2012, pp. 20-27.

[15]

X. Yang, Y. Tian, EigenJoints-based action recognition using Naive-Bayes-Nearest-Neighbor, in: Proc. of Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, 2012, pp. 14-19.

[16]

O. Oreifej, Z. Liu, W. Redmond, HON4D: histogram of oriented 4D normals for activity recognition from depth sequences, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 716-723.

Digital Library

[17]

G. Evangelidis, G. Singh, R. Horaud, Skeletal quads: human action recognition using joint quadruples, in: Proc. of International Conference on Pattern Recognition (ICPR), IEEE, 2014, pp. 4513-4518.

Digital Library

[18]

R. Slama, H. Wannous, M. Daoudi, A. Srivastava, Accurate 3D action recognition using learning on the Grassmann manifold, Pattern Recogn. (PR), 48 (2015) 556-567.

[19]

J. Wang, Z. Liu, J. Chorowski, Z. Chen, Y. Wu, Robust 3D action recognition with random occupancy patterns, in: Proc. of European Conference on Computer Vision (ECCV), Springer, 2012, pp. 872-885.

[20]

J. Wang, Z. Liu, Y. Wu, J. Yuan, Mining actionlet ensemble for action recognition with depth cameras, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2012, pp. 1290-1297.

[21]

N. Ikizler, D. Forsyth, Searching video for complex activities with finite state models, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2007, pp. 1-8.

[22]

R. Li, T.-P. Tian, S. Sclaroff, M.-H. Yang, 3D human motion tracking with a coordinated mixture of factor analyzers, Int. J. Comput. Vis. (IJCV), 87 (2010) 170-190.

Digital Library

[23]

R. Li, T.-P. Tian, S. Sclaroff, Simultaneous learning of nonlinear manifold and dynamical models for high-dimensional time series, in: Proc. of International Conference on Computer Vision (ICCV), IEEE, 2007, pp. 1-8.

[24]

B. Li, O.I. Camps, M. Sznaier, Cross-view activity recognition using Hankelets, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2012, pp. 1362-1369.

[25]

B. Li, M. Ayazoglu, T. Mao, O.I. Camps, M. Sznaier, Activity recognition using dynamic subspace angles, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2011, pp. 3193-3200.

[26]

L. Lo Presti, M. La Cascia, S. Sclaroff, O. Camps, Gesture modeling by Hanklet-based hidden Markov model, in: Vol. 9005 of Lecture Notes in Computer Science, Springer International Publishing, 2015, pp. 529-546.

[27]

T.B. Moeslund, E. Granum, A survey of computer vision-based human motion capture, Comp. Vision Image Underst. (CVIU), 81 (2001) 231-268.

Digital Library

[28]

S. Mitra, T. Acharya, Gesture recognition: a survey, IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., 37 (2007) 311-324.

Digital Library

[29]

R. Poppe, A survey on vision-based human action recognition, Image Vis. Comput. (IMAVIS), 28 (2010) 976-990.

Digital Library

[30]

D. Weinland, R. Ronfard, E. Boyer, A survey of vision-based methods for action representation, segmentation and recognition, Comp. Vision Image Underst. (CVIU), 115 (2011) 224-241.

Digital Library

[31]

J. Klamka, Controllability of dynamical systems. A survey, Bull. Pol. Acad. Sci. Tech. Sci., 61 (2013) 335-342.

[32]

N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, IEEE, 2005, pp. 886-893.

[33]

P. Scovanner, S. Ali, M. Shah, A 3-dimensional SIFT descriptor and its application to action recognition, in: Proc. of Conference on Multimedia (MM), ACM, 2007, pp. 357-360.

[34]

V. Kellokumpu, G. Zhao, M. Pietikäinen, Human activity recognition using a dynamic texture based method, in: Proc. of British Machine Vision Conference (BMVC), vol. 1, BMVA Press, 2008, pp. 2.

[35]

J.C. Niebles, H. Wang, L. Fei-Fei, Unsupervised learning of human action categories using spatial-temporal words, Int. J. Comput. Vis. (IJCV), 79 (2008) 299-318.

Digital Library

[36]

I. Laptev, On space-time interest points, Int. J. Comput. Vis. (IJCV), 64 (2005) 107-123.

Digital Library

[37]

H. Wang, A. Kläser, C. Schmid, C.-L. Liu, Dense trajectories and motion boundary descriptors for action recognition, Int. J. Comput. Vis. (IJCV), 103 (2013) 60-79.

[38]

N. Dalal, B. Triggs, C. Schmid, Human detection using oriented histograms of flow and appearance, in: Proc. of European Conference on Computer Vision (ECCV), Springer, 2006, pp. 428-441.

[39]

W. Li, Z. Zhang, Z. Liu, Action recognition based on a bag of 3D points, in: Proc. of Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, 2010, pp. 9-14.

[40]

A.W. Vieira, E.R. Nascimento, G.L. Oliveira, Z. Liu, M.F. Campos, STOP: space-time occupancy patterns for 3D action recognition from depth map sequences, 2012.

[41]

A. Yao, J. Gall, G. Fanelli, L.J. Van Gool, Does human action recognition benefit from pose estimation?, in: Proc. of the British Machine Vision Conference (BMVC), vol. 3, BMVA Press, 2011, pp. 67.1-67.11.

[42]

M. Devanne, H. Wannous, S. Berretti, P. Pala, M. Daoudi, A. Del Bimbo, Space-time pose representation for 3D human action recognition, in: Proc. of the International Conference on Image Analysis and Processing (ICIAP), Springer, 2013, pp. 456-464.

[43]

J. Sung, C. Ponce, B. Selman, A. Saxena, Unstructured human activity detection from RGBD images, in: Proc. of International Conference on Robotics and Automation (ICRA), IEEE, 2012, pp. 842-849.

[44]

Y. Zhu, W. Chen, G. Guo, Fusing spatiotemporal features and joints for 3D action recognition, in: Proc. of Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, 2013, pp. 486-491.

Digital Library

[45]

B. Bamieh, L. Giarre, Identification of linear parameter varying models, Int. J. Robust Nonlinear Control, 12 (2002) 841-853.

[46]

S. Paoletti, A.L. Juloski, G. Ferrari-Trecate, R. Vidal, Identification of hybrid systems: a tutorial, Eur. J. Control., 13 (2007) 242-260.

[47]

E.D. Sontag, Nonlinear regulation: the piecewise linear approach, IEEE Trans. Autom. Control, 26 (1981) 346-358.

[48]

V. Gupta, R.M. Murray, L. Shi, B. Sinopoli, Networked sensing, estimation and control systems, California Institute of Technology Report (2009).

[49]

G. Doretto, A. Chiuso, Y.N. Wu, S. Soatto, Dynamic textures, Int. J. Comput. Vis. (IJCV), 51 (2003) 91-109.

[50]

C. Dicle, O.I. Camps, M. Sznaier, The way they move: tracking multiple targets with similar appearance, in: Proc. of International Conference on Computer Vision (ICCV), IEEE, 2013, pp. 2304-2311.

Digital Library

[51]

A.C. Sankaranarayanan, P.K. Turaga, R.G. Baraniuk, R. Chellappa, Compressive acquisition of dynamic scenes, in: Proc. of European Conference on Computer Vision (ECCV), Springer, 2010, pp. 129-142.

[52]

L. Lo Presti, M. La Cascia, Using Hankel matrices for dynamics-based facial emotion recognition and pain detection, in: Proc. of Computer Vision and Pattern Recognition Workshops (AMFG-CVPRW), IEEE, 2015, pp. 36-33.

[53]

L. Lo Presti, M. La Cascia, Ensemble of Hankel Matrices for Face Emotion Recognition, Proc. of International Conference on Image Analysis and Processing (ICIAP), Springer International Publishing, 2015.

[54]

F. Cuzzolin, M. Sapienza, Learning pullback HMM distances, IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), 36 (2014) 1483-1489.

Digital Library

[55]

H. Jiang, Discriminative training of HMMs for automatic speech recognition: a survey, Comput. Speech Lang., 24 (2010) 589-608.

Digital Library

[56]

T. Jebara, A. Pentland, Maximum conditional likelihood via bound maximization and the cem algorithm, in: Advances in Neural Information Processing Systems (NIPS), vol. 1, The MIT Press, 1998, pp. 494-500.

[57]

L. Bahl, P. Brown, P. De Souza, R. Mercer, Maximum mutual information estimation of hidden markov model parameters for speech recognition, in: Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, 1986, pp. 49-52.

[58]

R. Meir, Empirical risk minimization versus maximum-likelihood estimation: a case study, Neural Comput., 7 (1995) 144-157.

Digital Library

[59]

A.J. Smola, Advances in large margin classifiers, The MIT Press, 2000.

[60]

J. Lafferty, A. McCallum, F.C. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data, in: Proc. of International Conference on Machine Learning (ICML), Morgan Kaufmann Publishers Inc., 2001, pp. 282-289.

[61]

A. Quattoni, S. Wang, L.-P. Morency, M. Collins, T. Darrell, Hidden conditional random fields, IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), 29 (2007) 1848-1852.

Digital Library

[62]

P. Felzenszwalb, D. McAllester, D. Ramanan, A discriminatively trained, multiscale, deformable part model, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2008, pp. 1-8.

[63]

I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun, Large margin methods for structured and interdependent output variables, J. Mach. Learn. Res. (JMLR), 6 (2005) 1453-1484.

Digital Library

[64]

Y. Wang, G. Mori, Max-margin Hidden Conditional Random Fields for human action recognition, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2009, pp. 872-879.

[65]

D. Weinland, E. Boyer, R. Ronfard, Action recognition from arbitrary views using 3D exemplars, in: Proc. of International Conference on Computer Vision (ICCV), IEEE, 2007, pp. 1-7.

[66]

F. Martinez-Contreras, C. Orrite-Urunuela, E. Herrero-Jaraba, H. Ragheb, S.A. Velastin, Recognizing human actions using silhouette-based HMM, in: Proc. of International Conference on Advanced Video and Signal Based Surveillance (AVSS), IEEE, 2009, pp. 43-48.

Digital Library

[67]

F. Lv, R. Nevatia, Recognition and segmentation of 3-D human action using HMM and multi-class AdaBoost, in: Proc. of European Conference on Computer Vision (ECCV), Springer, 2006, pp. 359-372.

[68]

T. Lan, Y. Wang, W. Yang, G. Mori, Beyond actions: discriminative models for contextual group activities, in: Advances in Neural Information Processing Systems (NIPS), vol. 4321, The MIT Press, 2010, pp. 4322-4325.

[69]

A.D. Wilson, A.F. Bobick, Parametric Hidden Markov Models for gesture recognition, IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), 21 (1999) 884-900.

Digital Library

[70]

L.E. Baum, T. Petrie, G. Soules, N. Weiss, A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains, Ann. Math. Stat., 41 (1970) 164-171.

[71]

F. Sha, L.K. Saul, Large margin Hidden Markov Models for automatic speech recognition, in: Advances in Neural Information Processing Systems (NIPS), vol. 19, The MIT Press, 2007, pp. 1249.

[72]

M. Collins, Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms, in: Proc. of Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 10, Association for Computational Linguistics, 2002, pp. 1-8.

Digital Library

[73]

B. Taskar, C. Guestrin, D. Koller, Max-margin Markov networks, in: Advances in Neural Information Processing Systems (NIPS), vol. 16, 2004, pp. 25.

[74]

Y. Altun, I. Tsochantaridis, T. Hofmann, Hidden Markov support vector machines, in: Proc. of International Conference on Machine Learning (ICML), vol. 3, 2003, pp. 3-10.

[75]

L. Rabiner, A tutorial on Hidden Markov Models and selected applications in speech recognition, Proc. IEEE, 77 (1989) 257-286.

Digital Library

[76]

J. Nocedal, S.J. Wright, Numerical optimization, Springer, 2006.

[77]

S.Z. Masood, C. Ellis, M.F. Tappen, J.J. LaViola, R. Sukthankar, Exploring the trade-off between accuracy and observational latency in action recognition, Int. J. Comput. Vis. (IJCV), 101 (2013) 420-436.

Digital Library

[78]

F. Ofli, R. Chaudhry, G. Kurillo, R. Vidal, R. Bajcsy, Sequence of the most informative joints (SMIJ): a new representation for human skeletal action recognition, J. Vis. Commun. Image Represent., 25 (2014) 24-38.

Digital Library

[79]

J. Martens, I. Sutskever, Learning recurrent neural networks with Hessian-free optimization, in: Proc. of International Conference on Machine Learning (ICML), 2011, pp. 1033-1040.

[80]

R. Vemulapalli, F. Arrate, R. Chellappa, Human action recognition by representing 3D skeletons as points in a Lie Group, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2014, pp. 588-595.

Digital Library

[81]

C. Wang, Y. Wang, A.L. Yuille, An approach to pose-based action recognition, in: Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2013, pp. 915-922.

Digital Library

[82]

X. Yang, C. Zhang, Y. Tian, Recognizing actions using depth motion maps-based histograms of oriented gradients, in: Proc. of International Conference on Multimedia (MM), ACM, 2012, pp. 1057-1060.

Digital Library

[83]

E. Ohn-Bar, M.M. Trivedi, Joint angles similarities and HOG2 for action recognition, in: Proc. of Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, 2013, pp. 465-470.

Digital Library

[84]

N. Raman, S.J. Maybank, Action classification using a discriminative multilevel HDP-HMM, Neurocomputing (2015).

[85]

M.E. Hussein, M. Torki, M.A. Gowayyed, M. El-Saban, Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations, in: Proc. of International Joint Conference on Artificial Intelligence (IJCAI), AAAI Press, 2013, pp. 2466-2472.

[86]

J.R. Padilla-López, A.A. Chaaraoui, F. Flórez-Revuelta, A discussion on the validation tests employed to compare human action recognition methods using the MSR Action3D dataset, CoRR abs/1407.7390, 2015.

Cited By

Gudmundsson JSeybold MPfeifer J(2022)Exploring Sub-skeleton Trajectories for Interpretable Recognition of Sign LanguageDatabase Systems for Advanced Applications10.1007/978-3-031-00123-9_19(241-249)Online publication date: 11-Apr-2022
https://dl.acm.org/doi/10.1007/978-3-031-00123-9_19
Berretti SDaoudi MTuraga PBasu A(2018)Representation, Analysis, and Recognition of 3D HumansACM Transactions on Multimedia Computing, Communications, and Applications10.1145/318217914:1s(1-36)Online publication date: 6-Mar-2018
https://dl.acm.org/doi/10.1145/3182179
Shao ZLi YGuo YYang JWang Z(2018)A Hierarchical Model for Action Recognition Based on Body Parts2018 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA.2018.8460516(1978-1985)Online publication date: 21-May-2018
https://dl.acm.org/doi/10.1109/ICRA.2018.8460516
Show More Cited By

Hankelet-based dynamical systems modeling for 3D action recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks

Recommendations

Hidden Part Models for Human Action Recognition: Probabilistic versus Max Margin

We present a discriminative part-based approach for human action recognition from video sequences using motion features. Our model is based on the recently proposed hidden conditional random field (HCRF) for object recognition. Similarly to HCRF for ...
Video Classification via Weakly Supervised Sequence Modeling

We proposed a multiple-instance learning algorithm to model sequential data, with novel conditional likelihood formulation.We incorporated temporal correlations into MIL for video classification, by explicitly exploiting the chain CRFs model.Our method ...
A new look at discriminative training for hidden Markov models

Discriminative training for hidden Markov models (HMMs) has been a central theme in speech recognition research for many years. One most popular technique is minimum classification error (MCE) training, with the objective function closely related to the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Image and Vision Computing

Image and Vision Computing Volume 44, Issue C

December 2015

74 pages

ISSN:0262-8856

Issue’s Table of Contents

Copyright © Elsevier B.V.

Publisher

Butterworth-Heinemann

United States

Publication History

Published: 01 December 2015

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gudmundsson JSeybold MPfeifer J(2022)Exploring Sub-skeleton Trajectories for Interpretable Recognition of Sign LanguageDatabase Systems for Advanced Applications10.1007/978-3-031-00123-9_19(241-249)Online publication date: 11-Apr-2022
https://dl.acm.org/doi/10.1007/978-3-031-00123-9_19
Berretti SDaoudi MTuraga PBasu A(2018)Representation, Analysis, and Recognition of 3D HumansACM Transactions on Multimedia Computing, Communications, and Applications10.1145/318217914:1s(1-36)Online publication date: 6-Mar-2018
https://dl.acm.org/doi/10.1145/3182179
Shao ZLi YGuo YYang JWang Z(2018)A Hierarchical Model for Action Recognition Based on Body Parts2018 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA.2018.8460516(1978-1985)Online publication date: 21-May-2018
https://dl.acm.org/doi/10.1109/ICRA.2018.8460516
Ma QShen LChen ETian SWang JCottrell G(2017)WALKING WALKing walkingProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3172077.3172230(2457-2463)Online publication date: 19-Aug-2017
https://dl.acm.org/doi/10.5555/3172077.3172230

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents