Abstract
In this paper, we present a human action recognition method for human silhouette sequences. Inspired by the locality preserving projection and its variants, a novel manifold embedding method, maximum spatio-temporal dissimilarity embedding, is proposed to embed each action frame into a manifold, where frames from different action classes can be well separated. Unlike existing methods that incorporate both inter-class and intra-class information in the embedding process, our proposed method focuses on maximizing distances between frames that are similar in appearance but are from different classes and takes the temporal information into consideration. A variant of Hausdorff distance is introduced for frame and sequence classifications. Extensive experimental results and comparison with state-of-the-art methods demonstrate the effectiveness and robustness of the proposed method for human action silhouette analysis.
Similar content being viewed by others
Notes
The results of LSTDE are obtained from its original paper.
References
Levin, E., Pieraccini, R., Eckert, W.: A stochastic model of human–machine interaction for learning dialog strategies. IEEE Trans. Speech Audio Process. 8, 11–23 (2000)
Dufaux, F., Ebrahimi, T.: Scrambling for Video Surveillance with Privacy. IEEE Conference on Computer Vision and Pattern Recognition Workshop (2006)
Rougier, C., Meunier, J., St-Arnaud, A., Rousseau, J.: Fall detection from Humhan shape and motion history using video surveillance. Int. Conf. Adv. Inf. Netw. Appl. Workshops 2, 875–880 (2007)
Niebles, J.C., Chen, C., Li, F.: Modeling temporal structure of decomposable motion segments for activity classification. European Conference on Computer Vision, pp. 392–405 (2010)
Petkovłc, M., Jonker, W.: Content-based video retrieval by integrating spatio-temporal and stochastic recognition of events. IEEE Workshop on Detection and Recognition of Events in Video, pp. 75–82 (2001)
Geetha, P., Narayanan, V.: A survey of content-based video retrieval. J. Comput. Sci. 4, 474–486 (2008)
Efros, A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. Int. Conf. Comput. Vis. 2, 726–733 (2003)
Collins, R., Gross, R., Shi, J.: Silhouette-based human identification from body shape and gait. IEEE Conference on Automatic Face and Gesture Recognition, pp. 366–371 (2002)
Schldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. IEEE Conf. Autom. Face Gesture Recognit. 3, 32–36 (2004)
Ke, Y., Sukthankar, R., Hebert, M.H.: Efficient visual event detection using volumetric features. Int. Conf. Comput. Vis. 1, 166–173 (2005)
Ivan, L.: On space-time interest points. Int. J. Comput. Vis. 64, 107–123 (2005)
Wang, L., Suter, D.: Learning and matching of dynamic shape manifolds for human action recognition. IEEE Trans. Image Process. 16, 1646–1661 (2007)
Wang, L., Suter, D.: Recognizing human activities from silhouettes: motion subspace and factorial discriminative graphical model. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007) (2007)
Bobick, A., Davis, J.: The recognition of human movement using temporal template. IEEE Trans. Pattern Anal. Mach. Intell. 23, 257–267 (2001)
He, X., Niyogi, P.: Locality preserving projections. Neural Inf. Process. Syst. 16, 153–160 (2003)
Blackburn, J., Ribeiro, E.: Human motion recognition using isomap and dynamic time warping. Int. Conf. Comput. Vis. Workshop Human Motion 4814, 285–298 (2007)
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Rabiner, L., Juang, B.: Fundamentals of Speech Recognition. Prentice Hall, New York (1993)
Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. Neural Inf. Process. Syst. 14, 585–591 (2001)
Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimension reduction via local tangent space alignment. SIAM J. Sci. Comput. 8, 406–424 (2005)
Yan, S., Xu, D., Zhang, B., Zhang, H., Yang, Q., Lin, S.: Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 40–51 (2007)
Jenkins, O., Mataric, M.: A spatio-temporal extension to isomap nonlinear dimension reduction. International Conference on Machine Learning, pp. 56–61 (2004)
Fang, C., Chen, J., Tseng, C., Lien, J.: Human action recognition using spatio-temporal classification. Asian Conf. Comput. Vis. 5995, 98–109 (2009)
Lewandowski, M., del Rincon, J.M., Makris, D., Nebe, J.: Temporal extension of laplacian eigenmaps for unsupervised dimensionality reduction of time series. International Conference on Pattern Recognition, pp. 161–164 (2010)
Jia, K., Yeung, D.: Human action recognition using local spatio-temporal discriminant embedding. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Zheng, Z., Yanga, F., Tana, W., Jiaa, J., Yangb, J.: Gabor feature-based face recognition using supervised locality preserving projection. Signal Process. 87, 2473–2483 (2007)
Okiopoulou, E., Saad, Y.: Orthogonal neighborhood preserving projections: a projection-based dimensionality reduction technique. IEEE Trans. Pattern Anal. Mach. Intell. 29, 2143–2156 (2007)
Cai, D., He, X., Zhou, K.: Locality sensitive discriminant analysis. International Joint Conference on Artificial Intelligence, pp. 708–713 (2007)
Cai, D., He, X.: Orthogonal locality preserving indexing. ACM SIGIR Conference on Research and development in Information Retrieval, pp. 3–10 (2005)
Wang, L., Suter, D.: Visual learning and recognition of sequential data manifolds with applications to human movement analysis. Comput. Vis. Image Underst. 110, 153–172 (2008)
Ma, J., Yuen, P.C., Zou, W., Lai, J.H.: Supervised neighborhood topology learning for human action recognition. International Conference on Computer Vision Workshops, pp. 476–481 (2009)
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Action as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29, 2247–2253 (2007)
Wang, L., Tan, T.: Silhouette analysis based gait recognition for human identification. IEEE Trans. Pattern Anal. Mach. Intell. 25, 1505–1518 (2003)
Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104, 249–257 (2006)
Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3D exemplars. International Conference on Computer Vision, pp. 1–7 (2007)
Acknowledgments
This work was supported by the National Science Foundation of China (No. 61301269 and No. 61201271), the Research Fund for the Doctoral Program of Higher Education (No. 20100185120021), the Science and Technology Cooperation Program with the Academy of China and Sichuan Province (No. 2012JZ0001).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cheng, J., Liu, H. & Li, H. Silhouette analysis for human action recognition based on maximum spatio-temporal dissimilarity embedding. Machine Vision and Applications 25, 1007–1018 (2014). https://doi.org/10.1007/s00138-013-0581-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-013-0581-2