Abstract
This paper presents a novel contextual spectral embedding (CSE) framework for human action recognition, which automatically learns the high-level features (motion semantic vocabulary) from a large vocabulary of abundant mid-level features (i.e. visual words). Our novelty is to exploit the inter-video context between mid-level features for spectral embedding, while the context is captured by the Pearson product moment correlation between mid-level features instead of Gaussian function computed over the vectors of point-wise information as mid-level feature representation. Our goal is to embed the mid-level features into a semantic low-dimensional space, and learn a much compact semantic vocabulary upon the CSE framework. Experiments on two action datasets demonstrate that our approach can achieve significantly improved results with respect to the state of the arts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Liu, J., Yang, Y., Shah, M.: Learning semantic visual vocabularies using diffusion distance. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 461–468 (2009)
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos ”in the wild”. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1996–2003 (2009)
Wang, L., Lu, Z., Ip, H.H.S.: Image categorization based on a hierarchical spatial markov model. In: Jiang, X., Petkov, N. (eds.) CAIP 2009. LNCS, vol. 5702, pp. 766–773. Springer, Heidelberg (2009)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 32–36 (2004)
Liu, J., Shah, M.: Learning human actions via information maximization. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8 (2008)
Savarese, S., DelPozo, A., Niebles, J., Fei-Fei, L.: Spatial-temporal correlatons for unsupervised action classification. In: IEEE Workshop on Motion and video Computing, WMVC 2008, pp. 1–8 (2008)
Wong, S.F., Kim, T.K., Cipolla, R.: Learning motion categories using both semantic and structural information. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2007, pp. 1–6 (2007)
Lu, Z., Ip, H.H.S.: Image categorization with spatial mismatch kernels. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 397–404 (2009)
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vision 79, 299–318 (2008)
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Ballan, L., Bertini, M., Del Bimbo, A., Seidenari, L., Serra, G.: Effective codebooks for human action categorization (2009)
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 2, pp. 524–531 (2005)
Lafon, S., Lee, A.: Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 1393–1403 (2006)
Yan, S., Xu, D., Zhang, B., Zhang, H.J., Yang, Q., Lin, S.: Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 40–51 (2007)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: The Tenth IEEE International Conference on Computer Vision (ICCV 2005), pp. 1395–1402 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhao, Q., Lu, Z., Ip, H.H.S. (2010). Action Recognition Based on Learnt Motion Semantic Vocabulary. In: Qiu, G., Lam, K.M., Kiya, H., Xue, XY., Kuo, CC.J., Lew, M.S. (eds) Advances in Multimedia Information Processing - PCM 2010. PCM 2010. Lecture Notes in Computer Science, vol 6297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15702-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-15702-8_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15701-1
Online ISBN: 978-3-642-15702-8
eBook Packages: Computer ScienceComputer Science (R0)