Abstract
This paper goes beyond recognizing human actions from a fixed view and focuses on action recognition from an arbitrary view. A novel learning algorithm, called latent kernelized structural SVM, is proposed for the view-invariant action recognition, which extends the kernelized structural SVM framework to include latent variables. Due to the changing and frequently unknown positions of the camera, we regard the view label of action as a latent variable and implicitly infer it during both learning and inference. Motivated by the geometric correlation between different views and semantic correlation between different action classes, we additionally propose a mid-level correlation feature which describes an action video by a set of decision values from the pre-learned classifiers of all the action classes from all the views. Each decision value captures both geometric and semantic correlations between the action video and the corresponding action class from the corresponding view. After that, we combine the low-level visual cue, mid-level correlation description, and high-level label information into a novel nonlinear kernel under the latent kernelized structural SVM framework. Extensive experiments on multi-view IXMAS and MuHAVi action datasets demonstrate that our method generally achieves higher recognition accuracy than other state-of-the-art methods.
Chapter PDF
Similar content being viewed by others
Keywords
References
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. PAMI 29, 2247–2253 (2007)
Yilmaz, A., Shah, M.: Actions sketch: a novel action representation. In: CVPR (2005)
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS PETS (2005)
Niebles, J.C., Wang, H., Fei-fei, L.: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79, 299–318 (2008)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: ICPR (2004)
Yilmaz, A., Shah, M.: Recognizing human actions in videos acquired by uncalibrated moving cameras. In: ICCV (2005)
Shen, Y., Foroosh, H.: View-invariant action recognition using fundamental ratios. In: CVPR (2008)
Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3d exemplars. In: ICCV (2007)
Yan, P., Khan, S.M., Shah, M.: Learning 4d action feature models for arbitrary view action recognition. In: CVPR (2008)
Junejo, I.N., Dexter, E., Laptev, I., Perez, P.: View-independent action recognition from temporal self-similarities. PAMI 33, 172–185 (2011)
Lewandowski, M., Makris, D., Nebel, J.C.: View and Style-Independent Action Manifolds for Human Activity Recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 547–560. Springer, Heidelberg (2010)
Liu, J., Shah, M., Kuipers, B., Savarese, S.: Cross-view action recognition via view knowledge transfer. In: CVPR (2011)
Farhadi, A., Tabrizi, M.K.: Learning to Recognize Activities from the Wrong View Point. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 154–166. Springer, Heidelberg (2008)
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML (2004)
Yu, C.N.J., Joachims, T.: Learning structural svms with latent variables. In: ICML (2009)
Yu, C.N.J., Joachims, T.: Training structural svms with kernels using sampled cuts. In: ACM KDD (2008)
Wang, Y., Mori, G.: A Discriminative Latent Model of Object Classes and Attributes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 155–168. Springer, Heidelberg (2010)
Wang, Y., Mori, G.: A discriminative latent model of image region and object tag correspondence. In: NIPS (2010)
Yang, W., Wang, Y., Mori, G.: Recognizing human actions from still images with latent poses. In: CVPR (2010)
Lan, T., Wang, Y., Yang, W., Mori, G.: Beyond actions: discriminative models for contextual group activities. In: NIPS (2010)
Artieres, T., Do, T.M.T.: Large margin training for hidden markov models with partially observed states. In: ICML (2009)
Zien, A., Ong, C.S.: Multiclass multiple kernel learning. In: ICML (2007)
Wu, X., Xu, D., Duan, L., Luo, J.: Action recognition using context and appearance distribution features. In: CVPR (2011)
Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Xu, D., Chang, S.F.: Video event recognition using kernel methods with multilevel temporal alignment. PAMI 30, 1985–1997 (2008)
Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR (2011)
Parikh, D., Grauman, K.: Relative attributes. In: ICCV (2011)
Singh, S., Velastin, S., Ragheb, H.: Muhavi: a multicamera human action video dataset for the evaluation of action recognition methods. In: AVSS (2010)
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (2001)
Weinland, D., Ozuysal, M., Fua, P.: Making action recognition robust to occlusions and viewpoint changes. In: ECCV (2010)
Liu, J., Shah, M.: Learning human actions via information maximization. In: CVPR (2008)
Reddy, K., Liu, J., Shah, M.: Incremental action recognition using feature-tree. In: ICCV (2009)
Kaaniche, M.B., Bremond, F.: Gesture recognition by learning local motion signatures. In: CVPR (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, X., Jia, Y. (2012). View-Invariant Action Recognition Using Latent Kernelized Structural SVM. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7576. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33715-4_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-33715-4_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33714-7
Online ISBN: 978-3-642-33715-4
eBook Packages: Computer ScienceComputer Science (R0)