Abstract
Classifying realistic human actions in video remains challenging for existing intro-variability and inter-ambiguity in action classes. Recently, Spatial-Temporal Interest Point (STIP) based local features have shown great promise in complex action analysis. However, these methods have the limitation that they typically focus on Bag-of-Words (BoW) algorithm, which can hardly discriminate actions’ ambiguity due to ignoring of spatial-temporal occurrence relations of visual words. In this paper, we propose a new model to capture this contextual relationship in terms of pairwise features’ co-occurrence. Normalized Google-Like Distance (NGLD) is proposed to numerically measuring this co-occurrence, due to its effectiveness in semantic correlation analysis. All pairwise distances compose a NGLD correlogram and its normalized form is incorporated into the final action representation. It is proved a much richer descriptor by observably reducing action ambiguity in experiments, conducted on WEIZMANN dataset and the more challenging UCF sports. Results also demonstrate the proposed model is more effective and robust than BoW on different setups.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Yilmaz, A., Shah, M.: Actions Sketch: A Novel Action Representation. In: CVPR, pp. 984–989 (2005)
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing Action at a Distance. In: ICCV, pp. 726–733 (2003)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as Space-Time Shapes. In: ICCV, pp. 1395–1402 (2005)
Bregonzio, M., Gong, S.G., Xiang, T.: Recognising Action as Clouds of Space-Time Interest Points. In: CVPR, pp. 1948–1955 (2009)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR, pp. 32–36 (2004)
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS, pp. 65–72 (2005)
Scovanner, P., Ali, S., Shah, M.: A 3-Dimensional SIFT Descriptor and its Application to Action Recognition. In: ACM Conf. Multimedia, pp. 357–360 (2007)
Niebles, J.C., Wang, H.C., Fei-Fei, L.: Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words. IJCV 79, 299–318 (2008)
Kläser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC, pp. 995–1004 (2008)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object Detection with Discriminatively Trained Part Based Models. PAMI 32, 1627–1645 (2010)
Wang, H., Ullah, M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC, pp. 124.1–124.11 (2009)
Cilibrasi, R.L., Vitanyi, P.M.: The google similarity distance. IEEE Transctions on Knowledge and Data Engineering 19(3), 370–383 (2007)
Savarese, S., DelPozo, A., Niebles, J.C., Fei-Fei, L.: Spatial-Temporal correlatons for unsupervised action classification. In: WMVC, pp. 1–8 (2008)
Kovashka, A., Grauman, K.: Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition. In: CVPR, pp. 2046–2053 (2010)
Banerjee, P., Nevatia, R.: Learning Neighborhood Co-occurrence Statistics of Sparse Features for Human Activity Recognition. In: AVSS, pp. 212–217 (2011)
Rodriguez, M.D., Ahmed, J., Mubarak, S.: Action MACH: A Spatio-temporal Maximum Average Correlation Height Filter for Action Recognition. In: CVPR, pp. 1–8 (2008)
Danielsson, O., Carlsson, S., Sullivan, J.: Automatic learning and extraction of multi-local features. In: ICCV, pp. 917–924 (2009)
Liu, D., Hua, G., Viola, P., Chen, T.: Integrated feature selection and higherorder spatial feature extraction for object categorization. In: CVPR, pp. 1–8 (2008)
Yeffet, L., Wolf, L.: Local trinary patterns for human action recognition. In: ICCV, pp. 492–497 (2009)
Haralick, R.M.: Statistical and structural approaches to texture. Proceedings of the IEEE 67(5), 786–804 (1979)
Savarese, S., Winn, J., Criminisi, A.: Discriminative object class models of appearance and shape by correlatons. In: CVPR, pp. 2033–2040 (2006)
Sapp, B., Chaudhry, R., Yu, X., Singh, G., Perera, I., Ferraro, F., Tzoukermann, E., Kosecka, J., Neumann, J.: Recognizing Manipulation Actions in Arts and Crafts Shows using Domain-Specific Visual and Textual Cues. In: ICCV Workshops, pp. 1554–1561 (2011)
Edelman, S.: Representation and recognition in vision. MIT Press (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sun, Q., Liu, H. (2013). Action Disambiguation Analysis Using Normalized Google-Like Distance Correlogram. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds) Computer Vision – ACCV 2012. ACCV 2012. Lecture Notes in Computer Science, vol 7726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37431-9_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-37431-9_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37430-2
Online ISBN: 978-3-642-37431-9
eBook Packages: Computer ScienceComputer Science (R0)