article

Local velocity-adapted motion events for spatio-temporal recognition

Authors:

Barbara Caputo,

Christian Schüldt,

Tony LindebergAuthors Info & Claims

Computer Vision and Image Understanding, Volume 108, Issue 3

Pages 207 - 229

https://doi.org/10.1016/j.cviu.2006.11.023

Published: 01 December 2007 Publication History

Abstract

In this paper, we address the problem of motion recognition using event-based local motion representations. We assume that similar patterns of motion contain similar events with consistent motion across image sequences. Using this assumption, we formulate the problem of motion recognition as a matching of corresponding events in image sequences. To enable the matching, we present and evaluate a set of motion descriptors that exploit the spatial and the temporal coherence of motion measurements between corresponding events in image sequences. As the motion measurements may depend on the relative motion of the camera, we also present a mechanism for local velocity adaptation of events and evaluate its influence when recognizing image sequences subjected to different camera motions. When recognizing motion patterns, we compare the performance of a nearest neighbor (NN) classifier with the performance of a support vector machine (SVM). We also compare event-based motion representations to motion representations in terms of global histograms. A systematic experimental evaluation on a large video database with human actions demonstrates that (i) local spatio-temporal image descriptors can be defined to carry important information of space-time events for subsequent recognition, and that (ii) local velocity adaptation is an important mechanism in situations when the relative motion between the camera and the interesting events in the scene is unknown. The particular advantage of event-based representations and velocity adaptation is further emphasized when recognizing human actions in unconstrained scenes with complex and non-stationary backgrounds.

References

[1]

Belongie, S., Fowlkes, C., Chung, F. and Malik, J., Spectral partitioning with indefinite kernels using the nyström extension. In: Lecture Notes in Computer Science, vol. 2352. Springer Verlag, Berlin. pp. III:531 ff

Digital Library

[2]

Black, M.J. and Jepson, A.D., Eigentracking: Robust matching and tracking of articulated objects using view-based representation. Int. J. Comput. Vis. v26 i1. 63-84.

Digital Library

[3]

Black, M.J., Yacoob, Y. and Ju, S.X., Recognizing human motion using parameterized models of optical flow. In: Shah, M., Jain, R. (Eds.), Motion-Based Recognition, Kluwer Academic Publishers, Dordrecht, Boston, London. pp. 245-269.

[4]

M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri, Actions as space-time shapes, in: Proc. 10th Int. Conf. on Computer Vision, Beijing, China, 2005, pp. II:1395-II:1402.

Digital Library

[5]

Bobick, A.F. and Davis, J.W., The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. v23 i3. 257-267.

Digital Library

[6]

O. Boiman, M. Irani, Detecting irregularities in images and in video. in: Proc. 10th Int. Conf. on Computer Vision, Beijing, China, 2005, pp. I:462-I:469.

Digital Library

[7]

Chapelle, O., Haffner, P. and Vapnik, V., SVMs for histogram-based image classification. IEEE Trans. Neural Network. v10 i5.

[8]

Chomat, O., Martin, J. and Crowley, J.L., A probabilistic sensor for the perception and recognition of activities. In: Lecture Notes in Computer Science, vol. 1842. Springer Verlag, Berlin. pp. I:487-I:503.

Digital Library

[9]

Cristianini, N. and Taylor, J.S., An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. 2000. Cambridge University Press, Cambridge, UK.

Digital Library

[10]

P. Dollár, V. Rabaud, G. Cottrell, S. Belongie, Behavior recognition via sparse spatio-temporal features, in: VS-PETS, 2005, pp. 65-72.

Digital Library

[11]

Efros, A.A., Berg, A.C., Mori, G. and Malik, J., Recognizing action at a distance. In: Proc. Ninth Int. Conf. on Computer Vision, Nice, France. pp. 726-733.

Digital Library

[12]

Fablet, R. and Bouthemy, P., Motion recognition using nonparametric image motion models estimated from temporal and multiscale co-occurrence statistics. IEEE Trans. Pattern Anal. Mach. Intell. v25 i12. 1619-1624.

Digital Library

[13]

Fergus, R., Perona, P. and Zisserman, A., Object class recognition by unsupervised scale-invariant learning. In: Proc. Computer Vision and Pattern Recognition, Madison, Wisconsin. pp. II:264-II:271.

[14]

Gavrila, D.M., The visual analysis of human movement: a survey. Comput. Vis. Image Und. v73 i1. 82-98.

Digital Library

[15]

J.M. Gryn, R.P. Wildes, J.K. Tsotsos, Detecting motion patterns via direction maps with application to surveillance, in: WACV/MOTION, 2005, pp. 202-209.

Digital Library

[16]

C. Harris, M.J. Stephens, A combined corner and edge detector, in: Alvey Vision Conference, 1988, pp. 147-152.

[17]

Hoey, J. and Little, J.J., Representation and recognition of complex human motion. In: Proc. Computer Vision and Pattern Recognition, Hilton Head, SC. pp. I:752-I:759.

[18]

Jähne, B., Hauíecker, H. and Geiíler, P., Signal processing and pattern recognition. In: Handbook of Computer Vision and Applications, vol. 2. Academic Press.

[19]

Kadir, T. and Brady, M., Saliency, scale and image description. Int. J. Comput. Vis. v45 i2. 83-105.

Digital Library

[20]

Y. Ke, R. Sukthankar, PCA-SIFT: a more disctinctive representation for local image descriptors, Technical Report IRP-TR-03-15, Intel, November 2003.

[21]

Y. Ke, R. Sukthankar, M. Hebert, Efficient visual event detection using volumetric features, in: Proc. 10th Int. Conf. on Computer Vision, Beijing, China, 2005, pp. I:166-I:173.

Digital Library

[22]

Koenderink, J.J. and van Doorn, A.J., Generic neighborhood operators. IEEE Trans. Pattern Anal. Mach. Intell. v14 i6. 597-605.

Digital Library

[23]

Koenderink, J.J. and van Doorn, A.J., Representation of local geometry in the visual system. Biol. Cybern. v55. 367-375.

Digital Library

[24]

I. Laptev, Local Spatio-Temporal Image Features for Motion Interpretation. Ph.D. thesis, Department of Numerical Analysis and Computer Science (NADA), KTH, S-100 44 Stockholm, Sweden, 2004. ISBN 91-7283-793-4.

[25]

I. Laptev, S. Belongie, P. Pérez, J. Wills. Periodic motion detection and segmentation via approximate sequence alignment, in: Proc. 10th Int. Conf. on Computer Vision, Beijing, China, 2005, pp. I:816-I:823.

Digital Library

[26]

Laptev, I. and Lindeberg, T., Space-time interest points. In: Proc. Ninth Int. Conf. on Computer Vision, Nice, France. pp. 432-439.

Digital Library

[27]

I. Laptev, T. Lindeberg, Local descriptors for spatio-temporal recognition, in: First Int. Workshop on Spatial Coherence for Visual Motion Analysis, vol. 3667 of Lecture Notes in Computer Science, Springer Verlag, Berlin, 2004, pp. 91-103.

[28]

I. Laptev, T. Lindeberg, Velocity adaptation of space-time interest points, in: Proc. Int. Conf. on Pattern Recognition, Cambridge, UK, 2004, pp. I:52-I:56.

Digital Library

[29]

Laptev, I. and Lindeberg, T., Velocity-adaptation of spatio-temporal receptive fields for direct recognition of activities: an experimental study. Image Vis. Comput. v22 i2. 105-116.

[30]

B. Leibe, B. Schiele, Interleaved object categorization and segmentation, in: Proc. British Machine Vision Conference, Norwich, GB, 2003.

[31]

O. Linde, T. Lindeberg, Object recognition using composed receptive field histograms of higher dimensionality, in: Proc. Int. Conf. on Pattern Recognition, Cambridge, UK, 2004, pp. II:1-II:6 .

Digital Library

[32]

Lindeberg, T., Feature detection with automatic scale selection. Int. J. Comput. Vis. v30 i2. 77-116.

Digital Library

[33]

Lindeberg, T., Time-recursive velocity-adapted spatio-temporal scale-space filters. In: Lecture Notes in Computer Science, vol. 2350. Springer Verlag, Berlin. pp. I:52-I:67.

Digital Library

[34]

T. Lindeberg, A. Akbarzadeh, I. Laptev, Galilean-corrected spatio-temporal interest operators, in: Proc. 17th Int. Conf. on Pattern Recognition, Cambridge, UK, 2004, pp. I:57-I:62.

Digital Library

[35]

Lindeberg, T. and Gårding, J., Shape-adapted smoothing in estimation of 3-D shape cues from affine deformations of local 2-D brightness structure. Image Vis. Comput. v15 i6. 415-434.

[36]

Lowe, D.G., Object recognition from local scale-invariant features. In: Proc. Seventh Int. Conf. on Computer Vision, Corfu, Greece. pp. 1150-1157.

Digital Library

[37]

Lowe, D.G., Local feature view clustering for 3d object recognition. In: Proc. Computer Vision and Pattern Recognition, Kauai Marriott, Hawaii. pp. I:682-I:688.

[38]

B.D. Lucas, T. Kanade, An iterative image registration technique with an application to stereo vision, in: DARPA Image Understanding Workshop, 1981, pp. 121-130.

Digital Library

[39]

J. Matas, O. Chum, M. Urban, T. Pajdla, Robust wide baseline stereo from maximally stable extremal regions, in: Proc. British Machine Vision Conference, 2002, pp. 384-393.

[40]

Mikolajczyk, K. and Schmid, C., Indexing based on scale invariant interest points. In: Proc. Eighth Int. Conf. on Computer Vision, Vancouver, Canada. pp. I:525-I:531.

[41]

Mikolajczyk, K. and Schmid, C., An affine invariant interest point detector. In: Lecture Notes in Computer Science, vol. 2350. Springer Verlag, Berlin. pp. I:128-I:142.

Digital Library

[42]

K. Mikolajczyk, C. Schmid, A performance evaluation of local descriptors, in: Proc. Computer Vision and Pattern Recognition, 2003, pp. II:257-II:263.

[43]

Nagel, H.H. and Gehrke, A., Spatiotemporal adaptive filtering for estimation and segmentation of optical flow fields. In: Burkhardt, H., Neumann, B. (Eds.), Lecture Notes in Computer Science, vol. 1407. Springer Verlag, Berlin. pp. II:86-II:102.

[44]

J.C. Niebles, H. Wang, F.F. Li, Unsupervised learning of human action categories using spatial-temporal words, in: Proc. British Machine Vision Conference, 2006.

[45]

M.E. Nilsback, B. Caputo, Cue integration through discriminative accumulation, in: Proc. Computer Vision and Pattern Recognition, 2004, pp. II:578-II:585.

[46]

R. Polana, R.C. Nelson, Recognition of motion from temporal texture, in: Proc. Computer Vision and Pattern Recognition, 1992, pp. 129-134.

[47]

Rao, C., Yilmaz, A. and Shah, M., View-invariant representation and recognition of actions. Int. J. Comput. Vis. v50 i2. 203-226.

Digital Library

[48]

Y. Rui, P. Anandan, Segmenting visual actions based on spatio-temporal motion patterns. in: Proc. Computer Vision and Pattern Recognition, vol. I, Hilton Head, SC, 2000, pp. 111-118.

[49]

Schiele, B. and Crowley, J.L., Recognition without correspondence using multidimensional receptive field histograms. Int. J. Comput. Vis. v36 i1. 31-50.

Digital Library

[50]

Schmid, C. and Mohr, R., Local grayvalue invariants for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. v19 i5. 530-535.

Digital Library

[51]

C. Schüldt, I. Laptev, B. Caputo, Recognizing human actions: a local SVM approach, in: Proc. 17th Int. Conf. on Pattern Recognition, Cambridge, UK, 2004, pp. III:32-III:36.

Digital Library

[52]

In: Shah, M., Jain, R. (Eds.), Motion-Based Recognition, Kluwer Academic Publishers, Dordrecht, Boston, London.

[53]

E. Shechtman, M. Irani, Space-time behavior based correlation, in: Proc. Computer Vision and Pattern Recognition, San Diego, CA, 2005, pp. I:405-I:412.

Digital Library

[54]

Sivic, J. and Zisserman, A., Video google: a text retrieval approach to object matching in videos. In: Proc. Ninth Int. Conf. on Computer Vision, Nice, France. pp. 1470-1477.

Digital Library

[55]

Tell, D. and Carlsson, S., Combining topology and appearance for wide baseline matching. In: Lecture Notes in Computer Science, vol. 2350. Springer Verlag, Berlin. pp. I:68-I:83.

Digital Library

[56]

T. Tuytelaars, L.J. Van Gool, Wide baseline stereo matching based on local, affinely invariant regions, in: Proc. British Machine Vision Conference, 2000, pp. 412-425.

[57]

Vapnik, V., Statistical Learning Theory. 1998. Wiley, NY.

[58]

Wallraven, C., Caputo, B. and Graf, A., Recognition with local features: the kernel recipe. In: Proc. Ninth Int. Conf. on Computer Vision, Nice, France. pp. 257-264.

Digital Library

[59]

L. Wolf, A. Shashua, Kernel principal angles for classification machines with applications to image sequence interpretation, in: Proc. Computer Vision and Pattern Recognition, 2003, pp. I:635-I:640.

[60]

Yacoob, Y. and Black, M.J., Parameterized modeling and recognition of activities. Comput. Vis. Image Und. v73 i2. 232-247.

Digital Library

[61]

A. Yilmaz, M. Shah, Recognizing human actions in videos acquired by uncalibrated moving cameras, in: Proc. 10th Int. Conf. on Computer Vision, Beijing, China, 2005, pp. I:150-I:157.

Digital Library

[62]

Zelnik-Manor, L. and Irani, M., Event-based analysis of video. In: Proc. Computer Vision and Pattern Recognition, Kauai Marriott, Hawaii. pp. II:123-II:130.

Cited By

Mota Vde Oliveira HScalzo SDittz DSantos Rdos Santos JAraújo A(2020)From video pornography to cancer cells: a tensor framework for spatiotemporal descriptionMultimedia Tools and Applications10.1007/s11042-020-08642-x79:19-20(13919-13949)Online publication date: 1-May-2020
https://dl.acm.org/doi/10.1007/s11042-020-08642-x
Lindeberg T(2018)Spatio-Temporal Scale Selection in Video DataJournal of Mathematical Imaging and Vision10.1007/s10851-017-0766-960:4(525-562)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1007/s10851-017-0766-9
(2017)Sequential data feature selection for human motion recognition via Markov blanketPattern Recognition Letters10.5555/3063157.306324486:C(18-25)Online publication date: 15-Jan-2017
https://dl.acm.org/doi/10.5555/3063157.3063244
Show More Cited By

Index Terms

Local velocity-adapted motion events for spatio-temporal recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics
    1. Animation
      1. Motion capture
      2. Motion processing

Recommendations

Describing motion for recognition
ISCV '95: Proceedings of the International Symposium on Computer Vision

Our goal is to describe motion of a moving human figure in order to recognize individuals by variation in the characteristics of the motion description. We begin with a short sequence of images of a moving figure, taken by a static camera, and derive ...
A Factorization-Based Approach for Articulated Nonrigid Shape, Motion and Kinematic Chain Recovery From Video

Recovering articulated shape and motion, especially human body motion, from video is a challenging problem with a wide range of applications in medical study, sport analysis and animation, etc. Previous work on articulated motion recovery generally ...
Action recognition using spatio-temporal differential motion
2017 IEEE International Conference on Image Processing (ICIP)
This paper presents human action recognition using spatio-temporal differential motion maps. The concept of differential motion in space and time helps in overcoming several challenges in action recognition such as camera motion and multiple actions in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computer Vision and Image Understanding

Computer Vision and Image Understanding Volume 108, Issue 3

December, 2007

83 pages

ISSN:1077-3142

Issue’s Table of Contents

Copyright © Elsevier Inc. © 2007.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 December 2007

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mota Vde Oliveira HScalzo SDittz DSantos Rdos Santos JAraújo A(2020)From video pornography to cancer cells: a tensor framework for spatiotemporal descriptionMultimedia Tools and Applications10.1007/s11042-020-08642-x79:19-20(13919-13949)Online publication date: 1-May-2020
https://dl.acm.org/doi/10.1007/s11042-020-08642-x
Lindeberg T(2018)Spatio-Temporal Scale Selection in Video DataJournal of Mathematical Imaging and Vision10.1007/s10851-017-0766-960:4(525-562)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1007/s10851-017-0766-9
(2017)Sequential data feature selection for human motion recognition via Markov blanketPattern Recognition Letters10.5555/3063157.306324486:C(18-25)Online publication date: 15-Jan-2017
https://dl.acm.org/doi/10.5555/3063157.3063244
Chang XYu YYang YXing E(2017)Semantic Pooling for Complex Event Analysis in Untrimmed VideosIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2016.260890139:8(1617-1632)Online publication date: 29-Jun-2017
https://dl.acm.org/doi/10.1109/TPAMI.2016.2608901
Lindeberg T(2017)Temporal Scale Selection in Time-Causal Scale SpaceJournal of Mathematical Imaging and Vision10.1007/s10851-016-0691-358:1(57-101)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1007/s10851-016-0691-3
Zhen XShao L(2016)Action recognition via spatio-temporal local featuresImage and Vision Computing10.1016/j.imavis.2016.02.00650:C(1-13)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1016/j.imavis.2016.02.006
Hashemi SRahmati M(2016)View-independent action recognitionMultimedia Tools and Applications10.1007/s11042-015-2606-575:12(6755-6775)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1007/s11042-015-2606-5
Cheng SSu JHsiao KRashvand H(2016)Latent semantic learning with time-series cross correlation analysis for video scene detection and classificationMultimedia Tools and Applications10.1007/s11042-015-2548-y75:20(12919-12940)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1007/s11042-015-2548-y
Lindeberg T(2016)Time-Causal and Time-Recursive Spatio-Temporal Receptive FieldsJournal of Mathematical Imaging and Vision10.1007/s10851-015-0613-955:1(50-88)Online publication date: 1-May-2016
https://dl.acm.org/doi/10.1007/s10851-015-0613-9
Das Dawn DShaikh S(2016)A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detectorThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-015-1066-232:3(289-306)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1007/s00371-015-1066-2
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents