Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Local velocity-adapted motion events for spatio-temporal recognition

Published: 01 December 2007 Publication History

Abstract

In this paper, we address the problem of motion recognition using event-based local motion representations. We assume that similar patterns of motion contain similar events with consistent motion across image sequences. Using this assumption, we formulate the problem of motion recognition as a matching of corresponding events in image sequences. To enable the matching, we present and evaluate a set of motion descriptors that exploit the spatial and the temporal coherence of motion measurements between corresponding events in image sequences. As the motion measurements may depend on the relative motion of the camera, we also present a mechanism for local velocity adaptation of events and evaluate its influence when recognizing image sequences subjected to different camera motions. When recognizing motion patterns, we compare the performance of a nearest neighbor (NN) classifier with the performance of a support vector machine (SVM). We also compare event-based motion representations to motion representations in terms of global histograms. A systematic experimental evaluation on a large video database with human actions demonstrates that (i) local spatio-temporal image descriptors can be defined to carry important information of space-time events for subsequent recognition, and that (ii) local velocity adaptation is an important mechanism in situations when the relative motion between the camera and the interesting events in the scene is unknown. The particular advantage of event-based representations and velocity adaptation is further emphasized when recognizing human actions in unconstrained scenes with complex and non-stationary backgrounds.

References

[1]
Belongie, S., Fowlkes, C., Chung, F. and Malik, J., Spectral partitioning with indefinite kernels using the nyström extension. In: Lecture Notes in Computer Science, vol. 2352. Springer Verlag, Berlin. pp. III:531 ff
[2]
Black, M.J. and Jepson, A.D., Eigentracking: Robust matching and tracking of articulated objects using view-based representation. Int. J. Comput. Vis. v26 i1. 63-84.
[3]
Black, M.J., Yacoob, Y. and Ju, S.X., Recognizing human motion using parameterized models of optical flow. In: Shah, M., Jain, R. (Eds.), Motion-Based Recognition, Kluwer Academic Publishers, Dordrecht, Boston, London. pp. 245-269.
[4]
M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri, Actions as space-time shapes, in: Proc. 10th Int. Conf. on Computer Vision, Beijing, China, 2005, pp. II:1395-II:1402.
[5]
Bobick, A.F. and Davis, J.W., The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. v23 i3. 257-267.
[6]
O. Boiman, M. Irani, Detecting irregularities in images and in video. in: Proc. 10th Int. Conf. on Computer Vision, Beijing, China, 2005, pp. I:462-I:469.
[7]
Chapelle, O., Haffner, P. and Vapnik, V., SVMs for histogram-based image classification. IEEE Trans. Neural Network. v10 i5.
[8]
Chomat, O., Martin, J. and Crowley, J.L., A probabilistic sensor for the perception and recognition of activities. In: Lecture Notes in Computer Science, vol. 1842. Springer Verlag, Berlin. pp. I:487-I:503.
[9]
Cristianini, N. and Taylor, J.S., An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. 2000. Cambridge University Press, Cambridge, UK.
[10]
P. Dollár, V. Rabaud, G. Cottrell, S. Belongie, Behavior recognition via sparse spatio-temporal features, in: VS-PETS, 2005, pp. 65-72.
[11]
Efros, A.A., Berg, A.C., Mori, G. and Malik, J., Recognizing action at a distance. In: Proc. Ninth Int. Conf. on Computer Vision, Nice, France. pp. 726-733.
[12]
Fablet, R. and Bouthemy, P., Motion recognition using nonparametric image motion models estimated from temporal and multiscale co-occurrence statistics. IEEE Trans. Pattern Anal. Mach. Intell. v25 i12. 1619-1624.
[13]
Fergus, R., Perona, P. and Zisserman, A., Object class recognition by unsupervised scale-invariant learning. In: Proc. Computer Vision and Pattern Recognition, Madison, Wisconsin. pp. II:264-II:271.
[14]
Gavrila, D.M., The visual analysis of human movement: a survey. Comput. Vis. Image Und. v73 i1. 82-98.
[15]
J.M. Gryn, R.P. Wildes, J.K. Tsotsos, Detecting motion patterns via direction maps with application to surveillance, in: WACV/MOTION, 2005, pp. 202-209.
[16]
C. Harris, M.J. Stephens, A combined corner and edge detector, in: Alvey Vision Conference, 1988, pp. 147-152.
[17]
Hoey, J. and Little, J.J., Representation and recognition of complex human motion. In: Proc. Computer Vision and Pattern Recognition, Hilton Head, SC. pp. I:752-I:759.
[18]
Jähne, B., Hauíecker, H. and Geiíler, P., Signal processing and pattern recognition. In: Handbook of Computer Vision and Applications, vol. 2. Academic Press.
[19]
Kadir, T. and Brady, M., Saliency, scale and image description. Int. J. Comput. Vis. v45 i2. 83-105.
[20]
Y. Ke, R. Sukthankar, PCA-SIFT: a more disctinctive representation for local image descriptors, Technical Report IRP-TR-03-15, Intel, November 2003.
[21]
Y. Ke, R. Sukthankar, M. Hebert, Efficient visual event detection using volumetric features, in: Proc. 10th Int. Conf. on Computer Vision, Beijing, China, 2005, pp. I:166-I:173.
[22]
Koenderink, J.J. and van Doorn, A.J., Generic neighborhood operators. IEEE Trans. Pattern Anal. Mach. Intell. v14 i6. 597-605.
[23]
Koenderink, J.J. and van Doorn, A.J., Representation of local geometry in the visual system. Biol. Cybern. v55. 367-375.
[24]
I. Laptev, Local Spatio-Temporal Image Features for Motion Interpretation. Ph.D. thesis, Department of Numerical Analysis and Computer Science (NADA), KTH, S-100 44 Stockholm, Sweden, 2004. ISBN 91-7283-793-4.
[25]
I. Laptev, S. Belongie, P. Pérez, J. Wills. Periodic motion detection and segmentation via approximate sequence alignment, in: Proc. 10th Int. Conf. on Computer Vision, Beijing, China, 2005, pp. I:816-I:823.
[26]
Laptev, I. and Lindeberg, T., Space-time interest points. In: Proc. Ninth Int. Conf. on Computer Vision, Nice, France. pp. 432-439.
[27]
I. Laptev, T. Lindeberg, Local descriptors for spatio-temporal recognition, in: First Int. Workshop on Spatial Coherence for Visual Motion Analysis, vol. 3667 of Lecture Notes in Computer Science, Springer Verlag, Berlin, 2004, pp. 91-103.
[28]
I. Laptev, T. Lindeberg, Velocity adaptation of space-time interest points, in: Proc. Int. Conf. on Pattern Recognition, Cambridge, UK, 2004, pp. I:52-I:56.
[29]
Laptev, I. and Lindeberg, T., Velocity-adaptation of spatio-temporal receptive fields for direct recognition of activities: an experimental study. Image Vis. Comput. v22 i2. 105-116.
[30]
B. Leibe, B. Schiele, Interleaved object categorization and segmentation, in: Proc. British Machine Vision Conference, Norwich, GB, 2003.
[31]
O. Linde, T. Lindeberg, Object recognition using composed receptive field histograms of higher dimensionality, in: Proc. Int. Conf. on Pattern Recognition, Cambridge, UK, 2004, pp. II:1-II:6 .
[32]
Lindeberg, T., Feature detection with automatic scale selection. Int. J. Comput. Vis. v30 i2. 77-116.
[33]
Lindeberg, T., Time-recursive velocity-adapted spatio-temporal scale-space filters. In: Lecture Notes in Computer Science, vol. 2350. Springer Verlag, Berlin. pp. I:52-I:67.
[34]
T. Lindeberg, A. Akbarzadeh, I. Laptev, Galilean-corrected spatio-temporal interest operators, in: Proc. 17th Int. Conf. on Pattern Recognition, Cambridge, UK, 2004, pp. I:57-I:62.
[35]
Lindeberg, T. and Gårding, J., Shape-adapted smoothing in estimation of 3-D shape cues from affine deformations of local 2-D brightness structure. Image Vis. Comput. v15 i6. 415-434.
[36]
Lowe, D.G., Object recognition from local scale-invariant features. In: Proc. Seventh Int. Conf. on Computer Vision, Corfu, Greece. pp. 1150-1157.
[37]
Lowe, D.G., Local feature view clustering for 3d object recognition. In: Proc. Computer Vision and Pattern Recognition, Kauai Marriott, Hawaii. pp. I:682-I:688.
[38]
B.D. Lucas, T. Kanade, An iterative image registration technique with an application to stereo vision, in: DARPA Image Understanding Workshop, 1981, pp. 121-130.
[39]
J. Matas, O. Chum, M. Urban, T. Pajdla, Robust wide baseline stereo from maximally stable extremal regions, in: Proc. British Machine Vision Conference, 2002, pp. 384-393.
[40]
Mikolajczyk, K. and Schmid, C., Indexing based on scale invariant interest points. In: Proc. Eighth Int. Conf. on Computer Vision, Vancouver, Canada. pp. I:525-I:531.
[41]
Mikolajczyk, K. and Schmid, C., An affine invariant interest point detector. In: Lecture Notes in Computer Science, vol. 2350. Springer Verlag, Berlin. pp. I:128-I:142.
[42]
K. Mikolajczyk, C. Schmid, A performance evaluation of local descriptors, in: Proc. Computer Vision and Pattern Recognition, 2003, pp. II:257-II:263.
[43]
Nagel, H.H. and Gehrke, A., Spatiotemporal adaptive filtering for estimation and segmentation of optical flow fields. In: Burkhardt, H., Neumann, B. (Eds.), Lecture Notes in Computer Science, vol. 1407. Springer Verlag, Berlin. pp. II:86-II:102.
[44]
J.C. Niebles, H. Wang, F.F. Li, Unsupervised learning of human action categories using spatial-temporal words, in: Proc. British Machine Vision Conference, 2006.
[45]
M.E. Nilsback, B. Caputo, Cue integration through discriminative accumulation, in: Proc. Computer Vision and Pattern Recognition, 2004, pp. II:578-II:585.
[46]
R. Polana, R.C. Nelson, Recognition of motion from temporal texture, in: Proc. Computer Vision and Pattern Recognition, 1992, pp. 129-134.
[47]
Rao, C., Yilmaz, A. and Shah, M., View-invariant representation and recognition of actions. Int. J. Comput. Vis. v50 i2. 203-226.
[48]
Y. Rui, P. Anandan, Segmenting visual actions based on spatio-temporal motion patterns. in: Proc. Computer Vision and Pattern Recognition, vol. I, Hilton Head, SC, 2000, pp. 111-118.
[49]
Schiele, B. and Crowley, J.L., Recognition without correspondence using multidimensional receptive field histograms. Int. J. Comput. Vis. v36 i1. 31-50.
[50]
Schmid, C. and Mohr, R., Local grayvalue invariants for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. v19 i5. 530-535.
[51]
C. Schüldt, I. Laptev, B. Caputo, Recognizing human actions: a local SVM approach, in: Proc. 17th Int. Conf. on Pattern Recognition, Cambridge, UK, 2004, pp. III:32-III:36.
[52]
In: Shah, M., Jain, R. (Eds.), Motion-Based Recognition, Kluwer Academic Publishers, Dordrecht, Boston, London.
[53]
E. Shechtman, M. Irani, Space-time behavior based correlation, in: Proc. Computer Vision and Pattern Recognition, San Diego, CA, 2005, pp. I:405-I:412.
[54]
Sivic, J. and Zisserman, A., Video google: a text retrieval approach to object matching in videos. In: Proc. Ninth Int. Conf. on Computer Vision, Nice, France. pp. 1470-1477.
[55]
Tell, D. and Carlsson, S., Combining topology and appearance for wide baseline matching. In: Lecture Notes in Computer Science, vol. 2350. Springer Verlag, Berlin. pp. I:68-I:83.
[56]
T. Tuytelaars, L.J. Van Gool, Wide baseline stereo matching based on local, affinely invariant regions, in: Proc. British Machine Vision Conference, 2000, pp. 412-425.
[57]
Vapnik, V., Statistical Learning Theory. 1998. Wiley, NY.
[58]
Wallraven, C., Caputo, B. and Graf, A., Recognition with local features: the kernel recipe. In: Proc. Ninth Int. Conf. on Computer Vision, Nice, France. pp. 257-264.
[59]
L. Wolf, A. Shashua, Kernel principal angles for classification machines with applications to image sequence interpretation, in: Proc. Computer Vision and Pattern Recognition, 2003, pp. I:635-I:640.
[60]
Yacoob, Y. and Black, M.J., Parameterized modeling and recognition of activities. Comput. Vis. Image Und. v73 i2. 232-247.
[61]
A. Yilmaz, M. Shah, Recognizing human actions in videos acquired by uncalibrated moving cameras, in: Proc. 10th Int. Conf. on Computer Vision, Beijing, China, 2005, pp. I:150-I:157.
[62]
Zelnik-Manor, L. and Irani, M., Event-based analysis of video. In: Proc. Computer Vision and Pattern Recognition, Kauai Marriott, Hawaii. pp. II:123-II:130.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computer Vision and Image Understanding
Computer Vision and Image Understanding  Volume 108, Issue 3
December, 2007
83 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 December 2007

Author Tags

  1. Action recognition
  2. Learning
  3. Local features
  4. Matching
  5. Motion
  6. Motion descriptors
  7. SVM
  8. Velocity adaptation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2020)From video pornography to cancer cells: a tensor framework for spatiotemporal descriptionMultimedia Tools and Applications10.1007/s11042-020-08642-x79:19-20(13919-13949)Online publication date: 1-May-2020
  • (2018)Spatio-Temporal Scale Selection in Video DataJournal of Mathematical Imaging and Vision10.1007/s10851-017-0766-960:4(525-562)Online publication date: 1-May-2018
  • (2017)Sequential data feature selection for human motion recognition via Markov blanketPattern Recognition Letters10.5555/3063157.306324486:C(18-25)Online publication date: 15-Jan-2017
  • (2017)Semantic Pooling for Complex Event Analysis in Untrimmed VideosIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2016.260890139:8(1617-1632)Online publication date: 29-Jun-2017
  • (2017)Temporal Scale Selection in Time-Causal Scale SpaceJournal of Mathematical Imaging and Vision10.1007/s10851-016-0691-358:1(57-101)Online publication date: 1-May-2017
  • (2016)Action recognition via spatio-temporal local featuresImage and Vision Computing10.1016/j.imavis.2016.02.00650:C(1-13)Online publication date: 1-Jun-2016
  • (2016)View-independent action recognitionMultimedia Tools and Applications10.1007/s11042-015-2606-575:12(6755-6775)Online publication date: 1-Jun-2016
  • (2016)Latent semantic learning with time-series cross correlation analysis for video scene detection and classificationMultimedia Tools and Applications10.1007/s11042-015-2548-y75:20(12919-12940)Online publication date: 1-Oct-2016
  • (2016)Time-Causal and Time-Recursive Spatio-Temporal Receptive FieldsJournal of Mathematical Imaging and Vision10.1007/s10851-015-0613-955:1(50-88)Online publication date: 1-May-2016
  • (2016)A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detectorThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-015-1066-232:3(289-306)Online publication date: 1-Mar-2016
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media