Beyond Tracking: Modelling Activity and Understanding Behaviour

Tao Xiang¹ &
Shaogang Gong¹

728 Accesses
Explore all metrics

Abstract

In this work, we present a unified bottom-up and top-down automatic model selection based approach for modelling complex activities of multiple objects in cluttered scenes. An activity of multiple objects is represented based on discrete scene events and their behaviours are modelled by reasoning about the temporal and causal correlations among different events. This is significantly different from the majority of the existing techniques that are centred on object tracking followed by trajectory matching. In our approach, object-independent events are detected and classified by unsupervised clustering using Expectation-Maximisation (EM) and classified using automatic model selection based on Schwarz's Bayesian Information Criterion (BIC). Dynamic Probabilistic Networks (DPNs) are formulated for modelling the temporal and causal correlations among discrete events for robust and holistic scene-level behaviour interpretation. In particular, we developed a Dynamically Multi-Linked Hidden Markov Model (DML-HMM) based on the discovery of salient dynamic interlinks among multiple temporal processes corresponding to multiple event classes. A DML-HMM is built using BIC based factorisation resulting in its topology being intrinsically determined by the underlying causality and temporal order among events. Extensive experiments are conducted on modelling activities captured in different indoor and outdoor scenes. Our experimental results demonstrate that the performance of a DML-HMM on modelling group activities in a noisy and cluttered scene is superior compared to those of other comparable dynamic probabilistic networks including a Multi-Observation Hidden Markov Model (MOHMM), a Parallel Hidden Markov Model (PaHMM) and a Coupled Hidden Markov Model (CHMM).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Akaike, H. 1973. Information theory and an extension of the maximum likelihood principle. In 2nd International Symposium on Information Theory, pp. 267–281.
Babaguchi, N., Kawai, Y., and Kitahashi, T. 2002. Event based indexing of broadcasting sports video by intermodal collaboration. IEEE Transactions on Multimedia, 4(1):68–75.
Article Google Scholar
Baum, L.E. and Petrie, T. 1996. Statistical inference for probabilistic functions of finite state markov chains. Ann. Math. Stat., 37:1554–1563.
MathSciNet Google Scholar
Biernacki, C., Celeux, G., and Govaert, G. 2000. Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7):719–725.
Article Google Scholar
Bishop, C. 1995. Neural Networks for Pattern Recognition. Cambridge University Press.
Bobick, A. and Wilson, A. 1997. A state-based approach to the representation and recognition of gesture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(12):1325–1337.
Article Google Scholar
Bobick, A.F. and Davis, J.W. 2001. The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3):257–267.
Article Google Scholar
Brand, M. and Kettnaker, V. 2000. Discovery and segmentation of activities in video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):844–851.
Article Google Scholar
Brand, M., Oliver, N., and Pentland, A. 1996. Coupled hidden markov models for complex action recognition. In IEEE Conference on Computer Vision and Pattern Recognition, Puerto Rico, pp. 994–999.
Bregler, C. 1997. Learning and recognizing human dynamics in video sequences. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 568–575.
Buxton, H. and Gong, S. 1995. Visual surveillance in a dynamic and uncertain world. Artificial Intelligence, 78:431–459.
Article Google Scholar
Chomat, O., Martin, J., and Crowley, J. 2000. A probabilistic sensor for the perception and the recognition of activities. In European Conference on Computer Vision, pp. 487–503.
Figueiredo, M. and Jain, A.K. 2002. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3):381–396.
Article Google Scholar
Forney, G.D. 1973. The viterbi algorithm. Proceedings of the IEEE, 61:268–278.
MathSciNet Google Scholar
Friedman, N., Murphy, K., and Russell, S. 1998. Learning the structure of dynamic probabilistic networks. In Uncertainty in AI, pp. 139–147.
Gath, I. and Geva, B. 1989. Unsupervised optimal fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7):773–781.
Article Google Scholar
Ghahramani, Z. 1998. Learning dynamic bayesian networks. In Adaptive Processing of Sequences and Data Structures. Lecture Notes in AI, pp. 168–197.
Gong, S. and Buxton, H. 1992. On the visual expectations of moving objects: A probabilistic approach with augmented hidden markov models. In European Conference on Artificial Intelligence, Vienna, pp. 781–786.
Gong, S., Ng, J., and Sherrah, J. 2002. On the semantics of visual behaviour, structured events and trajectories of human action. Image Vision Computing, 20(12):873–888.
Google Scholar
Gong, S., Walter, M., and Psarrou, A. 1999. Recognition of temporal structures: Learning prior and propagating observation augmented densities via hidden markov states. In IEEE International Conference on Computer Vision, Corfu, pp. 157–162.
Gong, S. and Xiang, T. 2003. Recognition of group activities using dynamic probabilistic networks. In IEEE International Conference on Computer Vision, pp. 742–749.
Greenspan, H., Goldberger, J., and Mayer, A. 2004. Probabilistic space-time video modelling via piecewise GMM. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(3):384–396.
Article Google Scholar
Haritaoglu, I., Harwood, D., and Davis, L.S. 2000. w⁴: Real-time surveillance of people and their activities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):809–830.
Article Google Scholar
Heckerman, D. 1995. A tutorial on learning with Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research.
Hongeng, S. and Nevatia, R. 2001. Multi-agent event recognition. In IEEE International Conference on Computer Vision, pp. 80–86.
Hung, H. and Gong, S. 2004. Quantifying temporal saliency. In British Machine Vision Conference, pp. 727–736.
Intille, S. and Bobick, A. 1998. Representation and visual recognition of complex multi-agent actions using Belief networks. In ECCV Workshop on Perception of Human Action, Freiburg, Germany.
Intille, S., Davis, J., and Bobick, A. 1997. Real-time closed-world tracking. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 697–703.
Johnson, N., Galata, A., and Hogg, D. 1998. The acquisition and use of interaction behaviour models. In IEEE Conference on Computer Vision and Pattern Recognition, Santa Barbara, USA, pp. 866–871.
Kass, R. and Raftery, A. 1995. Bayes factors. Journal of the American Statistical Association, 90:377–395.
Google Scholar
Rabiner, L.R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286.
Article Google Scholar
McKenna, S., Jabri, S., Duric, Z., Rosenfeld, A., and Wechsler, H. 2000. Tracking group of people. Computer Vision and Image Understanding, 80:42–56.
Article Google Scholar
Mclachlan, G. and Peel, D. 1997. Finite Mixture Models. John Wiley & Sons.
Medioni, G., Cohen, I., Bremond, F., Hongeng, S., and Nevatia, R. 2001. Event detection and analysis from video streams. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(8):873–889.
Article Google Scholar
Ng, J. and Gong, S. 2001. Learning pixel-wise signal energy for understanding semantics. In British Machine Vision Conference, pp. 695–704.
Oliver, N., Rosario, B., and Pentland, A. 2000. A bayesian computer vision system for modelling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):831–843.
Article Google Scholar
Pavlovic, V., Rehg, J.M., Cham, T., and Murphy, K.P. 1999. A dynamic bayesian network approach to figure tracking using learned dynamic models. In IEEE International Conference on Computer Vision, pp. 94–101.
Piater, J.H. and Crowley, J.L. 2001. Multi-modal tracking of interacting targets using gaussian approximation. In Proceedings of 2nd IEEE Workshop on Performance Evaluation of Tracking and Surveillance, pp. 141–147.
Raftery, A. 1995. Bayes model selection in social research. Sociological Methodology, 90:181–196.
Google Scholar
Rao, C., Yilmaz, A., and Shah, M. 2002. View-invariant representation and recognition of actions. International Journal of Computer Vision, 50:203–226.
Article Google Scholar
Rissanen, J. 1989. Stochastic Complexity in Statistical Inquiry. World Scentific.
Roberts, S. 1997. Parametric and non-parametric unsupervised cluster analysis. Pattern Recognition, 30(2):261–272.
Article Google Scholar
Roberts, S., Husmeier, D., Rezek, I., and Penny, W. 1998. Bayesian approaches to Gaussian mixture modelling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1133–1142.
Article Google Scholar
Schwarz, G. 1978. Estimating the dimension of a model. Annals of Statistics, 6:461–464.
MATH MathSciNet Google Scholar
Sherrah, J. and Gong, S. 2000. VIGOUR: A system for tracking and recognition of multiple people and their activities. In International Conference on Pattern Recognition, Barcelona, pp. 179–182.
Sherrah, J. and Gong, S. 2001. Automated detection of localised visual events over varying temporal scales. In Proc. European Workshop on Advanced Video-based Surveillance System.
Smyth, P. 2000. Model selection for probabilistic clustering using cross-validated likelihood. Statistics and Computing, 10:63–72.
Article Google Scholar
Stauffer, C. and Grimson, W. 2000. Learning patterns of activity using real-time tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):747–758.
Article Google Scholar
Vogler, C. and Metaxas, D. 2001. A framework for recognizing the simultaneous aspects of american sign language. Computer Vision and Image Understanding, 81:358–384.
Article Google Scholar
Wada, T. and Matsuyama, T. 2000. Multiobject behavior recognition by event driven selective attention method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):873–887.
Article Google Scholar
Xiang, T. and Gong, S. 2003. Discovering bayesian causality among visual events in a complex outdoor scene. In IEEE International Conference on Advanced Video- and Signal-based Surveillance, pp. 177–182.
Xiang, T., Gong, S., and Parkinson, D. 2002. Autonomous visual events detection and classification without explicit object-centred segmentation and tracking. In British Machine Vision Conference, pp. 233–242.

Download references

Author information

Authors and Affiliations

Department of Computer Science, Queen Mary, University of London, E1 4NS, UK
Tao Xiang & Shaogang Gong

Authors

Tao Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Shaogang Gong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Xiang.

Additional information

First online version published in February, 2006

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiang, T., Gong, S. Beyond Tracking: Modelling Activity and Understanding Behaviour. Int J Comput Vision 67, 21–51 (2006). https://doi.org/10.1007/s11263-006-4329-6

Download citation

Received: 11 May 2004
Revised: 28 June 2005
Accepted: 27 July 2005
Issue Date: April 2006
DOI: https://doi.org/10.1007/s11263-006-4329-6

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised Group Activity Detection by Hierarchical Dirichlet Processes

Gate and common pathway detection in crowd scenes and anomaly detection using motion units and LSTM predictive models

Discovering activity patterns in office environment using a network of low-resolution visual sensors

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Beyond Tracking: Modelling Activity and Understanding Behaviour

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised Group Activity Detection by Hierarchical Dirichlet Processes

Gate and common pathway detection in crowd scenes and anomaly detection using motion units and LSTM predictive models

Discovering activity patterns in office environment using a network of low-resolution visual sensors

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now