Abstract
This work proposes to learn visual encodings of attention patterns that enables sequential attention for object detection in real world environments. The system embeds a saccadic decision procedure in a cascaded process where visual evidence is probed at informative image locations. It is based on the extraction of information theoretic saliency by determining informative local image descriptors that provide selected foci of interest. The local information in terms of code book vector responses and the geometric information in the shift of attention contribute to recognition states of a Markov decision process. A Q-learner performs then performs search on useful actions towards salient locations, developing a strategy of action sequences directed in state space towards the optimization of information maximization. The method is evaluated in outdoor object recognition and demonstrates efficient performance.
This work is supported by the European Commission funded projects MACS under grant number FP6-004381 and MOBVIS under grant number FP6-511051, and by the FWF Austrian Joint research Project Cognitive Vision under sub-projects S9103-N04 and S9104-N04.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bandera, C., Vico, F.J., Bravo, J.M., Harmon, M.E., Baird III, L.C.: Residual Q-learning applied to visual attention. In: International Conference on Machine Learning, pp. 20–27 (1996)
Deco, G.: The computational neuroscience of visual cognition: Attention, memory and reward. In: Proc. International Workshop on Attention and Performance in Computational Vision, pp. 49–58 (2004)
Fritz, G., Paletta, L., Bischof, H.: Object recognition using local information content. In: Proc. International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK, vol. II, pp. 15–18 (2004)
Fritz, G., Seifert, C., Paletta, L., Bischof, H.: Rapid object recognition from discriminative regions of interest. In: Proc. National Conference on Artificial Intelligence, AAAI 2004, San Jose, CA, pp. 444–449 (2004)
Henderson, J.M.: Human gaze control in real-world scene perception. Trends in Cognitive Sciences 7, 498–504 (2003)
Itti, L., Koch, C.: Computational modeling of visual attention. Nature Reviews Neuroscience 2(3), 194–203 (2001)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
Puterman, M.L.: Markov Decision Processes. John Wiley & Sons, New York (1994)
Rybak, I.A., Gusakova, V.I., Golovan, A.V., Podladchikova, L.N., Shevtsova, N.A.: A model of attention-guided visual perception and recognition. Vision Research 38, 2387–2400 (1998)
Tipper, S.P., Grisson, S., Kessler, K.: Long-term inhibition of return of attention. Psychological Science 14, 19–25–105 (2003)
Watkins, C., Dayan, P.: Q-learning. Machine Learning 8(3,4), 279–292 (1992)
Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 18–32. Springer, Heidelberg (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Paletta, L., Fritz, G., Seifert, C. (2005). Perception-Action Based Object Detection from Local Descriptor Combination and Reinforcement Learning. In: Kalviainen, H., Parkkinen, J., Kaarna, A. (eds) Image Analysis. SCIA 2005. Lecture Notes in Computer Science, vol 3540. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11499145_65
Download citation
DOI: https://doi.org/10.1007/11499145_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26320-3
Online ISBN: 978-3-540-31566-7
eBook Packages: Computer ScienceComputer Science (R0)