Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

P. Duygulu⁷,
K. Barnard⁷,
J. F. G. de Freitas⁸ &
…
D. A. Forsyth⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2353))

Included in the following conference series:

European Conference on Computer Vision

6766 Accesses
493 Citations

Abstract

We describe a model of object recognition as machine translation. In this model, recognition is a process of annotating image regions with words. Firstly, images are segmented into regions, which are classified into region types using a variety of features. A mapping between region types and keywords supplied with the images, is then learned, using a method based around EM. This process is analogous with learning a lexicon from an aligned bitext. For the implementation we describe, these words are nouns taken from a large vocabulary. On a large test set, the method can predict numerous words with high accuracy. Simple methods identify words that cannot be predicted well. We show how to cluster words that individually are difficult to predict into clusters that can be predicted well — for example, we cannot predict the distinction between train and locomotive using the current set of features, but we can predict the underlying concept. The method is trained on a substantial collection of images. Extensive experimental results illustrate the strengths and weaknesses of the approach.

Download to read the full chapter text

Chapter PDF

Learning Language Models from Images with ReGLL

Statistical Learning Theory in Practice

Multilingual Image Corpus

Keywords

References

K. Barnard, P. Duygulu and D. A. Forsyth. Clustering art. In IEEE Conf. on Computer Vision and Pattern Recognition, II: 434–441, 2001.
Google Scholar
K. Barnard and D. A. Forsyth. Learning the semantics of words and pictures. In Int. Conf. on Computer Vision pages 408–15, 2001.
Google Scholar
P. Brown, S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 32(2):263–311, 1993.
Google Scholar
D.A. Forsyth and J. Ponce. Computer Vision: a modern approach. Prentice-Hall 2001. in preparation.
Google Scholar
D. Jurafsky and J. H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice-Hall, 2000.
Google Scholar
C. D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.
Google Scholar
M. Markkula and E. Sormunen. End-user searching challenges indexing practices in the digital newspaper photo archive. Information retrieval, 1:259–285, 2000.
Article MATH Google Scholar
Y. Mori, H. Takahashi, R. Oka Image-to-word transformation based on dividing and vector quantizing images with words In First International Workshop on Multimedia Intelligent Storage and Retrieval Management (MISRM’99), 1999
Google Scholar
O. Maron. Learning from Ambiguity. PhD thesis, MIT, 1998.
Google Scholar
O. Maron and A. L. Ratan. Multiple-Instance Learning for Natural Scene Classification, In The Fifteenth International Conference on Machine Learning, 1998
Google Scholar
I. Dan Melamed. Empirical Methods for Exploiting Parallel Texts. MIT Press, 2001.
Google Scholar
S. Ornager. View a picture, theoretical image analysis and empirical user studies on indexing and retrieval. Swedis Library Research, 2–3:31–41, 1996.
Google Scholar
J. Shi and J. Malik. Normalised cuts and image segmentation. In IEEE Conf. on Computer Vision and Pattern Recognition, pages 731–737, 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Division, U.C. Berkeley, Berkeley, CA, 94720
P. Duygulu, K. Barnard & D. A. Forsyth
Department of Computer Science, University of British Columbia, Vancouver
J. F. G. de Freitas

Authors

P. Duygulu
View author publications
You can also search for this author in PubMed Google Scholar
K. Barnard
View author publications
You can also search for this author in PubMed Google Scholar
J. F. G. de Freitas
View author publications
You can also search for this author in PubMed Google Scholar
D. A. Forsyth
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centre for Mathematical Sciences, Lund University, Box 118, 22100, Lund, Sweden
Anders Heyden & Gunnar Sparr &
The IT University of Copenhagen, Glentevej 67-69, 2400, Copenhagen, NW, Denmark
Mads Nielsen
University of Copenhagen, Universitetsparken 1, 2100, Copenhagen, Denmark
Peter Johansen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.A. (2002). Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds) Computer Vision — ECCV 2002. ECCV 2002. Lecture Notes in Computer Science, vol 2353. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47979-1_7

Download citation

DOI: https://doi.org/10.1007/3-540-47979-1_7
Published: 29 April 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43748-2
Online ISBN: 978-3-540-47979-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

Abstract

Chapter PDF

Similar content being viewed by others

Learning Language Models from Images with ReGLL

Statistical Learning Theory in Practice

Multilingual Image Corpus

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

Abstract

Chapter PDF

Similar content being viewed by others

Learning Language Models from Images with ReGLL

Statistical Learning Theory in Practice

Multilingual Image Corpus

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation