Article

Visual language modeling for image classification

Authors:

Nenghai YuAuthors Info & Claims

MIR '07: Proceedings of the international workshop on Workshop on multimedia information retrieval

Pages 115 - 124

https://doi.org/10.1145/1290082.1290101

Published: 24 September 2007 Publication History

Abstract

Although it has been studied for many years, image classification is still a challenging problem. In this paper, we propose a visual language modeling method for content-based image classification. It transforms each image into a matrix of visual words, and assumes that each visual word is conditionally dependent on its neighbors. For each image category, a visual language model is constructed using a set of training images, which captures both the co-occurrence and proximity information of visual words. According to how many neighbors are taken in consideration, three kinds of language models can be trained, including unigram, bigram and trigram, each of which corresponds to a different level of model complexity. Given a test image, its category is determined by estimating how likely it is generated under a specific category. Compared with traditional methods that are based on bag-of-words models, the proposed method can utilize the spatial correlation of visual words effectively in image classification. In addition, we propose to use the absent words, which refer to those appearing frequently in a category but not in the target image, to help image classification. Experimental results show that our method can achieve comparable accuracy while performing classification much more quickly.

References

[1]

Bahl, L. R., Jelinek, F., and Mercer, R. L. A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1983.

Digital Library

[2]

Brown, P. F., Cocke, J., DellaPietra, S. A., Mercer, R. L. and Roossin, P. S. A statistical approach to machine translation. Computational Linguistics, 1990.

Digital Library

[3]

Mays, E., Damerau, F. J. and Mercer, R. L. Context-based spelling correction. IBM Natural Language ITL, 1990.

[4]

Chatterjee, S., Hadi, A. and Price, B. Simple Linear Regression. Regression Analysis by Example, 3rd ed. New York: Wiley, 2000.

[5]

Sivic, J., Russell, B., Efros, A., Zisserman, A. and Freeman, W. Discovering object categories in image collections. Technical Report A. I. Memo 2005--005, MIT, 2005.

[6]

Lowe, D. Object recognition from local scale-invariant features. In Proceedings of the International Conference on Computer Vision (ICCV'99), 1999, 1150--1157.

Digital Library

[7]

Fergus, R., Perona, P. and Zisserman, A. Object class recognition by unsupervised scale-invariant learning. In Proceedings of the Computer Vision and Pattern Recognition (CVPR'03), 2003.

[8]

Li, F. F., Fergus, R., and Perona, P. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. In IEEE CVPR Workshop of Generative Model Based Vision, 2004.

Digital Library

[9]

Csurka, G., Bray, C., Dance, C. and Fan, L. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, (ECCV'04), 2004, 1--22.

[10]

Hofmann, T. Probabilistic latent semantic indexing. In Proc. ACM SIGIR (SIGIR'99), ACM Press, 1999.

Digital Library

[11]

Blei, D., Ng, A. and Jordan, M. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, Jan 2003.

Digital Library

[12]

Fergus, R., Li, F. F., Perona, P. and Zisserman, A. Learning object categories from Google's image search. In Proc. Tenth IEEE International Conference on Computer Vision, (ICCV'05), 2005.

Digital Library

[13]

Maree, R., Geurts, P., Piater, J. and Wehenkel, L. Random Subwindows for Robust Image Classification. In Proceedings of the Computer Vision and Pattern Recognition (CVPR'05), 2005.

Digital Library

[14]

Wang, B., Li, Z. W., Li, M. J. and Ma, W. Y. Large-Scale Duplicate Detection for Web Image Search. In Proceedings of IEEE International Conference on Multimedia & Expo (ICME'06), 2006.

[15]

Katz, S. M. Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(3):400--401, 1987.

[16]

Matas, J., Chum, O., Urban, M. and Pajdla, T. Robust wide baseline stereo from maximally stable extremal regions. In Proceedings of The British Machine Vision Conference (BMVC'02), 2002, 384--393.

[17]

Mikolajczyk, K. and Schmid, C. An affine invariant interest point detector. In Proceedings of European Conference on Computer Vision (ECCV'02), Springer-Verlag, 2002.

Digital Library

[18]

Zheng, Q., Wang, W. and Gao, W. Effective and efficient object-based image retrieval using visual phrases. In Proc. of the 14th Annual ACM international Conference on Multimedia, (MM '06), 2006.

Digital Library

[19]

Otluman, H. and Aboulnasr, T. Low Complexity 2-d Hidden Markov Model for Face Recognition. In Proceedings of International Symposium on Computer Architecture. (ISCAS'00), 2000.

[20]

Vailaya, A., Jain, A. K. and Zhang, H. J. On image classification: City images vs. landscapes. Pattern Recognition, Vol. 31, pp. 1921--1936, 1998.

[21]

Maron, O. and Lozano-Perez, T. A framework for multiple-instance learning. In M.I. Jordan, M.J. Kearns, and S.A. Solla, Eds. Advances in Neural Information Processing Systems 10, Cambridge, MA: MIT Press, pp.570--576, 1998

Digital Library

[22]

Bi, J., Chen, Y. and Wang, J.Z. A Sparse Support Vector Machine Approach to Region-Based Image Categorization. In Proceedings of the Computer Vision and Pattern Recognition (CVPR'05), 2005.

Digital Library

[23]

Quelhas, P., Monay, F., Odobez, J., Gatica-Perez, D., Tuytelaars, T. and Gool, L. Modeling Scenes with Local Descriptors and Latent Aspects. In Proc. Tenth IEEE International Conference on Computer Vision, (ICCV'05), 2005.

Digital Library

[24]

Szummer, M. and Picard, R. Indoor-Outdoor Image Classification. In Proc. IEEE Workshop on Content-Based Access of Image and Video Databases, 1998, 42--51.

Digital Library

[25]

Gorkani, M. M. and Picard, R. W. Texture orientation for sorting photos .at a glance'. In Proc. 12th Int. Conf. on Pattern Recognition (ICPR'94), 1994, 459--464.

[26]

Wang, J., Li, J. and Wiederhold, G. SIMPLIcity: Semantics-sensitive integrated matching for picture libraries. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(9):947--963.

Digital Library

[27]

Peng, F. and Schuurmans, D. Combining Naive Bayes and n-Gram Language Models for Text Classification. In Proc. of The 25th European Conference on Information Retrieval Research (ECIR'03), 2003.

Digital Library

[28]

Clarkson, P. R. and Rosenfeld, R. Statistical Language Modeling Using the CMU-Cambridge Toolkit. In Proceedings ESCA Eurospeech, 1997.

[29]

Li, J. and Wang, J. Z. Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach. IEEE Trans. Pattern Anal. Mach. Intell. 25, 9 (Sep. 2003), 1075--1088. 2003.

Digital Library

Cited By

Levine DRizvi SLévy SPallikkavaliyaveetil NZhang DChen XGhadermarzi SWu RZheng ZVrkic IZhong ARaskin DHan IDe Oliveira Fonseca ACaro JKarbasi ADhodapkar RVan Dijk DSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Cell2SentenceProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693159(27299-27325)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693159
(2018)An improved image classification based on K-means clustering and BoW modelInternational Journal of Grid and Utility Computing10.1504/IJGUC.2018.0902259:1(37-42)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1504/IJGUC.2018.090225
Khwildi RZaid A(2018)A New Retrieval System Based on Low Dynamic Range Expansion and SIFT Descriptor2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)10.1109/MMSP.2018.8547089(1-6)Online publication date: Aug-2018
https://doi.org/10.1109/MMSP.2018.8547089
Show More Cited By

Index Terms

Visual language modeling for image classification
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
        Scene understanding
  2. Machine learning

Recommendations

Scale-invariant visual language modeling for object categorization
Special issue on integration of context and content

In recent years, "bag-of-words" models, which treat an image as a collection of unordered visual words, have been widely applied in the multimedia and computer vision fields. However, their ignorance of the spatial structure among visual words makes ...
High-Order Topology Modeling of Visual Words for Image Classification
Modeling relationship between visual words in feature encoding is important in image classification. Recent methods consider this relationship in either image or feature space, and most of them incorporate only pairwise relationship (between visual words)...
Generic image classification using visual knowledge on the web
MULTIMEDIA '03: Proceedings of the eleventh ACM international conference on Multimedia

In this paper, we describe a generic image classification system with an automatic knowledge acquisition mechanism from the World-Wide Web. Due to the recent spread of digital imaging devices, the demand for image recognition of various kinds of real ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MIR '07: Proceedings of the international workshop on Workshop on multimedia information retrieval

September 2007

343 pages

ISBN:9781595937780

DOI:10.1145/1290082

General Chairs:
James Z. Wang
The Pennsylvania State University, USA
,
Nozha Boujemaa
INRIA Rocquencourt, France
,
Program Chairs:
Alberto Del Bimbo
University of Florence, Italy
,
Jia Li
The Pennsylvania State University, USA

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 September 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

MM07

Sponsor:

MM07: The 15th ACM International Conference on Multimedia 2007

September 24 - 29, 2007

Bavaria, Augsburg, Germany

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

54
Total Citations
View Citations
1,301
Total Downloads

Downloads (Last 12 months)69
Downloads (Last 6 weeks)12

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Levine DRizvi SLévy SPallikkavaliyaveetil NZhang DChen XGhadermarzi SWu RZheng ZVrkic IZhong ARaskin DHan IDe Oliveira Fonseca ACaro JKarbasi ADhodapkar RVan Dijk DSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Cell2SentenceProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693159(27299-27325)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693159
(2018)An improved image classification based on K-means clustering and BoW modelInternational Journal of Grid and Utility Computing10.1504/IJGUC.2018.0902259:1(37-42)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1504/IJGUC.2018.090225
Khwildi RZaid A(2018)A New Retrieval System Based on Low Dynamic Range Expansion and SIFT Descriptor2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)10.1109/MMSP.2018.8547089(1-6)Online publication date: Aug-2018
https://doi.org/10.1109/MMSP.2018.8547089
Ayyachamy S(2017)Analysis and Comparison of Developed 2D Medical Image Database Design using Registration Scheme, Retrieval Scheme, and Bag-of-Visual-WordsMedical Imaging10.4018/978-1-5225-0571-6.ch058(1394-1413)Online publication date: 2017
https://doi.org/10.4018/978-1-5225-0571-6.ch058
Wei HGao G(2017)Visual language model for keyword spotting on historical mongolian document images2017 29th Chinese Control And Decision Conference (CCDC)10.1109/CCDC.2017.7978797(1737-1742)Online publication date: May-2017
https://doi.org/10.1109/CCDC.2017.7978797
Liu LMa YZhang XZhang YLi S(2017)High discriminative SIFT feature and feature pair selection to improve the bag of visual words modelIET Image Processing10.1049/iet-ipr.2017.006211:11(994-1001)Online publication date: 21-Sep-2017
https://doi.org/10.1049/iet-ipr.2017.0062
Ayyachamy S(2016)Analysis and Comparison of Developed 2D Medical Image Database Design using Registration Scheme, Retrieval Scheme, and Bag-of-Visual-WordsClassification and Clustering in Biomedical Signal Processing10.4018/978-1-5225-0140-4.ch007(149-168)Online publication date: 2016
https://doi.org/10.4018/978-1-5225-0140-4.ch007
Mandery CBorras JJochner MAsfour T(2016)Using language models to generate whole-body multi-contact motions2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS.2016.7759796(5411-5418)Online publication date: Oct-2016
https://doi.org/10.1109/IROS.2016.7759796
Chih-Fong Tsai Ya-Han Hu Wei-Chao Lin (2016)The effect of region segmentation on object categorization2016 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC)10.1109/ICSPCC.2016.7753644(1-4)Online publication date: Aug-2016
https://doi.org/10.1109/ICSPCC.2016.7753644
Wu WGao G(2016)A novel image classifier based on Gaussian mixture language model2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2016.7471889(1312-1316)Online publication date: Mar-2016
https://doi.org/10.1109/ICASSP.2016.7471889
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten