Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1290082.1290101acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Visual language modeling for image classification

Published: 24 September 2007 Publication History

Abstract

Although it has been studied for many years, image classification is still a challenging problem. In this paper, we propose a visual language modeling method for content-based image classification. It transforms each image into a matrix of visual words, and assumes that each visual word is conditionally dependent on its neighbors. For each image category, a visual language model is constructed using a set of training images, which captures both the co-occurrence and proximity information of visual words. According to how many neighbors are taken in consideration, three kinds of language models can be trained, including unigram, bigram and trigram, each of which corresponds to a different level of model complexity. Given a test image, its category is determined by estimating how likely it is generated under a specific category. Compared with traditional methods that are based on bag-of-words models, the proposed method can utilize the spatial correlation of visual words effectively in image classification. In addition, we propose to use the absent words, which refer to those appearing frequently in a category but not in the target image, to help image classification. Experimental results show that our method can achieve comparable accuracy while performing classification much more quickly.

References

[1]
Bahl, L. R., Jelinek, F., and Mercer, R. L. A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1983.
[2]
Brown, P. F., Cocke, J., DellaPietra, S. A., Mercer, R. L. and Roossin, P. S. A statistical approach to machine translation. Computational Linguistics, 1990.
[3]
Mays, E., Damerau, F. J. and Mercer, R. L. Context-based spelling correction. IBM Natural Language ITL, 1990.
[4]
Chatterjee, S., Hadi, A. and Price, B. Simple Linear Regression. Regression Analysis by Example, 3rd ed. New York: Wiley, 2000.
[5]
Sivic, J., Russell, B., Efros, A., Zisserman, A. and Freeman, W. Discovering object categories in image collections. Technical Report A. I. Memo 2005--005, MIT, 2005.
[6]
Lowe, D. Object recognition from local scale-invariant features. In Proceedings of the International Conference on Computer Vision (ICCV'99), 1999, 1150--1157.
[7]
Fergus, R., Perona, P. and Zisserman, A. Object class recognition by unsupervised scale-invariant learning. In Proceedings of the Computer Vision and Pattern Recognition (CVPR'03), 2003.
[8]
Li, F. F., Fergus, R., and Perona, P. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. In IEEE CVPR Workshop of Generative Model Based Vision, 2004.
[9]
Csurka, G., Bray, C., Dance, C. and Fan, L. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, (ECCV'04), 2004, 1--22.
[10]
Hofmann, T. Probabilistic latent semantic indexing. In Proc. ACM SIGIR (SIGIR'99), ACM Press, 1999.
[11]
Blei, D., Ng, A. and Jordan, M. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, Jan 2003.
[12]
Fergus, R., Li, F. F., Perona, P. and Zisserman, A. Learning object categories from Google's image search. In Proc. Tenth IEEE International Conference on Computer Vision, (ICCV'05), 2005.
[13]
Maree, R., Geurts, P., Piater, J. and Wehenkel, L. Random Subwindows for Robust Image Classification. In Proceedings of the Computer Vision and Pattern Recognition (CVPR'05), 2005.
[14]
Wang, B., Li, Z. W., Li, M. J. and Ma, W. Y. Large-Scale Duplicate Detection for Web Image Search. In Proceedings of IEEE International Conference on Multimedia & Expo (ICME'06), 2006.
[15]
Katz, S. M. Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(3):400--401, 1987.
[16]
Matas, J., Chum, O., Urban, M. and Pajdla, T. Robust wide baseline stereo from maximally stable extremal regions. In Proceedings of The British Machine Vision Conference (BMVC'02), 2002, 384--393.
[17]
Mikolajczyk, K. and Schmid, C. An affine invariant interest point detector. In Proceedings of European Conference on Computer Vision (ECCV'02), Springer-Verlag, 2002.
[18]
Zheng, Q., Wang, W. and Gao, W. Effective and efficient object-based image retrieval using visual phrases. In Proc. of the 14th Annual ACM international Conference on Multimedia, (MM '06), 2006.
[19]
Otluman, H. and Aboulnasr, T. Low Complexity 2-d Hidden Markov Model for Face Recognition. In Proceedings of International Symposium on Computer Architecture. (ISCAS'00), 2000.
[20]
Vailaya, A., Jain, A. K. and Zhang, H. J. On image classification: City images vs. landscapes. Pattern Recognition, Vol. 31, pp. 1921--1936, 1998.
[21]
Maron, O. and Lozano-Perez, T. A framework for multiple-instance learning. In M.I. Jordan, M.J. Kearns, and S.A. Solla, Eds. Advances in Neural Information Processing Systems 10, Cambridge, MA: MIT Press, pp.570--576, 1998
[22]
Bi, J., Chen, Y. and Wang, J.Z. A Sparse Support Vector Machine Approach to Region-Based Image Categorization. In Proceedings of the Computer Vision and Pattern Recognition (CVPR'05), 2005.
[23]
Quelhas, P., Monay, F., Odobez, J., Gatica-Perez, D., Tuytelaars, T. and Gool, L. Modeling Scenes with Local Descriptors and Latent Aspects. In Proc. Tenth IEEE International Conference on Computer Vision, (ICCV'05), 2005.
[24]
Szummer, M. and Picard, R. Indoor-Outdoor Image Classification. In Proc. IEEE Workshop on Content-Based Access of Image and Video Databases, 1998, 42--51.
[25]
Gorkani, M. M. and Picard, R. W. Texture orientation for sorting photos .at a glance'. In Proc. 12th Int. Conf. on Pattern Recognition (ICPR'94), 1994, 459--464.
[26]
Wang, J., Li, J. and Wiederhold, G. SIMPLIcity: Semantics-sensitive integrated matching for picture libraries. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(9):947--963.
[27]
Peng, F. and Schuurmans, D. Combining Naive Bayes and n-Gram Language Models for Text Classification. In Proc. of The 25th European Conference on Information Retrieval Research (ECIR'03), 2003.
[28]
Clarkson, P. R. and Rosenfeld, R. Statistical Language Modeling Using the CMU-Cambridge Toolkit. In Proceedings ESCA Eurospeech, 1997.
[29]
Li, J. and Wang, J. Z. Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach. IEEE Trans. Pattern Anal. Mach. Intell. 25, 9 (Sep. 2003), 1075--1088. 2003.

Cited By

View all
  • (2018)An improved image classification based on K-means clustering and BoW modelInternational Journal of Grid and Utility Computing10.1504/IJGUC.2018.0902259:1(37-42)Online publication date: 1-Jan-2018
  • (2018)A New Retrieval System Based on Low Dynamic Range Expansion and SIFT Descriptor2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)10.1109/MMSP.2018.8547089(1-6)Online publication date: Aug-2018
  • (2017)Analysis and Comparison of Developed 2D Medical Image Database Design using Registration Scheme, Retrieval Scheme, and Bag-of-Visual-WordsMedical Imaging10.4018/978-1-5225-0571-6.ch058(1394-1413)Online publication date: 2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MIR '07: Proceedings of the international workshop on Workshop on multimedia information retrieval
September 2007
343 pages
ISBN:9781595937780
DOI:10.1145/1290082
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 September 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. absent word criterion
  2. image classification
  3. visual language model

Qualifiers

  • Article

Conference

MM07
MM07: The 15th ACM International Conference on Multimedia 2007
September 24 - 29, 2007
Bavaria, Augsburg, Germany

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)6
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2018)An improved image classification based on K-means clustering and BoW modelInternational Journal of Grid and Utility Computing10.1504/IJGUC.2018.0902259:1(37-42)Online publication date: 1-Jan-2018
  • (2018)A New Retrieval System Based on Low Dynamic Range Expansion and SIFT Descriptor2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)10.1109/MMSP.2018.8547089(1-6)Online publication date: Aug-2018
  • (2017)Analysis and Comparison of Developed 2D Medical Image Database Design using Registration Scheme, Retrieval Scheme, and Bag-of-Visual-WordsMedical Imaging10.4018/978-1-5225-0571-6.ch058(1394-1413)Online publication date: 2017
  • (2017)Visual language model for keyword spotting on historical mongolian document images2017 29th Chinese Control And Decision Conference (CCDC)10.1109/CCDC.2017.7978797(1737-1742)Online publication date: May-2017
  • (2016)Analysis and Comparison of Developed 2D Medical Image Database Design using Registration Scheme, Retrieval Scheme, and Bag-of-Visual-WordsClassification and Clustering in Biomedical Signal Processing10.4018/978-1-5225-0140-4.ch007(149-168)Online publication date: 2016
  • (2016)Using language models to generate whole-body multi-contact motions2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS.2016.7759796(5411-5418)Online publication date: Oct-2016
  • (2016)The effect of region segmentation on object categorization2016 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC)10.1109/ICSPCC.2016.7753644(1-4)Online publication date: Aug-2016
  • (2016)A novel image classifier based on Gaussian mixture language model2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2016.7471889(1312-1316)Online publication date: Mar-2016
  • (2016)Improving the BoVW via discriminative visual n-grams and MKL strategiesNeurocomputing10.1016/j.neucom.2015.10.053175:PA(768-781)Online publication date: 29-Jan-2016
  • (2016)Integrating multiple types of features for event identification in social imagesMultimedia Tools and Applications10.1007/s11042-014-2436-x75:6(3301-3322)Online publication date: 1-Mar-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media