Fast Structured Prediction Using Large Margin Sigmoid Belief Networks

Xu Miao¹ &
Rajesh P. N. Rao¹

448 Accesses
1 Citation
Explore all metrics

Abstract

Images usually contain multiple objects that are semantically related to one another. Mapping from low-level visual features to mutually dependent high-level semantics can be formulated as a structured prediction problem. Current statistical models for structured prediction make simplifying assumptions about the underlying output graph structure, such as assuming a low-order Markov chain, because exact inference becomes intractable as the tree-width of the underlying graph increases. Approximate inference algorithms, on the other hand, force one to trade off representational power with computational efficiency. In this paper, we present large margin sigmoid belief networks (LMSBNs) for structured prediction in images. LMSBNs allow a very fast inference algorithm for arbitrary graph structures that runs in polynomial time with high probability. This probability is data-distribution dependent and is maximized in learning. The new approach overcomes the representation-efficiency trade-off in previous models and allows fast structured prediction with complicated graph structures. We present results from applying a fully connected model to semantic image annotation, image retrieval and optical character recognition (OCR) problems, and demonstrate that the proposed approach can yield significant performance gains over current state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Abdou, S., & Scordilis, M. S. (2004). Beam search pruning in speech recognition using a posterior probability-based confidence measure. Speech Communication, 42(3–4), 409–428.
Article Google Scholar
Bartlett, P. L., & Mendelson, S. (2002). Rademacher and Gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482.
MathSciNet Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. In Advances in neural information processing systems. Cambridge: MIT Press.
Google Scholar
Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. In J. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems (Vol. 20, pp. 161–168).
Google Scholar
Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37, 1757–1771.
Article Google Scholar
Carneiro, G., Chan, A., Moreno, P., & Vasconcelos, N. (2007). Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 394–410.
Article Google Scholar
Collins, M., Globerson, A., Koo, T., Carreras, X., & Bartlett, P. L. (2008). Exponentiated gradient algorithms for conditional random fields and max-margin Markov networks. Journal of Machine Learning Research, 9, 1775–1822.
MathSciNet MATH Google Scholar
Darwiche, A. (2003). A differential approach to inference in Bayesian networks. Journal of the ACM, 50, 123–132.
Article MathSciNet Google Scholar
Daumé III, H., Langford, J., & Marcu, D. (2009). Search-based structured prediction. Machine Learning Journal, 75(3), 297–325.
Article Google Scholar
Doucet, A., de Freitas, N., Murphy, K. P., & Russell, S. J. (2000). Rao-Blackwellised particle filtering for dynamic Bayesian networks. In Annual conference on uncertainty in artificial intelligence (pp. 176–183).
Google Scholar
Duygulu, P., Barnard, K., de Freitas, J., & Forsyth, D. (2006). Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In A. Heyden, G. Sparr, M. Nielsen, & P. Johansen (Eds.), Lecture notes in computer science: Vol. 2353. European conference on computer vision (pp. 349–354). Berlin: Springer.
Google Scholar
Fan, R. E., & Lin, C. J. (2007). A study on threshold selection for multi-label classification (Tech. rep.). National Taiwan University.
Feng, S., Manmatha R., & Lavrenko, V. (2004). Multiple Bernoulli relevance models for image and video annotation. In Computer vision and pattern recognition.
Google Scholar
Finley, T., & Joachims, T. (2008). Training structural SVMs when exact inference is intractable. In International conference on machine learning (pp. 304–311). New York: ACM.
Google Scholar
Guillaumin, M., Mensink, T., Verbeek, J., & Schmid, C. (2009). Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In International conference on computer vision (pp. 309–316).
Chapter Google Scholar
Guo, Y., Wilkinson, D., & Schuurmans, D. (2005). Maximum margin Bayesian networks. In Annual conference on uncertainty in artificial intelligence.
Google Scholar
Hinton, G. E., & Sejnowski, T. J. (1983). Optimal perceptual inference. In Computer vision and pattern recognition (pp. 448–453).
Google Scholar
Hoefel, G., & Elkan, C. (2008). Learning a two-stage SVM/CRF sequence classifier. In ACM conference on information and knowledge management (pp. 271–278). New York, NY, USA.
Google Scholar
Hsieh, C. J., Chang, K. W., Lin, C. J., Keerthi, S. S., & Sundararajan, S. (2008). A dual coordinate descent method for large-scale linear SVM. In International conference on machine learning (pp. 408–415). New York: ACM.
Google Scholar
Joachims, T., Finley, T., & Yu, C. N. J. (2009). Cutting-plane training of structural SVMs. Machine Learning, 77(1), 27–59.
Article MATH Google Scholar
Kassel, R. H. (1995). A comparison of approaches to on-line handwritten character recognition. PhD thesis, Cambridge, MA, USA.
Kolmogorov, V., & Zabih, R. (2002). What energy functions can be minimized via graph cuts? In European conference on computer vision (pp. 185–208).
Google Scholar
Kulesza, A., & Pereira, F. (2007). Structured learning with approximate inference. In Advances in neural information processing systems.
Google Scholar
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In International conference on machine learning.
Google Scholar
Lavrenko, V., Manmatha, R., & Jeon, J. (2003). A model for learning the semantics of pictures. In Advances in neural information processing systems. Cambridge: MIT Press.
Google Scholar
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In Computer vision and pattern recognition.
Google Scholar
LeCun, Y., Chopra, S., Hadsell, R., Huang, F. J., Bakir, G., Hofman, T., Schölkopf, B., Smola, A., & Taskar, B. (Eds.) (2006). A tutorial on energy-based learning. In Predicting structured data. Cambridge: MIT Press.
Google Scholar
Liu, J., Li, M., Liu, Q., Lu, H., & Ma, S. (2009). Image annotation via graph learning. Pattern Recognition, 42, 218–228.
Article MATH Google Scholar
Lowd, D., & Domingos, P. (2008). Learning arithmetic circuits. In Annual conference on uncertainty in artificial intelligence (pp. 383–392). Corvallis: AUAI Press.
Google Scholar
Makadia, A., Pavlovic, V., & Kumar, S. (2008). A new baseline for image annotation. In European conference on computer vision (pp. 316–329). Berlin: Springer.
Google Scholar
McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum entropy Markov models for information extraction and segmentation. In Proc. 17th international conf. on machine learning (pp. 591–598). San Francisco: Morgan Kaufmann.
Google Scholar
Metzler, D., & Manmatha, R. (2004). An inference network approach to image retrieval. In The international conference on image and video retrieval (pp. 42–50). Berlin: Springer.
Chapter Google Scholar
Miao, X., & Rao, R. P. (2009). Large margin Boltzmann machines. In International joint conference on artificial intelligence (pp. 1156–1162).
Google Scholar
Neal, R. M. (1992). Connectionist learning of belief networks. Artificial Intelligence, 56(1), 71–113.
Article MathSciNet MATH Google Scholar
Perez-Cruz, F., Ghahramani, Z., & Pontil, M. (2007). Conditional graphical models. Cambridge: MIT Press.
Google Scholar
Quattoni, A., Collins, M., & Darrell, T. (2004). Conditional random fields for object recognition. In Advances in neural information processing systems (pp. 1097–1104). Cambridge: MIT Press.
Google Scholar
Rosenberg, D., Klein, D., & Taskar, B. (2007). Mixture-of-parents maximum entropy Markov models. In Annual conference on uncertainty in artificial intelligence (pp. 318–325). Corvallis: AUAI Press.
Google Scholar
Russell, B., Torralba, A., Murphy, K., & Freeman, W. (2008). Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision, 77(1), 157–173.
Article Google Scholar
Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-gradient solver for SVM. In International conference on machine learning (pp. 807–814). New York: ACM.
Google Scholar
Shalev-Shwartz, S., Srebro, N., & Sridharan, K. (2008). Fast rates for regularized objectives. In Advances in neural information processing systems.
Google Scholar
Tadepalli, P., & Natarajan, B. K. (1996). A formal framework for speedup learning from problems and solutions. The Journal of Artificial Intelligence Research, 4, 445–475.
MathSciNet MATH Google Scholar
Taskar, B., Guestrin, C., & Koller, D. (2004). Max-margin Markov networks. In Advances in neural information processing systems, Vancouver, Canada.
Google Scholar
Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In International conference on machine learning.
Google Scholar
Tsoumakas, G., Katakis, I., & Vlahavas, I. P. (2010). Mining multi-label data. In Data mining and knowledge discovery handbook, pp. 667–685.
Google Scholar
Wainwright, M. J., Jaakkola, T., & Willsky, A. S. (2003). Tree-reweighted belief propagation algorithms and approximate Ml estimation via pseudo-moment matching. In AISTATS.
Google Scholar
Wainwright, M. J., Jaakkola, T., & Willsky, A. S. (2005a). A new class of upper bounds on the log partition function. IEEE Transactions on Information Theory, 51, 2313–2335.
Article MathSciNet Google Scholar
Wainwright, M. J., Jaakola, T., & Willsky, A. S. (2005b). MAP estimation via agreement on trees: message passing and linear programming. IEEE Transactions on Information Theory, 51, 3697–3717.
Article Google Scholar
Yavlinsky, A., Schofield, E., & Rüger, S. (2005). Automated image annotation using global features and robust nonparametric density estimation. In The international conference on image and video retrieval (pp. 507–517). Berlin: Springer.
Chapter Google Scholar
Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2005). Constructing free energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51, 2282–2312.
Article MathSciNet Google Scholar
Zhang, T. (2001). Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, 32(1), 56–85.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engineering, University of Washington, Seattle, WA, 98125, USA
Xu Miao & Rajesh P. N. Rao

Authors

Xu Miao
View author publications
You can also search for this author in PubMed Google Scholar
Rajesh P. N. Rao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xu Miao or Rajesh P. N. Rao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Miao, X., Rao, R.P.N. Fast Structured Prediction Using Large Margin Sigmoid Belief Networks. Int J Comput Vis 99, 302–318 (2012). https://doi.org/10.1007/s11263-011-0423-5

Download citation

Received: 23 March 2010
Accepted: 17 January 2011
Published: 05 February 2011
Issue Date: September 2012
DOI: https://doi.org/10.1007/s11263-011-0423-5

Fast Structured Prediction Using Large Margin Sigmoid Belief Networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic Texton Forests for Image Categorization and Segmentation

Mixing Low-Level and Semantic Features for Image Interpretation

Learning Hierarchical Bag of Words Using Naive Bayes Clustering

References

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Fast Structured Prediction Using Large Margin Sigmoid Belief Networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic Texton Forests for Image Categorization and Segmentation

Mixing Low-Level and Semantic Features for Image Interpretation

Learning Hierarchical Bag of Words Using Naive Bayes Clustering

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation