Abstract
Images usually contain multiple objects that are semantically related to one another. Mapping from low-level visual features to mutually dependent high-level semantics can be formulated as a structured prediction problem. Current statistical models for structured prediction make simplifying assumptions about the underlying output graph structure, such as assuming a low-order Markov chain, because exact inference becomes intractable as the tree-width of the underlying graph increases. Approximate inference algorithms, on the other hand, force one to trade off representational power with computational efficiency. In this paper, we present large margin sigmoid belief networks (LMSBNs) for structured prediction in images. LMSBNs allow a very fast inference algorithm for arbitrary graph structures that runs in polynomial time with high probability. This probability is data-distribution dependent and is maximized in learning. The new approach overcomes the representation-efficiency trade-off in previous models and allows fast structured prediction with complicated graph structures. We present results from applying a fully connected model to semantic image annotation, image retrieval and optical character recognition (OCR) problems, and demonstrate that the proposed approach can yield significant performance gains over current state-of-the-art methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abdou, S., & Scordilis, M. S. (2004). Beam search pruning in speech recognition using a posterior probability-based confidence measure. Speech Communication, 42(3–4), 409–428.
Bartlett, P. L., & Mendelson, S. (2002). Rademacher and Gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482.
Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. In Advances in neural information processing systems. Cambridge: MIT Press.
Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. In J. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems (Vol. 20, pp. 161–168).
Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37, 1757–1771.
Carneiro, G., Chan, A., Moreno, P., & Vasconcelos, N. (2007). Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 394–410.
Collins, M., Globerson, A., Koo, T., Carreras, X., & Bartlett, P. L. (2008). Exponentiated gradient algorithms for conditional random fields and max-margin Markov networks. Journal of Machine Learning Research, 9, 1775–1822.
Darwiche, A. (2003). A differential approach to inference in Bayesian networks. Journal of the ACM, 50, 123–132.
Daumé III, H., Langford, J., & Marcu, D. (2009). Search-based structured prediction. Machine Learning Journal, 75(3), 297–325.
Doucet, A., de Freitas, N., Murphy, K. P., & Russell, S. J. (2000). Rao-Blackwellised particle filtering for dynamic Bayesian networks. In Annual conference on uncertainty in artificial intelligence (pp. 176–183).
Duygulu, P., Barnard, K., de Freitas, J., & Forsyth, D. (2006). Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In A. Heyden, G. Sparr, M. Nielsen, & P. Johansen (Eds.), Lecture notes in computer science: Vol. 2353. European conference on computer vision (pp. 349–354). Berlin: Springer.
Fan, R. E., & Lin, C. J. (2007). A study on threshold selection for multi-label classification (Tech. rep.). National Taiwan University.
Feng, S., Manmatha R., & Lavrenko, V. (2004). Multiple Bernoulli relevance models for image and video annotation. In Computer vision and pattern recognition.
Finley, T., & Joachims, T. (2008). Training structural SVMs when exact inference is intractable. In International conference on machine learning (pp. 304–311). New York: ACM.
Guillaumin, M., Mensink, T., Verbeek, J., & Schmid, C. (2009). Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In International conference on computer vision (pp. 309–316).
Guo, Y., Wilkinson, D., & Schuurmans, D. (2005). Maximum margin Bayesian networks. In Annual conference on uncertainty in artificial intelligence.
Hinton, G. E., & Sejnowski, T. J. (1983). Optimal perceptual inference. In Computer vision and pattern recognition (pp. 448–453).
Hoefel, G., & Elkan, C. (2008). Learning a two-stage SVM/CRF sequence classifier. In ACM conference on information and knowledge management (pp. 271–278). New York, NY, USA.
Hsieh, C. J., Chang, K. W., Lin, C. J., Keerthi, S. S., & Sundararajan, S. (2008). A dual coordinate descent method for large-scale linear SVM. In International conference on machine learning (pp. 408–415). New York: ACM.
Joachims, T., Finley, T., & Yu, C. N. J. (2009). Cutting-plane training of structural SVMs. Machine Learning, 77(1), 27–59.
Kassel, R. H. (1995). A comparison of approaches to on-line handwritten character recognition. PhD thesis, Cambridge, MA, USA.
Kolmogorov, V., & Zabih, R. (2002). What energy functions can be minimized via graph cuts? In European conference on computer vision (pp. 185–208).
Kulesza, A., & Pereira, F. (2007). Structured learning with approximate inference. In Advances in neural information processing systems.
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In International conference on machine learning.
Lavrenko, V., Manmatha, R., & Jeon, J. (2003). A model for learning the semantics of pictures. In Advances in neural information processing systems. Cambridge: MIT Press.
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In Computer vision and pattern recognition.
LeCun, Y., Chopra, S., Hadsell, R., Huang, F. J., Bakir, G., Hofman, T., Schölkopf, B., Smola, A., & Taskar, B. (Eds.) (2006). A tutorial on energy-based learning. In Predicting structured data. Cambridge: MIT Press.
Liu, J., Li, M., Liu, Q., Lu, H., & Ma, S. (2009). Image annotation via graph learning. Pattern Recognition, 42, 218–228.
Lowd, D., & Domingos, P. (2008). Learning arithmetic circuits. In Annual conference on uncertainty in artificial intelligence (pp. 383–392). Corvallis: AUAI Press.
Makadia, A., Pavlovic, V., & Kumar, S. (2008). A new baseline for image annotation. In European conference on computer vision (pp. 316–329). Berlin: Springer.
McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum entropy Markov models for information extraction and segmentation. In Proc. 17th international conf. on machine learning (pp. 591–598). San Francisco: Morgan Kaufmann.
Metzler, D., & Manmatha, R. (2004). An inference network approach to image retrieval. In The international conference on image and video retrieval (pp. 42–50). Berlin: Springer.
Miao, X., & Rao, R. P. (2009). Large margin Boltzmann machines. In International joint conference on artificial intelligence (pp. 1156–1162).
Neal, R. M. (1992). Connectionist learning of belief networks. Artificial Intelligence, 56(1), 71–113.
Perez-Cruz, F., Ghahramani, Z., & Pontil, M. (2007). Conditional graphical models. Cambridge: MIT Press.
Quattoni, A., Collins, M., & Darrell, T. (2004). Conditional random fields for object recognition. In Advances in neural information processing systems (pp. 1097–1104). Cambridge: MIT Press.
Rosenberg, D., Klein, D., & Taskar, B. (2007). Mixture-of-parents maximum entropy Markov models. In Annual conference on uncertainty in artificial intelligence (pp. 318–325). Corvallis: AUAI Press.
Russell, B., Torralba, A., Murphy, K., & Freeman, W. (2008). Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision, 77(1), 157–173.
Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-gradient solver for SVM. In International conference on machine learning (pp. 807–814). New York: ACM.
Shalev-Shwartz, S., Srebro, N., & Sridharan, K. (2008). Fast rates for regularized objectives. In Advances in neural information processing systems.
Tadepalli, P., & Natarajan, B. K. (1996). A formal framework for speedup learning from problems and solutions. The Journal of Artificial Intelligence Research, 4, 445–475.
Taskar, B., Guestrin, C., & Koller, D. (2004). Max-margin Markov networks. In Advances in neural information processing systems, Vancouver, Canada.
Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In International conference on machine learning.
Tsoumakas, G., Katakis, I., & Vlahavas, I. P. (2010). Mining multi-label data. In Data mining and knowledge discovery handbook, pp. 667–685.
Wainwright, M. J., Jaakkola, T., & Willsky, A. S. (2003). Tree-reweighted belief propagation algorithms and approximate Ml estimation via pseudo-moment matching. In AISTATS.
Wainwright, M. J., Jaakkola, T., & Willsky, A. S. (2005a). A new class of upper bounds on the log partition function. IEEE Transactions on Information Theory, 51, 2313–2335.
Wainwright, M. J., Jaakola, T., & Willsky, A. S. (2005b). MAP estimation via agreement on trees: message passing and linear programming. IEEE Transactions on Information Theory, 51, 3697–3717.
Yavlinsky, A., Schofield, E., & Rüger, S. (2005). Automated image annotation using global features and robust nonparametric density estimation. In The international conference on image and video retrieval (pp. 507–517). Berlin: Springer.
Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2005). Constructing free energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51, 2282–2312.
Zhang, T. (2001). Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, 32(1), 56–85.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Miao, X., Rao, R.P.N. Fast Structured Prediction Using Large Margin Sigmoid Belief Networks. Int J Comput Vis 99, 302–318 (2012). https://doi.org/10.1007/s11263-011-0423-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-011-0423-5