Article

Free access

Submodular field grammars: representation, inference, and application to image parsing

Authors:

Abram L. Friesen,

Pedro DomingosAuthors Info & Claims

NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

Pages 4312 - 4322

Published: 03 December 2018 Publication History

PDF eReader Publisher Site

Abstract

Natural scenes contain many layers of part-subpart structure, and distributions over them are thus naturally represented by stochastic image grammars, with one production per decomposition of a part. Unfortunately, in contrast to language grammars, where the number of possible split points for a production A → BC is linear in the length of A, in an image there are an exponential number of ways to split a region into subregions. This makes parsing intractable and requires image grammars to be severely restricted in practice, for example by allowing only rectangular regions. In this paper, we address this problem by associating with each production a submodular Markov random field whose labels are the subparts and whose labeling segments the current object into these subparts. We call the resulting model a submodular field grammar (SFG). Finding the MAP split of a region into subregions is now tractable, and by exploiting this we develop an efficient approximate algorithm for MAP parsing of images with SFGs. Empirically, we show promising improvements in accuracy when using SFGs for scene understanding, and demonstrate exponential improvements in inference time compared to traditional methods, while returning comparable minima.

References

[1]

Song-Chun Zhu and David Mumford. A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2(4):259-362, 2006.

Digital Library

[2]

Hoifung Poon and Pedro Domingos. Sum-product networks: A new deep architecture. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, pages 337-346. AUAI Press, 2011.

Digital Library

[3]

Yibiao Zhao and Song-Chun Zhu. Image parsing via stochastic scene grammar. In Advances in Neural Information Processing Systems, 2011.

Digital Library

[4]

Jamie Shotton, John Winn, Carsten Rother, and Antonio Criminisi. TextonBoost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81(1):2-23, 2009.

Digital Library

[5]

Stephen Gould, Richard Fulton, and Daphne Koller. Decomposing a scene into geometric and semantically consistent regions. In Proceedings of the IEEE International Conference on Computer Vision, 2009.

[6]

M. Pawan Kumar and Daphne Koller. MAP estimation of semi-metric MRFs via hierarchical graph cuts. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, pages 313-320, 2009.

Digital Library

[7]

Andrew Delong, Lena Gorelick, Olga Veksler, and Yuri Boykov. Minimizing energies with hierarchical costs. International Journal of Computer Vision, 100(1):38-58, 2012.

Digital Library

[8]

V. Chandrasekaran, N. Srebro, and P. Harsha. Complexity of inference in graphical models. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, pages 70-78, 2008.

Digital Library

[9]

Vladimir Kolmogorov and Ramin Zabih. What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2):147-159, 2004.

Digital Library

[10]

P. L. Hammer. Some network flow problems solved with pseudo-Boolean programming. Operations Research, 13:388-399, 1965.

Digital Library

[11]

D. M. Greig, B.T. Porteous, and A. H. Seheult. Exact maximum a posteriori estimation for binary images. Journal of the Royal Statistical Society. Series B (Methodological), 51(2):271-279, 1989.

[12]

Yuri Boykov and Vladimir Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9): 1124-1137, 2004.

Digital Library

[13]

Yuri Boykov, Olga Veksler, and Ramin Zabih. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11):1222-1239, 2001.

Digital Library

[14]

Chris Russell, Lubor Ladický, Pushmeet Kohli, and Philip H.S. Torr. Exact and approximate inference in associative hierarchical networks using graph cuts. The 26th Conference on Uncertainty in Artificial Intelligence, 2010.

Digital Library

[15]

Victor Lempitsky, Andrea Vedaldi, and Andrew Zisserman. A pylon model for semantic segmentation. In Neural Information Processing Systems, 2011.

Digital Library

[16]

Richard Socher, Cliff C. Lin, Chris Manning, and Andrew Y. Ng. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th International Conference on Machine Learning, pages 129-136, 2011.

Digital Library

[17]

Abhishek Sharma, Oncel Tuzel, and Ming-Yu Liu. Recursive context propagation network for semantic scene labeling. In Advances in Neural Information Processing Systems, pages 2447-2455, 2014.

Digital Library

[18]

Daniel S. Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, 2000.

Digital Library

[19]

Robert Gens and Pedro Domingos. Learning the structure of sum-product networks. In Proceedings of the 30th International Conference on Machine Learning, pages 873-880. Omnipress, 2013.

Digital Library

[20]

John Hopcroft and Jeffrey Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Reading MA, 1979.

Digital Library

[21]

Victor Lempitsky, Carsten Rother, Stefan Roth, and Andrew Blake. Fusion moves for Markov random field optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8):1392-1405, 2010.

Digital Library

[22]

Vladimir Kolmogorov and Carsten Rother. Minimizing nonsubmodular functions with graph cuts - a review. IEEE transactions on pattern analysis and machine intelligence, 29(7):1274-9, 2007.

Digital Library

[23]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected CRFs. In Proceedings of the International Conference on Learning Representations, 2015.

[24]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv preprint arXiv:1606.00915 [cs.CV], 2016.

[25]

Jamie Shotton, John Winn, Carsten Rother, and Antonio Criminisi. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. Proceedings European Conference on Computer Vision (ECCV), 3951, 2006.

Digital Library

Cited By

Cambronero JDang TVasilakis NShen JWu JRinard MMasuhara HPetricek T(2019)Active learning for software engineeringProceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3359591.3359732(62-78)Online publication date: 23-Oct-2019
https://dl.acm.org/doi/10.1145/3359591.3359732

Submodular field grammars: representation, inference, and application to image parsing
1. Computing methodologies

Recommendations

Parsing expression grammars: a recognition-based syntactic foundation
POPL '04: Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languages

For decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ...
Parsing expression grammars: a recognition-based syntactic foundation
POPL '04

For decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ...
Lexicalized context-free grammars
ACL '93: Proceedings of the 31st annual meeting on Association for Computational Linguistics

Lexicalized context-free grammar(LCFG) is an attractive compromise between the parsing efficiency of context-free grammar (CFG) and the elegance and lexical sensitivity of lexicalized tree adjoining grammar (LTAG). LCFG is a restricted form of LTAG that ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

December 2018

11021 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 03 December 2018

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
40
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)8

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cambronero JDang TVasilakis NShen JWu JRinard MMasuhara HPetricek T(2019)Active learning for software engineeringProceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3359591.3359732(62-78)Online publication date: 23-Oct-2019
https://dl.acm.org/doi/10.1145/3359591.3359732

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents