Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3327144.3327343guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article
Free access

Submodular field grammars: representation, inference, and application to image parsing

Published: 03 December 2018 Publication History

Abstract

Natural scenes contain many layers of part-subpart structure, and distributions over them are thus naturally represented by stochastic image grammars, with one production per decomposition of a part. Unfortunately, in contrast to language grammars, where the number of possible split points for a production A → BC is linear in the length of A, in an image there are an exponential number of ways to split a region into subregions. This makes parsing intractable and requires image grammars to be severely restricted in practice, for example by allowing only rectangular regions. In this paper, we address this problem by associating with each production a submodular Markov random field whose labels are the subparts and whose labeling segments the current object into these subparts. We call the resulting model a submodular field grammar (SFG). Finding the MAP split of a region into subregions is now tractable, and by exploiting this we develop an efficient approximate algorithm for MAP parsing of images with SFGs. Empirically, we show promising improvements in accuracy when using SFGs for scene understanding, and demonstrate exponential improvements in inference time compared to traditional methods, while returning comparable minima.

References

[1]
Song-Chun Zhu and David Mumford. A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2(4):259-362, 2006.
[2]
Hoifung Poon and Pedro Domingos. Sum-product networks: A new deep architecture. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, pages 337-346. AUAI Press, 2011.
[3]
Yibiao Zhao and Song-Chun Zhu. Image parsing via stochastic scene grammar. In Advances in Neural Information Processing Systems, 2011.
[4]
Jamie Shotton, John Winn, Carsten Rother, and Antonio Criminisi. TextonBoost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81(1):2-23, 2009.
[5]
Stephen Gould, Richard Fulton, and Daphne Koller. Decomposing a scene into geometric and semantically consistent regions. In Proceedings of the IEEE International Conference on Computer Vision, 2009.
[6]
M. Pawan Kumar and Daphne Koller. MAP estimation of semi-metric MRFs via hierarchical graph cuts. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, pages 313-320, 2009.
[7]
Andrew Delong, Lena Gorelick, Olga Veksler, and Yuri Boykov. Minimizing energies with hierarchical costs. International Journal of Computer Vision, 100(1):38-58, 2012.
[8]
V. Chandrasekaran, N. Srebro, and P. Harsha. Complexity of inference in graphical models. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, pages 70-78, 2008.
[9]
Vladimir Kolmogorov and Ramin Zabih. What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2):147-159, 2004.
[10]
P. L. Hammer. Some network flow problems solved with pseudo-Boolean programming. Operations Research, 13:388-399, 1965.
[11]
D. M. Greig, B.T. Porteous, and A. H. Seheult. Exact maximum a posteriori estimation for binary images. Journal of the Royal Statistical Society. Series B (Methodological), 51(2):271-279, 1989.
[12]
Yuri Boykov and Vladimir Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9): 1124-1137, 2004.
[13]
Yuri Boykov, Olga Veksler, and Ramin Zabih. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11):1222-1239, 2001.
[14]
Chris Russell, Lubor Ladický, Pushmeet Kohli, and Philip H.S. Torr. Exact and approximate inference in associative hierarchical networks using graph cuts. The 26th Conference on Uncertainty in Artificial Intelligence, 2010.
[15]
Victor Lempitsky, Andrea Vedaldi, and Andrew Zisserman. A pylon model for semantic segmentation. In Neural Information Processing Systems, 2011.
[16]
Richard Socher, Cliff C. Lin, Chris Manning, and Andrew Y. Ng. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th International Conference on Machine Learning, pages 129-136, 2011.
[17]
Abhishek Sharma, Oncel Tuzel, and Ming-Yu Liu. Recursive context propagation network for semantic scene labeling. In Advances in Neural Information Processing Systems, pages 2447-2455, 2014.
[18]
Daniel S. Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, 2000.
[19]
Robert Gens and Pedro Domingos. Learning the structure of sum-product networks. In Proceedings of the 30th International Conference on Machine Learning, pages 873-880. Omnipress, 2013.
[20]
John Hopcroft and Jeffrey Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Reading MA, 1979.
[21]
Victor Lempitsky, Carsten Rother, Stefan Roth, and Andrew Blake. Fusion moves for Markov random field optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8):1392-1405, 2010.
[22]
Vladimir Kolmogorov and Carsten Rother. Minimizing nonsubmodular functions with graph cuts - a review. IEEE transactions on pattern analysis and machine intelligence, 29(7):1274-9, 2007.
[23]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected CRFs. In Proceedings of the International Conference on Learning Representations, 2015.
[24]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv preprint arXiv:1606.00915 [cs.CV], 2016.
[25]
Jamie Shotton, John Winn, Carsten Rother, and Antonio Criminisi. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. Proceedings European Conference on Computer Vision (ECCV), 3951, 2006.

Cited By

View all
  • (2019)Active learning for software engineeringProceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3359591.3359732(62-78)Online publication date: 23-Oct-2019
  1. Submodular field grammars: representation, inference, and application to image parsing

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems
    December 2018
    11021 pages

    Publisher

    Curran Associates Inc.

    Red Hook, NY, United States

    Publication History

    Published: 03 December 2018

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)31
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Active learning for software engineeringProceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3359591.3359732(62-78)Online publication date: 23-Oct-2019

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media