Nothing Special   »   [go: up one dir, main page]

skip to main content
article

A stochastic grammar of images

Published: 29 January 2006 Publication History

Abstract

This exploratory paper quests for a stochastic and context sensitive grammar of images. The grammar should achieve the following four objectives and thus serves as a unified framework of representation, learning, and recognition for a large number of object categories. (i) The grammar represents both the hierarchical decompositions from scenes, to objects, parts, primitives and pixels by terminal and nonterminal nodes and the contexts for spatial and functional relations by horizontal links between the nodes. It formulates each object category as the set of all possible valid configurations produced by the grammar. (ii) The grammar is embodied in a simple And-Or graph representation where each Or-node points to alternative sub-configurations and an And-node is decomposed into a number of components. This representation supports recursive top-down/bottom-up procedures for image parsing under the Bayesian framework and make it convenient to scale up in complexity. Given an input image, the image parsing task constructs a most probable parse graph on-the-fly as the output interpretation and this parse graph is a subgraph of the And-Or graph after making choice on the Or-nodes. (iii) A probabilistic model is defined on this And-Or graph representation to account for the natural occurrence frequency of objects and parts as well as their relations. This model is learned from a relatively small training set per category and then sampled to synthesize a large number of configurations to cover novel object instances in the test set. This generalization capability is mostly missing in discriminative machine learning methods and can largely improve recognition performance in experiments. (iv) To fill the well-known semantic gap between symbols and raw signals, the grammar includes a series of visual dictionaries and organizes them through graph composition. At the bottom-level the dictionary is a set of image primitives each having a number of anchor points with open bonds to link with other primitives. These primitives can be combined to form larger and larger graph structures for parts and objects. The ambiguities in inferring local primitives shall be resolved through top-down computation using larger structures. Finally these primitives forms a primal sketch representation which will generate the input image with every pixels explained. The proposal grammar integrates three prominent representations in the literature: stochastic grammars for composition, Markov (or graphical) models for contexts, and sparse coding with primitives (wavelets). It also combines the structure-based and appearance based methods in the vision literature. Finally the paper presents three case studies to illustrate the proposed grammar.

References

[1]
{1} S. P. Abney, "Stochastic attribute-value grammars," Computational Linguisics , vol. 23, no. 4, pp. 597-618, 1997.
[2]
{2} K. Athreya and A. Vidyashankar, Branching Processes. Springer-Verlag, 1972.
[3]
{3} A. Barbu and S. C. Zhu, "Generalizing Swendsen-Wang to sampling arbitrary posterior probabilities," IEEE Transactions on PAMI, vol. 27, no. 8, pp. 1239-1253, 2005.
[4]
{4} K. Barnard et al., "Evaluation of localized semantics: Data methodology, and experiments," Tech. Report, CS, U. Arizona, 2005.
[5]
{5} I. Biederman, "Recognition-by-components: A theory of human image understanding," Psychological Review, vol. 94, pp. 115-147, 1987.
[6]
{6} E. Bienenstock, S. Geman, and D. Potter, "Compositionality, MDL priors, and object Recognition," in Advances in Neural Information Processing Systems 9, (M. Mozer, M. Jordan, and T. Petsche, eds.), MIT Press, 1998.
[7]
{7} G. Blanchard and D. Geman, "Sequential testing designs for pattern recognition," Annals of Statistics, vol. 33, pp. 1155-1202, June 2005.
[8]
{8} H. Blum, "Biological shape and visual science," Journal of Theoretical Biology, vol. 38, pp. 207-285, 1973.
[9]
{9} H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, "Composite templates for cloth modeling and sketching," in Proceedings of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006.
[10]
{10} Z. Y. Chi and S. Geman, "Estimation of probabilistic context free grammar," Computational Linguistics, vol. 24, no. 2, pp. 299-305, 1998.
[11]
{11} N. Chomsky, Syntactic Structures. Mouton: The Hague, 1957.
[12]
{12} T. F. Cootes, C. J. Taylor, D. Cooper, and J. Graham, "Active appearance models-their training and applications," Computer Vision and Image Understanding , vol. 61, no. 1, pp. 38-59, 1995.
[13]
{13} M. Crouse, R. Nowak, and R. Baraniuk, "Wavelet based statistical signal processing using hidden Markov models," IEEE Transactions on Signal Processing, vol. 46, pp. 886-902, 1998.
[14]
{14} S. J. Dickinson, A. P. Pentland, and A. Rosenfeld, "From volumes to views: An approach to 3D object recognition," CVGIP: Image Understanding, vol. 55, no. 2, pp. 130-154, 1992.
[15]
{15} D. L. Donoho, M. Vetterli, R. A. DeVore, and I. Daubechie, "Data compression and harmonic analysis," IEEE Transactions on Information Theory, vol. 6, pp. 2435-2476, 1998.
[16]
{16} L. Fei-Fei, R. Fergus, and P. Perona, "Learning generative visual models from few training examples: An incremental Bayesian approach tested on 100 object categories," Workshop on Generative Model Based Vision, 2004.
[17]
{17} L. Fei-Fei, R. Fergus, and P. Perona, "One-Shot learning of object categories," IEEE Transactions on PAMI, vol. 28, no. 4, pp. 594-611, 2006.
[18]
{18} L. Fei-Fei, A. Iyer, C. Koch, and P. Perona, "What do we perceive in a glance of a real-world scene?," Journal of Vision, vol. 7, no. 1, pp. 1-29, 2007.
[19]
{19} M. Fischler and R. Elschlager, "The representation and matching of pictorial structures," IEEE Transactions on Computer, vol. C-22, pp. 67-92, 1973.
[20]
{20} A. Fridman, "Mixed markov models," Proceedings of Natural Academy of Science USA, vol. 100, pp. 8092-8096, 2003.
[21]
{21} J. Friedman, T. Hastie, and R. Tibshirani, "Additive logistic regression: A statistical view of boosting," Annals of Statistics, vol. 38, no. 2, pp. 337-374, 2000.
[22]
{22} K. S. Fu, Syntactic Pattern Recognition and Applications. Prentice-Hall, 1982.
[23]
{23} M. Galun, E. Sharon, R. Basri, and A. Brandt, "Texture segmentation by multiscale aggregation of filter responses and shape elements," Proceedings of ICCV, Nice, pp. 716-723, 2003.
[24]
{24} R. X. Gao, T. F. Wu, N. Sang, and S. C. Zhu, "Bayesian inference for layered representation with mixed Markov random field," in Proceedings of the 6th International Conference on EMMCVPR, Ezhou, China, August 2007.
[25]
{25} R. X. Gao and S. C. Zhu, "From primal sketch to 2.1D sketch," Technical Report, Lotus Hill Institute, 2006.
[26]
{26} S. Geman and M. Johnson, "Probability and statistics in computational linguistics, a brief review," in Int'l Encyc. of the Social and Behavioral Sciences, (N. J. Smelser and P. B. Baltes, eds.), pp. 12075-12082, Pergamon: Oxford, 2002.
[27]
{27} S. Geman, D. Potter, and Z. Chi, "Composition systems," Quarterly of Applied Mathematics, vol. 60, pp. 707-736, 2002.
[28]
{28} U. Grenander, General Pattern Theory. Oxford University Press, 1993.
[29]
{29} G. Griffin, A. Holub, and P. Perona, "The Caltech 256," Technical Report, 2006.
[30]
{30} C. E. Guo, S. C. Zhu, and Y. N. Wu, "Modeling visual patterns by integrating descriptive and generative models," IJCV, vol. 53, no. 1, pp. 5-29, 2003.
[31]
{31} C. E. Guo, S. C. Zhu, and Y. N. Wu, "Primal sketch: Integrating texture and structure," in Proceedings of International Conference on Computer Vision, 2003.
[32]
{32} F. Han and S. C. Zhu, "Bottom-up/top-down image parsing by attribute graph grammar". Proceedings of International Conference on Computer Vision, Beijing, China, 2005. (A long version is under review by PAMI).
[33]
{33} A. Hanson and E. Riseman, "Visions: A computer system for interpreting scenes," in Computer Vision Systems, 1978.
[34]
{34} T. Hong and A. Rosenfeld, "Compact region extraction using weighted pixel linking in a pyramid," IEEE Transactions on PAMI, vol. 6, pp. 222-229, 1984.
[35]
{35} J. Huang, PhD Thesis, Division of Applied Math, Brown University.
[36]
{36} Y. Jin and S. Geman, "Context and hierarchy in a probabilistic image model," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition , New York, June 2006.
[37]
{37} B. Julesz, "Textons, the elements of eexture perception, and their interactions," Nature, vol. 290, pp. 91-97, 1981.
[38]
{38} T. Kadir and M. Brady, "Saliency, scale and image description," International Journal of Computer Vision, 2001.
[39]
{39} G. Kanisza, Organization in Vision. New York: Praeger, 1979.
[40]
{40} Y. Keselman and S. Dickinson, "Generic model abstraction from examples," CVPR, 2001.
[41]
{41} B. Kimia, A. Tannenbaum, and S. Zucker, "Shapes, shocks and deformations I," Interantional Journal of Computer Vision, vol. 15, pp. 189-224, 1995.
[42]
{42} A. B. Lee, K. S. Pedersen, and D. Mumford, "The nonlinear statistics of high-contrast patches in natural images," IJCV, vol. 54, no. 1/2, pp. 83-103, 2003.
[43]
{43} M. Leyton, "A process grammar for shape," Artificial Intelligence, vol. 34, pp. 213-247, 1988.
[44]
{44} L. Lin, S. W. Peng, and S. C. Zhu, "An empirical study of object category recognition: Sequential testing with generalized samples," in Proceedings of International Conference on Computer Vision, Rio de Janeiro, Brazil, October 2007.
[45]
{45} T. Lindeberg, Scale-Space Theory in Computer Vision. Netherlands: Kluwer Academic Publishers, 1994.
[46]
{46} J. S. Liu, Monte Carlo Strategies in Scientific Computing. NY: Springer-Verlag, p. 134, 2001.
[47]
{47} S. Mallat and Z. Zhang, "Matching pursuit in a time-frequency dictionary," IEEE Transactions on Signal Processing, vol. 41, pp. 3397-3415, 1993.
[48]
{48} K. Mark, M. Miller, and U. Grenander, "Constrained stochastic language models," in Image Models (and Their Speech Model cousins), (S. Levinson and L. Shepp, eds.), IMA Volumes in Mathematics and its Applications, 1994.
[49]
{49} D. Marr, Vision. Freeman Publisher, 1983.
[50]
{50} D. Martin, C. Fowlkes, D. Tal, and J. Malik, "A database of human segmented natural images and its application to evaluating segmentation algorithms," ICCV, 2001.
[51]
{51} H. Murase and S. K. Nayar, "Visual learning and recognition of 3-D objects from appearance," International Journal of Computer Vision, vol. 14, pp. 5-24, 1995.
[52]
{52} K. Murphy, A. Torralba, and W. T. Freeman, "Graphical model for recognizing scenes and objects," Proceedings of NIPS, 2003.
[53]
{53} M. Nitzberg, D. Mumford, and T. Shiota, "Filtering, segmentation and depth," Springer Lecture Notes in Computer Science, vol. 662, 1993.
[54]
{54} Y. Ohta, Knowledge-Based Interpretation of Outdoor Natural Color Scenes. Pitman, 1985.
[55]
{55} Y. Ohta, T. Kanade, and T. Sakai, "An analysis system for scenes containing objects with substructures," in Proceedings of 4th International Joint Conference on Pattern Recognition, (Kyoto), pp. 752-754, 1978.
[56]
{56} B. A. Olshausen and D. J. Field, "Emergence of simple-cell receptive field properties by learning a sparse code for natural images," Nature, vol. 381, pp. 607-609, 1996.
[57]
{57} B. Ommer and J. M. Buhmann, "Learning compositional categorization method," in Proceedings of European Conference on Computer Vision, 2006.
[58]
{58} J. Pearl, Heuristics: Intelligent Search Strategies for Computer Problem Solving. Addison-Wesley, 1984.
[59]
{59} J. Porway, Z. Y. Yao, and S. C. Zhu, "Learning an And-Or graph for modeling and recognizing object categories," Technical Report, Department of Statistics, UCLA, 2007.
[60]
{60} J. Rekers and A. Schürr, "A parsing algorithm for context sensitive graph grammars," TR-95-05, Leiden University, 1995.
[61]
{61} M. Riesenhuber and T. Poggio, "Neural mechanisms of object recognition," Current Opinion in Neurobiology, vol. 12, pp. 162-168, 2002.
[62]
{62} B. Russel, A. Torralba, K. Murphy, and W. Freeman, "LabelMe: A database and web-based tool for image annotation," MIT AI Lab Memo AIM-2005-025, September 2005.
[63]
{63} R. E. Schapire, "The boosting approach to machine learning: An overview," MSRI Workshop on nonlinear Estimation and Classification, 2002.
[64]
{64} T. B. Sebastian, P. N. Klein, and B. B. Kimia, "Recognition of shapes by editing their shock graphs," IEEE Transactions on PAMI, vol. 26, no. 5, pp. 550-571, 2004.
[65]
{65} S. M. Sherman and R. W. Guillery, "The role of thalamus in the flow of information to cortex," Philosophical Transactions of Royal Society London (Biology), vol. 357, pp. 1695-1708, 2002.
[66]
{66} K. Shi and S. C. Zhu, "Visual learning with implicit and explicit manifolds," IEEE Conference on CVPR, June 2007.
[67]
{67} K. Siddiqi and B. B. Kimia, "Parts of visual form: Computational aspects," IEEE Transactions on PAMI, vol. 17, no. 3, pp. 239-251, 1995.
[68]
{68} K. Siddiqi, A. Shokoufandeh, S. J. Dickinson, and S. W. Zucker, "Shock graphs and shape matching," IJCV, vol. 35, no. 1, pp. 13-32, 1999.
[69]
{69} E. P. Simoncelli, W. T. Freeman, E. H. Adelson, and D. J. Heeger, "Shiftable multi-scale transforms," IEEE Transactions on Information Theory, vol. 38, no. 2, pp. 587-607, 1992.
[70]
{70} S. Thorpe, D. Fize, and C. Marlot, "Speed of processing in the human visual system," Nature, vol. 381, pp. 520-522, 1996.
[71]
{71} S. Todorovic and N. Ahuja, "Extracting subimages of an unknown category from a set of images," CVPR, 2006.
[72]
{72} Z. W. Tu, X. R. Chen, A. L. Yuille, and S. C. Zhu, "Image parsing: Unifying segmentation, detection, and recognition," International Journal of Computer Vision, vol. 63, no. 2, pp. 113-140, 2005.
[73]
{73} Z. W. Tu and S. C. Zhu, "Image segmentation by data-driven Markov chain Monte Carlo," IEEE Transactions on PAMI, May 2002.
[74]
{74} Z. W. Tu and S. C. Zhu, "Parsing images into regions, curves and curve groups," International Journal of Computer Vision, vol. 69, no. 2, pp. 223-249, 2006.
[75]
{75} M. Turk and A. Pentland, "Eigenfaces for recognition," Journal of Cognitive Neuroscience, vol. 3, p. 1, 1991.
[76]
{76} S. Ullman, "Visual routine," Cognition, vol. 18, pp. 97-157, 1984.
[77]
{77} S. Ullman, E. Sali, and M. Vidal-Naquet, "A fragment-based approach to object representation and classification," in Proceedings of 4th International Workshop on Visual Form, Capri, Italy, 2001.
[78]
{78} P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features," CVPR, pp. 511-518, 2001.
[79]
{79} W. Wang, I. Pollak, T.-S. Wong, C. A. Bouman, M. P. Harper, and J. M. Siskind, "Hierarchical stochastic image grammars for classification and segmentation," IEEE Transactions on Image Processing, vol. 15, no. 10, pp. 3033-3052, 2006.
[80]
{80} Y. Z. Wang, S. Bahrami, and S. C. Zhu, "Perceptual scale space and it applications," in International Conference on Computer Vision, Beijing, China, 2005.
[81]
{81} M. Weber, M. Welling, and P. Perona, "Towards automatic discovery of object categories," IEEE Conference on CVPR, 2000.
[82]
{82} A. P. Witkin, "Scale space filtering," International Joint Conference on AI. Palo Alto: Kaufman, 1983.
[83]
{83} T. F. Wu, G. S. Xia, and S. C. Zhu, "Compositional boosting for computing hierarchical image structures," IEEE Conference on CVPR, June 2007.
[84]
{84} Y. N. Wu, S. C. Zhu, and C. E. Guo, "From information scaling laws of natural images to regimes of statistical models," Quarterly of Applied Mathematics, 2007 (To appear).
[85]
{85} Z. J. Xu, H. Chen, and S. C. Zhu, "A high resolution grammatical model for face representation and sketching," in Proceedings of IEEE Conference on CVPR, San Diego, June 2005.
[86]
{86} Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu, "Recursive top-down/bottom-up algorithm for object recognition," Technical Report, Lotus Hill Research Institute, 2007.
[87]
{87} Z. Y. Yao, X. Yang, and S. C. Zhu, "Introduction to a large scale general purpose groundtruth database: Methodology, annotation tools, and benchmarks," in 6th International Conference on EMMCVPR, Ezhou, China, 2007.
[88]
{88} S. C. Zhu, "Embedding Gestalt laws in Markov random fields," IEEE Transactions on PAMI, vol. 21, no. 11, 1999.
[89]
{89} S. C. Zhu, "Statistical modeling and conceptualization of visual patterns," IEEE Transactions on PAMI, vol. 25, no. 6, pp. 691-712, 2003.
[90]
{90} S. C. Zhu, Y. N. Wu, and D. B. Mumford, "Minimax entropy principle and its applications to texture modeling," Neural Computation, vol. 9, no. 8, pp. 1627-1660, November 1997.
[91]
{91} S. C. Zhu and A. L. Yuille, "Forms: A flexible object recognition and modeling system," Interantional Journal of Computer Vision, vol. 20, pp. 187-212, 1996.
[92]
{92} S. C. Zhu, R. Zhang, and Z. W. Tu, "Integrating top-down/bottom-up for object recognition by data-driven Markov chain Monte Carlo," CVPR, 2000.

Cited By

View all
  • (2024)Foundations of spatial perception for roboticsInternational Journal of Robotics Research10.1177/0278364924122972543:10(1457-1505)Online publication date: 1-Sep-2024
  • (2024)Multi-Derivational Parsing of Vague Languages— The New Paradigm of Syntactic Pattern RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.336724546:8(5677-5691)Online publication date: 1-Aug-2024
  • (2023)Mitigating the effect of incidental correlations on part-based learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668906(63738-63757)Online publication date: 10-Dec-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Foundations and Trends® in Computer Graphics and Vision
Foundations and Trends® in Computer Graphics and Vision  Volume 2, Issue 4
January 2006
104 pages

Publisher

Now Publishers Inc.

Hanover, MA, United States

Publication History

Published: 29 January 2006

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Foundations of spatial perception for roboticsInternational Journal of Robotics Research10.1177/0278364924122972543:10(1457-1505)Online publication date: 1-Sep-2024
  • (2024)Multi-Derivational Parsing of Vague Languages— The New Paradigm of Syntactic Pattern RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.336724546:8(5677-5691)Online publication date: 1-Aug-2024
  • (2023)Mitigating the effect of incidental correlations on part-based learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668906(63738-63757)Online publication date: 10-Dec-2023
  • (2023)UHTP: A User-Aware Hierarchical Task Planning Framework for Communication-Free, Mutually-Adaptive Human-Robot CollaborationACM Transactions on Human-Robot Interaction10.1145/362338713:3(1-27)Online publication date: 22-Sep-2023
  • (2023)Compositional Scene Representation Learning via Reconstruction: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.328618445:10(11540-11560)Online publication date: 14-Jun-2023
  • (2023)Deep learning modelling techniques: current progress, applications, advantages, and challengesArtificial Intelligence Review10.1007/s10462-023-10466-856:11(13521-13617)Online publication date: 17-Apr-2023
  • (2022)Monocular 3D Pose Estimation via Pose Grammar and Data AugmentationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2021.308769544:10_Part_1(6327-6344)Online publication date: 1-Oct-2022
  • (2022)A Flexible and Explainable Vehicle Motion Prediction and Inference Framework Combining Semi-Supervised AOG and ST-LSTMIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2020.301630423:2(840-860)Online publication date: 1-Feb-2022
  • (2022)Scene Reconstruction with Functional Objects for Robot AutonomyInternational Journal of Computer Vision10.1007/s11263-022-01670-0130:12(2940-2961)Online publication date: 1-Dec-2022
  • (2021)Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model Alignments2021 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA48506.2021.9561546(12199-12206)Online publication date: 30-May-2021
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media