Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Minimizing Binding Errors Using Learned Conjunctive Features

Published: 01 April 2000 Publication History

Abstract

We have studied some of the design trade-offs governing visual representations based on spatially invariant conjunctive feature detectors, with an emphasis on the susceptibility of such systems to false-positive recognition errors—Malsburg’s classical binding problem. We begin by deriving an analytical model that makes explicit how recognition performance is affected by the number of objects that must be distinguished, the number of features included in the representation, the complexity of individual objects, and the clutter load, that is, the amount of visual material in the field of view in which multiple objects must be simultaneously recognized, independent of pose, and without explicit segmentation. Using the domain of text to model object recognition in cluttered scenes, we show that with corrections for the nonuniform probability and nonindependence of text features, the analytical model achieves good fits to measured recognition rates in simulations involving a wide range of clutter loads, word sizes, and feature counts. We then introduce a greedy algorithm for feature learning, derived from the analytical model, which grows a representation by choosing those conjunctive features that are most likely to distinguish objects from the cluttered backgrounds in which they are embedded. We show that the representations produced by this algorithm are compact, decorrelated, and heavily weighted toward features of low conjunctive order. Our results provide a more quantitative basis for understanding when spatially invariant conjunctive features can support unambiguous perception in multiobject scenes, and lead to several insights regarding the properties of visual representations optimized for specific recognition tasks.

References

[1]
Barron, R., Mucciardi, A., Cook, F., Craig, J., & Barron, A. (1984). Adaptive learning networks: Development and Applications in the United States of algorithms related to GMDH. In S. Farrow (Ed.), Self-organizing methods in modeling. New York: Marcel Dekker.
[2]
Biederman, I. (1995). Visual object recognition. In S. Kosslyn & D. Osherson (Eds.), An invitation to cognitive science (2nd ed.) (pp. 121-165). Cambridge, MA: MIT Press.
[3]
Biederman, I., & Gerhardstein, P. (1995). Viewpoint-dependent mechanisms in visual object recognition: Reply to Tarr and Bülthoff (1995). J. Exp. Psychol. (Human Perception and Performance), 21, 1506-1514.
[4]
Califano, A., & Mohan, R. (1994). Multidimensional indexing for recognizing visual shapes. IEEE Trans. on PAMI, 16, 373-392.
[5]
Charniak, E. (1993). Statistical language learning. Cambridge, MA: MIT Press.
[6]
Douglas, R., & Martin, K. (1998). Neocortex. In G. Shepherd (Ed.), The synaptic organization of the brain (pp. 567-509). Oxford: Oxford University Press.
[7]
Edelman, S., & Duvdevani-Bar, S. (1997). A model of visual recognition and categorization. Phil. Trans. R Soc. Lond. {Biol}, 352, 1191-1202.
[8]
Fahlman, S., & Lebiere, C. (1990). The cascade-correlation learning architecture. In D. Touretzky (Ed.), Advances in neural information processing systems, 2 (pp. 524-532). San Mateo, CA: Morgan Kaufmann.
[9]
Fize, D., Boulanouar, K., Ranjeva, J., Fabre-Thorpe, M., & Thorpe, S. (1998). Brain activity during rapid scene categorization--a study using event-related FMRI. J. Cog. Neurosci., suppl S, 72-72.
[10]
Fukushima, K., Miyake, S., & Ito, T. (1983). Neocognition: A neural network model for a mechanism of visual pattern recognition. IEEE Trans. Sys. Man & Cybernetics, SMC-13, 826-834.
[11]
Gilbert, C. (1983). Microcircuitry of the visual cortex. Ann. Rev. Neurosci, 89, 8366-8370.
[12]
Heller, J., Hertz, J., Kjær, T., & Richmond, B. (1995). Information flow and temporal coding in primate pattern vision. J. Comput. Neurosci., 2, 175-193.
[13]
Hubel, D., & Wiesel, T. (1968). Receptive fields and functional architecture of monkey striate cortex. J. Physiol., 195, 215-243.
[14]
Hummel, J., & Biederman, I. (1992). Dynamic binding in a neural network for shape recognition. Psych. Rev., 99, 480-517.
[15]
Johnston, A., Hill, H., & Carman, N. (1992). Recognizing faces: Effects of lighting direction, inversion, and brightness reversal. Perception, 21, 365-375.
[16]
Jones, E. (1981). Anatomy of cerebral cortex: Columnar input-output relations. In F. Schmitt, F. Worden, G. Adelman, & S. Dennis (Eds.), The organization of cerebral cortex. Cambridge, MA: MIT Press.
[17]
Ku¿era, H., & Francis, W. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.
[18]
Lades, M., Vorbruggen, J., Buhmann, J., Lange, J., Malsburg, C., Wurtz, R., & Komen, W. (1993). Distortion invariant object recognition in the dynamic link architecture. IEEE Trans. Computers, 42, 300-311.
[19]
Lang, G. K., & Seitz, P. (1997). Robust classification of arbitrary object classes based on hierarchical spatial feature-matching. Machine Vision and Applications, 10, 123-135.
[20]
Le Cun, Y., Matan, O., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L., & Baird, H. (1990). Handwritten zip code recognition with multilayer networks. In Proc. of the 10th Int. Conf. on Patt. Rec. Los Alamitos, CA: IEEE Computer Science Press.
[21]
Logothetis, N., & Pauls, J. (1995). Psychophysical and physiological evidence for viewer-centered object representations in the primate. Cerebral Cortex, 3, 270-288.
[22]
Logothetis, N., & Sheinberg, D. (1996). Visual object recognition. Ann. Rev. Neurosci., 19, 577-621.
[23]
Malsburg, C. (1994). The correlation theory of brain function (reprint from 1981). In E. Domany, J. van Hemmen, & K. Schulten (Eds.), Models of neural networks II (pp. 95-119). Berlin: Springer.
[24]
McClelland, J., & Rumelhart, D. (1981). An interactive activation model of context effects in letter perception. Psych. Rev., 88, 375-407.
[25]
Mel, B. W. (1997). SEEMORE: Combining color, shape, and texture histogramming in a neurally-inspired approach to visual object recognition. Neural Computation, 9, 777-804.
[26]
Mel, B. W., Ruderman, D. L., & Archie, K. A. (1998). Translation-invariant orientation tuning in visual "complex" cells could derive from intradendritic computations. J. Neurosci., 17, 4325-4334.
[27]
Mozer, M. (1991). The perception of multiple objects. Cambridge, MA: MIT Press.
[28]
Oram, M., & Perrett, D. (1992). Time course of neural responses discriminating different views of the face and head. J. Neurophysiol., 68(1), 70-84.
[29]
Oram, M. W., & Perrett, D. I. (1994). Modeling visual recognition from neurobiological constraints. Neural Networks, 7(6/7), 945-972.
[30]
Pitts, W., & McCullough, W. (1947). How we know universals: The perception of auditory and visual forms. Bull. Math. Biophys., 9, 127-147.
[31]
Potter, M. (1976). Short-term conceptual memory for pictures. J. Exp. Psychol.: Human Learning and Memory, 2, 509-522.
[32]
Sandon, P., & Urh, L. (1988). An adaptive model for viewpoint-invariant object recognition. In Proc. of the 10th Ann. Conf. of the Cog. Sci. Soc. (pp. 209-215). Hillsdale, NJ: Erlbaum.
[33]
Schiele, B., & Crowley, J. (1996). Probabilistic object recognition using multidimensional receptive field histograms. In Proc. of the 13th Int. Conf. on Patt. Rec. (Vol. 2 pp. 50-54). Los Alamitos, CA: IEEE Computer Society Press.
[34]
Swain, M., & Ballard, D. (1991). Color indexing. Int. J. Computer Vision, 7, 11-32.
[35]
Szentagothai, J. (1977). The neuron network of the cerebral cortex: A functional interpretation. Proc. R. Soc. Lond. B, 201, 219-248.
[36]
Tanaka, K. (1996). Inferotemporal cortex and object vision. Ann. Rev. Neurosci, 19, 109-139.
[37]
Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system. Nature, 381, 520-522.
[38]
Van Essen, D. (1985). Functional organization of primate visual cortex. In A. Peters & E. Jones (Eds.), Cerebral cortex (pp. 259-329). New York: Plenum Publishing.
[39]
Wallis, G., & Rolls, E. T. (1997). Invariant face and object recognition in the visual system. Prog. Neurobiol., 51, 167-194.
[40]
Weng, J., Ahuja, N., & Huang, T. S. (1997). Learning recognition and segmentation using the cresceptron. Int. J. Comp. Vis., 25(2), 109-143.
[41]
Wickelgren, W. (1969). Context-sensitive coding, associative memory, and serial order in (speech) behavior. Psych. Rev., 76, 1-15.
[42]
Yin, R. (1969). Looking at upside down faces. J. Exp. Psychol., 81, 141-145.
[43]
Zemel, R., Mozer, M., & Hinton, G. (1990). TRAFFIC: Recognizing objects using hierarchical reference frame transformations. In D. Touretzky (Ed.), Advances in neural information processing systems, 2 (pp. 266-273). San Mateo, CA: Morgan Kaufmann.

Cited By

View all
  • (2015)Using surfaces and surface relations in an Early Cognitive Vision systemMachine Vision and Applications10.1007/s00138-015-0705-y26:7-8(933-954)Online publication date: 1-Nov-2015
  • (2005)Self-sustained thought processes in a dense associative networkProceedings of the 28th annual German conference on Advances in Artificial Intelligence10.1007/11551263_29(366-379)Online publication date: 11-Sep-2005
  • (2003)Learning optimized features for hierarchical models of invariant object recognitionNeural Computation10.1162/08997660332189180015:7(1559-1588)Online publication date: 1-Jul-2003
  • Show More Cited By
  1. Minimizing Binding Errors Using Learned Conjunctive Features

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Neural Computation
    Neural Computation  Volume 12, Issue 4
    April 2000
    257 pages

    Publisher

    MIT Press

    Cambridge, MA, United States

    Publication History

    Published: 01 April 2000

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2015)Using surfaces and surface relations in an Early Cognitive Vision systemMachine Vision and Applications10.1007/s00138-015-0705-y26:7-8(933-954)Online publication date: 1-Nov-2015
    • (2005)Self-sustained thought processes in a dense associative networkProceedings of the 28th annual German conference on Advances in Artificial Intelligence10.1007/11551263_29(366-379)Online publication date: 11-Sep-2005
    • (2003)Learning optimized features for hierarchical models of invariant object recognitionNeural Computation10.1162/08997660332189180015:7(1559-1588)Online publication date: 1-Jul-2003
    • (2002)Unsupervised Learning of Combination Features for Hierarchical Recognition ModelsProceedings of the International Conference on Artificial Neural Networks10.5555/646259.684303(1225-1230)Online publication date: 28-Aug-2002
    • (2001)A model of the phonological loopProceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic10.5555/2980539.2980551(83-90)Online publication date: 3-Jan-2001
    • (2001)Generalizable relational binding from coarse-coded distributed representationsProceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic10.5555/2980539.2980550(75-82)Online publication date: 3-Jan-2001

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media