article

Minimizing Binding Errors Using Learned Conjunctive Features

Authors:

Bartlett W. Mel,

Jósef W. FiserAuthors Info & Claims

Neural Computation, Volume 12, Issue 4

Pages 731 - 762

https://doi.org/10.1162/089976600300015574

Published: 01 April 2000 Publication History

Abstract

We have studied some of the design trade-offs governing visual representations based on spatially invariant conjunctive feature detectors, with an emphasis on the susceptibility of such systems to false-positive recognition errors—Malsburg’s classical binding problem. We begin by deriving an analytical model that makes explicit how recognition performance is affected by the number of objects that must be distinguished, the number of features included in the representation, the complexity of individual objects, and the clutter load, that is, the amount of visual material in the field of view in which multiple objects must be simultaneously recognized, independent of pose, and without explicit segmentation. Using the domain of text to model object recognition in cluttered scenes, we show that with corrections for the nonuniform probability and nonindependence of text features, the analytical model achieves good fits to measured recognition rates in simulations involving a wide range of clutter loads, word sizes, and feature counts. We then introduce a greedy algorithm for feature learning, derived from the analytical model, which grows a representation by choosing those conjunctive features that are most likely to distinguish objects from the cluttered backgrounds in which they are embedded. We show that the representations produced by this algorithm are compact, decorrelated, and heavily weighted toward features of low conjunctive order. Our results provide a more quantitative basis for understanding when spatially invariant conjunctive features can support unambiguous perception in multiobject scenes, and lead to several insights regarding the properties of visual representations optimized for specific recognition tasks.

References

[1]

Barron, R., Mucciardi, A., Cook, F., Craig, J., & Barron, A. (1984). Adaptive learning networks: Development and Applications in the United States of algorithms related to GMDH. In S. Farrow (Ed.), Self-organizing methods in modeling. New York: Marcel Dekker.

[2]

Biederman, I. (1995). Visual object recognition. In S. Kosslyn & D. Osherson (Eds.), An invitation to cognitive science (2nd ed.) (pp. 121-165). Cambridge, MA: MIT Press.

[3]

Biederman, I., & Gerhardstein, P. (1995). Viewpoint-dependent mechanisms in visual object recognition: Reply to Tarr and Bülthoff (1995). J. Exp. Psychol. (Human Perception and Performance), 21, 1506-1514.

[4]

Califano, A., & Mohan, R. (1994). Multidimensional indexing for recognizing visual shapes. IEEE Trans. on PAMI, 16, 373-392.

Digital Library

[5]

Charniak, E. (1993). Statistical language learning. Cambridge, MA: MIT Press.

[6]

Douglas, R., & Martin, K. (1998). Neocortex. In G. Shepherd (Ed.), The synaptic organization of the brain (pp. 567-509). Oxford: Oxford University Press.

[7]

Edelman, S., & Duvdevani-Bar, S. (1997). A model of visual recognition and categorization. Phil. Trans. R Soc. Lond. {Biol}, 352, 1191-1202.

[8]

Fahlman, S., & Lebiere, C. (1990). The cascade-correlation learning architecture. In D. Touretzky (Ed.), Advances in neural information processing systems, 2 (pp. 524-532). San Mateo, CA: Morgan Kaufmann.

[9]

Fize, D., Boulanouar, K., Ranjeva, J., Fabre-Thorpe, M., & Thorpe, S. (1998). Brain activity during rapid scene categorization--a study using event-related FMRI. J. Cog. Neurosci., suppl S, 72-72.

[10]

Fukushima, K., Miyake, S., & Ito, T. (1983). Neocognition: A neural network model for a mechanism of visual pattern recognition. IEEE Trans. Sys. Man & Cybernetics, SMC-13, 826-834.

[11]

Gilbert, C. (1983). Microcircuitry of the visual cortex. Ann. Rev. Neurosci, 89, 8366-8370.

[12]

Heller, J., Hertz, J., Kjær, T., & Richmond, B. (1995). Information flow and temporal coding in primate pattern vision. J. Comput. Neurosci., 2, 175-193.

[13]

Hubel, D., & Wiesel, T. (1968). Receptive fields and functional architecture of monkey striate cortex. J. Physiol., 195, 215-243.

[14]

Hummel, J., & Biederman, I. (1992). Dynamic binding in a neural network for shape recognition. Psych. Rev., 99, 480-517.

[15]

Johnston, A., Hill, H., & Carman, N. (1992). Recognizing faces: Effects of lighting direction, inversion, and brightness reversal. Perception, 21, 365-375.

[16]

Jones, E. (1981). Anatomy of cerebral cortex: Columnar input-output relations. In F. Schmitt, F. Worden, G. Adelman, & S. Dennis (Eds.), The organization of cerebral cortex. Cambridge, MA: MIT Press.

[17]

Ku¿era, H., & Francis, W. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.

[18]

Lades, M., Vorbruggen, J., Buhmann, J., Lange, J., Malsburg, C., Wurtz, R., & Komen, W. (1993). Distortion invariant object recognition in the dynamic link architecture. IEEE Trans. Computers, 42, 300-311.

Digital Library

[19]

Lang, G. K., & Seitz, P. (1997). Robust classification of arbitrary object classes based on hierarchical spatial feature-matching. Machine Vision and Applications, 10, 123-135.

Digital Library

[20]

Le Cun, Y., Matan, O., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L., & Baird, H. (1990). Handwritten zip code recognition with multilayer networks. In Proc. of the 10th Int. Conf. on Patt. Rec. Los Alamitos, CA: IEEE Computer Science Press.

[21]

Logothetis, N., & Pauls, J. (1995). Psychophysical and physiological evidence for viewer-centered object representations in the primate. Cerebral Cortex, 3, 270-288.

[22]

Logothetis, N., & Sheinberg, D. (1996). Visual object recognition. Ann. Rev. Neurosci., 19, 577-621.

[23]

Malsburg, C. (1994). The correlation theory of brain function (reprint from 1981). In E. Domany, J. van Hemmen, & K. Schulten (Eds.), Models of neural networks II (pp. 95-119). Berlin: Springer.

[24]

McClelland, J., & Rumelhart, D. (1981). An interactive activation model of context effects in letter perception. Psych. Rev., 88, 375-407.

[25]

Mel, B. W. (1997). SEEMORE: Combining color, shape, and texture histogramming in a neurally-inspired approach to visual object recognition. Neural Computation, 9, 777-804.

Digital Library

[26]

Mel, B. W., Ruderman, D. L., & Archie, K. A. (1998). Translation-invariant orientation tuning in visual "complex" cells could derive from intradendritic computations. J. Neurosci., 17, 4325-4334.

[27]

Mozer, M. (1991). The perception of multiple objects. Cambridge, MA: MIT Press.

[28]

Oram, M., & Perrett, D. (1992). Time course of neural responses discriminating different views of the face and head. J. Neurophysiol., 68(1), 70-84.

[29]

Oram, M. W., & Perrett, D. I. (1994). Modeling visual recognition from neurobiological constraints. Neural Networks, 7(6/7), 945-972.

Digital Library

[30]

Pitts, W., & McCullough, W. (1947). How we know universals: The perception of auditory and visual forms. Bull. Math. Biophys., 9, 127-147.

[31]

Potter, M. (1976). Short-term conceptual memory for pictures. J. Exp. Psychol.: Human Learning and Memory, 2, 509-522.

[32]

Sandon, P., & Urh, L. (1988). An adaptive model for viewpoint-invariant object recognition. In Proc. of the 10th Ann. Conf. of the Cog. Sci. Soc. (pp. 209-215). Hillsdale, NJ: Erlbaum.

[33]

Schiele, B., & Crowley, J. (1996). Probabilistic object recognition using multidimensional receptive field histograms. In Proc. of the 13th Int. Conf. on Patt. Rec. (Vol. 2 pp. 50-54). Los Alamitos, CA: IEEE Computer Society Press.

Digital Library

[34]

Swain, M., & Ballard, D. (1991). Color indexing. Int. J. Computer Vision, 7, 11-32.

Digital Library

[35]

Szentagothai, J. (1977). The neuron network of the cerebral cortex: A functional interpretation. Proc. R. Soc. Lond. B, 201, 219-248.

[36]

Tanaka, K. (1996). Inferotemporal cortex and object vision. Ann. Rev. Neurosci, 19, 109-139.

[37]

Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system. Nature, 381, 520-522.

[38]

Van Essen, D. (1985). Functional organization of primate visual cortex. In A. Peters & E. Jones (Eds.), Cerebral cortex (pp. 259-329). New York: Plenum Publishing.

[39]

Wallis, G., & Rolls, E. T. (1997). Invariant face and object recognition in the visual system. Prog. Neurobiol., 51, 167-194.

[40]

Weng, J., Ahuja, N., & Huang, T. S. (1997). Learning recognition and segmentation using the cresceptron. Int. J. Comp. Vis., 25(2), 109-143.

[41]

Wickelgren, W. (1969). Context-sensitive coding, associative memory, and serial order in (speech) behavior. Psych. Rev., 76, 1-15.

[42]

Yin, R. (1969). Looking at upside down faces. J. Exp. Psychol., 81, 141-145.

[43]

Zemel, R., Mozer, M., & Hinton, G. (1990). TRAFFIC: Recognizing objects using hierarchical reference frame transformations. In D. Touretzky (Ed.), Advances in neural information processing systems, 2 (pp. 266-273). San Mateo, CA: Morgan Kaufmann.

Cited By

Kraft DMustafa WPopović MJessen JBuch ASavarimuthu TPugeault NKrüger N(2015)Using surfaces and surface relations in an Early Cognitive Vision systemMachine Vision and Applications10.1007/s00138-015-0705-y26:7-8(933-954)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1007/s00138-015-0705-y
Gros C(2005)Self-sustained thought processes in a dense associative networkProceedings of the 28th annual German conference on Advances in Artificial Intelligence10.1007/11551263_29(366-379)Online publication date: 11-Sep-2005
https://dl.acm.org/doi/10.1007/11551263_29
Wersing HKörner E(2003)Learning optimized features for hierarchical models of invariant object recognitionNeural Computation10.1162/08997660332189180015:7(1559-1588)Online publication date: 1-Jul-2003
https://dl.acm.org/doi/10.1162/089976603321891800
Show More Cited By

Minimizing Binding Errors Using Learned Conjunctive Features
1. Computing methodologies

Recommendations

Minimizing Binding Errors Using Learned Conjunctive Features

We have studied some of the design trade-offs governing visual representations based on spatially invariant conjunctive feature detectors, with an emphasis on the susceptibility of such systems to false-positive recognition errors—Malsburg's classical ...
Video-based facial expression recognition using learned spatiotemporal pyramid sparse coding features

Recently, hand-designed local descriptors like spatiotemporal Gabor filters and VLBP have been successfully applied in video-based facial expression recognition. One major drawback of these methods is that they are hard to generalize to different ...
Fusing dynamic deep learned features and handcrafted features for facial expression recognition
Abstract
The automated recognition of facial expressions has been actively researched due to its wide-ranging applications. The recent advances in deep learning have improved the performance facial expression recognition (FER) methods. In this ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neural Computation

Neural Computation Volume 12, Issue 4

April 2000

257 pages

ISSN:0899-7667

Issue’s Table of Contents

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 April 2000

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
3
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kraft DMustafa WPopović MJessen JBuch ASavarimuthu TPugeault NKrüger N(2015)Using surfaces and surface relations in an Early Cognitive Vision systemMachine Vision and Applications10.1007/s00138-015-0705-y26:7-8(933-954)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1007/s00138-015-0705-y
Gros C(2005)Self-sustained thought processes in a dense associative networkProceedings of the 28th annual German conference on Advances in Artificial Intelligence10.1007/11551263_29(366-379)Online publication date: 11-Sep-2005
https://dl.acm.org/doi/10.1007/11551263_29
Wersing HKörner E(2003)Learning optimized features for hierarchical models of invariant object recognitionNeural Computation10.1162/08997660332189180015:7(1559-1588)Online publication date: 1-Jul-2003
https://dl.acm.org/doi/10.1162/089976603321891800
Wersing HKörner E(2002)Unsupervised Learning of Combination Features for Hierarchical Recognition ModelsProceedings of the International Conference on Artificial Neural Networks10.5555/646259.684303(1225-1230)Online publication date: 28-Aug-2002
https://dl.acm.org/doi/10.5555/646259.684303
O'Reilly RSoto R(2001)A model of the phonological loopProceedings of the 15th International Conference on Neural Information Processing Systems: Natural and Synthetic10.5555/2980539.2980551(83-90)Online publication date: 3-Jan-2001
https://dl.acm.org/doi/10.5555/2980539.2980551
O'Reilly RBusby R(2001)Generalizable relational binding from coarse-coded distributed representationsProceedings of the 15th International Conference on Neural Information Processing Systems: Natural and Synthetic10.5555/2980539.2980550(75-82)Online publication date: 3-Jan-2001
https://dl.acm.org/doi/10.5555/2980539.2980550

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents