Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Learning optimized features for hierarchical models of invariant object recognition

Published: 01 July 2003 Publication History

Abstract

There is an ongoing debate over the capabilities of hierarchical neural feedforward architectures for performing real-world invariant object recognition. Although a variety of hierarchical models exists, appropriate supervised and unsupervised learning methods are still an issue of intense research. We propose a feedforward model for recognition that shares components like weight sharing, pooling stages, and competitive nonlinearities with earlier approaches but focuses on new methods for learning optimal feature-detecting cells in intermediate stages of the hierarchical network. We show that principles of sparse coding, which were previously mostly applied to the initial feature detection stages, can also be employed to obtain optimized intermediate complex features. We suggest a new approach to optimize the learning of sparse features under the constraints of a weight-sharing or convolutional architecture that uses pooling operations to achieve gradual invariance in the feature hierarchy. The approach explicitly enforces symmetry constraints like translation invariance on the feature set. This leads to a dimension reduction in the search space of optimal features and allows determining more efficiently the basis representatives, which achieve a sparse decomposition of the input. We analyze the quality of the learned feature representation by investigating the recognition performance of the resulting hierarchical network on object and face databases. We show that a hierarchy with features learned on a single object data set can also be applied to face recognition without parameter changes and is competitive with other recent machine learning recognition approaches. To investigate the effect of the interplay between sparse coding and processing nonlinearities, we also consider alternative feedforward pooling nonlinearities such as presynaptic maximum selection and sum-of-squares integration. The comparison shows that a combination of strong competitive nonlinearities with sparse coding offers the best recognition performance in the difficult scenario of segmentation-free recognition in cluttered surround. We demonstrate that for both learning and recognition, a precise segmentation of the objects is not necessary.

References

[1]
Amit, Y. (2000). A neural network architecture for visual selection. Neural Computation, 12(5), 1141-1164.
[2]
Barlow, H. B. (1972). Single units and cognition: A neuron doctrine for perceptual psychology. Perception, 1, 371-394.
[3]
Barlow, H. B. (1985). The Twelfth Bartlett Memorial Lecture: The role of single neurons in the psychology of perception. Quart. J. Exp. Psychol., 37, 121-145.
[4]
Bartlett, M. S., & Sejnowski, T. J. (1997). Viewpoint invariant face recognition using independent component analysis and attractor networks. In M. Mozer, M. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems, 9 (pp. 817). Cambridge, MA: MIT Press.
[5]
Behnke, S. (1999). Hebbian learning and competition in the neural abstraction pyramid. In Proc. Int. Joint Conf. on Neural Networks. Washington, DC. Piscataway, NJ: IEEE Neural Networks Council.
[6]
Bell, A. J., & Sejnowski, T. J. (1997). The "independent components" of natural scenes are edge filters. Vision Research, 37, 3327-3338.
[7]
Einhäuser, W., Kayser, C., König, K., & Köörding, K. (2002). Learning the invariance properties of complex cells from their responses to natural stimuli. European Journal of Neuroscience, 15(3), 475-486.
[8]
Feng, J. (1997). Lyapunov functions for neural nets with nondifferentiable input-output characteristics. Neural Computation, 9(1), 43-49.
[9]
Földiák, P. (1991). Learning invariance from transformation sequences. Neural Computation, 3(2), 194-200.
[10]
Freeman, W. T., & Adelson, E. H. (1991) The design and use of steerable filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(9), 891-906.
[11]
Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cyb., 39, 139-202.
[12]
Gallant, J. L., Braun, J., & Van Essen, D. C. (1993). Selectivity for polar, hyperbolic, and cartesian gratings in macaque visual cortex. Science, 259, 100-103.
[13]
Gray, C. M. (1999). The temporal correlation hypothesis of visual feature integration: Still alive and well. Neuron, 24, 31-47.
[14]
Hahnloser, R., Sarpeshkar, R., Mahowald, M. A., Douglas, R. J., & Seung, H. S. (2000). Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature, 405, 947-951.
[15]
Hegde, J., & Van Essen, D. C. (2000). Selectivity for complex shapes in primate visual area V2. Journal of Neuroscience, 20(RC61), 1-6.
[16]
Heisele, B., Poggio, T., & Pontil, M. (2000). Face detection in still gray images (Tech. Rep. AI Memo 1687). Cambridge, MA: MIT.
[17]
Hoyer, P. O., & Hyväärinen, A. (2000). Independent component analysis applied to feature extraction from colour and stereo images. Network, 11(3), 191-210.
[18]
Hoyer, P. O., & Hyväärinen, A. (2002). A multi-layer sparse coding network learns contour coding from natural images. Vision Research, 42(12), 1593-1605.
[19]
Hyvärinen, A., & Hoyer, P. O. (2000). Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces. Neur. Comp., 12(7), 1705-1720.
[20]
Körner, E., Gewaltig, M.-O., Körner, U., Richter, A., & Rodemann, T. (1999). A model of computation in neocortical architecture. Neural Networks, 12(7-8), 989-1005.
[21]
Lawrence, S., Giles, C. L., Tsoi, A. C., & Back, A. D. (1997). Face recognition: A convolutional neural-network approach. IEEE Trans. Neur. Netw., 8(1), 98-113.
[22]
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278-2324.
[23]
Lee, D. L., & Seung, S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788-791.
[24]
Logothetis, N. K., & Pauls, J. (1995). Psychophysical and physiological evidence for viewer-centered object representations in the primate. Cerebral Cortex, 5, 270-288.
[25]
Logothetis, N. K., Pauls, J., Bülthoff, H., & Poggio, T. (1994). Shape representation in the inferior temporal cortex of monkeys. Curr. Biol., 4, 401-414.
[26]
Lovell, D., Downs, T., & Tsoi, A. (1997). An evaluation of the neocognitron. IEEE Trans. Neur. Netw., 8, 1090-1105.
[27]
Malik, J., & Perona, P. (1990, May). Preattentive texture discrimination with early vision mechanisms. J. Opt. Soc. Amer., 5(5), 923-932.
[28]
Mel, B. W. (1997). SEEMORE: Combining color, shape, and texture histogramming in a neurally inspired approach to visual object recognition. Neural Computation, 9(4), 777-804.
[29]
Mel, B, W., & Fiser, J. (2000). Minimizing binding errors using learned conjunctive features. Neural Computation, 12(4), 731-762.
[30]
Nayar, S. K., Nene, S. A., & Murase, H. (1996). Real-time 100 object recognition system. In Proc. of ARPA Image Understanding Workshop. San Mateo, CA: Morgan Kaufmann.
[31]
Olshausen, B. A., & Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37, 3311-3325.
[32]
Perrett, D., & Oram, M. (1993). Neurophysiology of shape processing. Imaging Vis. Comput., 11, 317-333.
[33]
Poggio, T., & Edelman, S. (1990). A network that learns to recognize 3D objects. Nature, 343, 263-266.
[34]
Pontil, M., & Verri, A. (1998). Support vector machines for 3D object recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 20(6), 637-646.
[35]
Riesenhuber, M., & Poggio, T. (1999a). Are cortical models really bound by the "binding problem"? Neuron, 24, 87-93.
[36]
Riesenhuber, M., & Poggio, T. (1999b). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11), 1019-1025.
[37]
Riesenhuber, M., & Poggio, T. (1999c). A note on object class representation and categorical perception (Tech. Rep. 1679). Cambridge, MA: MIT AI Lab.
[38]
Rodemann, T., & Köörner, E. (2001). Two separate processing streams in a cortical-type architecture. Neurocomputing, 38-40, 1541-1547.
[39]
Rolls, E. T., & Milward, T. (2000). A model of invariant object recognition in the visual system: Learning rules, activation functions, lateral inhibition and information-based performance measures. Neural Computation, 12(11), 2547-2572.
[40]
Rolls, E., Webb, B., & Booth, C. (2001). Responses of inferior temporal cortex neurons to objects in natural scenes. Society for Neuroscience Abstracts, 27, 1331.
[41]
Roobaert, D., & Hulle, M. V. (1999). View-based 3D object recognition with support vector machines. In Proc. IEEE Int. Workshop on Neural Networks for Signal Processing (pp. 77-84). Madison, WI. New York: IEEE.
[42]
Roth, D., Yang, M.-H., & Ahuja, N. (2002). Learning to recognize 3D objects. Neural Computation, 14(5), 1071-1104.
[43]
Tanaka, K. (1993). Neuronal mechanisms of object recognition. Science, 262, 685-688.
[44]
Tanaka, K. (1996). Inferotemperal cortex and object vision: Stimulus selectivity and columnar organization. Annual Review of Neuroscience, 19, 109-139.
[45]
Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the visual system. Nature, 381, 520-522.
[46]
Thorpe, S. J., & Gautrais, J. (1997). Rapid visual processing using spike asynchrony. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems, 9 (pp. 901). Cambridge, MA: MIT Press.
[47]
Ullman, S., & Soloviev, S. (1999). Computation of pattern invariance in brainlike structures. Neural Networks, 12(7-8), 1021-1036.
[48]
von der Malsburg, C. (1981). The correlation theory of brain function (Tech. Rep. 81-2). MPI Göttingen.
[49]
von der Malsburg, C. (1999). The what and why of binding: The modeler's perspective. Neuron, 24, 95-104.
[50]
Wallis, G., & Rolls, E. T. (1997). A model of invariant object recognition in the visual system. Progress in Neurobiology, 51, 167-194.
[51]
Wersing, H., Steil, J. J., & Ritter, H. (2001). A competitive layer model for feature binding and sensory segmentation. Neural Computation, 13(2), 357-387.
[52]
Wiskott, L., & Sejnowski, T. (2002). Slow feature analysis: Unsupervised learning of invariances. Neural Computation, 14(4), 715-770.

Cited By

View all
  • (2022)Brain-inspired models for visual object recognition: an overviewArtificial Intelligence Review10.1007/s10462-021-10130-z55:7(5263-5311)Online publication date: 1-Oct-2022
  • (2021)Biologically inspired visual computing: the state of the artFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-020-9001-815:1Online publication date: 1-Feb-2021
  • (2017)Cortex-inspired multilayer hierarchy based object detection system using PHOG descriptors and ensemble classificationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-015-1155-233:1(99-112)Online publication date: 1-Jan-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neural Computation
Neural Computation  Volume 15, Issue 7
July 2003
271 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 July 2003

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Brain-inspired models for visual object recognition: an overviewArtificial Intelligence Review10.1007/s10462-021-10130-z55:7(5263-5311)Online publication date: 1-Oct-2022
  • (2021)Biologically inspired visual computing: the state of the artFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-020-9001-815:1Online publication date: 1-Feb-2021
  • (2017)Cortex-inspired multilayer hierarchy based object detection system using PHOG descriptors and ensemble classificationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-015-1155-233:1(99-112)Online publication date: 1-Jan-2017
  • (2016)Dynamic attention priorsNeurocomputing10.1016/j.neucom.2016.01.036197:C(14-28)Online publication date: 12-Jul-2016
  • (2015)Topological sparse learning of dynamic form patternsNeural Computation10.1162/NECO_a_0067027:1(42-73)Online publication date: 1-Jan-2015
  • (2014)Hierarchical kernel-based rotation and scale invariant similarityPattern Recognition10.1016/j.patcog.2013.10.00847:4(1674-1688)Online publication date: 1-Apr-2014
  • (2012)A life-long learning vector quantization approach for interactive learning of multiple categoriesNeural Networks10.5555/2770423.277046028:C(90-105)Online publication date: 1-Apr-2012
  • (2012)A spiking neural network based cortex-like mechanism and application to facial expression recognitionComputational Intelligence and Neuroscience10.1155/2012/9465892012(19-19)Online publication date: 1-Jan-2012
  • (2012)On cortex mechanism hierarchy model for facial expression recognitionProceedings of the 9th international conference on Advances in Neural Networks - Volume Part II10.1007/978-3-642-31362-2_16(141-148)Online publication date: 11-Jul-2012
  • (2011)Invariant object recognition and pose estimation with slow feature analysisNeural Computation10.5555/2696389.269639323:9(2289-2323)Online publication date: 1-Sep-2011
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media