Abstract
The human visual system is faced with the computationally difficult problem of achieving object constancy: identifying three-dimensional (3D) objects via two-dimensional (2D) retinal images that may be altered when the same object is seen from different viewpoints1. A widely accepted class of theories holds that we first reconstruct a description of the object's 3D structure from the retinal image, then match this representation to a remembered structural description. If the same structural description is reconstructed from every possible view of an object, object constancy will be obtained. For example, in Biederman's2 oft-cited recognition-by-components (RBC) theory, structural descriptions are composed of sets of simple 3D volumes called geons (Fig. 1), along with the spatial relations in which the geons are placed. Thus a mug is represented in RBC as a noodle attached to the side of a cylinder, and a suitcase as a noodle attached to the top of a brick. The attraction of geons is that, unlike more complex objects, they possess a small set of defining properties that appear in their 2D projections when viewed from almost any position (e.g., all three views of the brick in Fig. 1 include a straight main axis, parallel edges, and a straight cross section). According to the RBC theory, a complex object can therefore be recognized from its constituent geons, which can themselves be recognized from any viewpoint.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
References
Marr, D. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information (Freeman, San Francisco, 1982).
Biederman, I. Psychol. Rev. 94, 115–147 ( 1987).
Biederman, I. & Gerhardstein, P. C. J. Exp. Psychol. Hum. Percept. Perform. 19, 1162–1182 ( 1993).
Hayward, W. G. & Tarr, M. J. J. Exp. Psychol. Hum. Percept. Perform. 23, 1511–1521 ( 1997).
Jolicoeur, P. Mem. Cognit. 13, 289–303 ( 1985).
Bülthoff, H. H. & Edelman, S. Proc. Natl. Acad. Sci. USA 89, 60–64 (1992).
Humphrey, G. K. & Khan, S. C. Can. J. Psychol. 46, 170–190 (1992).
Tarr, M. J. Psychonomic Bull. Rev. 2, 55–82 ( 1995).
Tarr, M. J., Bülthoff, H. H., Zabinski, M. & Blanz, V. Psychol. Sci. 8, 282–289 (1997).
Perrett, D. I. et al. Proc. R. Soc. Lond. B 223, 293–317 (1985).
Logothetis, N. K., Pauls, J. & Poggio, T. Curr. Biol. 5, 552– 563 (1995).
Poggio, T. & Edelman, S. Nature 343, 263–266 (1990).
Loftus, G. R. & Masson, M. E. J. Psychonomic Bull. Rev. 1, 476–490 (1994).
Acknowledgements
This research was supported by an Air Force Office of Scientific Research Grant. We thank Jay Servidea and Jaymz Rosoff for their assistance in running the psychophysical studies.
Author information
Authors and Affiliations
Corresponding author
Supplementary information
220 Yale University undergraduates participated in exchange for course credit or cash payment; numbers of participants in each individual experiment are given in Fig. 2. Line drawings of three viewpoints of ten single geons were scanned into a Macintosh computer from Biederman and Gerhardstein's3 Fig. 12 for use in experiments 1a and 2a. Shaded images of the same 10 geons (Fig. 1; available at ftp://www.cog.brown.edu/pub/tarrlab/stim/geons8.sit.hqx), matching the views used by Biederman and Gerhardstein as closely as possible, were created and rendered using CAD software for use in all other experiments. All stimuli subtended approximately 70 by 70 of visual angle when viewed by participants approximately 60 cm from the computer screen. All experiments were performed on Macintosh computers using RSVP software (http://psych.umb.edu/rsvp).
Each sequential matching trial (experiments 1a-e) consisted of the following sequence of events: blank screen for 1000 ms, fixation cross for 500 ms, object image for 200 ms, mask (consisting of random combinations of features from the line drawings or shaded images) for 750 ms, second object image for 100 ms, mask for 500 ms. The trial timed out 1500 ms later if no response was given. In experiments 1a, b and d, a two-key procedure was used, in which the participant pressed the 'V' key on the computer keyboard if the two images were of the same geon (even if shown in a different viewpoint), or the 'M' key if the two images were of different geons. In experiments 1c and e, a go/no-go procedure was used, in which the participant pressed the space bar if the two objects were the same or allowed the trial to time out otherwise. Participants in experiments 1d and e were informed after each trial of their response time and the accuracy of their response (as shown in Fig. 2, this feedback lowered overall response times, but had little impact on viewpoint effects). For 'same' trials, each of the three views of each geon was presented three times as the first object in a trial and three times as the second object, producing three trials in which the two views were identical, four trials in which they differed by 45°, and two trials in which they differed by 90°. In these and subsequent experiments, trials were presented in a different random order for each participant.
Match-to-sample experiments (2a-c) each consisted of 10 blocks of trials. Each block included an initial presentation of a target object for 20 s, followed by 18 test trials in the following sequence: blank screen for 250 ms, fixation cross for 500 ms, object for 150 ms, mask for 500 ms, time out 1500 ms later if no response was given. Participants memorized the initial target, then pressed the space bar if the test object on subsequent trials matched the target object (even if the viewpoint varied), or let the trial time out if the test and target objects did not match. Participants in experiment 2c received feedback of the same sort as experiments 1d and e. The target object was always shown in the 0° viewpoint. Every test block included three same trials in each of the three viewpoints (0°, 45°, and 90°) of the target object and 9 'different' trials (one for each of the non-target objects).
In experiment 3, participants learned labels (given in Fig. 1) for the 10 geons, then performed trials in which they verbally named test images as quickly as possible. Participants first studied a sheet of paper showing the 10 objects with their names, then performed 20 trials in which each object was shown twice, along with its name, on the computer screen. Four practice blocks of five trials with each object followed, in which participants saw objects without their names and spoke the names. Objects were always shown in the 0° viewpoint during practice trials. Participants then performed two test blocks of six trials with each object, distributed equally between the 0°, 45°, and 90° viewpoints. All practice and test trials consisted of a 500 ms blank screen, 500 ms fixation cross and an object that stayed on the screen until the participant responded or until 2500 ms had elapsed. Response times were recorded via the voice trigger, but accuracy was not recorded. (The experimenter observed the first few participants, and found that accuracy was almost always perfect.)
Rights and permissions
About this article
Cite this article
Tarr, M., Williams, P., Hayward, W. et al. Three-dimensional object recognition is viewpoint dependent. Nat Neurosci 1, 275–277 (1998). https://doi.org/10.1038/1089
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1038/1089
This article is cited by
-
Variation of picture angles and its effect on the Concealed Information Test
Cognitive Research: Principles and Implications (2020)
-
Invariant object recognition is a personalized selection of invariant features in humans, not simply explained by hierarchical feed-forward vision models
Scientific Reports (2017)
-
The face inversion effect in non-human primates revisited - an investigation in chimpanzees (Pan troglodytes)
Scientific Reports (2013)
-
A distributed computational cognitive model for object recognition
Science China Information Sciences (2013)
-
Perception of Fragmented Images of Three-Dimensional Objects as the Observation Angle Changes
Neuroscience and Behavioral Physiology (2010)