Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Scientific Correspondence
  • Published:

Three-dimensional object recognition is viewpoint dependent

Abstract

The human visual system is faced with the computationally difficult problem of achieving object constancy: identifying three-dimensional (3D) objects via two-dimensional (2D) retinal images that may be altered when the same object is seen from different viewpoints1. A widely accepted class of theories holds that we first reconstruct a description of the object's 3D structure from the retinal image, then match this representation to a remembered structural description. If the same structural description is reconstructed from every possible view of an object, object constancy will be obtained. For example, in Biederman's2 oft-cited recognition-by-components (RBC) theory, structural descriptions are composed of sets of simple 3D volumes called geons (Fig. 1), along with the spatial relations in which the geons are placed. Thus a mug is represented in RBC as a noodle attached to the side of a cylinder, and a suitcase as a noodle attached to the top of a brick. The attraction of geons is that, unlike more complex objects, they possess a small set of defining properties that appear in their 2D projections when viewed from almost any position (e.g., all three views of the brick in Fig. 1 include a straight main axis, parallel edges, and a straight cross section). According to the RBC theory, a complex object can therefore be recognized from its constituent geons, which can themselves be recognized from any viewpoint.

The leftmost figure in each row was arbitrarily designated the 0° view; the other two figures represent 45° and 90° rotations of the objects in the depth plane.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 2: Results of the psychophysical experiments.

References

  1. Marr, D. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information (Freeman, San Francisco, 1982).

    Google Scholar 

  2. Biederman, I. Psychol. Rev. 94, 115–147 ( 1987).

    Article  Google Scholar 

  3. Biederman, I. & Gerhardstein, P. C. J. Exp. Psychol. Hum. Percept. Perform. 19, 1162–1182 ( 1993).

    Article  CAS  Google Scholar 

  4. Hayward, W. G. & Tarr, M. J. J. Exp. Psychol. Hum. Percept. Perform. 23, 1511–1521 ( 1997).

    Article  CAS  Google Scholar 

  5. Jolicoeur, P. Mem. Cognit. 13, 289–303 ( 1985).

    Article  CAS  Google Scholar 

  6. Bülthoff, H. H. & Edelman, S. Proc. Natl. Acad. Sci. USA 89, 60–64 (1992).

    Article  Google Scholar 

  7. Humphrey, G. K. & Khan, S. C. Can. J. Psychol. 46, 170–190 (1992).

    Article  CAS  Google Scholar 

  8. Tarr, M. J. Psychonomic Bull. Rev. 2, 55–82 ( 1995).

    Article  CAS  Google Scholar 

  9. Tarr, M. J., Bülthoff, H. H., Zabinski, M. & Blanz, V. Psychol. Sci. 8, 282–289 (1997).

    Article  Google Scholar 

  10. Perrett, D. I. et al. Proc. R. Soc. Lond. B 223, 293–317 (1985).

    Article  CAS  Google Scholar 

  11. Logothetis, N. K., Pauls, J. & Poggio, T. Curr. Biol. 5, 552– 563 (1995).

    Article  CAS  Google Scholar 

  12. Poggio, T. & Edelman, S. Nature 343, 263–266 (1990).

    Article  CAS  Google Scholar 

  13. Loftus, G. R. & Masson, M. E. J. Psychonomic Bull. Rev. 1, 476–490 (1994).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This research was supported by an Air Force Office of Scientific Research Grant. We thank Jay Servidea and Jaymz Rosoff for their assistance in running the psychophysical studies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael J. Tarr.

Supplementary information

220 Yale University undergraduates participated in exchange for course credit or cash payment; numbers of participants in each individual experiment are given in Fig. 2. Line drawings of three viewpoints of ten single geons were scanned into a Macintosh computer from Biederman and Gerhardstein's3 Fig. 12 for use in experiments 1a and 2a. Shaded images of the same 10 geons (Fig. 1; available at ftp://www.cog.brown.edu/pub/tarrlab/stim/geons8.sit.hqx), matching the views used by Biederman and Gerhardstein as closely as possible, were created and rendered using CAD software for use in all other experiments. All stimuli subtended approximately 70 by 70 of visual angle when viewed by participants approximately 60 cm from the computer screen. All experiments were performed on Macintosh computers using RSVP software (http://psych.umb.edu/rsvp).

Each sequential matching trial (experiments 1a-e) consisted of the following sequence of events: blank screen for 1000 ms, fixation cross for 500 ms, object image for 200 ms, mask (consisting of random combinations of features from the line drawings or shaded images) for 750 ms, second object image for 100 ms, mask for 500 ms. The trial timed out 1500 ms later if no response was given. In experiments 1a, b and d, a two-key procedure was used, in which the participant pressed the 'V' key on the computer keyboard if the two images were of the same geon (even if shown in a different viewpoint), or the 'M' key if the two images were of different geons. In experiments 1c and e, a go/no-go procedure was used, in which the participant pressed the space bar if the two objects were the same or allowed the trial to time out otherwise. Participants in experiments 1d and e were informed after each trial of their response time and the accuracy of their response (as shown in Fig. 2, this feedback lowered overall response times, but had little impact on viewpoint effects). For 'same' trials, each of the three views of each geon was presented three times as the first object in a trial and three times as the second object, producing three trials in which the two views were identical, four trials in which they differed by 45°, and two trials in which they differed by 90°. In these and subsequent experiments, trials were presented in a different random order for each participant.

Match-to-sample experiments (2a-c) each consisted of 10 blocks of trials. Each block included an initial presentation of a target object for 20 s, followed by 18 test trials in the following sequence: blank screen for 250 ms, fixation cross for 500 ms, object for 150 ms, mask for 500 ms, time out 1500 ms later if no response was given. Participants memorized the initial target, then pressed the space bar if the test object on subsequent trials matched the target object (even if the viewpoint varied), or let the trial time out if the test and target objects did not match. Participants in experiment 2c received feedback of the same sort as experiments 1d and e. The target object was always shown in the 0° viewpoint. Every test block included three same trials in each of the three viewpoints (0°, 45°, and 90°) of the target object and 9 'different' trials (one for each of the non-target objects).

In experiment 3, participants learned labels (given in Fig. 1) for the 10 geons, then performed trials in which they verbally named test images as quickly as possible. Participants first studied a sheet of paper showing the 10 objects with their names, then performed 20 trials in which each object was shown twice, along with its name, on the computer screen. Four practice blocks of five trials with each object followed, in which participants saw objects without their names and spoke the names. Objects were always shown in the 0° viewpoint during practice trials. Participants then performed two test blocks of six trials with each object, distributed equally between the 0°, 45°, and 90° viewpoints. All practice and test trials consisted of a 500 ms blank screen, 500 ms fixation cross and an object that stayed on the screen until the participant responded or until 2500 ms had elapsed. Response times were recorded via the voice trigger, but accuracy was not recorded. (The experimenter observed the first few participants, and found that accuracy was almost always perfect.)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tarr, M., Williams, P., Hayward, W. et al. Three-dimensional object recognition is viewpoint dependent. Nat Neurosci 1, 275–277 (1998). https://doi.org/10.1038/1089

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/1089

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing