Three-dimensional object recognition is viewpoint dependent

Tarr, Michael J.; Williams, Pepper; Hayward, William G.; Gauthier, Isabel

doi:10.1038/1089

Scientific Correspondence
Published: August 1998

Three-dimensional object recognition is viewpoint dependent

Michael J. Tarr¹,
Pepper Williams²,
William G. Hayward³ &
…
Isabel Gauthier⁴

Nature Neuroscience volume 1, pages 275–277 (1998)Cite this article

1761 Accesses
174 Citations
3 Altmetric
Metrics details

Abstract

The human visual system is faced with the computationally difficult problem of achieving object constancy: identifying three-dimensional (3D) objects via two-dimensional (2D) retinal images that may be altered when the same object is seen from different viewpoints¹. A widely accepted class of theories holds that we first reconstruct a description of the object's 3D structure from the retinal image, then match this representation to a remembered structural description. If the same structural description is reconstructed from every possible view of an object, object constancy will be obtained. For example, in Biederman's² oft-cited recognition-by-components (RBC) theory, structural descriptions are composed of sets of simple 3D volumes called geons (Fig. 1), along with the spatial relations in which the geons are placed. Thus a mug is represented in RBC as a noodle attached to the side of a cylinder, and a suitcase as a noodle attached to the top of a brick. The attraction of geons is that, unlike more complex objects, they possess a small set of defining properties that appear in their 2D projections when viewed from almost any position (e.g., all three views of the brick in Fig. 1 include a straight main axis, parallel edges, and a straight cross section). According to the RBC theory, a complex object can therefore be recognized from its constituent geons, which can themselves be recognized from any viewpoint.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 2: Results of the psychophysical experiments.**

References

Marr, D. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information (Freeman, San Francisco, 1982).
Google Scholar
Biederman, I. Psychol. Rev. 94, 115–147 ( 1987).
Article Google Scholar
Biederman, I. & Gerhardstein, P. C. J. Exp. Psychol. Hum. Percept. Perform. 19, 1162–1182 ( 1993).
Article CAS Google Scholar
Hayward, W. G. & Tarr, M. J. J. Exp. Psychol. Hum. Percept. Perform. 23, 1511–1521 ( 1997).
Article CAS Google Scholar
Jolicoeur, P. Mem. Cognit. 13, 289–303 ( 1985).
Article CAS Google Scholar
Bülthoff, H. H. & Edelman, S. Proc. Natl. Acad. Sci. USA 89, 60–64 (1992).
Article Google Scholar
Humphrey, G. K. & Khan, S. C. Can. J. Psychol. 46, 170–190 (1992).
Article CAS Google Scholar
Tarr, M. J. Psychonomic Bull. Rev. 2, 55–82 ( 1995).
Article CAS Google Scholar
Tarr, M. J., Bülthoff, H. H., Zabinski, M. & Blanz, V. Psychol. Sci. 8, 282–289 (1997).
Article Google Scholar
Perrett, D. I. et al. Proc. R. Soc. Lond. B 223, 293–317 (1985).
Article CAS Google Scholar
Logothetis, N. K., Pauls, J. & Poggio, T. Curr. Biol. 5, 552– 563 (1995).
Article CAS Google Scholar
Poggio, T. & Edelman, S. Nature 343, 263–266 (1990).
Article CAS Google Scholar
Loftus, G. R. & Masson, M. E. J. Psychonomic Bull. Rev. 1, 476–490 (1994).
Article CAS Google Scholar

Download references

Acknowledgements

This research was supported by an Air Force Office of Scientific Research Grant. We thank Jay Servidea and Jaymz Rosoff for their assistance in running the psychophysical studies.

Author information

Authors and Affiliations

Department of Cognitive and Linguistic Sciences, Brown University, Box 1978, Providence, 02912, Rhode Island, USA
Michael J. Tarr
Department of Psychology, University of Massachusetts, 100 Morrissey Boulevard, Boston, 02125-3393, Massachusetts, USA
Pepper Williams
Department of Psychology, University of Wollongong, Northfields Avenue, Wollongong, 2552, NSW , Australia
William G. Hayward
Department of Psychology, Yale University, PO Box 208205, New Haven, 06520-8205 , Connecticut, USA
Isabel Gauthier

Authors

Michael J. Tarr
View author publications
You can also search for this author in PubMed Google Scholar
Pepper Williams
View author publications
You can also search for this author in PubMed Google Scholar
William G. Hayward
View author publications
You can also search for this author in PubMed Google Scholar
Isabel Gauthier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael J. Tarr.

Supplementary information

220 Yale University undergraduates participated in exchange for course credit or cash payment; numbers of participants in each individual experiment are given in Fig. 2. Line drawings of three viewpoints of ten single geons were scanned into a Macintosh computer from Biederman and Gerhardstein's³ Fig. 12 for use in experiments 1a and 2a. Shaded images of the same 10 geons (Fig. 1; available at ftp://www.cog.brown.edu/pub/tarrlab/stim/geons8.sit.hqx), matching the views used by Biederman and Gerhardstein as closely as possible, were created and rendered using CAD software for use in all other experiments. All stimuli subtended approximately 70 by 70 of visual angle when viewed by participants approximately 60 cm from the computer screen. All experiments were performed on Macintosh computers using RSVP software (http://psych.umb.edu/rsvp).

Each sequential matching trial (experiments 1a-e) consisted of the following sequence of events: blank screen for 1000 ms, fixation cross for 500 ms, object image for 200 ms, mask (consisting of random combinations of features from the line drawings or shaded images) for 750 ms, second object image for 100 ms, mask for 500 ms. The trial timed out 1500 ms later if no response was given. In experiments 1a, b and d, a two-key procedure was used, in which the participant pressed the 'V' key on the computer keyboard if the two images were of the same geon (even if shown in a different viewpoint), or the 'M' key if the two images were of different geons. In experiments 1c and e, a go/no-go procedure was used, in which the participant pressed the space bar if the two objects were the same or allowed the trial to time out otherwise. Participants in experiments 1d and e were informed after each trial of their response time and the accuracy of their response (as shown in Fig. 2, this feedback lowered overall response times, but had little impact on viewpoint effects). For 'same' trials, each of the three views of each geon was presented three times as the first object in a trial and three times as the second object, producing three trials in which the two views were identical, four trials in which they differed by 45°, and two trials in which they differed by 90°. In these and subsequent experiments, trials were presented in a different random order for each participant.

Match-to-sample experiments (2a-c) each consisted of 10 blocks of trials. Each block included an initial presentation of a target object for 20 s, followed by 18 test trials in the following sequence: blank screen for 250 ms, fixation cross for 500 ms, object for 150 ms, mask for 500 ms, time out 1500 ms later if no response was given. Participants memorized the initial target, then pressed the space bar if the test object on subsequent trials matched the target object (even if the viewpoint varied), or let the trial time out if the test and target objects did not match. Participants in experiment 2c received feedback of the same sort as experiments 1d and e. The target object was always shown in the 0° viewpoint. Every test block included three same trials in each of the three viewpoints (0°, 45°, and 90°) of the target object and 9 'different' trials (one for each of the non-target objects).

In experiment 3, participants learned labels (given in Fig. 1) for the 10 geons, then performed trials in which they verbally named test images as quickly as possible. Participants first studied a sheet of paper showing the 10 objects with their names, then performed 20 trials in which each object was shown twice, along with its name, on the computer screen. Four practice blocks of five trials with each object followed, in which participants saw objects without their names and spoke the names. Objects were always shown in the 0° viewpoint during practice trials. Participants then performed two test blocks of six trials with each object, distributed equally between the 0°, 45°, and 90° viewpoints. All practice and test trials consisted of a 500 ms blank screen, 500 ms fixation cross and an object that stayed on the screen until the participant responded or until 2500 ms had elapsed. Response times were recorded via the voice trigger, but accuracy was not recorded. (The experimenter observed the first few participants, and found that accuracy was almost always perfect.)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tarr, M., Williams, P., Hayward, W. et al. Three-dimensional object recognition is viewpoint dependent. Nat Neurosci 1, 275–277 (1998). https://doi.org/10.1038/1089

Download citation

Received: 04 May 1998
Accepted: 04 June 1998
Issue Date: August 1998
DOI: https://doi.org/10.1038/1089

This article is cited by

Variation of picture angles and its effect on the Concealed Information Test
- Ann Hsu
- Yu-Hui Lo
- Philip Tseng
Cognitive Research: Principles and Implications (2020)
Invariant object recognition is a personalized selection of invariant features in humans, not simply explained by hierarchical feed-forward vision models
- Hamid Karimi-Rouzbahani
- Nasour Bagheri
- Reza Ebrahimpour
Scientific Reports (2017)
The face inversion effect in non-human primates revisited - an investigation in chimpanzees (Pan troglodytes)
- Christoph D. Dahl
- Malte J. Rasch
- Ikuma Adachi
Scientific Reports (2013)
A distributed computational cognitive model for object recognition
- YongJin Liu
- QiuFang Fu
- XiaoLan Fu
Science China Information Sciences (2013)
Perception of Fragmented Images of Three-Dimensional Objects as the Observation Angle Changes
- V. N. Chikhman
- Yu. E. Shelepin
- P. Pacemore
Neuroscience and Behavioral Physiology (2010)

Three-dimensional object recognition is viewpoint dependent

Abstract

Access options

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Supplementary information

Rights and permissions

About this article

Cite this article

This article is cited by

Variation of picture angles and its effect on the Concealed Information Test

Invariant object recognition is a personalized selection of invariant features in humans, not simply explained by hierarchical feed-forward vision models

The face inversion effect in non-human primates revisited - an investigation in chimpanzees (Pan troglodytes)

A distributed computational cognitive model for object recognition

Perception of Fragmented Images of Three-Dimensional Objects as the Observation Angle Changes

Search

Quick links

Abstract

Access options

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Variation of picture angles and its effect on the Concealed Information Test

Invariant object recognition is a personalized selection of invariant features in humans, not simply explained by hierarchical feed-forward vision models

The face inversion effect in non-human primates revisited - an investigation in chimpanzees (Pan troglodytes)

A distributed computational cognitive model for object recognition

Perception of Fragmented Images of Three-Dimensional Objects as the Observation Angle Changes

Search

Quick links