Perception of 3D shape integrates intuitive physics and analysis-by-synthesis

Yildirim, Ilker; Siegel, Max H.; Soltani, Amir A.; Ray Chaudhuri, Shraman; Tenenbaum, Joshua B.

doi:10.1038/s41562-023-01759-7

Article
Published: 23 November 2023

Perception of 3D shape integrates intuitive physics and analysis-by-synthesis

Nature Human Behaviour volume 8, pages 320–335 (2024)Cite this article

1044 Accesses
3 Altmetric
Metrics details

Subjects

Abstract

Many surface cues support three-dimensional shape perception, but humans can sometimes still see shape when these features are missing—such as when an object is covered with a draped cloth. Here we propose a framework for three-dimensional shape perception that explains perception in both typical and atypical cases as analysis-by-synthesis, or inference in a generative model of image formation. The model integrates intuitive physics to explain how shape can be inferred from the deformations it causes to other objects, as in cloth draping. Behavioural and computational studies comparing this account with several alternatives show that it best matches human observers (total n = 174) in both accuracy and response times, and is the only model that correlates significantly with human performance on difficult discriminations. We suggest that bottom-up deep neural network models are not fully adequate accounts of human shape perception, and point to how machine vision systems might achieve more human-like robustness.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Seeing 3D shape through a cloth.**

**Fig. 2: Matching a target shape to one of two unoccluded test objects.**

**Fig. 4: PbAS explains how human accuracy increases with longer stimulus presentation time.**

**Fig. 5: Fine-grained analysis of human accuracy at the level of individual trials in the unlimited time condition.**

**Fig. 6: Trial-level response time comparisons.**

**Fig. 7: Seeing the shape of a single cloth-draped object, without the aid of unoccluded candidates (cf. Fig. 2a,b).**

Qualitative similarities and differences in visual object representations between brains and deep networks

Article Open access 25 March 2021

Perception is rich and probabilistic

Article Open access 01 August 2022

The relationship between shape perception accuracy and drawing ability

Article Open access 01 September 2022

Data availability

Our behavioural data are publicly available at https://github.com/CNCLgithub/PbAS-model-human-comparisons. The experimental stimuli underlying the object-under-cloth task are publicly available at https://github.com/CNCLgithub/intuitive-physics-3d-shape-perception-stimuli.

Code availability

Code implementing the PbAS model, scripts for replicating model simulations, and a container for full reproducibility are publicly available at https://github.com/CNCLgithub/PbAS. Our custom Python scripts for data analysis are publicly available at https://github.com/CNCLgithub/PbAS-model-human-comparisons.

References

Bulthoff, H. Shape from X: psychophysics and computation. Comput. Models Vis. Process. 305–330 (1991).
Yildirim, I., Siegel, M. H. & Tenenbaum, J. B. Perceiving fully occluded objects via physical simulation. In Proc. 38th Annual Conference of the Cognitive Science Society 1265–1271 (Cognitive Science Society, 2016).
Phillips, F. & Fleming, R. W. The Veiled Virgin illustrates visual segmentation of shape by cause. Proc. Natl Acad. Sci. USA 117, 11735–11743 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Little, P. C. & Firestone, C. Physically implied surfaces. Psychol. Sci. 32, 799–808 (2021).
Article PubMed Google Scholar
Wong, K. W., Bi, W., Soltani, A. A., Yildirim, I. & Scholl, B. J. Seeing soft materials draped over objects: a case study of intuitive physics in perception, attention, and memory. Psychol. Sci. 34, 111–119 (2022).
Article PubMed Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article ADS CAS PubMed Google Scholar
Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012).
Google Scholar
Hong, H., Yamins, D. L., Majaj, N. J. & DiCarlo, J. J. Explicit information for category-orthogonal object properties increases along the ventral stream. Nat. Neurosci. 19, 613 (2016).
Article CAS PubMed Google Scholar
Yuille, A. & Kersten, D. Vision as Bayesian inference: analysis by synthesis? Trends Cogn. Sci. 10, 301–308 (2006).
Article PubMed Google Scholar
Mumford, D. in Large-Scale Neuronal Theories of the Brain (eds Koch, C. & and Davis, J.) 125–152 (MIT Press, 1994).
Liu, Z., Knill, D. C. & Kersten, D. Object classification for human and ideal observers. Vis. Res. 35, 549–568 (1995).
Article CAS PubMed Google Scholar
Destler, N., Singh, M. & Feldman, J. Skeleton-based shape similarity. Psychol. Rev. https://doi.org/10.1037/rev0000412 (2023).
Erdogan, G. & Jacobs, R. A. Visual shape perception as Bayesian inference of 3D object-centered shape representations. Psychol. Rev. 124, 740 (2017).
Article PubMed Google Scholar
Lee, M. J. & DiCarlo, J. J. An empirical assay of view-invariant object learning in humans and comparison with baseline image-computable models. Preprint at bioRxiv (2023).
Chandra, K., Li, T.-M., Tenenbaum, J. & Ragan-Kelley, J. Designing perceptual puzzles by differentiating probabilistic programs. In ACM SIGGRAPH 2022 Conference Proceedings 1–9 (ACM, 2022).
Chang, A. X. et al. ShapeNet: an information-rich 3D model repository. Preprint at https://doi.org/10.48550/arXiv.1512.03012 (2015).
Macklin, M., Müller, M., Chentanez, N. & Kim, T.-Y. Unified particle physics for real-time applications. ACM Trans. Graph. 33, 1–12 (2014).
Article Google Scholar
Koch, E., Baig, F. & Zaidi, Q. Picture perception reveals mental geometry of 3D scene inferences. Proc. Natl Acad. Sci. USA 115, 7807–7812 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Snoek, J., Larochelle, H. & Adams, R. P. Practical Bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 25, 2951–2959 (2012).
Google Scholar
Cranmer, K., Brehmer, J. & Louppe, G. The frontier of simulation-based inference. Proc. Natl Acad. Sci. USA 117, 30055–30062 (2020).
Article ADS MathSciNet CAS PubMed PubMed Central Google Scholar
Hamrick, J. B. & Griffiths, T. L. Mental rotation as Bayesian quadrature. In NIPS 2013 Workshop on Bayesian Optimization in Theory and Practice (2013).
Wang, A., Mei, S., Yuille, A. L. & Kortylewski, A. Neural view synthesis and matching for semi-supervised few-shot learning of 3D pose. Adv. Neural Inf. Process. Syst. 34, 7207–7219 (2021).
Google Scholar
Järvenpää, M., Gutmann, M. U., Pleska, A., Vehtari, A. & Marttinen, P. Efficient acquisition rules for model-based approximate Bayesian computation. Bayesian Anal. 14, 595–622 (2019).
Article MathSciNet Google Scholar
Kandasamy, K., Schneider, J. & Póczos, B. Bayesian active learning for posterior estimation. In 24th International Joint Conference on Artificial Intelligence 3605–3611 (PMLR, 2015).
Tamura, R. & Hukushima, K. Bayesian optimization for computationally extensive probability distributions. PLoS ONE 13, e0193785 (2018).
Article PubMed PubMed Central Google Scholar
Schrimpf, M. et al. Brain-score: which artificial neural network for object recognition is most brain-like? Preprint at bioRxiv https://doi.org/10.1101/407007 (2018).
Yamins, D. L. & DiCarlo, J. J. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci. 19, 356 (2016).
Article CAS PubMed Google Scholar
Deng, J. et al. Imagenet: a large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. Preprint at https://doi.org/10.48550/arXiv.1409.1556(2014).
Usher, M. & McClelland, J. L. The time course of perceptual choice: the leaky, competing accumulator model. Psychol. Rev. 108, 550 (2001).
Article CAS PubMed Google Scholar
Echeveste, R., Aitchison, L., Hennequin, G. & Lengyel, M. Cortical-like dynamics in recurrent circuits optimized for sampling-based probabilistic inference. Nat. Neurosci. 23, 1138–1149 (2020).
Article CAS PubMed PubMed Central Google Scholar
Yildirim, I., Belledonne, M., Freiwald, W. & Tenenbaum, J. Efficient inverse graphics in biological face processing. Sci. Adv. 6, eaax5979 (2020).
Article ADS PubMed PubMed Central Google Scholar
DiCarlo, J. J., Zoccolan, D. & Rust, N. C. How does the brain solve visual object recognition? Neuron 73, 415–434 (2012).
Article CAS PubMed PubMed Central Google Scholar
Yamins, D. L. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl Acad. Sci. USA 111, 8619–8624 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning 1597–1607 (PMLR, 2020).
Radford, A. et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning 8748–8763 (PMLR, 2021).
Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. Preprint at https://doi.org/10.48550/arXiv.2010.11929 (2020).
Konkle, T. & Alvarez, G. A. A self-supervised domain-general learning framework for human ventral stream representation. Nat. Commun. 13, 1–12 (2022).
Article Google Scholar
Geirhos, R. Partial success in closing the gap between human and machine vision. Adv. Neural Inf. Process. Syst. 34, 23885–23899 (2021).
Google Scholar
Gilbert, C. D. In Principles of Neural Science 5th edn (eds. Kandel, E. R. et al.) 556–576 (McGraw-Hill, 2013).
Baker, N., Lu, H., Erlikhman, G. & Kellman, P. J. Deep convolutional networks do not classify based on global object shape. PLoS Comput. Biol. 14, e1006613 (2018).
Article ADS PubMed PubMed Central Google Scholar
Feldman, J. & Singh, M. Information along contours and object boundaries. Psychol. Rev. 112, 243 (2005).
Article PubMed Google Scholar
Zucker, S. W. On qualitative shape inferences: a journey from geometry to topology. Preprint at https://doi.org/10.48550/arXiv.2008.08622 (2020).
Kunsberg, B. & Zucker, S. W. Critical contours: an invariant linking image flow with salient surface organization. SIAM J. Imaging Sci. 11, 1849–1877 (2018).
Article MathSciNet Google Scholar
Grill-Spector, K. & Kanwisher, N. Visual recognition: as soon as you know it is there, you know what it is. Psychol. Sci. 16, 152–160 (2005).
Article PubMed Google Scholar
Ullman, S. in Readings in Computer Vision (eds Fischler, M. A. & Firschein, O.) 298–328 (Elsevier, 1987).
Shepard, R. N. & Metzler, J. Mental rotation of three-dimensional objects. Science 171, 701–703 (1971).
Article ADS CAS PubMed Google Scholar
Shams, L. & Beierholm, U. Bayesian causal inference: a unifying neuroscience theory. Neurosci. Biobehav. Rev. 137, 104619 (2022).
Article PubMed Google Scholar
Fischer, J., Mikhael, J. G., Tenenbaum, J. B. & Kanwisher, N. Functional neuroanatomy of intuitive physical inference. Proc. Natl Acad. Sci. USA 113, E5072–E5081 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Nash, C., Ganin, Y., Eslami, S. A. & Battaglia, P. Polygen: an autoregressive generative model of 3d meshes. In International Conference on Machine Learning (7220–7229) (2020).
Pfaff, T., Fortunato, M., Sanchez-Gonzalez, A. & Battaglia, P. W. Learning mesh-based simulation with graph networks. Preprint at https://doi.org/10.48550/arXiv.2010.03409 (2021).
Mrowca, D. et al. Flexible neural representation for physics prediction. In Proc. 32nd International Conference on Information Processing Systems 8813–8824 (2018).
Smith, K. et al. Modeling expectation violation in intuitive physics with coarse probabilistic object representations. Adv. Neural Inf. Process. Syst. 32, 8983–8993 (2019).
Google Scholar
Piloto, L. S., Weinstein, A., Battaglia, P. & Botvinick, M. Intuitive physics learning in a deep-learning model inspired by developmental psychology. Nat. Hum. Behav. 6, 1257–1267 (2022).
Article PubMed PubMed Central Google Scholar
Sanborn, A. N., Mansinghka, V. K. & Griffiths, T. L. Reconciling intuitive physics and Newtonian mechanics for colliding objects. Psychol. Rev. 120, 411 (2013).
Article PubMed Google Scholar
Wu, J., Yildirim, I., Lim, J. J., Freeman, B. & Tenenbaum, J. Galileo: perceiving physical object properties by integrating a physics engine with deep learning. Adv. Neural Inf. Process. Syst. 28, 127–135 (2015).
Google Scholar
Schwettmann, S., Tenenbaum, J. B. & Kanwisher, N. Invariant representations of mass in the human brain. eLife 8, e46619 (2019).
Article PubMed PubMed Central Google Scholar
Bi, W., Shah, A. D., Wong, K. W., Scholl, B. & Yildirim, I. Perception of soft materials relies on physics-based object representations: Behavioral and computational evidence. Preprint at bioRxiv https://doi.org/10.1101/2021.05.12.443806 (2021).
Paulun, V. C., Schmidt, F., van Assen, J. J. R. & Fleming, R. W. Shape, motion, and optical cues to stiffness of elastic objects. J. Vis. 17, 20–20 (2017).
Article PubMed Google Scholar
Paulun, V. C. & Fleming, R. W. Visually inferring elasticity from the motion trajectory of bouncing cubes. J. Vis. 20, 6–6 (2020).
Article PubMed PubMed Central Google Scholar
Bates, C. J., Yildirim, I., Tenenbaum, J. B. & Battaglia, P. Modeling human intuitions about liquid flow with particle-based simulation. PLoS Comput. Biol. 15, e1007210 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Kubricht, J., Zhu, Y., Jiang, C., Terzopoulos, D., Zhu, S. C. & Lu, H. Consistent probabilistic simulation underlying human judgment in substance dynamics. In Proc. 39th Annual Conference of the Cognitive Science Society 3426–3431 (Cognitive Science Society, 2017).
Van Assen, J. J. R., Barla, P. & Fleming, R. W. Visual features in the perception of liquids. Curr. Biol. 28, 452–458 (2018).
Article PubMed PubMed Central Google Scholar
Chen, Y.-C. & Scholl, B. J. The perception of history: seeing causal history in static shapes induces illusory motion perception. Psychol. Sci. 27, 923–930 (2016).
Article PubMed Google Scholar
Fleming, R. W. & Schmidt, F. Getting “fumpered”: classifying objects by what has been done to them. J. Vis. 19, 15–15 (2019).
Article PubMed Google Scholar
Schmidt, F., Phillips, F. & Fleming, R. W. Visual perception of shape-transforming processes: ‘shape scission’. Cognition 189, 167–180 (2019).
Article PubMed Google Scholar
Blender Online Community Blender—a 3D modelling and rendering package. Blender Institute http://www.blender.org (2015).
Geirhos, R. et al. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. Preprint at https://doi.org/10.48550/arXiv.1811.12231 (2018).
Rasmussen, C. E. & Williams, C. K. Gaussian Processes for Machine Learning (MIT Press, 2006).
Nogueira, F. Bayesian Optimization: open source constrained global optimization tool for Python. GitHub https://github.com/fmfn/BayesianOptimization (2014).
GPy: a Gaussian process framework in Python. GitHub http://github.com/SheffieldML/GPy (2012).
Schultz, M. & Joachims, T. Learning a distance metric from relative comparisons. Adv. Neural Inf. Process. Syst. 16, 41–48 (2003).
Google Scholar
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://doi.org/10.48550/arXiv.1412.6980 (2015).
Reddi, S. J., Kale, S. & Kumar, S. On the convergence of Adam and beyond. Preprint at https://doi.org/10.48550/arXiv.1904.09237 (2018).

Download references

Acknowledgements

This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216; ONR MURI N00014-13-1-0333 (to J.B.T.); a grant from Toyota Research Institute (to J.B.T.); and a grant from Mitsubishi MELCO (to J.B.T.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. A high performance computing cluster (OpenMind) was provided by the McGovern Institute for Brain Research. We thank K. Smith, B. Egger, K. Allen, G. Erdogan, M. Tenenbaum, N. Kanwisher and V. Paulun for their comments on a previous version of this manuscript.

Author information

These authors contributed equally: Ilker Yildirim, Max H. Siegel, Amir A. Soltani.

Authors and Affiliations

Department of Psychology, Yale University, New Haven, CT, USA
Ilker Yildirim
Department of Statistics & Data Science, Yale University, New Haven, CT, USA
Ilker Yildirim
Wu-Tsai Institute, Yale University, New Haven, CT, USA
Ilker Yildirim
Department of Brain & Cognitive Sciences, MIT, Cambridge, MA, USA
Max H. Siegel, Amir A. Soltani & Joshua B. Tenenbaum
The Center for Brains, Minds, and Machines, MIT, Cambridge, MA, USA
Max H. Siegel, Amir A. Soltani & Joshua B. Tenenbaum
Department of Electrical Engineering & Computer Science, MIT, Cambridge, MA, USA
Shraman Ray Chaudhuri

Authors

Ilker Yildirim
View author publications
You can also search for this author in PubMed Google Scholar
Max H. Siegel
View author publications
You can also search for this author in PubMed Google Scholar
Amir A. Soltani
View author publications
You can also search for this author in PubMed Google Scholar
Shraman Ray Chaudhuri
View author publications
You can also search for this author in PubMed Google Scholar
Joshua B. Tenenbaum
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

I.Y., M.H.S. and J.B.T. conceived and designed the study. I.Y. analysed the data. I.Y., M.H.S., A.A.S. and J.B.T. designed stimuli and experiments, and wrote and edited the manuscript. All authors contributed to the models.

Corresponding authors

Correspondence to Ilker Yildirim, Max H. Siegel or Joshua B. Tenenbaum.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Yaniv Morgenstern, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–9.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yildirim, I., Siegel, M.H., Soltani, A.A. et al. Perception of 3D shape integrates intuitive physics and analysis-by-synthesis. Nat Hum Behav 8, 320–335 (2024). https://doi.org/10.1038/s41562-023-01759-7

Download citation

Received: 09 January 2023
Accepted: 12 October 2023
Published: 23 November 2023
Issue Date: February 2024
DOI: https://doi.org/10.1038/s41562-023-01759-7