Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Perception of 3D shape integrates intuitive physics and analysis-by-synthesis

Abstract

Many surface cues support three-dimensional shape perception, but humans can sometimes still see shape when these features are missing—such as when an object is covered with a draped cloth. Here we propose a framework for three-dimensional shape perception that explains perception in both typical and atypical cases as analysis-by-synthesis, or inference in a generative model of image formation. The model integrates intuitive physics to explain how shape can be inferred from the deformations it causes to other objects, as in cloth draping. Behavioural and computational studies comparing this account with several alternatives show that it best matches human observers (total n = 174) in both accuracy and response times, and is the only model that correlates significantly with human performance on difficult discriminations. We suggest that bottom-up deep neural network models are not fully adequate accounts of human shape perception, and point to how machine vision systems might achieve more human-like robustness.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Seeing 3D shape through a cloth.
Fig. 2: Matching a target shape to one of two unoccluded test objects.
Fig. 3: Overview of PbAS.
Fig. 4: PbAS explains how human accuracy increases with longer stimulus presentation time.
Fig. 5: Fine-grained analysis of human accuracy at the level of individual trials in the unlimited time condition.
Fig. 6: Trial-level response time comparisons.
Fig. 7: Seeing the shape of a single cloth-draped object, without the aid of unoccluded candidates (cf. Fig. 2a,b).

Similar content being viewed by others

Data availability

Our behavioural data are publicly available at https://github.com/CNCLgithub/PbAS-model-human-comparisons. The experimental stimuli underlying the object-under-cloth task are publicly available at https://github.com/CNCLgithub/intuitive-physics-3d-shape-perception-stimuli.

Code availability

Code implementing the PbAS model, scripts for replicating model simulations, and a container for full reproducibility are publicly available at https://github.com/CNCLgithub/PbAS. Our custom Python scripts for data analysis are publicly available at https://github.com/CNCLgithub/PbAS-model-human-comparisons.

References

  1. Bulthoff, H. Shape from X: psychophysics and computation. Comput. Models Vis. Process. 305–330 (1991).

  2. Yildirim, I., Siegel, M. H. & Tenenbaum, J. B. Perceiving fully occluded objects via physical simulation. In Proc. 38th Annual Conference of the Cognitive Science Society 1265–1271 (Cognitive Science Society, 2016).

  3. Phillips, F. & Fleming, R. W. The Veiled Virgin illustrates visual segmentation of shape by cause. Proc. Natl Acad. Sci. USA 117, 11735–11743 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  4. Little, P. C. & Firestone, C. Physically implied surfaces. Psychol. Sci. 32, 799–808 (2021).

    Article  PubMed  Google Scholar 

  5. Wong, K. W., Bi, W., Soltani, A. A., Yildirim, I. & Scholl, B. J. Seeing soft materials draped over objects: a case study of intuitive physics in perception, attention, and memory. Psychol. Sci. 34, 111–119 (2022).

    Article  PubMed  Google Scholar 

  6. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  ADS  CAS  PubMed  Google Scholar 

  7. Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012).

    Google Scholar 

  8. Hong, H., Yamins, D. L., Majaj, N. J. & DiCarlo, J. J. Explicit information for category-orthogonal object properties increases along the ventral stream. Nat. Neurosci. 19, 613 (2016).

    Article  CAS  PubMed  Google Scholar 

  9. Yuille, A. & Kersten, D. Vision as Bayesian inference: analysis by synthesis? Trends Cogn. Sci. 10, 301–308 (2006).

    Article  PubMed  Google Scholar 

  10. Mumford, D. in Large-Scale Neuronal Theories of the Brain (eds Koch, C. & and Davis, J.) 125–152 (MIT Press, 1994).

  11. Liu, Z., Knill, D. C. & Kersten, D. Object classification for human and ideal observers. Vis. Res. 35, 549–568 (1995).

    Article  CAS  PubMed  Google Scholar 

  12. Destler, N., Singh, M. & Feldman, J. Skeleton-based shape similarity. Psychol. Rev. https://doi.org/10.1037/rev0000412 (2023).

  13. Erdogan, G. & Jacobs, R. A. Visual shape perception as Bayesian inference of 3D object-centered shape representations. Psychol. Rev. 124, 740 (2017).

    Article  PubMed  Google Scholar 

  14. Lee, M. J. & DiCarlo, J. J. An empirical assay of view-invariant object learning in humans and comparison with baseline image-computable models. Preprint at bioRxiv (2023).

  15. Chandra, K., Li, T.-M., Tenenbaum, J. & Ragan-Kelley, J. Designing perceptual puzzles by differentiating probabilistic programs. In ACM SIGGRAPH 2022 Conference Proceedings 1–9 (ACM, 2022).

  16. Chang, A. X. et al. ShapeNet: an information-rich 3D model repository. Preprint at https://doi.org/10.48550/arXiv.1512.03012 (2015).

  17. Macklin, M., Müller, M., Chentanez, N. & Kim, T.-Y. Unified particle physics for real-time applications. ACM Trans. Graph. 33, 1–12 (2014).

    Article  Google Scholar 

  18. Koch, E., Baig, F. & Zaidi, Q. Picture perception reveals mental geometry of 3D scene inferences. Proc. Natl Acad. Sci. USA 115, 7807–7812 (2018).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  19. Snoek, J., Larochelle, H. & Adams, R. P. Practical Bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 25, 2951–2959 (2012).

    Google Scholar 

  20. Cranmer, K., Brehmer, J. & Louppe, G. The frontier of simulation-based inference. Proc. Natl Acad. Sci. USA 117, 30055–30062 (2020).

    Article  ADS  MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  21. Hamrick, J. B. & Griffiths, T. L. Mental rotation as Bayesian quadrature. In NIPS 2013 Workshop on Bayesian Optimization in Theory and Practice (2013).

  22. Wang, A., Mei, S., Yuille, A. L. & Kortylewski, A. Neural view synthesis and matching for semi-supervised few-shot learning of 3D pose. Adv. Neural Inf. Process. Syst. 34, 7207–7219 (2021).

    Google Scholar 

  23. Järvenpää, M., Gutmann, M. U., Pleska, A., Vehtari, A. & Marttinen, P. Efficient acquisition rules for model-based approximate Bayesian computation. Bayesian Anal. 14, 595–622 (2019).

    Article  MathSciNet  Google Scholar 

  24. Kandasamy, K., Schneider, J. & Póczos, B. Bayesian active learning for posterior estimation. In 24th International Joint Conference on Artificial Intelligence 3605–3611 (PMLR, 2015).

  25. Tamura, R. & Hukushima, K. Bayesian optimization for computationally extensive probability distributions. PLoS ONE 13, e0193785 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Schrimpf, M. et al. Brain-score: which artificial neural network for object recognition is most brain-like? Preprint at bioRxiv https://doi.org/10.1101/407007 (2018).

  27. Yamins, D. L. & DiCarlo, J. J. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci. 19, 356 (2016).

    Article  CAS  PubMed  Google Scholar 

  28. Deng, J. et al. Imagenet: a large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

  29. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).

  30. Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. Preprint at https://doi.org/10.48550/arXiv.1409.1556(2014).

  31. Usher, M. & McClelland, J. L. The time course of perceptual choice: the leaky, competing accumulator model. Psychol. Rev. 108, 550 (2001).

    Article  CAS  PubMed  Google Scholar 

  32. Echeveste, R., Aitchison, L., Hennequin, G. & Lengyel, M. Cortical-like dynamics in recurrent circuits optimized for sampling-based probabilistic inference. Nat. Neurosci. 23, 1138–1149 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Yildirim, I., Belledonne, M., Freiwald, W. & Tenenbaum, J. Efficient inverse graphics in biological face processing. Sci. Adv. 6, eaax5979 (2020).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  34. DiCarlo, J. J., Zoccolan, D. & Rust, N. C. How does the brain solve visual object recognition? Neuron 73, 415–434 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Yamins, D. L. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl Acad. Sci. USA 111, 8619–8624 (2014).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  36. Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning 1597–1607 (PMLR, 2020).

  37. Radford, A. et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning 8748–8763 (PMLR, 2021).

  38. Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. Preprint at https://doi.org/10.48550/arXiv.2010.11929 (2020).

  39. Konkle, T. & Alvarez, G. A. A self-supervised domain-general learning framework for human ventral stream representation. Nat. Commun. 13, 1–12 (2022).

    Article  Google Scholar 

  40. Geirhos, R. Partial success in closing the gap between human and machine vision. Adv. Neural Inf. Process. Syst. 34, 23885–23899 (2021).

    Google Scholar 

  41. Gilbert, C. D. In Principles of Neural Science 5th edn (eds. Kandel, E. R. et al.) 556–576 (McGraw-Hill, 2013).

  42. Baker, N., Lu, H., Erlikhman, G. & Kellman, P. J. Deep convolutional networks do not classify based on global object shape. PLoS Comput. Biol. 14, e1006613 (2018).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  43. Feldman, J. & Singh, M. Information along contours and object boundaries. Psychol. Rev. 112, 243 (2005).

    Article  PubMed  Google Scholar 

  44. Zucker, S. W. On qualitative shape inferences: a journey from geometry to topology. Preprint at https://doi.org/10.48550/arXiv.2008.08622 (2020).

  45. Kunsberg, B. & Zucker, S. W. Critical contours: an invariant linking image flow with salient surface organization. SIAM J. Imaging Sci. 11, 1849–1877 (2018).

    Article  MathSciNet  Google Scholar 

  46. Grill-Spector, K. & Kanwisher, N. Visual recognition: as soon as you know it is there, you know what it is. Psychol. Sci. 16, 152–160 (2005).

    Article  PubMed  Google Scholar 

  47. Ullman, S. in Readings in Computer Vision (eds Fischler, M. A. & Firschein, O.) 298–328 (Elsevier, 1987).

  48. Shepard, R. N. & Metzler, J. Mental rotation of three-dimensional objects. Science 171, 701–703 (1971).

    Article  ADS  CAS  PubMed  Google Scholar 

  49. Shams, L. & Beierholm, U. Bayesian causal inference: a unifying neuroscience theory. Neurosci. Biobehav. Rev. 137, 104619 (2022).

    Article  PubMed  Google Scholar 

  50. Fischer, J., Mikhael, J. G., Tenenbaum, J. B. & Kanwisher, N. Functional neuroanatomy of intuitive physical inference. Proc. Natl Acad. Sci. USA 113, E5072–E5081 (2016).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  51. Nash, C., Ganin, Y., Eslami, S. A. & Battaglia, P. Polygen: an autoregressive generative model of 3d meshes. In International Conference on Machine Learning (7220–7229) (2020).

  52. Pfaff, T., Fortunato, M., Sanchez-Gonzalez, A. & Battaglia, P. W. Learning mesh-based simulation with graph networks. Preprint at https://doi.org/10.48550/arXiv.2010.03409 (2021).

  53. Mrowca, D. et al. Flexible neural representation for physics prediction. In Proc. 32nd International Conference on Information Processing Systems 8813–8824 (2018).

  54. Smith, K. et al. Modeling expectation violation in intuitive physics with coarse probabilistic object representations. Adv. Neural Inf. Process. Syst. 32, 8983–8993 (2019).

    Google Scholar 

  55. Piloto, L. S., Weinstein, A., Battaglia, P. & Botvinick, M. Intuitive physics learning in a deep-learning model inspired by developmental psychology. Nat. Hum. Behav. 6, 1257–1267 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Sanborn, A. N., Mansinghka, V. K. & Griffiths, T. L. Reconciling intuitive physics and Newtonian mechanics for colliding objects. Psychol. Rev. 120, 411 (2013).

    Article  PubMed  Google Scholar 

  57. Wu, J., Yildirim, I., Lim, J. J., Freeman, B. & Tenenbaum, J. Galileo: perceiving physical object properties by integrating a physics engine with deep learning. Adv. Neural Inf. Process. Syst. 28, 127–135 (2015).

    Google Scholar 

  58. Schwettmann, S., Tenenbaum, J. B. & Kanwisher, N. Invariant representations of mass in the human brain. eLife 8, e46619 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Bi, W., Shah, A. D., Wong, K. W., Scholl, B. & Yildirim, I. Perception of soft materials relies on physics-based object representations: Behavioral and computational evidence. Preprint at bioRxiv https://doi.org/10.1101/2021.05.12.443806 (2021).

  60. Paulun, V. C., Schmidt, F., van Assen, J. J. R. & Fleming, R. W. Shape, motion, and optical cues to stiffness of elastic objects. J. Vis. 17, 20–20 (2017).

    Article  PubMed  Google Scholar 

  61. Paulun, V. C. & Fleming, R. W. Visually inferring elasticity from the motion trajectory of bouncing cubes. J. Vis. 20, 6–6 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Bates, C. J., Yildirim, I., Tenenbaum, J. B. & Battaglia, P. Modeling human intuitions about liquid flow with particle-based simulation. PLoS Comput. Biol. 15, e1007210 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  63. Kubricht, J., Zhu, Y., Jiang, C., Terzopoulos, D., Zhu, S. C. & Lu, H. Consistent probabilistic simulation underlying human judgment in substance dynamics. In Proc. 39th Annual Conference of the Cognitive Science Society 3426–3431 (Cognitive Science Society, 2017).

  64. Van Assen, J. J. R., Barla, P. & Fleming, R. W. Visual features in the perception of liquids. Curr. Biol. 28, 452–458 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Chen, Y.-C. & Scholl, B. J. The perception of history: seeing causal history in static shapes induces illusory motion perception. Psychol. Sci. 27, 923–930 (2016).

    Article  PubMed  Google Scholar 

  66. Fleming, R. W. & Schmidt, F. Getting “fumpered”: classifying objects by what has been done to them. J. Vis. 19, 15–15 (2019).

    Article  PubMed  Google Scholar 

  67. Schmidt, F., Phillips, F. & Fleming, R. W. Visual perception of shape-transforming processes: ‘shape scission’. Cognition 189, 167–180 (2019).

    Article  PubMed  Google Scholar 

  68. Blender Online Community Blender—a 3D modelling and rendering package. Blender Institute http://www.blender.org (2015).

  69. Geirhos, R. et al. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. Preprint at https://doi.org/10.48550/arXiv.1811.12231 (2018).

  70. Rasmussen, C. E. & Williams, C. K. Gaussian Processes for Machine Learning (MIT Press, 2006).

  71. Nogueira, F. Bayesian Optimization: open source constrained global optimization tool for Python. GitHub https://github.com/fmfn/BayesianOptimization (2014).

  72. GPy: a Gaussian process framework in Python. GitHub http://github.com/SheffieldML/GPy (2012).

  73. Schultz, M. & Joachims, T. Learning a distance metric from relative comparisons. Adv. Neural Inf. Process. Syst. 16, 41–48 (2003).

    Google Scholar 

  74. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://doi.org/10.48550/arXiv.1412.6980 (2015).

  75. Reddi, S. J., Kale, S. & Kumar, S. On the convergence of Adam and beyond. Preprint at https://doi.org/10.48550/arXiv.1904.09237 (2018).

Download references

Acknowledgements

This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216; ONR MURI N00014-13-1-0333 (to J.B.T.); a grant from Toyota Research Institute (to J.B.T.); and a grant from Mitsubishi MELCO (to J.B.T.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. A high performance computing cluster (OpenMind) was provided by the McGovern Institute for Brain Research. We thank K. Smith, B. Egger, K. Allen, G. Erdogan, M. Tenenbaum, N. Kanwisher and V. Paulun for their comments on a previous version of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

I.Y., M.H.S. and J.B.T. conceived and designed the study. I.Y. analysed the data. I.Y., M.H.S., A.A.S. and J.B.T. designed stimuli and experiments, and wrote and edited the manuscript. All authors contributed to the models.

Corresponding authors

Correspondence to Ilker Yildirim, Max H. Siegel or Joshua B. Tenenbaum.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Yaniv Morgenstern, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yildirim, I., Siegel, M.H., Soltani, A.A. et al. Perception of 3D shape integrates intuitive physics and analysis-by-synthesis. Nat Hum Behav 8, 320–335 (2024). https://doi.org/10.1038/s41562-023-01759-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41562-023-01759-7

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing