Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Incorporating physics into data-driven computer vision

Abstract

Many computer vision techniques infer properties of our physical world from images. Although images are formed through the physics of light and mechanics, computer vision techniques are typically data driven. This trend is mostly performance related: classical techniques from physics-based vision often score lower on metrics compared with modern deep learning. However, recent research, covered in this Perspective, has shown that physical models can be included as a constraint into data-driven pipelines. In doing so, one can combine the performance benefits of a data-driven method with advantages offered from a physics-based method, such as intepretability, falsifiability and generalizability. The aim of this Perspective is to provide an overview into specific approaches for integrating physical models into artificial intelligence pipelines, referred to as physics-based machine learning. We discuss technical approaches that range from modifications to the dataset, network design, loss functions, optimization and regularization schemes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Incorporating physics in neural pipelines in modern computer vision.
Fig. 2: When to approach a problem from a physics, data-driven or physics-based learning approach.
Fig. 3: Two techniques to incorporate physics into machine learning pipelines.
Fig. 4: Combined loss functions that use both data-driven annotations and physical constraints.

Similar content being viewed by others

References

  1. Thapa, S., Li, N. & Ye, J. Dynamic fluid surface reconstruction using deep neural network. In: 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition 21–30 (IEEE, 2020).

  2. Schweri, L. et al. A physics-aware neural network approach for flow data reconstruction from satellite observations. Front. Clim. 3, 656505 (2021).

    Article  Google Scholar 

  3. Zhao, B., Huang, Y., Wei, H. & Hu, X. Ego-motion estimation using recurrent convolutional neural networks through optical flow learning. Electronics 10, 222 (2021).

    Article  Google Scholar 

  4. Zhou, W., Zhang, H., Yan, Z., Wang, W. & Lin, L. DecoupledPoseNet: cascade decoupled pose learning for unsupervised camera ego-motion estimation. IEEE Trans. Multimedia https://doi.org/10.1109/TMM.2022.3144958 (2022).

  5. Li, W. et al. Dynamic registration: joint ego motion estimation and 3d moving object detection in dynamic environment. Preprint at https://doi.org/10.48550/arXiv.2204.12769 (2022).

  6. Frazzoli, E. Robust Hybrid Control for Autonomous Vehicle Motion Planning. PhD thesis, Massachusetts Institute of Technology (2001).

  7. Frazzoli, E., Dahleh, M. A. & Feron, E. Real-time motion planning for agile autonomous vehicles. J. Guid. Control Dyn. 25, 116–129 (2002).

    Article  Google Scholar 

  8. Goerzen, C., Kong, Z. & Mettler, B. A survey of motion planning algorithms from the perspective of autonomous UAV guidance. J. Intell. Rob. Syst. 57, 65–100 (2010).

    Article  MATH  Google Scholar 

  9. Gibson, J. J. The perception of visual surfaces. Am. J. Psychol. 63, 367–384 (1950).

    Article  Google Scholar 

  10. Latecki, L. J. & Lakamper, R. Shape similarity measure based on correspondence of visual parts. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1185–1190 (2000).

    Article  Google Scholar 

  11. Mokhtarian, F. & Abbasi, S. Shape similarity retrieval under affine transforms. Pattern Recognit. 35, 31–41 (2002).

    Article  MATH  Google Scholar 

  12. Raytchev, B., Hasegawa, O. & Otsu, N. User-independent gesture recognition by relative-motion extraction and discriminant analysis. New Gener. Comput. 18, 117–126 (2000).

    Article  Google Scholar 

  13. Atkinson, G. A. & Hancock, E. R. Recovery of surface orientation from diffuse polarization. IEEE Trans. Image Process. 15, 1653–1664 (2006).

    Article  Google Scholar 

  14. Ba, Y. et al. Deep shape from polarization. In: Proc. 16th Eur. Conf. Computer Vision 554–571 (Springer, 2020).

  15. Cao, Y. & Gu, Q. Generalization error bounds of gradient descent for learning over-parameterized deep ReLU networks. In: Proc. AAAI Conf. on Artificial Intelligence Vol. 34, 3349–3356 (AAAI, 2020).

  16. Rockwell, C., Johnson, J. & Fouhey, D. F. The 8-point algorithm as an inductive bias for relative pose prediction by ViTs. In: 2022 Int. Conf. 3D Vision (IEEE, 2022).

  17. Lu, Y., Lin, S., Chen, G. & Pan, J. ModLaNets: learning generalisable dynamics via modularity and physical inductive bias. In: Proc. 39th Int. Conf. Machine Learning (eds. Chaudhuri, K. et al.) Vol. 162, 14384–14397 (PMLR, 2022).

  18. Achille, A. & Soatto, S. On the learnability of physical concepts: can a neural network understand what’s real? Preprint at https://doi.org/10.48550/ARXIV.2207.12186 (2022).

  19. Kilic, V. et al. Lidar Light Scattering Augmentation (LISA): physics-based simulation of adverse weather conditions for 3D object detection. Preprint at https://doi.org/10.48550/arXiv.2107.07004 (2021).

  20. Wang, C., Bentivegna, E., Zhou, W., Klein, L. & Elmegreen, B. Physics-informed neural network super resolution for advection-diffusion models. In: Annu. Conf. Neural Information Processing Systems (2020).

  21. Chao, M. A., Kulkarni, C., Goebel, K. & Fink, O. Fusing physics-based and deep learning models for prognostics. Reliab. Eng. Syst. Saf. 217, 107961 (2022).

    Article  Google Scholar 

  22. Zhou, H., Greenwood, D., Taylor, S. Self-supervised monocular depth estimation with internal feature fusion. In: British Machine Vision Conf. 2021 (2021).

  23. Klingner, M., Termöhlen, J.-A., Mikolajczyk, J. & Fingscheidt, T. Self-supervised monocular depth estimation: solving the dynamic object problem by semantic guidance. In: Proc. 16th Eur. Conf. Computer Vision 582–600 (Springer, 2020).

  24. Liu, L., Song, X., Wang, M., Liu, Y. & Zhang, L. Self-supervised monocular depth estimation for all day images using domain separation. In: Proc. IEEE/CVF Int. Conf. on Computer Vision 12737–12746 (IEEE, 2021).

  25. Guizilini, V., Ambrus, R., Pillai, S., Raventos, A. & Gaidon, A. 3D packing for self-supervised monocular depth estimation. In: Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 2485–2494 (IEEE, 2020).

  26. Schwarz, M. & Behnke, S. Stillleben: realistic scene synthesis for deep learning in robotics. In: 2020 IEEE Int. Conf. Robotics and Automation 10502–10508 (IEEE, 2020).

  27. Lerer, A., Gross, S. & Fergus, R. Learning physical intuition of block towers by example. In: Proc. 33rd Int. Conf. on Machine Learning 430–438 (2016).

  28. Wu, J., Yildirim, I., Lim, J. J., Freeman, B. & Tenenbaum, J. Galileo: perceiving physical object properties by integrating a physics engine with deep learning. In: Adv. Neural Information Processing Systems Vol. 28 (2015).

  29. Narang, Y., Sundaralingam, B., Macklin, M., Mousavian, A. & Fox, D. Sim-to-real for robotic tactile sensing via physics-based simulation and learned latent projections. In: 2021 IEEE Int. Conf. Robotics and Automation 6444–6451 (IEEE, 2021).

  30. Huang, I. et al. DefGraspSim: physics-based simulation of grasp outcomes for 3D deformable objects. IEEE Rob. Autom. Lett. 7, 6274–6281 (2022).

    Article  Google Scholar 

  31. de Melo, C. M. et al. Next-generation deep learning based on simulators and synthetic data. Trends Cognit. Sci. 26, 174–187 (2022).

    Article  Google Scholar 

  32. Jalali, B., Zhou, Y., Kadambi, A. & Roychowdhury, V. Physics-AI symbiosis. Mach. Learn. Sci. Technol. 3, 041001 (2022).

    Article  Google Scholar 

  33. Zhao, S., Jakob, W. & Li, T.-M. Physics-based differentiable rendering: from theory to implementation. In: ACM SIGGRAPH 2020 (Association for Computing Machinery, 2020).

  34. Baek, S.-H. et al. Image-based acquisition and modeling of polarimetric reflectance. ACM Trans. Graph. https://doi.org/10.1145/3386569.3392387 (2020).

  35. Kondo, Y., Ono, T., Sun, L., Hirasawa, Y. & Murayama, J. Accurate polarimetric brdf for real polarization scene rendering. In: Eur. Conf. Computer Vision 2020 (eds. Vedaldi, A. et al.) 220–236 (Springer, 2020).

  36. Zhang, K., Luan, F., Wang, Q., Bala, K. & Snavely, N. PhySG: inverse rendering with spherical Gaussians for physics-based material editing and relighting. In: IEEE/CVF Conf. Computer Vision and Pattern Recognition (IEEE, 2021).

  37. Gaidon, A., Wang, Q., Cabon, Y. & Vig, E. VirtualWorlds as proxy for multi-object tracking analysis. In: 2016 IEEE Conf. Computer Vision and Pattern Recognition 4340–4349 (IEEE, 2016).

  38. Ros, G., Sellart, L., Materzynska, J., Vazquez, D. & Lopez, A.M. The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: 2016 IEEE Conf. Computer Vision and Pattern Recognition 3234–3243 (IEEE, 2016).

  39. Prakash, A. et al. Structured domain randomization: bridging the reality gap by context-aware synthetic data. In: 2019 Int. Conf. Robotics and Automation 7249–7255 (IEEE, 2019).

  40. Müller, M., Casser, V., Lahoud, J., Smith, N. & Ghanem, B. Sim4CV: a photo-realistic simulator for computer vision applications. Int. J. Comput. Vision 126, 902–919 (2018).

    Article  Google Scholar 

  41. Richter, S. R., Vineet, V., Roth, S. & Koltun, V. Playing for data: ground truth from computer games. In: Eur. Conf. Computer Vision 2016 (eds. Leibe, B. et al.) 102–118 (Springer, 2016).

  42. Wang, Z. et al. Synthetic generation of face videos with plethysmograph physiology. In: Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 20587–20596 (2022).

  43. Zhu, Y., Jiang, C., Zhao, Y., Terzopoulos, D. & Zhu, S.-C. Inferring forces and learning human utilities from videos. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition 3823–3833 (IEEE, 2016).

  44. Chen, C. et al. SoundSpaces: audio-visual navigation in 3D environments. In: Proc. 16th Eur. Conf. Computer Vision 17–36 (Springer, 2020).

  45. Luo, W. et al. End-to-end active object tracking and its real-world deployment via reinforcement learning. IEEE Trans. Pattern Anal. Mach. Intell. 42, 1317–1332 (2019).

    Article  Google Scholar 

  46. Tobin, J. et al. Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ Int. Conf. Intelligent Robots and Systems 23–30 (IEEE, 2017).

  47. Sadeghi, F. & Levine, S. CAD2RL: real single-image flight without a single real image. In: Robotics: Science and Systems XIII (eds. Amato, N. M. et al.) (Massachusetts Institute of Technology, 2017).

  48. Zeng, A., Song, S., Lee, J., Rodriguez, A. & Funkhouser, T. TossingBot: learning to throw arbitrary objects with residual physics. Trans. Rob. 36, 1307–1319 (2020).

    Article  Google Scholar 

  49. Ajay, A. et al. Augmenting physical simulators with stochastic neural networks: case study of planar pushing and bouncing. In: IROS 2018 (2018).

  50. Kloss, A., Schaal, S. & Bohg, J. Combining learned and analytical models for predicting action effects from sensory data. Int. J. Rob. Res. https://doi.org/10.1177/0278364920954896 (2020).

  51. Kadambi, A., Taamazyan, V., Shi, B. & Raskar, R. Polarized 3D: high-quality depth sensing with polarization cues. In: Proc. IEEE Int. Conf. Computer Vision 3370–3378 (IEEE, 2015).

  52. Kalra, A. et al. Deep polarization cues for transparent object segmentation. In: 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition 8599–8608 (IEEE, 2020).

  53. Zou, S. et al. 3D human shape reconstruction from a polarization image. In: Proc. 16th Eur. Conf. Computer Vision 351–368 (Springer, 2020).

  54. Seo, S. et al. Controlling neural networks with rule representations. In: Adv. Neural Information Processing Systems (2021).

  55. Klinghoffer, T., Somasundaram, S., Tiwary, K. & Raskar, R. Physics vs. learned priors: Rethinking camera and algorithm design for task-specific imaging. In 2022 IEEE International Conference on Computational Photography (ICCP) 1–12 (IEEE, 2022).

  56. Janner, M., Wu, J., Kulkarni, T. D., Yildirim, I. & Tenenbaum, J. B. Self-supervised intrinsic image decomposition. In: Proc. 31st Int. Conf. Neural Information Processing Systems 5938–5948 (Curran Associates, 2017).

  57. Vamaraju, J. & Sen, M. K. Unsupervised physics-based neural networks for seismic migration. Interpretation 7, 189–200 (2019).

    Article  Google Scholar 

  58. Rupe, A. et al. DisCo: physics-based unsupervised discovery of coherent structures in spatiotemporal systems. In: 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments 75–87 (IEEE, 2019).

  59. Hui, Z., Chakrabarti, A., Sunkavalli, K. & Sankaranarayanan, A. C. Learning to separate multiple illuminants in a single image. In: Computer Vision and Pattern Recognition 2019 (2019).

  60. Nestmeyer, T., Lalonde, J., Matthews, I. & Lehrmann, A. Learning physics-guided face relighting under directional light. In: 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition 5123–5132 (IEEE, 2020).

  61. Alotaibi, S. & Smith, W. A. P. BioFaceNet: deep biophysical face image interpretation. In: Proc. British Machine Vision Conf. (2019).

  62. Cai, G., Yan, K., Dong, Z., Gkioulekas, I. & Zhao, S. Physics‐based inverse rendering using combined implicit and explicit geometries. Comput. Graph. Forum 41, 129–138 (2022).

  63. Halder, S. S., Lalonde, J.-F. & de Charette, R. Physics-based rendering for improving robustness to rain. In: Proc. IEEE/CVF Int. Conf. Computer Vision 10203–10212 (IEEE, 2019).

  64. Agarwal, A., Man, T. & Yuan, W. Simulation of vision-based tactile sensors using physics based rendering. In: 2021 IEEE Int. Conf. Robotics and Automation 1–7 (IEEE, 2021).

  65. Tewari, A. et al. Advances in neural rendering. In: Computer Graphics Forum Vol. 41, 703–735 (Wiley, 2022).

  66. Nayar, S. K. & Narasimhan, S. G. Vision in bad weather. In: Proc. 7th IEEE Int. Conf. Computer Vision Vol. 2, 820–8272 (1999).

  67. Zhang, H. & Patel, V. M. Densely connected pyramid dehazing network. In: 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition 3194–3203 (2018).

  68. Chen, Z., Wang, Y., Yang, Y. & Liu, D. PSD: principled synthetic-to-real dehazing guided by physical priors. In: 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition 7176–7185 (2021).

  69. Le, H., Vicente, T. F. Y., Nguyen, V., Hoai, M. & Samaras, D. A+D Net: training a shadow detector with adversarial shadow attenuation. In: Proc. Eur. Conf. Computer Vision (2018).

  70. Jin, Y., Sharma, A. & Tan, R. T. DC-ShadowNet: single-image hard and soft shadow removal using unsupervised domain-classifier guided network. In: 2021 IEEE/CVF Int. Conf. Computer Vision 5007–5016 (IEEE, 2021).

  71. Bogo, F. et al. Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: 14th Eur. Conf. Computer Vision 561–578 (Springer, 2016).

  72. Shimada, S., Golyanik, V., Xu, W. & Theobalt, C. PhysCap: physically plausible monocular 3D motion capture in real time. ACM Trans. Graph. 39, 1–16 (2020).

    Article  Google Scholar 

  73. Shi, L. et al. A novel loss function incorporating imaging acquisition physics for pet attenuation map generation using deep learning. In: Medical Image Computing and Computer Assisted Intervention 2019 (eds. Shen, D. et al.) 723–731. (Springer, 2019).

  74. Metzler, C. A. et al. Deep-inverse correlography: towards real-time high-resolution non-line-of-sight imaging: erratum. Optica 7, 249–251 (2020).

    Article  Google Scholar 

  75. Zhang, F. et al. Physics-based iterative projection complex neural network for phase retrieval in lensless microscopy imaging. In: 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition 10518–10526 (IEEE, 2021).

  76. Huang, L., Chen, H., Liu, T. & Ozcan, A. GedankenNet: self-supervised learning of hologram reconstruction using physics consistency. Preprint at https://arxiv.org/abs/2209.08288 (2022).

  77. Bai, B. et al. Deep learning-enabled virtual histological staining of biological samples. Light Sci. Appl. 12, 57 (2023).

  78. Kadambi, A. Achieving fairness in medical devices. Science 372, 30–31 (2021).

  79. Zhao, E. Q. et al. Making thermal imaging more equitable and accurate: resolving solar loading biases. Preprint at https://arxiv.org/abs/2304.08832 (2023).

  80. Vilesov, A. et al. Blending camera and 77 GHz radar sensing for equitable robust plethysmography. ACM Trans. Grap. 41, 1–14 (2022).

  81. Trager, M. et al. Linear spaces of meanings: the compositional language of vision-language models. Preprint at https://doi.org/10.48550/arXiv.2302.14383 (2023).

  82. Fragkiadaki, K., Agrawal, P., Levine, S. & Malik, J. Learning visual predictive models of physics for playing billiards. In 4th International Conference on Learning Representations (ICLR, 2016)

  83. Pradyumna, C. et al. On learning mechanical laws of motion from video using neural networks. IEEE Access 11 30129–30145 (2013).

  84. Li, Y., Torralba, A., Anandkumar, A., Fox, D. & Garg, A. Causal discovery in physical systems from videos. Adv. Neural Inf. Process. Syst. 33, 9180–9192 (2020).

    Google Scholar 

  85. Li, Y. et al. Visual grounding of learned physical models. In: Int. Conf. Machine Learning (2020).

  86. Chen, B. et al. Automated discovery of fundamental variables hidden in experimental data. Nat. Comput. Sci. 2, 433–442 (2022).

    Article  Google Scholar 

  87. Hassabis, D., Kumaran, D., Summerfield, C. & Botvinick, M. Neuroscience-Inspired artificial intelligence. Neuron 95, 245–258 (2017).

    Article  Google Scholar 

  88. Marblestone, A. H., Wayne, G. & Kording, K. P. Toward an integration of deep learning and neuroscience. Front. Comput. Neurosci. 10, 94 (2016).

  89. Bengio, Y., Lee, D.-H., Bornschein, J., Mesnard, T. & Lin, Z. Towards biologically plausible deep learning. Preprint at https://doi.org/10.48550/arXiv.1502.04156 (2015).

  90. Battaglia, P. W., Hamrick, J. B. & Tenenbaum, J. B. Simulation as an engine of physical scene understanding. Proc. Natl Acad. Sci. USA 110, 18327–18332 (2013).

    Article  Google Scholar 

  91. Spelke, E. S. & Kinzler, K. D. Core knowledge. Dev. Sci. 10, 89–96 (2007).

    Article  Google Scholar 

  92. Wu, J., Lim, J. J., Zhang, H. & Tenenbaum, J. B. Physics 101: learning physical object properties from unlabeled videos. In: Proc. British Machine Vision Conference (2016).

  93. Bear, D. M. et al. Learning physical graph representations from visual scenes. In: Proc. 34th Int. Conf. Neural Information Processing Systems (2020).

  94. GPT-4 Technical Report (OpenAI, 2023); https://cdn.openai.com/papers/gpt-4.pdf

  95. Chrupala, G., Alishahi, A. & Berg-Kirkpatrick, T. The science of language modeling. Annu. Rev. Ling. 7, 149–176 (2021).

    Google Scholar 

  96. Pan, J. et al. Physics-based generative adversarial models for image restoration and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 43, 2449–2462 (2020).

    Article  Google Scholar 

  97. Ba, Y. et al. Not just streaks: towards ground truth for single image deraining. In: Proc. 17th Eur. Conf. Computer Vision 723–740 (Springer, 2022).

  98. Bear, D. et al. Physion: evaluating physical prediction from vision in humans and machines. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1 (NeurIPS, 2021).

  99. Ba, Y., Zhao, G. & Kadambi, A. Blending diverse physical priors with neural networks. Preprint at https://doi.org/10.48550/arXiv.1910.00201 (2019).

Download references

Acknowledgements

The authors thank members of the Visual Machines Group for feedback and support, as well as P. Patwa, Y. Ba, H. Peters, A. Armouti, H. Zhang, E. Zhao, S. Zhou, S. Vilesov, P. Chari, Z. Wang, A. Gupta, D. Conover, A. Singh and A. Wong for technical discussions, contributions and pointers to references for this manuscript. This research was partially supported by Army Research Lab (ARL) Grant W911NF-20-2-0158 under the cooperative A2I2 programme. A.K. was supported by an National Science Foundation (NSF) CAREER award IIS-2046737, Army Young Investigator Program Award, and Defense Advanced Research Projects Agency (DARPA) Young Faculty Award.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the ideas in the manuscript. A.K. took the lead in coordinating the figures and writing the manuscript. S.S. and C.d.M. had a supporting role in writing the manuscript. All authors proofread the manuscript.

Corresponding author

Correspondence to Achuta Kadambi.

Ethics declarations

Competing interests

A.K. is an employee, receives salary and owns stock in Intrinsic (an Alphabet company); and is a co-founder and owns stock in Vayu Robotics. C.d.M. declares no competing interests. C.-J.H., M.S. and S.S. hold employment, draw salary from and hold stock in Amazon.

Peer review

Peer review information

Nature Machine Intelligence thanks Fangwei Zhong and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kadambi, A., de Melo, C., Hsieh, CJ. et al. Incorporating physics into data-driven computer vision. Nat Mach Intell 5, 572–580 (2023). https://doi.org/10.1038/s42256-023-00662-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-023-00662-0

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing