Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review
  • Published:

Scientific discovery in the age of artificial intelligence

A Publisher Correction to this article was published on 30 August 2023

This article has been updated

Abstract

Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment and accelerate research, helping scientists to generate hypotheses, design experiments, collect and interpret large datasets, and gain insights that might not have been possible using traditional scientific methods alone. Here we examine breakthroughs over the past decade that include self-supervised learning, which allows models to be trained on vast amounts of unlabelled data, and geometric deep learning, which leverages knowledge about the structure of scientific data to enhance model accuracy and efficiency. Generative AI methods can create designs, such as small-molecule drugs and proteins, by analysing diverse data modalities, including images and sequences. We discuss how these methods can help scientists throughout the scientific process and the central issues that remain despite such advances. Both developers and users of AI tools need a better understanding of when such approaches need improvement, and challenges posed by poor data quality and stewardship remain. These issues cut across scientific disciplines and require developing foundational algorithmic approaches that can contribute to scientific understanding or acquire it autonomously, making them critical areas of focus for AI innovation.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Science in the age of artificial intelligence.
Fig. 2: Learning meaningful representations of scientific data.
Fig. 3: AI-guided generation of scientific hypotheses.
Fig. 4: Integration of AI with scientific experiments and simulation.

Similar content being viewed by others

Change history

References

  1. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015). This survey summarizes key elements of deep learning and its development in speech recognition, computer vision and and natural language processing.

    Article  ADS  CAS  PubMed  Google Scholar 

  2. de Regt, H. W. Understanding, values, and the aims of science. Phil. Sci. 87, 921–932 (2020).

    Article  MathSciNet  Google Scholar 

  3. Pickstone, J. V. Ways of Knowing: A New History of Science, Technology, and Medicine (Univ. Chicago Press, 2001).

  4. Han, J. et al. Deep potential: a general representation of a many-body potential energy surface. Commun. Comput. Phys. 23, 629–639 (2018). This paper introduced a deep neural network architecture that learns the potential energy surface of many-body systems while respecting the underlying symmetries of the system by incorporating group theory.

  5. Akiyama, K. et al. First M87 Event Horizon Telescope results. IV. Imaging the central supermassive black hole. Astrophys. J. Lett. 875, L4 (2019).

    Article  ADS  CAS  Google Scholar 

  6. Wagner, A. Z. Constructions in combinatorics via neural networks. Preprint at https://arxiv.org/abs/2104.14516 (2021).

  7. Coley, C. W. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 365, eaax1566 (2019).

    Article  CAS  PubMed  Google Scholar 

  8. Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://arxiv.org/abs/2108.07258 (2021).

  9. Davies, A. et al. Advancing mathematics by guiding human intuition with AI. Nature 600, 70–74 (2021). This paper explores how AI can aid the development of pure mathematics by guiding mathematical intuition.

    Article  ADS  CAS  PubMed  PubMed Central  MATH  Google Scholar 

  10. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).This study was the first to demonstrate the ability to predict protein folding structures using AI methods with a high degree of accuracy, achieving results that are at or near the experimental resolution. This accomplishment is particularly noteworthy, as predicting protein folding has been a grand challenge in the field of molecular biology for over 50 years.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  11. Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Bohacek, R. S., McMartin, C. & Guida, W. C. The art and practice of structure-based drug design: a molecular modeling perspective. Med. Res. Rev. 16, 3–50 (1996).

    Article  CAS  PubMed  Google Scholar 

  13. Bileschi, M. L. et al. Using deep learning to annotate the protein universe. Nat. Biotechnol. 40, 932–937 (2022).

  14. Bellemare, M. G. et al. Autonomous navigation of stratospheric balloons using reinforcement learning. Nature 588, 77–82 (2020). This paper describes a reinforcement-learning algorithm for navigating a super-pressure balloon in the stratosphere, making real-time decisions in the changing environment.

    Article  ADS  CAS  PubMed  Google Scholar 

  15. Tshitoyan, V. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019).

    Article  ADS  CAS  PubMed  Google Scholar 

  16. Zhang, L. et al. Deep potential molecular dynamics: a scalable model with the accuracy of quantum mechanics. Phys. Rev. Lett. 120, 143001 (2018).

    Article  ADS  CAS  PubMed  Google Scholar 

  17. Deiana, A. M. et al. Applications and techniques for fast machine learning in science. Front. Big Data 5, 787421 (2022).

  18. Karagiorgi, G. et al. Machine learning in the search for new fundamental physics. Nat. Rev. Phys. 4, 399–412 (2022).

  19. Zhou, C. & Paffenroth, R. C. Anomaly detection with robust deep autoencoders. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 665–674 (2017).

  20. Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).

    Article  ADS  MathSciNet  CAS  PubMed  MATH  Google Scholar 

  21. Kasieczka, G. et al. The LHC Olympics 2020 a community challenge for anomaly detection in high energy physics. Rep. Prog. Phys. 84, 124201 (2021).

    Article  ADS  CAS  Google Scholar 

  22. Govorkova, E. et al. Autoencoders on field-programmable gate arrays for real-time, unsupervised new physics detection at 40 MHz at the Large Hadron Collider. Nat. Mach. Intell. 4, 154–161 (2022).

    Article  Google Scholar 

  23. Chamberland, M. et al. Detecting microstructural deviations in individuals with deep diffusion MRI tractometry. Nat. Comput. Sci. 1, 598–606 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Rafique, M. et al. Delegated regressor, a robust approach for automated anomaly detection in the soil radon time series data. Sci. Rep. 10, 3004 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  25. Pastore, V. P. et al. Annotation-free learning of plankton for classification and anomaly detection. Sci. Rep. 10, 12142 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  26. Naul, B. et al. A recurrent neural network for classification of unevenly sampled variable stars. Nat. Astron. 2, 151–155 (2018).

    Article  ADS  Google Scholar 

  27. Lee, D.-H. et al. Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In ICML Workshop on Challenges in Representation Learning (2013).

  28. Zhou, D. et al. Learning with local and global consistency. In Advances in Neural Information Processing Systems 16, 321–328 (2003).

  29. Radivojac, P. et al. A large-scale evaluation of computational protein function prediction. Nat. Methods 10, 221–227 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Barkas, N. et al. Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nat. Methods 16, 695–698 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Tran, K. & Ulissi, Z. W. Active learning across intermetallics to guide discovery of electrocatalysts for CO2 reduction and H2 evolution. Nat. Catal. 1, 696–703 (2018).

    Article  CAS  Google Scholar 

  32. Jablonka, K. M. et al. Bias free multiobjective active learning for materials design and discovery. Nat. Commun. 12, 2312 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  33. Roussel, R. et al. Turn-key constrained parameter space exploration for particle accelerators using Bayesian active learning. Nat. Commun. 12, 5612 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  34. Ratner, A. J. et al. Data programming: creating large training sets, quickly. In Advances in Neural Information Processing Systems 29, 3567–3575 (2016).

  35. Ratner, A. et al. Snorkel: rapid training data creation with weak supervision. In International Conference on Very Large Data Bases 11, 269–282 (2017). This paper presents a weakly-supervised AI system designed to annotate massive amounts of data using labeling functions.

  36. Butter, A. et al. GANplifying event samples. SciPost Phys. 10, 139 (2021).

    Article  ADS  Google Scholar 

  37. Brown, T. et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33, 1877–1901 (2020).

  38. Ramesh, A. et al. Zero-shot text-to-image generation. In International Conference on Machine Learning 139, 8821–8831 (2021).

  39. Littman, M. L. Reinforcement learning improves behaviour from evaluative feedback. Nature 521, 445–451 (2015).

    Article  ADS  CAS  PubMed  Google Scholar 

  40. Cubuk, E. D. et al. Autoaugment: learning augmentation strategies from data. In IEEE Conference on Computer Vision and Pattern Recognition 113–123 (2019).

  41. Reed, C. J. et al. Selfaugment: automatic augmentation policies for self-supervised learning. In IEEE Conference on Computer Vision and Pattern Recognition 2674–2683 (2021).

  42. ATLAS Collaboration et al. Deep generative models for fast photon shower simulation in ATLAS. Preprint at https://arxiv.org/abs/2210.06204 (2022).

  43. Mahmood, F. et al. Deep adversarial training for multi-organ nuclei segmentation in histopathology images. IEEE Trans. Med. Imaging 39, 3257–3267 (2019).

    Article  Google Scholar 

  44. Teixeira, B. et al. Generating synthetic X-ray images of a person from the surface geometry. In IEEE Conference on Computer Vision and Pattern Recognition 9059–9067 (2018).

  45. Lee, D., Moon, W.-J. & Ye, J. C. Assessing the importance of magnetic resonance contrasts using collaborative generative adversarial networks. Nat. Mach. Intell. 2, 34–42 (2020).

    Article  Google Scholar 

  46. Kench, S. & Cooper, S. J. Generating three-dimensional structures from a two-dimensional slice with generative adversarial network-based dimensionality expansion. Nat. Mach. Intell. 3, 299–305 (2021).

    Article  Google Scholar 

  47. Wan, C. & Jones, D. T. Protein function prediction is improved by creating synthetic feature samples with generative adversarial networks. Nat. Mach. Intell. 2, 540–550 (2020).

    Article  Google Scholar 

  48. Repecka, D. et al. Expanding functional protein sequence spaces using generative adversarial networks. Nat. Mach. Intell. 3, 324–333 (2021).

    Article  Google Scholar 

  49. Marouf, M. et al. Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks. Nat. Commun. 11, 166 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  50. Ghahramani, Z. Probabilistic machine learning and artificial intelligence. Nature 521, 452–459 (2015).This survey provides an introduction to probabilistic machine learning, which involves the representation and manipulation of uncertainty in models and predictions, playing a central role in scientific data analysis.

    Article  ADS  CAS  PubMed  Google Scholar 

  51. Cogan, J. et al. Jet-images: computer vision inspired techniques for jet tagging. J. High Energy Phys. 2015, 118 (2015).

    Article  Google Scholar 

  52. Zhao, W. et al. Sparse deconvolution improves the resolution of live-cell super-resolution fluorescence microscopy. Nat. Biotechnol. 40, 606–617 (2022).

    Article  CAS  PubMed  Google Scholar 

  53. Brbić, M. et al. MARS: discovering novel cell types across heterogeneous single-cell experiments. Nat. Methods 17, 1200–1206 (2020).

    Article  PubMed  Google Scholar 

  54. Qiao, C. et al. Evaluation and development of deep neural networks for image super-resolution in optical microscopy. Nat. Methods 18, 194–202 (2021).

    Article  CAS  PubMed  Google Scholar 

  55. Andreassen, A. et al. OmniFold: a method to simultaneously unfold all observables. Phys. Rev. Lett. 124, 182001 (2020).

    Article  ADS  CAS  PubMed  Google Scholar 

  56. Bergenstråhle, L. et al. Super-resolved spatial transcriptomics by deep data fusion. Nat. Biotechnol. 40, 476–479 (2021).

  57. Vincent, P. et al. Extracting and composing robust features with denoising autoencoders. In International Conference on Machine Learning 1096–1103 (2008).

  58. Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. In International Conference on Learning Representations (2014).

  59. Eraslan, G. et al. Single-cell RNA-seq denoising using a deep count autoencoder. Nat. Commun. 10, 390 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  60. Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).

  61. Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996).

    Article  ADS  CAS  PubMed  Google Scholar 

  62. Bengio, Y. Deep learning of representations for unsupervised and transfer learning. In ICML Workshop on Unsupervised and Transfer Learning (2012).

  63. Detlefsen, N. S., Hauberg, S. & Boomsma, W. Learning meaningful representations of protein sequences. Nat. Commun. 13, 1914 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  64. Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2019).

    Article  CAS  Google Scholar 

  65. Bronstein, M. M. et al. Geometric deep learning: going beyond euclidean data. IEEE Signal Process Mag. 34, 18–42 (2017).

    Article  ADS  Google Scholar 

  66. Anderson, P. W. More is different: broken symmetry and the nature of the hierarchical structure of science. Science 177, 393–396 (1972).

    Article  ADS  CAS  PubMed  Google Scholar 

  67. Qiao, Z. et al. Informing geometric deep learning with electronic interactions to accelerate quantum chemistry. Proc. Natl Acad. Sci. USA 119, e2205221119 (2022).

  68. Bogatskiy, A. et al. Symmetry group equivariant architectures for physics. Preprint at https://arxiv.org/abs/2203.06153 (2022).

  69. Bronstein, M. M. et al. Geometric deep learning: grids, groups, graphs, geodesics, and gauges. Preprint at https://arxiv.org/abs/2104.13478 (2021).

  70. Townshend, R. J. L. et al. Geometric deep learning of RNA structure. Science 373, 1047–1051 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  71. Wicky, B. I. M. et al. Hallucinating symmetric protein assemblies. Science 378, 56–61 (2022).

  72. Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (2017).

  73. Veličković, P. et al. Graph attention networks. In International Conference on Learning Representations (2018).

  74. Hamilton, W. L., Ying, Z. & Leskovec, J. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems 30, 1024–1034 (2017).

  75. Gilmer, J. et al. Neural message passing for quantum chemistry. In International Conference on Machine Learning 1263–1272 (2017).

  76. Li, M. M., Huang, K. & Zitnik, M. Graph representation learning in biomedicine and healthcare. Nat. Biomed. Eng. 6, 1353–1369 (2022).

  77. Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. In International Conference on Machine Learning 9323–9332 (2021). This study incorporates principles of physics into the design of neural models, advancing the field of equivariant machine learning.

  78. Thomas, N. et al. Tensor field networks: rotation-and translation-equivariant neural networks for 3D point clouds. Preprint at https://arxiv.org/abs/1802.08219 (2018).

  79. Finzi, M. et al. Generalizing convolutional neural networks for equivariance to lie groups on arbitrary continuous data. In International Conference on Machine Learning 3165–3176 (2020).

  80. Fuchs, F. et al. SE(3)-transformers: 3D roto-translation equivariant attention networks. In Advances in Neural Information Processing Systems 33, 1970-1981 (2020).

  81. Zaheer, M. et al. Deep sets. In Advances in Neural Information Processing Systems 30, 3391–3401 (2017). This paper is an early study that explores the use of deep neural architectures on set data, which consists of an unordered list of elements.

  82. Cohen, T. S. et al. Spherical CNNs. In International Conference on Learning Representations (2018).

  83. Gordon, J. et al. Permutation equivariant models for compositional generalization in language. In International Conference on Learning Representations (2019).

  84. Finzi, M., Welling, M. & Wilson, A. G. A practical method for constructing equivariant multilayer perceptrons for arbitrary matrix groups. In International Conference on Machine Learning 3318–3328 (2021).

  85. Dijk, D. V. et al. Recovering gene interactions from single-cell data using data diffusion. Cell 174, 716–729 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  86. Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).

    Article  CAS  PubMed  Google Scholar 

  87. Hatfield, P. W. et al. The data-driven future of high-energy-density physics. Nature 593, 351–361 (2021).

    Article  ADS  CAS  PubMed  Google Scholar 

  88. Bapst, V. et al. Unveiling the predictive power of static structure in glassy systems. Nat. Phys. 16, 448–454 (2020).

    Article  CAS  Google Scholar 

  89. Zhang, R., Zhou, T. & Ma, J. Multiscale and integrative single-cell Hi-C analysis with Higashi. Nat. Biotechnol. 40, 254–261 (2022).

    Article  CAS  PubMed  Google Scholar 

  90. Sammut, S.-J. et al. Multi-omic machine learning predictor of breast cancer therapy response. Nature 601, 623–629 (2022).

    Article  ADS  CAS  PubMed  Google Scholar 

  91. DeZoort, G. et al. Graph neural networks at the Large Hadron Collider. Nat. Rev. Phys. 5, 281–303 (2023).

  92. Liu, S. et al. Pre-training molecular graph representation with 3D geometry. In International Conference on Learning Representations (2022).

  93. The LIGO Scientific Collaboration. et al. A gravitational-wave standard siren measurement of the Hubble constant. Nature 551, 85–88 (2017).

    Article  Google Scholar 

  94. Reichstein, M. et al. Deep learning and process understanding for data-driven Earth system science. Nature 566, 195–204 (2019).

    Article  ADS  CAS  PubMed  Google Scholar 

  95. Goenka, S. D. et al. Accelerated identification of disease-causing variants with ultra-rapid nanopore genome sequencing. Nat. Biotechnol. 40, 1035–1041 (2022).

  96. Bengio, Y. et al. Greedy layer-wise training of deep networks. In Advances in Neural Information Processing Systems 19, 153–160 (2006).

  97. Hinton, G. E., Osindero, S. & Teh, Y.-W. A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006).

    Article  MathSciNet  PubMed  MATH  Google Scholar 

  98. Jordan, M. I. & Mitchell, T. M. Machine learning: trends, perspectives, and prospects. Science 349, 255–260 (2015).

    Article  ADS  MathSciNet  CAS  PubMed  MATH  Google Scholar 

  99. Devlin, J. et al. BERT: pre-training of deep bidirectional transformers for language understanding. In North American Chapter of the Association for Computational Linguistics 4171–4186 (2019).

  100. Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).

  101. Elnaggar, A. et al. ProtTrans: rowards cracking the language of lifes code through self-supervised deep learning and high performance computing. In IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).

  102. Hie, B. et al. Learning the language of viral evolution and escape. Science 371, 284–288 (2021).This paper modeled viral escape with machine learning algorithms originally developed for human natural language.

    Article  ADS  MathSciNet  CAS  PubMed  MATH  Google Scholar 

  103. Biswas, S. et al. Low-N protein engineering with data-efficient deep learning. Nat. Methods 18, 389–396 (2021).

    Article  CAS  PubMed  Google Scholar 

  104. Ferruz, N. & Höcker, B. Controllable protein design with language models. Nat. Mach. Intell. 4, 521–532 (2022).

  105. Hsu, C. et al. Learning inverse folding from millions of predicted structures. In International Conference on Machine Learning 8946–8970 (2022).

  106. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021). Inspired by AlphaFold2, this study reported RoseTTAFold, a novel three-track neural module capable of simultaneously processing protein’s sequence, distance and coordinates.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  107. Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).

    Article  CAS  Google Scholar 

  108. Lin, T.-S. et al. BigSMILES: a structurally-based line notation for describing macromolecules. ACS Cent. Sci. 5, 1523–1531 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  109. Krenn, M. et al. SELFIES and the future of molecular string representations. Patterns 3, 100588 (2022).

  110. Flam-Shepherd, D., Zhu, K. & Aspuru-Guzik, A. Language models can learn complex molecular distributions. Nat. Commun. 13, 3293 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  111. Skinnider, M. A. et al. Chemical language models enable navigation in sparsely populated chemical space. Nat. Mach. Intell. 3, 759–770 (2021).

    Article  Google Scholar 

  112. Chithrananda, S., Grand, G. & Ramsundar, B. ChemBERTa: large-scale self-supervised pretraining for molecular property prediction. In Machine Learning for Molecules Workshop at NeurIPS (2020).

  113. Schwaller, P. et al. Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chem. Sci. 11, 3316–3325 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  114. Tetko, I. V. et al. State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis. Nat. Commun. 11, 5575 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  115. Schwaller, P. et al. Mapping the space of chemical reactions using attention-based neural networks. Nat. Mach. Intell. 3, 144–152 (2021).

    Article  Google Scholar 

  116. Kovács, D. P., McCorkindale, W. & Lee, A. A. Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias. Nat. Commun. 12, 1695 (2021).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  117. Pesciullesi, G. et al. Transfer learning enables the molecular transformer to predict regio-and stereoselective reactions on carbohydrates. Nat. Commun. 11, 4874 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  118. Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems 30, 5998–6008 (2017). This paper introduced the transformer, a modern neural network architecture that can process sequential data in parallel, revolutionizing natural language processing and sequence modeling.

  119. Mousavi, S. M. et al. Earthquake transformer—an attentive deep-learning model for simultaneous earthquake detection and phase picking. Nat. Commun. 11, 3952 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  120. Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. Meier, J. et al. Language models enable zero-shot prediction of the effects of mutations on protein function. In Advances in Neural Information Processing Systems 34, 29287–29303 (2021).

  122. Kamienny, P.-A. et al. End-to-end symbolic regression with transformers. In Advances in Neural Information Processing Systems 35, 10269–10281 (2022).

  123. Jaegle, A. et al. Perceiver: general perception with iterative attention. In International Conference on Machine Learning 4651–4664 (2021).

  124. Chen, L. et al. Decision transformer: reinforcement learning via sequence modeling. In Advances in Neural Information Processing Systems 34, 15084–15097 (2021).

  125. Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. In International Conference on Learning Representations (2020).

  126. Choromanski, K. et al. Rethinking attention with performers. In International Conference on Learning Representations (2021).

  127. Li, Z. et al. Fourier neural operator for parametric partial differential equations. In International Conference on Learning Representations (2021).

  128. Kovachki, N. et al. Neural operator: learning maps between function spaces. J. Mach. Learn. Res. 24, 1–97 (2023).

  129. Russell, J. L. Kepler’s laws of planetary motion: 1609–1666. Br. J. Hist. Sci. 2, 1–24 (1964).

    Article  Google Scholar 

  130. Huang, K. et al. Artificial intelligence foundation for therapeutic science. Nat. Chem. Biol. 18, 1033–1036 (2022).

  131. Guimerà, R. et al. A Bayesian machine scientist to aid in the solution of challenging scientific problems. Sci. Adv. 6, eaav6971 (2020).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  132. Liu, G. et al. Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii. Nat. Chem. Biol. https://doi.org/10.1038/s41589-023-01349-8 (2023).

  133. Gómez-Bombarelli, R. et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat. Mater. 15, 1120–1127 (2016). This paper proposes using a black-box AI predictor to accelerate high-throughput screening of molecules in materials science.

    Article  ADS  PubMed  Google Scholar 

  134. Sadybekov, A. A. et al. Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature 601, 452–459 (2022).

    Article  ADS  CAS  PubMed  Google Scholar 

  135. The NNPDF Collaboration Evidence for intrinsic charm quarks in the proton. Nature 606, 483–487 (2022).

    Article  Google Scholar 

  136. Graff, D. E., Shakhnovich, E. I. & Coley, C. W. Accelerating high-throughput virtual screening through molecular pool-based active learning. Chem. Sci. 12, 7866–7881 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. Janet, J. P. et al. Accurate multiobjective design in a space of millions of transition metal complexes with neural-network-driven efficient global optimization. ACS Cent. Sci. 6, 513–524 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  138. Bacon, F. Novum Organon Vol. 1620 (2000).

  139. Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science 324, 81–85 (2009).

    Article  ADS  CAS  PubMed  Google Scholar 

  140. Petersen, B. K. et al. Deep symbolic regression: recovering mathematical expressions from data via risk-seeking policy gradients. In International Conference on Learning Representations (2020).

  141. Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 (2019). This paper describes a reinforcement-learning algorithm for navigating molecular combinatorial spaces, and it validates generated molecules using wet-lab experiments.

    Article  CAS  PubMed  Google Scholar 

  142. Zhou, Z. et al. Optimization of molecules via deep reinforcement learning. Sci. Rep. 9, 10752 (2019).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  143. You, J. et al. Graph convolutional policy network for goal-directed molecular graph generation. In Advances in Neural Information Processing Systems 31, 6412–6422 (2018).

  144. Bengio, Y. et al. GFlowNet foundations. Preprint at https://arxiv.org/abs/2111.09266 (2021). This paper describes a generative flow network that generates objects by sampling them from a distribution optimized for drug design.

  145. Jain, M. et al. Biological sequence design with GFlowNets. In International Conference on Machine Learning 9786–9801 (2022).

  146. Malkin, N. et al. Trajectory balance: improved credit assignment in GFlowNets. In Advances in Neural Information Processing Systems 35, 5955–5967 (2022).

  147. Borkowski, O. et al. Large scale active-learning-guided exploration for in vitro protein production optimization. Nat. Commun. 11, 1872 (2020). This study introduced a dynamic programming approach to determine the optimal locations and capacities of hydropower dams in the Amazon Basin, balancing between energy production and environmental impact.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  148. Flecker, A. S. et al. Reducing adverse impacts of Amazon hydropower expansion. Science 375, 753–760 (2022).This study introduced a dynamic programming approach to determine the optimal locations and capacities of hydropower dams in the Amazon basin, achieving a balance between the benefits of energy production and the potential environmental impacts.

    Article  ADS  CAS  PubMed  Google Scholar 

  149. Pion-Tonachini, L. et al. Learning from learning machines: a new generation of AI technology to meet the needs of science. Preprint at https://arxiv.org/abs/2111.13786 (2021).

  150. Kusner, M. J., Paige, B. & Hernández-Lobato, J. M. Grammar variational autoencoder. In International Conference on Machine Learning 1945–1954 (2017). This paper describes a grammar variational autoencoder that generates novel symbolic laws and drug molecules.

  151. Brunton, S. L., Proctor, J. L. & Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl Acad. Sci. USA 113, 3932–3937 (2016).

    Article  ADS  MathSciNet  CAS  PubMed  PubMed Central  MATH  Google Scholar 

  152. Liu, Z. & Tegmark, M. Machine learning hidden symmetries. Phys. Rev. Lett. 128, 180201 (2022).

    Article  ADS  MathSciNet  CAS  PubMed  Google Scholar 

  153. Gabbard, H. et al. Bayesian parameter estimation using conditional variational autoencoders for gravitational-wave astronomy. Nat. Phys. 18, 112–117 (2022).

    Article  CAS  Google Scholar 

  154. Chen, D. et al. Automating crystal-structure phase mapping by combining deep learning with constraint reasoning. Nat. Mach. Intell. 3, 812–822 (2021).

    Article  Google Scholar 

  155. Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

  156. Anishchenko, I. et al. De novo protein design by deep network hallucination. Nature 600, 547–552 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  157. Fu, T. et al. Differentiable scaffolding tree for molecular optimization. In International Conference on Learning Representations (2021).

  158. Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).

    Article  ADS  CAS  PubMed  Google Scholar 

  159. Huang, K. et al. Therapeutics Data Commons: machine learning datasets and tasks for drug discovery and development. In NeurIPS Datasets and Benchmarks (2021). This study describes an initiative with open AI models, datasets and education programmes to facilitate advances in therapeutic science across all stages of drug discovery and development.

  160. Dance, A. Lab hazard. Nature 458, 664–665 (2009).

    Article  CAS  PubMed  Google Scholar 

  161. Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018). This paper describes an approach that combines deep neural networks with Monte Carlo tree search to plan chemical synthesis.

    Article  ADS  CAS  PubMed  Google Scholar 

  162. Gao, W., Raghavan, P. & Coley, C. W. Autonomous platforms for data-driven organic synthesis. Nat. Commun. 13, 1075 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  163. Kusne, A. G. et al. On-the-fly closed-loop materials discovery via Bayesian active learning. Nat. Commun. 11, 5966 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  164. Gormley,A. J. & Webb, M. A. Machine learning in combinatorial polymer chemistry. Nat. Rev. Mater. 6, 642–644 (2021).

    Article  ADS  CAS  PubMed  Google Scholar 

  165. Ament, S. et al. Autonomous materials synthesis via hierarchical active learning of nonequilibrium phase diagrams. Sci. Adv. 7, eabg4930 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  166. Degrave, J. et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602, 414–419 (2022).This paper describes an approach for controlling tokamak plasmas, using a reinforcement-learning agent to command-control coils and satisfy physical and operational constraints.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  167. Melnikov, A. A. et al. Active learning machine learns to create new quantum experiments. Proc. Natl Acad. Sci. USA 115, 1221–1226 (2018).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  168. Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  169. Wang, D. et al. Efficient sampling of high-dimensional free energy landscapes using adaptive reinforced dynamics. Nat. Comput. Sci. 2, 20–29 (2022).This paper describes a neural network for reliable uncertainty estimations in molecular dynamics, enabling efficient sampling of high-dimensional free energy landscapes.

    Article  CAS  Google Scholar 

  170. Wang, W. & Gómez-Bombarelli, R. Coarse-graining auto-encoders for molecular dynamics. npj Comput. Mater. 5, 125 (2019).

    Article  ADS  Google Scholar 

  171. Hermann, J., Schätzle, Z. & Noé, F. Deep-neural-network solution of the electronic Schrödinger equation. Nat. Chem. 12, 891–897 (2020).This paper describes a method to learn the wavefunction of quantum systems using deep neural networks in conjunction with variational quantum Monte Carlo.

    Article  CAS  PubMed  Google Scholar 

  172. Carleo, G. & Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science 355, 602–606 (2017).

    Article  ADS  MathSciNet  CAS  PubMed  MATH  Google Scholar 

  173. Em Karniadakis, G. et al. Physics-informed machine learning. Nat. Rev. Phys. 3, 422–440 (2021).

    Article  Google Scholar 

  174. Li, Z. et al. Physics-informed neural operator for learning partial differential equations. Preprint at https://arxiv.org/abs/2111.03794 (2021).

  175. Kochkov, D. et al. Machine learning–accelerated computational fluid dynamics. Proc. Natl Acad. Sci. USA 118, e2101784118 (2021). This paper describes an approach to accelerating computational fluid dynamics by training a neural network to interpolate from coarse to fine grids and generalize to varying forcing functions and Reynolds numbers.

  176. Ji, W. et al. Stiff-PINN: physics-informed neural network for stiff chemical kinetics. J. Phys. Chem. A 125, 8098–8106 (2021).

    Article  CAS  PubMed  Google Scholar 

  177. Smith, J. D., Azizzadenesheli, K. & Ross, Z. E. EikoNet: solving the Eikonal equation with deep neural networks. IEEE Trans. Geosci. Remote Sens. 59, 10685–10696 (2020).

    Article  ADS  Google Scholar 

  178. Waheed, U. B. et al. PINNeik: Eikonal solution using physics-informed neural networks. Comput. Geosci. 155, 104833 (2021).

    Article  Google Scholar 

  179. Chen, R. T. Q. et al. Neural ordinary differential equations. In Advances in Neural Information Processing Systems 31, 6572–6583 (2018). This paper established a connection between neural networks and differential equations by introducing the adjoint method to learn continuous-time dynamical systems from data, replacing backpropagation.

  180. Raissi, M., Perdikaris, P. & Karniadakis, G. E. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019). This paper describes a deep-learning approach for solving forwards and inverse problems in nonlinear partial differential equations and can find solutions to differential equations from data.

    Article  ADS  MathSciNet  MATH  Google Scholar 

  181. Lu, L. et al. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nat. Mach. Intell. 3, 218–229 (2021).

    Article  ADS  Google Scholar 

  182. Brandstetter, J., Worrall, D. & Welling, M. Message passing neural PDE solvers. In International Conference on Learning Representations (2022).

  183. Noé, F. et al. Boltzmann generators: sampling equilibrium states of many-body systems with deep learning. Science 365, eaaw1147 (2019). This paper presents an efficient sampling algorithm using normalizing flows to simulate equilibrium states in many-body systems.

  184. Rezende, D. & Mohamed, S. Variational inference with normalizing flows. In International Conference on Machine Learning 37, 1530–1538, (2015).

  185. Dinh, L., Sohl-Dickstein, J. & Bengio, S. Density estimation using real NVP. In International Conference on Learning Representations (2017).

  186. Nicoli, K. A. et al. Estimation of thermodynamic observables in lattice field theories with deep generative models. Phys. Rev. Lett. 126, 032001 (2021).

    Article  ADS  MathSciNet  CAS  PubMed  Google Scholar 

  187. Kanwar, G. et al. Equivariant flow-based sampling for lattice gauge theory. Phys. Rev. Lett. 125, 121601 (2020).

    Article  ADS  MathSciNet  CAS  PubMed  Google Scholar 

  188. Gabrié, M., Rotskoff, G. M. & Vanden-Eijnden, E. Adaptive Monte Carlo augmented with normalizing flows. Proc. Natl Acad. Sci. USA 119, e2109420119 (2022).

    Article  MathSciNet  PubMed  PubMed Central  Google Scholar 

  189. Jasra, A., Holmes, C. C. & Stephens, D. A. Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Stat. Sci. 20, 50–67 (2005).

  190. Bengio, Y. et al. Better mixing via deep representations. In International Conference on Machine Learning 552–560 (2013).

  191. Pompe, E., Holmes, C. & Łatuszyński, K. A framework for adaptive MCMC targeting multimodal distributions. Ann. Stat. 48, 2930–2952 (2020).

    Article  MathSciNet  MATH  Google Scholar 

  192. Townshend, R. J. L. et al. ATOM3D: tasks on molecules in three dimensions. In NeurIPS Datasets and Benchmarks (2021).

  193. Kearnes, S. M. et al. The open reaction database. J. Am. Chem. Soc. 143, 18820–18826 (2021).

    Article  CAS  PubMed  Google Scholar 

  194. Chanussot, L. et al. Open Catalyst 2020 (OC20) dataset and community challenges. ACS Catal. 11, 6059–6072 (2021).

    Article  CAS  Google Scholar 

  195. Brown, N. et al. GuacaMol: benchmarking models for de novo molecular design. J. Chem. Inf. Model. 59, 1096–1108 (2019).

    Article  ADS  CAS  PubMed  Google Scholar 

  196. Notin, P. et al. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. In International Conference on Machine Learning 16990–17017 (2022).

  197. Mitchell, M. et al. Model cards for model reporting. In Conference on Fairness, Accountability, and Transparency220–229 (2019).

  198. Gebru, T. et al. Datasheets for datasets. Commun. ACM 64, 86–92 (2021).

    Article  Google Scholar 

  199. Bai, X. et al. Advancing COVID-19 diagnosis with privacy-preserving collaboration in artificial intelligence. Nat. Mach. Intell. 3, 1081–1089 (2021).

    Article  Google Scholar 

  200. Warnat-Herresthal, S. et al. Swarm learning for decentralized and confidential clinical machine learning. Nature 594, 265–270 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  201. Hie, B., Cho, H. & Berger, B. Realizing private and practical pharmacological collaboration. Science 362, 347–350 (2018).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  202. Rohrbach, S. et al. Digitization and validation of a chemical synthesis literature database in the ChemPU. Science 377, 172–180 (2022).

    Article  ADS  CAS  PubMed  Google Scholar 

  203. Gysi, D. M. et al. Network medicine framework for identifying drug-repurposing opportunities for COVID-19. Proc. Natl Acad. Sci. USA 118, e2025581118 (2021).

    Article  CAS  Google Scholar 

  204. King, R. D. et al. The automation of science. Science 324, 85–89 (2009).

    Article  ADS  CAS  PubMed  Google Scholar 

  205. Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).

  206. Doerr, S. et al. TorchMD: a deep learning framework for molecular simulations. J. Chem. Theory Comput. 17, 2355–2363 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  207. Schoenholz, S. S. & Cubuk, E. D. JAX MD: a framework for differentiable physics. In Advances in Neural Information Processing Systems 33, 11428–11441 (2020).

  208. Peters, J., Janzing, D. & Schölkopf, B. Elements of Causal Inference: Foundations and Learning Algorithms (MIT Press, 2017).

  209. Bengio, Y. et al. A meta-transfer objective for learning to disentangle causal mechanisms. In International Conference on Learning Representations (2020).

  210. Schölkopf, B. et al. Toward causal representation learning. Proc. IEEE 109, 612–634 (2021).

    Article  Google Scholar 

  211. Goyal, A. & Bengio, Y. Inductive biases for deep learning of higher-level cognition. Proc. R. Soc. A 478, 20210068 (2022).

  212. Deleu, T. et al. Bayesian structure learning with generative flow networks. In Conference on Uncertainty in Artificial Intelligence 518–528 (2022).

  213. Geirhos, R. et al. Shortcut learning in deep neural networks. Nat. Mach. Intell. 2, 665–673 (2020).

    Article  Google Scholar 

  214. Koh, P. W. et al. WILDS: a benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning 5637–5664 (2021).

  215. Luo, Z. et al. Label efficient learning of transferable representations across domains and tasks. In Advances in Neural Information Processing Systems 30, 165–177 (2017).

  216. Mahmood, R. et al. How much more data do I need? estimating requirements for downstream tasks. In IEEE Conference on Computer Vision and Pattern Recognition 275–284 (2022).

  217. Coley, C. W., Eyke, N. S. & Jensen, K. F. Autonomous discovery in the chemical sciences part II: outlook. Angew. Chem. Int. Ed. 59, 23414–23436 (2020).

    Article  CAS  Google Scholar 

  218. Gao, W. & Coley, C. W. The synthesizability of molecules proposed by generative models. J. Chem. Inf. Model. 60, 5714–5723 (2020).

    Article  CAS  PubMed  Google Scholar 

  219. Kogler, R. et al. Jet substructure at the Large Hadron Collider. Rev. Mod. Phys. 91, 045003 (2019).

    Article  ADS  CAS  Google Scholar 

  220. Acosta, J. N. et al. Multimodal biomedical AI. Nat. Med. 28, 1773–1784 (2022).

  221. Alayrac, J.-B. et al. Flamingo: a visual language model for few-shot learning. In Advances in Neural Information Processing Systems 35, 23716–23736 (2022).

  222. Elmarakeby, H. A. et al. Biologically informed deep neural network for prostate cancer discovery. Nature 598, 348–352 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  223. Qin, Y. et al. A multi-scale map of cell structure fusing protein images and interactions. Nature 600, 536–542 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  224. Schaffer, L. V. & Ideker, T. Mapping the multiscale structure of biological systems. Cell Systems 12, 622–635 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  225. Stiglic, G. et al. Interpretability of machine learning-based prediction models in healthcare. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 10, e1379 (2020).

    Article  Google Scholar 

  226. Erion, G. et al. A cost-aware framework for the development of AI models for healthcare applications. Nat. Biomed. Eng. 6, 1384–1398 (2022).

  227. Lundberg, S. M. et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2, 749–760 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  228. Sanders, L. M. et al. Beyond low Earth orbit: biological research, artificial intelligence, and self-driving labs. Preprint at https://arxiv.org/abs/2112.12582 (2021).

  229. Gagne, D. J. II et al. Interpretable deep learning for spatial analysis of severe hailstorms. Mon. Weather Rev. 147, 2827–2845 (2019).

    Article  ADS  Google Scholar 

  230. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  231. Koh, P. W. & Liang, P. Understanding black-box predictions via influence functions. In International Conference on Machine Learning 1885–1894 (2017).

  232. Mirzasoleiman, B., Bilmes, J. & Leskovec, J. Coresets for data-efficient training of machine learning models. In International Conference on Machine Learning 6950–6960 (2020).

  233. Kim, B. et al. Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). In International Conference on Machine Learning 2668–2677 (2018).

  234. Silver, D. et al. Mastering the game of go without human knowledge. Nature 550, 354–359 (2017).

    Article  ADS  CAS  PubMed  Google Scholar 

  235. Baum, Z. J. et al. Artificial intelligence in chemistry: current trends and future directions. J. Chem. Inf. Model. 61, 3197–3212 (2021).

    Article  CAS  PubMed  Google Scholar 

  236. Finlayson, S. G. et al. Adversarial attacks on medical machine learning. Science 363, 1287–1289 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  237. Urbina, F. et al. Dual use of artificial-intelligence-powered drug discovery. Nat. Mach. Intell. 4, 189–191 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  238. Norgeot, B. et al. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist. Nat. Med. 26, 1320–1324 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

M.Z. gratefully acknowledges the support of the National Institutes of Health under R01HD108794, U.S. Air Force under FA8702-15-D-0001, awards from Harvard Data Science Initiative, Amazon Faculty Research, Google Research Scholar Program, Bayer Early Excellence in Science, AstraZeneca Research, Roche Alliance with Distinguished Scientists, and Kempner Institute for the Study of Natural and Artificial Intelligence. C.P.G. and Y.D. acknowledge the support from the U.S. Air Force Office of Scientific Research under Multidisciplinary University Research Initiatives Program (MURI) FA9550-18-1-0136, Defense University Research Instrumentation Program (DURIP) FA9550-21-1-0316, and awards from Scientific Autonomous Reasoning Agent (SARA), and AI for Discovery Assistant (AIDA). Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funders. We thank D. Hassabis, A. Davies, S. Mohamed, Z. Li, K. Ma, Z. Qiao, E. Weinstein, A. V. Weller, Y. Zhong and A. M. Brandt for discussions on the paper.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the design and writing of the paper, helped shape the research, provided critical feedback, and commented on the paper and its revisions. H.W., T.F., Y.D. and M.Z conceived the study and were responsible for overall direction and planning. W.G., K.H. and Z.L. contributed equally to this work (equal second authorship) and are listed alphabetically.

Corresponding author

Correspondence to Marinka Zitnik.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks Brian Gallagher and Benjamin Nachman for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Fu, T., Du, Y. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023). https://doi.org/10.1038/s41586-023-06221-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-023-06221-2

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics