Main

The foundation for forming scientific insights and theories is laid by how data are collected, transformed and understood. The rise of deep learning in the early 2010s has significantly expanded the scope and ambition of these scientific discovery processes1. Artificial intelligence (AI) is increasingly used across scientific disciplines to integrate massive datasets, refine measurements, guide experimentation, explore the space of theories compatible with the data, and provide actionable and reliable models integrated with scientific workflows for autonomous discovery.

Data collection and analysis are fundamental to scientific understanding and discovery, two of the central aims in science2, and quantitative methods and emerging technologies, from physical instruments such as microscopes to research techniques such as bootstrapping, have long been used to reach these aims3. The introduction of digitization in the 1950s paved the way for the general use of computing in scientific research. The rise of data science since the 2010s has enabled AI to provide valuable guidance by identifying scientifically relevant patterns from large datasets.

Although scientific practices and procedures vary across stages of scientific research, the development of AI algorithms cuts across traditionally isolated disciplines (Fig. 1). Such algorithms can enhance the design and execution of scientific studies. They are becoming indispensable tools for researchers by optimizing parameters and functions4, automating procedures to collect, visualize, and process data5, exploring vast spaces of candidate hypotheses to form theories6, and generating hypotheses and estimating their uncertainty to suggest relevant experiments7.

Fig. 1: Science in the age of artificial intelligence.
figure 1

Scientific discovery is a multifaceted process that involves several interconnected stages, including hypothesis formation, experimental design, data collection and analysis. AI is poised to reshape scientific discovery by augmenting and accelerating research at each stage of this process. The principles and illustrative studies shown here highlight the contributions to enhance scientific understanding and discovery.

The power of AI methods has vastly increased since the early 2010s because of the availability of large datasets, aided by fast and massively parallel computing and storage hardware (graphics processing units and supercomputers) and coupled with new algorithms. The latter includes deep representation learning (Box 1), particularly multilayered neural networks capable of identifying essential, compact features that can simultaneously solve many tasks that underlie a scientific problem. Of these, geometric deep learning (Box 1) has proved to be helpful in integrating scientific knowledge, presented as compact mathematical statements of physical relationships, prior distributions, constraints and other complex descriptors, such as the geometry of atoms in molecules. Self-supervised learning (Box 1) has enabled neural networks trained on labelled or unlabelled data to transfer learned representations to a different domain with few labelled examples, for example, by pre-training large foundation models8 and adapting them to solve diverse tasks across different domains. In addition, generative models (Box 1) can estimate the underlying data distribution of a complex system and support new designs. Distinct from other uses of AI, reinforcement-learning methods (Box 1) find optimal strategies for an environment by exploring many possible scenarios and assigning rewards to different actions based on metrics such as the information gain expected from a considered experiment.

In AI-driven scientific discovery, scientific knowledge can be incorporated into AI models using appropriate inductive biases (Box 1), which are assumptions representing structure, symmetry, constraints and prior knowledge as compact mathematical statements. However, applying these laws can lead to equations that are too complex for humans to solve, even with traditional numerical methods9. An emerging approach is incorporating scientific knowledge into AI models by including information about fundamental equations, such as the laws of physics or principles of molecular structure and binding in protein folding. Such inductive biases can enhance AI models by reducing the number of training examples needed to achieve the same level of accuracy10 and scaling analyses to a vast space of unexplored scientific hypotheses11.

Using AI for scientific innovation and discovery presents unique challenges compared with other areas of human endeavour where AI is utilized. One of the biggest challenges is the vastness of hypothesis spaces in scientific problems, making systematic exploration infeasible. For instance, in biochemistry, an estimated 1060 drug-like molecules exist to explore12. AI systems have the potential to revolutionize scientific workflows by accelerating processes and providing predictions with near-experimental accuracy. However, there are challenges to obtaining reliably annotated datasets for AI models, which can involve time-consuming and resource-intensive experimentation and simulations13. Despite these challenges, AI systems can enable efficient, intelligent and highly autonomous experimental design and data collection, where AI systems can operate under human supervision to assess, evaluate and act on results. Such capabilities have facilitated the development of artificially intelligent agents that continuously interact in dynamic environments and can, for example, make real-time decisions to navigate stratospheric balloons14. AI systems can play a valuable role in interpreting scientific datasets and extracting relationships and knowledge from scientific literature in a generalized manner. Recent findings demonstrate the potential for unsupervised language AI models to capture complex scientific concepts15, such as the periodic table, and predict applications of functional materials years before their discovery, suggesting that latent knowledge regarding future discoveries may be embedded in past publications.

Recent advances, including the successful unraveling of the 50-year-old protein-folding problem10 and AI-driven simulations of molecular systems with millions of particles16, demonstrate the potential of AI to address challenging scientific problems. However, the remarkable promise of discovery is accompanied by significant challenges for the emerging field of ‘AI for Science’ (AI4Science). As with any new technology, the success of AI4Science depends on our ability to integrate it into routine practices and understand its potential and limitations. Barriers to the widespread adoption of AI in scientific discovery include internal and external factors specific to each stage of the discovery process and concerns regarding the utility of methods, theory, software and hardware, as well as potential misuse. We explore the developments and address critical questions in AI4Science, including the conduct of science, traditional scepticism and implementation challenges.

AI-aided data collection and curation for scientific research

The ever-increasing scale and complexity of datasets collected by experimental platforms have led to a growing dependence on real-time processing and high-performance computing in scientific research to selectively store and analyse data generated at high rates17.

Data selection

A typical particle collision experiment generates over 100 terabytes of data every second18. Such scientific experiments are pushing the limits of existing data transmission and storage technologies. In these physics experiments, over 99.99% of raw instrument data represents background events that must be detected in real time and discarded to manage the data rates18. To identify rare events for future scientific enquiry, deep-learning methods18 replace pre-programmed hardware event triggers with algorithms that search for outlying signals to detect unforeseen or rare phenomena that may otherwise be missed during compression. The background processes can be modelled generatively using a deep autoencoder19 (Box 1). The autoencoder20 returns a higher loss value (anomaly score) for previously unseen signals (rare events) that are out of the background distribution. Unlike supervised anomaly detection, unsupervised anomaly detection does not require annotations and has been widely used in physics21,22, neuroscience23, Earth science24, oceanography25 and astronomy26.

Data annotation

Training supervised models requires datasets with annotated labels that provide supervised information to guide model training and estimate a function or a conditional distribution over target variables from inputs. Pseudo-labelling27 and label propagation28 are enticing alternatives to laborious data labelling, allowing automatic annotation of massive unlabelled datasets based on only a small set of accurate annotations. In biology, techniques that assign functional and structural labels to newly characterized molecules are vital for downstream training of supervised models owing to the difficulty of experimentally generating labels. For instance, less than 1% of sequenced proteins is annotated with biological functions despite the proliferation of next-generation sequencing29. Another strategy for data labelling leverages surrogate models trained on manually labelled data to annotate unlabelled samples and uses these predicted pseudo-labels to supervise downstream predictive models. In contrast, label propagation diffuses labels to unlabelled samples via similarity graphs constructed based on feature embeddings13,30 (Box 1). In addition to automatic labelling, active learning31,32,33 (Box 1) can identify the most informative data points to be labelled by humans or the most informative experiments to be performed. This approach allows models to be trained with fewer expert-provided labels. Another strategy in data annotation is to develop labelling rules that leverage domain knowledge34,35.

Data generation

Deep-learning performance improves with increased quality, diversity and scale36 of training datasets37,38. A fruitful approach to creating better models is to augment training datasets by generating additional synthetic data points through automatic data augmentation and deep generative models. In addition to manually designing such data augmentations (Box 1), reinforcement-learning methods39 can discover a policy for automatic data augmentation40,41 that is flexible and agnostic of downstream models. Deep generative models, including variational autoencoders, generative adversarial networks, normalizing flows and diffusion models, learn the underlying data distribution and can sample training points from the optimized distribution. Generative adversarial networks (Box 1) have proven to be beneficial for scientific images because they synthesize realistic images in many domains ranging from particle collision events42, pathology slides43, chest X-rays44, magnetic resonance contrasts45, three-dimensional (3D) material microstructure46, protein functions47,48 to genetic sequences49. An emerging technique in generative modelling is probabilistic programming50, in which data generation models are expressed as computer programs.

Data refinements

Precision instruments such as ultrahigh-resolution lasers and non-invasive microscopy systems enable direct measurement of physical quantities or indirect measurement by calculating real-world objects, producing highly accurate results. AI techniques have significantly increased measurement resolution, reduced noise and eliminated errors in measuring roundness, resulting in high precision consistent across sites. Examples of AI applications in scientific experiments include visualization regions of spacetime such as black holes5, capturing a physics particle collision51, improving the resolution of live-cell images52 and better detection of cell types across biological contexts53. Deep convolutional methods, which utilize algorithmic advances such as spectral deconvolution54,55, flexible sparsity52 and generative capability56, can transform poor spatiotemporally resolved measurements into high-quality, super-resolved and structured images. An important AI task in various scientific disciplines is denoising, which involves differentiating relevant signals from noise and learning to remove noise. Denoising autoencoders57 can project high-dimensional input data into more compact representations of essential features. These autoencoders minimize the difference between uncorrupted input data points and their reconstruction from the compressed representation of their noise-corrupted version. Other forms of distribution-learning autoencoders, such as variational autoencoders (VAEs; Box 1)58, are also frequently used. VAEs learn a stochastic representation via latent autoencoding that retains essential data features while ignoring non-essential sources of variation, probably representing random noise. For example, in single-cell genomics, autoencoders optimizing count-based vectors of gene activation across millions of cells59 are routinely used to improve protein-RNA expression analyses.

Learning meaningful representations of scientific data

Deep learning can extract meaningful representations of scientific data at various levels of abstraction and optimize them to guide research, often through end-to-end learning (Box 1). A high-quality representation should retain as much information about the data as possible while remaining simple and accessible60. Scientifically meaningful representations are compact21, discriminative61, disentangle underlying factors of variation62 and encode underlying mechanisms that generalize across numerous tasks63,64. Here we introduce three emerging strategies that fulfil these requirements: geometric priors, self-supervised learning and language modelling.

Geometric priors

Integrating geometric priors65 in learned representations has proved effective as geometry and structure play a central role in scientific domains66,67,68. Symmetry is a widely studied concept in geometry69. It can be described in terms of invariance and equivariance (Box 1) to represent the behaviour of a mathematical function, such as a neural feature encoder, under a group of transformations, such as the SE(3) group in rigid body dynamics. Important structural properties, such as the secondary structural content of molecular systems, solvent accessibility, residue compactness and hydrogen-bonding patterns, are invariant to spatial orientations. In the analysis of scientific images, objects do not change when translated in the image, meaning that image segmentation masks are translationally equivariant as they change equivalently when input pixels are translated. Incorporating symmetry into models can benefit using AI with limited labelled datasets, such as 3D RNA and protein structures70,71, by augmenting training samples, and can improve extrapolative prediction to inputs markedly different than those encountered during model training.

Geometric deep learning

Graph neural networks have emerged as a leading approach for deep learning on datasets with underlying geometric and relational structure72,73,74,75,76 (Fig. 2a). In a broader sense, geometric deep learning involves discovering relational patterns65 and equipping neural network models with inductive biases that explicitly make use of localized information encoded in the form of graphs and transformation groups77,78,79 through neural message-passing algorithms80,81,82,83,84. Depending on the scientific problem, various graph representations were developed to capture complex systems85,86,87. Directional edges can facilitate the physical modelling of glassy systems88, hypergraphs with edges connecting multiple nodes are used in chromatin structure understanding89, models trained on multimodal graphs are used to create predictive models in genomics90, and sparse, irregular and highly relational graphs have been applied to a number of Large Hadron Collider physics tasks, including the reconstruction of particles from detector readouts and the discrimination of physics signals against background processes91.

Fig. 2: Learning meaningful representations of scientific data.
figure 2

a, Geometric deep learning integrates information about scientific data’s geometry, structure and symmetry, such as molecules and materials, by leveraging graphs and employing neural message-passing strategies. This approach generates latent representations (embeddings) by exchanging neural messages along edges within graphs while considering other geometric priors, such as invariance and equivariance constraints. As a result, geometric deep learning can incorporate complex structural information into deep-learning models, allowing for a better understanding and manipulation of the underlying geometric datasets. b, To effectively represent diverse samples such as satellite images, it is crucial to capture both their similarities and differences. Self-supervised learning strategies, such as contrastive learning, achieve this by generating augmented counterparts and aligning positive while separating negative pairs. This iterative process enhances the embeddings, leading to informative latent representations and better performance on downstream prediction tasks. c, Masked-language modelling effectively captures the semantics of sequential data, such as natural language and biological sequences. This approach involves feeding masked elements of the input into a transformer block, which includes pre-processing steps, such as positional encodings. The self-attention mechanism, represented by grey lines with colour intensity reflecting the magnitude of attention weights, combines representations of non-masked input to accurately predict the masked input. This approach produces high-quality representations of sequences by repeating this autocompletion process across many elements of the input.

Self-supervised learning

Supervised learning may be insufficient when only a few labelled samples are available for model training or when labelling data for a specific task is prohibitively expensive. In such cases, leveraging both labelled and unlabelled data can improve model performance and learning capacity. Self-supervised learning is a technique that enables models to learn the general features of a dataset without relying on explicit labels. Effective self-supervised strategies include predicting occluded regions of images, forecasting past or future frames in a video, and using contrastive learning to teach the model to distinguish between similar and dissimilar data points92 (Fig. 2b). Self-supervised learning can be a crucial pre-processing step to learn transferable features in large unlabelled datasets92,93,94,95 before fine-tuning models on small labelled datasets to perform downstream tasks. Such pretrained models96,97,98 with a broad understanding of a scientific domain are general-purpose predictors that can be adapted for various tasks, thereby improving label efficiency and surpassing purely supervised methods8.

Language modelling

Masked-language modelling is a popular method for self-supervised learning of both natural language and biological sequences (Fig. 2c). The arrangement of atoms or amino acids (tokens) into structures to produce molecular and biological function is similar to how letters form words and sentences to define the meaning of a document. As both natural language and biological sequence processing continue to evolve, they inform the development of each other. In the training process, the goal is to predict the next token in a sequence, whereas in masked-based training99, the self-supervised task is to recover a masked token in a sequence using a bidirectional sequence context. Protein language models can encode amino acid sequences to capture structural and functional properties100,101 and evaluate the evolutionary fitness of viral variants102. Such representations are transferable across various tasks, ranging from sequence design103,104,105 to structure prediction10,106. In handling biochemical sequences107,108,109, chemical language models facilitate efficient exploration of vast chemical space110,111. They have been used to predict properties112, plan multi-step syntheses113,114 and explore the space of chemical reactions115,116,117.

Transformer architectures

Transformers (Box 1)118 are neural architecture models that can process sequences of tokens by flexibly modelling interactions between arbitrary token pairs, surpassing earlier efforts using recurrent neural networks for sequential modelling. Transformers dominate natural language processing37,99 and have been successfully applied to a range of problems, including earthquake signal detection119, DNA and protein sequence modelling10,120, modelling the effect of sequence variation on biological function100,121, and symbolic regression122. Although transformers unify graph neural networks and language models123,124,125, the run-time and memory footprint of transformers can scale quadratically with the length of sequences, leading to efficiency challenges addressed by long-range modelling120 and linearized attention mechanisms126. As a result, unsupervised or self-supervised generative pretrained transformers, followed by parameter-efficient fine-tuning, are widely used.

Neural operators

Standard neural network models can be inadequate for scientific applications as they assume a fixed data discretization. This approach is unsuitable for many scientific datasets collected at varying resolutions and grids. Moreover, data are often sampled from an underlying physical phenomenon in a continuous domain, such as seismic activity or fluid flow. Neural operators learn representations invariant to discretization by learning mappings between function spaces127,128. Neural operators are guaranteed to be discretization invariant, meaning that they can work on any discretization of inputs and converge to a limit upon mesh refinement. Once neural operators are trained, they can be evaluated at any resolution without the need for re-training. In contrast, the performance of standard neural networks can degrade when data resolution during deployment changes from model training.

AI-based generation of scientific hypotheses

Testable hypotheses are central to scientific discovery. They can take many forms, from symbolic expressions in mathematics to molecules in chemistry and genetic variants in biology. Formulating meaningful hypotheses can be a laborious process, as exemplified by Johannes Kepler, who spent four years analysing stellar and planetary data before arriving at a hypothesis that led to the discovery of the laws of planetary motion129. AI methods can be helpful at several stages of this process. They can generate hypotheses by identifying candidate symbolic expressions from noisy observations. They can help design objects, such as a molecule that binds to a therapeutic target130 or a counterexample that contradicts a mathematical conjecture9, suggesting experimental evaluation in the laboratory. Moreover, AI systems can learn a Bayesian posterior distribution (Box 1) of hypotheses and use it to generate hypotheses compatible with scientific data and knowledge131.

Black-box predictors of scientific hypotheses

Identifying promising hypotheses for scientific enquiry requires efficiently examining many candidates and selecting those that can maximize the yield of downstream simulations and experimentation. In drug discovery, high-throughput screening can assess thousands to millions of molecules, and algorithms can prioritize which molecules to investigate experimentally132. Models can be trained to anticipate the utility of an experiment, such as relevant molecular properties133,134 or symbolic formulas that fit the observations122. However, experimental ground-truth data for these predictors may be unavailable for many molecules. Thus, weak supervision-learning approaches (Box 1) can be used to train these models, where noisy, limited or imprecise supervision is used as a training signal. These serve as a cost-effective proxy for annotations from human experts, expensive in silico calculations or higher-fidelity experiments (Fig. 3a).

Fig. 3: AI-guided generation of scientific hypotheses.
figure 3

a, High-throughput screening involves using AI predictors trained on experimentally generated datasets to select a small number of screened objects with desirable properties, thus reducing the size of the total candidate pool by orders of magnitude. This approach can leverage self-supervised learning to pre-train predictors on vast amounts of unscreened objects, followed by fine-tuning predictors on datasets of screened objects with labelled readouts. Laboratory evaluations and uncertainty quantification can refine this approach to streamline the screening process, making it more cost effective and time efficient, ultimately accelerating the identification of candidate chemical compounds, materials and biomolecules. b, The AI navigator employs rewards predicted by reinforcement-learning agents and design criteria, such as Occam’s razor, to focus on the most promising elements of a candidate hypothesis during symbolic regression. Shown is an example illustrating the inference of the mathematical expression representing Newton’s gravitational law. The low-scoring search routes are shown as grey branches in the symbolic expression tree. Guided by actions associated with the highest predicted rewards, this iterative process converges towards mathematical expressions consistent with the data and satisfying other design criteria. c, AI differentiators are autoencoder models that map discrete objects, such as chemical compounds, to points in a differentiable, continuous latent space. This space allows for the optimization of the objects, such as selecting compounds from a vast chemical library that maximize a specific biochemical endpoint. The idealized landscape plot depicts the learned latent space, with deeper colours indicating regions enriched for objects with higher predicted scores. By leveraging this latent space, the AI differentiator can efficiently identify objects that maximize the desired property indicated by the red star.

AI methods trained on high-fidelity simulations have been used to efficiently screen large libraries of molecules, such as 1.6 million organic-light-emitting-diode material candidates133 and 11 billion synthon-based ligand candidates134. In genomics, transformer architectures trained to predict gene expression values from DNA sequences can help prioritize genetic variants120. In particle physics, identifying intrinsic charm quarks in protons involves screening all possible structures and fitting experimental data on each candidate structure135. To further increase the efficiency of these processes, AI-selected candidates can be sent to medium or low-throughput experiments for continual refinement of candidates using experimental feedback. The results can be fed back into the AI models using active learning136 and Bayesian optimization137 (Box 1), allowing the algorithms to refine their predictions and focus on the most promising candidates.

AI methods have become invaluable when hypotheses involve complex objects such as molecules. For instance, in protein folding, AlphaFold210 can predict the 3D atom coordinates of proteins from amino acid sequences with atomic accuracy, even for proteins whose structure is unlike any of the proteins in the training dataset. This breakthrough has led to the development of various AI-driven protein-folding methods, such as RoseTTAFold106. In addition to forward problems, AI approaches are increasingly used for inverse problems that aim to understand the causal factors that produced a set of observations. Inverse problems, such as inverse folding or fixed backbone design, can predict the amino acid sequence from the protein’s backbone 3D atom coordinates using a black-box predictor trained on millions of protein structures105. However, such black-box AI predictors require large training datasets and offer limited interpretability despite reducing the dependence on the availability of prior scientific knowledge.

Navigating combinatorial hypothesis spaces

Although sampling all the hypotheses compatible with the data is daunting, a manageable goal is to search for a single good one, which can be formulated as an optimization problem. Instead of traditional methods that rely on manually engineered rules138, AI policies can be used to estimate the reward of each search and prioritize search directions with higher values. An agent trained by a reinforcement-learning algorithm is typically employed to learn the policy. The agent learns to take actions in the search space that maximize a reward signal, which can be defined to reflect the quality of the generated hypotheses or other relevant criteria.

To solve the optimization problem, a symbolic regression task can be solved using evolutionary algorithms, which generate random symbolic laws as the initial set of solutions. Within each generation, slight variations are imposed on candidate solutions. The algorithm checks whether any modification produced a symbolic law that fits the observations better than prior solutions, keeping the best ones for the next generation139. However, reinforcement-learning approaches are increasingly replacing this standard strategy. Reinforcement learning uses neural networks to generate a mathematical expression sequentially by adding mathematical symbols from a predefined vocabulary and using the learned policy to decide which notation symbol to be added next140. The mathematical formula is represented as a parse tree. The learned policy takes the parse tree as input to determine what leaf node to expand and what notation (from the vocabulary) to add (Fig. 3b). Another approach for using neural networks to solve mathematical problems is transforming a mathematical formula into a binary sequence of symbols. A neural network policy can then probabilistically and sequentially grow the sequence one binary character at a time6. By designing a reward that measures the ability to refute the conjecture, this approach can find a refutation to a mathematical conjecture without prior knowledge about the mathematical problem.

Combinatorial optimization also applies to tasks such as discovering molecules with desirable pharmaceutical properties, where each step in molecular design is a discrete decision-making process. In this process, a partially generated molecular graph is given as input to the learned policy, making discrete choices on where to add a new atom and which atom to add at the selected position in the molecule. By iteratively performing this process, the policy can generate a series of possible molecular structures evaluated based on their fitness to the target properties. The search space is too vast to explore all possible combinations, but reinforcement learning can efficiently guide the search by prioritizing the most promising branches worth investigating141,142,143,144,145. Reinforcement-learning methods can be trained with a training objective that encourages the resulting policy to sample from all reasonable solutions (with a high reward) rather than to focus on a single good solution, as is the case with standard reward maximization in reinforcement learning144,145,146. These reinforcement-learning approaches have been successfully applied to various optimization problems, including maximizing protein expression147, planning hydropower to reduce adverse impact in the Amazon Basin148 and exploring the parameter space of particle accelerators33.

Policies learned by AI agents have foresighted actions that seemed unconventional initially but proved to be effective149. For instance, in mathematics, supervised models can identify patterns and relations between mathematical objects and help guide intuition and propose conjectures9. These analyses have pointed to previously unknown patterns or even new models of the world. However, reinforcement-learning methods may not generalize well to unseen data during model training, as the agent may get stuck in a local optimum once it finds a sequence of actions that work well. To improve generalization, some exploration strategy is required to collect broader search trajectories that could help the agent perform better in new and modified settings.

Optimizing differentiable hypothesis spaces

Scientific hypotheses often take the form of discrete objects, such as symbolic formulas in physics or chemical compounds in pharmaceutical and materials science. Although combinatorial optimization techniques have been successful for some of these problems, a differentiable space can also be used for optimization as it is amenable to gradient-based methods, which can efficiently find local optima. To enable the use of gradient-based optimization, two approaches are frequently used. The first is to use models such as VAEs to map discrete candidate hypotheses to points in a latent differentiable space. The second approach is to relax discrete hypotheses into differentiable objects that can be optimized in the differentiable space. This relaxation can take different forms, such as replacing discrete variables with continuous ones or using a soft version of the original constraints.

Applications of symbolic regression in physics use grammar VAEs150. These models represent discrete symbolic expressions as parse trees using context-free grammar and map the trees into a differentiable latent space. Bayesian optimization is then employed to optimize the latent space for symbolic laws while ensuring that the expressions are syntactically valid. In a related study, Brunton and colleagues151 introduced a method for differentiating symbolic rules by assigning trainable weights to predefined basis functions. Sparse regression was used to select a linear combination of the basis functions that accurately represented the dynamic system while maintaining compactness. Unlike equivariant neural networks, which use a predefined inductive bias to enforce symmetry, symmetry can be discovered as the characteristic behaviour of a domain. For instance, Liu and Tegmark152 described asymmetry as a smooth loss function and minimized the loss function to extract previously unknown symmetries. This approach was applied to uncover hidden symmetries in black-hole waveform datasets, revealing unexpected space–time structures that were historically challenging to find.

In astrophysics, VAEs have been used to estimate gravitational-wave detector parameters based on pretrained black-hole waveform models. This method is up to six orders of magnitude faster than traditional methods, making it practical to capture transient gravitational-wave events153. In materials science, thermodynamic rules are combined with an autoencoder to design an interpretable latent space for identifying phase maps of crystal structures154. In chemistry, models such as simplified molecular-input line-entry system (SMILES)-VAE155 can transform SMILES strings, which are molecular notations of chemical structures in the form of a discrete series of symbols that computers can easily understand, into a differentiable latent space that can be optimized using Bayesian optimization techniques (Fig. 3c). By representing molecular structures as points in the latent space, we can design differentiable objectives and optimize them using self-supervised learning to predict molecular properties based on latent representations of molecules. This means that we can optimize discrete molecular structures by backpropagating gradients of the AI predictor all the way to the continuous-valued representation of molecular inputs. A decoder can turn these molecular representations into approximately corresponding discrete inputs. This approach is used in the design of proteins156 and small molecules157,158.

Performing optimization in the latent space can more flexibly model underlying data distributions than mechanistic approaches in the original hypothesis space. However, extrapolative prediction in sparsely explored regions of the hypothesis space can be poor. In many scientific disciplines, hypothesis spaces can be vastly larger than what can be examined through experimentation. For instance, it is estimated that there are approximately 1060 molecules, whereas even the largest chemical libraries contain fewer than 1010 molecules12,159. Therefore, there is a pressing need for methods to efficiently search through and identify high-quality candidate solutions in these largely unexplored regions.

AI-driven experimentation and simulation

Evaluating scientific hypotheses through experimentation is critical to scientific discovery. However, laboratory experiments can be costly and impractical. Computer simulations have emerged as a promising alternative, offering the potential for more efficient and flexible experimentation. While simulations rely on handcrafted parameters and heuristics to imitate real-world scenarios, they require a trade-off between accuracy and speed compared with physical experimentation, necessitating understanding the underlying mechanisms. However, with the advent of deep learning, these challenges are being addressed by identifying and optimizing hypotheses for efficient testing and empowering computer simulations to link observations with hypotheses.

Efficient evaluation of scientific hypotheses

AI systems have provided experimental design and optimization tools, which can enhance traditional scientific methods, decrease the number of experiments needed and save resources. Specifically, AI systems can assist with two essential steps of experimental testing: planning and steering. In traditional approaches, these steps often require trial and error, which can be inefficient, costly and even life-threatening at times160. AI planning provides a systematic approach to designing experiments, optimizing their efficiency and exploring uncharted territory. At the same time, AI steering directs experimental processes towards high-yield hypotheses, allowing the system to learn from previous observations and adjust the course of experiments. These AI approaches can be model based, using simulations and prior knowledge, or model free, based on machine-learning algorithms alone.

AI systems can aid in the planning of experiments by optimizing the use of resources and reducing unnecessary investigations. Unlike hypothesis searching, experimental planning pertains to the procedures and steps involved in the design of scientific experiments. One example is synthesis planning in chemistry. Synthesis planning involves finding a sequence of steps by which a target chemical compound can be synthesized from available chemicals. AI systems can design synthetic routes to a desired chemical compound, reducing the need for human intervention161,162. Active learning has also been employed in materials discovery and synthesis32,163,164,165. Active learning involves iteratively interacting with and learning from experimental feedback to refine hypotheses. Material synthesis is a complex and resource-intensive process that requires efficient exploration of high-dimensional parameter space. Active learning uses uncertainty estimation to explore the parameter space and reduce uncertainty with as few steps as possible165.

During an ongoing experiment, decision-making must often be adapted in real time. However, this process can be difficult and error prone when driven solely by human experience and intuition. Reinforcement learning provides an alternative approach that can continually react to the evolving environment and maximize the safety and success of the experiments. For example, reinforcement-learning approaches have proven to be effective for magnetic control of tokamak plasmas, where the algorithm interacts with the tokamak simulator to optimize a policy for controlling the process166 (Fig. 4a). In another study, a reinforcement-learning agent used real-time feedback such as wind speed and solar elevation to control a stratospheric balloon and find favourable wind currents for navigation14. In quantum physics, experiment design needs to be dynamically adjusted as the best choice for a future materialization of a complex experiment can be counterintuitive. Reinforcement-learning methods can overcome this issue by iteratively designing the experiment and receiving feedback from it. For instance, reinforcement-learning algorithms have been used to optimize the measurement and control of quantum systems, where they improve experimental efficiency and accuracy167.

Fig. 4: Integration of AI with scientific experiments and simulation.
figure 4

a, Leveraging AI for nuclear fusion control of complex and dynamic systems: Degrave et al.166 developed an AI controller to regulate nuclear fusion through magnetic fields in a tokamak reactor. The AI agent receives real-time measurements of electrical voltage levels and plasma configurations and takes actions to control the magnetic field and meet experimental targets, such as maintaining a functional power supply. The controller is trained using simulations with a reward function to update model parameters. b, In computational simulations of complex systems, AI systems can accelerate the detection of rare events, such as transitions between different conformational structures of a protein. Wang et al.169 used a neural-network-based uncertainty estimator to guide the addition of potentials that compensate for the original potential energy, allowing the system to escape local minima (in grey) and explore a configuration space more rapidly. This approach, illustrated here, can enhance the efficiency and accuracy of simulations, leading to a deeper understanding of complex biological phenomena. c, A neural framework for solving partial differential equations, where the AI solver is a physics-informed neural network trained to estimate target function f. The derivative of variable x is calculated by automatically differentiating the neural network’s outputs. When the expression for the differential equation is unknown (parameterized by η), it can be estimated by solving a multi-objective loss that optimizes both the functional form of the equation and its fit to observations y. Credit: Nuclear fusion icon in a, iStockphoto/VectorMine.

Deducing observables from hypotheses using simulations

Computer simulation is a powerful tool to deduce observables from hypotheses, enabling the evaluation of hypotheses that are not directly testable. However, existing simulation techniques heavily rely on human understanding and knowledge about the underlying mechanisms of the studied systems, which can be suboptimal and inefficient. AI systems can enhance computer simulation with more accurate and efficient learning by better fitting key parameters of complex systems, solving differential equations that govern complex systems and modelling states in complex systems.

Scientists often study complex systems by creating a model that involves parameterized forms, which requires domain knowledge to identify initial symbolic expressions for the parameters. An example is molecular force fields, which are interpretable but limited in their ability to represent a wide range of functions and require strong inductive biases or scientific knowledge to generate. To improve the accuracy of molecular simulations, an AI-based neural potential that fits expensive yet accurate quantum-mechanical data has been developed to replace traditional force fields16,168. In addition, uncertainty quantification has been used to locate the energy barrier in the high-dimensional free-energy surface, thereby improving the efficiency of molecular dynamics169 (Fig. 4b). For coarse-grained molecular dynamics, AI models have been utilized to reduce the computational cost for large systems by determining the degree to which the system needs to be coarsened from the learned hidden complex structures170. In quantum physics, neural networks have replaced manually estimated symbolic forms in parameterizing wave functions or density functionals due to their flexibility and ability to fit the data accurately171,172.

Differential equations are crucial for modelling complex systems’ dynamics in space and time. In contrast to numerical algebra solvers, AI-based neural solvers integrate data and physics more seamlessly173,174. These neural solvers combine physics with the flexibility of deep learning by grounding neural networks in domain knowledge (Fig. 4c). AI methods have been applied to solve differential equations in various fields, including computational fluid dynamics175, predicting the structures of glassy systems88, solving stiff chemical kinetic problems176 and solving the Eikonal equation to characterize the travel times of seismic waves177,178. In dynamics modelling, continuous time can be modelled by neural ordinary differential equations179. Neural networks can parameterize solutions of Navier–Stokes equations in a spatiotemporal domain using physics-informed losses180. However, standard convolutional neural networks have limited ability to model fine-structured characteristics of the solution. This issue can be addressed by learning operators that model mappings between functions using neural networks127,181. In addition, solvers must be able to adapt to different domains and boundary conditions. This can be achieved by combining neural differential equations with graph neural networks to discretize arbitrary by graph partitioning182.

Statistical modelling is a powerful tool to provide a full quantitative description of complex systems by modelling the distributions of states in those systems. Owing to its capability to capture highly complex distributions, deep generative modelling has recently emerged as a valuable approach in complex system simulations. One well known example is the Boltzmann generator183 based on normalizing flows184,185 (Box 1). Normalizing flows can map any complex distribution to a prior distribution (for example, a simple Gaussian) and back using a series of invertible neural networks. Although computationally expensive (often requiring hundreds or thousands of neural layers), normalizing flows provide an exact density function, which enables sampling and training. Unlike conventional simulations, normalizing flows can generate equilibrium states by directly sampling from the prior distribution and applying the neural network, which has a fixed computational cost. This enhances sampling in the lattice field186 and gauge theories187 and improves Markov chain Monte Carlo methods188 that otherwise might not converge due to mode mixing189,190,191.

Grand challenges

To harness scientific data, models must be built and employed with simulation and human expertise. Such integration has opened up opportunities for scientific discovery. However, to further enhance the impact of AI across scientific disciplines, significant progress is needed in theory, methods, software and hardware infrastructure. Cross-disciplinary collaborations are crucial to realize a comprehensive and practical approach towards advancing science through AI.

Practical considerations

Scientific datasets are often not directly amenable to AI analyses because of measurement technology limitations that produce incomplete datasets and biased or conflicting readouts, and limited accessibility owing to privacy and safety concerns. Standardized and transparent formats are required to alleviate the workload of data processing159,192,193,194,195,196. Model cards197 and datasheets198 are examples of efforts to document the operating characteristics of scientific datasets and models. In addition, federated learning199,200 and cryptographic201 algorithms can be used to prevent releasing sensitive data with high commercial value to the public domain. Leveraging open scientific literature, natural language processing and knowledge graph techniques can facilitate literature mining to support material discovery15, chemistry synthesis202 and therapeutic science203.

The use of deep learning poses a complex challenge for human-in-the-loop AI-driven design, discovery and evaluation. To automate scientific workflows, optimize large-scale simulation codes and operate instruments, autonomous robotic control can leverage predictions and conduct experiments on high-throughput synthesis and testing lines, creating self-driving laboratories. The early application of generative models in materials exploration suggests that millions of possible materials could be identified with desired properties and functions and evaluated for synthesizability. For instance, King et al.204 combined logical AI and robotics to autonomously generate functional genomics hypotheses about yeast and experimentally test the hypotheses using laboratory automation. In chemical synthesis, AI optimizes candidate synthesis routes, followed by robots steering chemical reactions in predicted synthesis routes7.

The practical implementation of an AI system involves complex software and hardware engineering, requiring a series of interdependent steps that go from data curation and processing to algorithm implementation and design of user and application interfaces. Minor variations in implementation can lead to considerable changes in performance and impact the success of integrating AI models within scientific practice. Therefore, both data and model standardization needs to be considered. AI approaches can suffer from reproducibility due to the stochastic nature of model training, varying model parameters and evolving training datasets, which are both data dependent and task dependent. Standardized benchmarks and experimental design can alleviate such issues205. Another direction towards improving reproducibility is through open-source initiatives that release open models, datasets and education programmes4,130,206,207.

Algorithmic innovations

To contribute to scientific understanding or acquire it autonomously, algorithmic innovation is required to establish a foundational ecosystem with the most appropriate algorithms for use throughout the scientific process.

The question of out-of-distribution generalization is at the frontier of AI research. A neural network trained on data from a specific regime may discover regularities that do not generalize in a different regime whose underlying distribution has shifted (Box 1). Although many scientific laws are not universal, their applicability is generally broad. Compared with state-of-the-art AI, human brains can better and faster generalize to modified settings. An attractive hypothesis is that this is because humans build not just a statistical model of what they observe but a causal model, that is, a family of statistical models indexed by all possible interventions (for example, different initial states, actions of agents or different regimes). Incorporating causality in AI is still a young field208,209,210,211,212 where much remains to be done. Techniques such as self-supervised learning have great potential for scientific problems because they can leverage massive unlabelled data and transfer their knowledge to low-data regimes. However, current transfer-learning schemes can be ad hoc, lack theoretical guidance213 and are vulnerable to shifts in underlying distributions214. Although preliminary attempts have addressed this challenge215,216, more exploration is needed to systematically measure transferability across domains and prevent negative transfer. Moreover, to address the difficulties that scientists care about, the development and evaluation of AI methods must be done in real-world scenarios, such as plausibly realizable synthesis paths in drug design217,218, and include well calibrated uncertainty estimators to assess the model’s reliability before transitioning it to real-world implementation.

Scientific data are multimodal and include images (such as black-hole images in cosmology), natural language (such as scientific literature), time series (such as thermal yellowing of materials), sequences (such as biological sequences), graphs (such as complex systems) and structures (such as 3D protein–ligand conformations). For instance, in high-energy physics, jets are collimated sprays of particles produced from quarks and gluons at high energy. Identifying their substructures from radiation patterns can aid in the search for new physics. The jet substructures can be described by images, sequences, binary trees, generic graphs and sets of tensors18. Although using neural networks to process images has been extensively researched, processing particle images alone is insufficient. Similarly, using other representations of jet substructures in isolation cannot give a holistic and integrated systems view of the complex system219. Although integrating multimodal observations remains a challenge, the modular nature of neural networks implies that distinct neural modules can transform diverse data modalities into universal vector representations220,221.

Scientific knowledge, such as rotational equivariance in molecules77, equality constraints in mathematics182, disease mechanisms in biology222 and multi-scale structures in complex systems223,224, can be incorporated into AI models. However, which principles and knowledge are most helpful and practical to implement is still unclear. As AI models require massive data to fit, incorporating scientific knowledge into models can aid learning when datasets are small or sparsely annotated. Therefore, research must establish principled methods for integrating knowledge into AI models and understanding the trade-offs between domain knowledge and learning from measured data.

AI methods often operate as black boxes, meaning that users cannot fully explain how outputs have been generated and what inputs have been critical in producing the outputs. Black-box models can decrease user trust in predictions and have limited applicability in areas where model outputs must be understood before real-world implementation225,226,227, such as in human space exploration228, and where predictions inform policy, such as in climate science229. Transparent deep-learning models remain elusive230 despite a plethora of explainability techniques231,232,233. However, the fact that human brains can synthesize high-level explanations, even if imperfect, that can convince other humans offers hope that by modelling phenomena at similarly high levels of abstraction, future AI models will provide interpretable explanations at least as valuable as those offered by human brains. This also suggests that studying higher-level cognition could potentially inspire future deep-learning models to incorporate both current deep-learning abilities and the abilities to manipulate verbalizable abstractions, reason causally, and generalize out of distribution.

Conduct of science and scientific enterprise

Looking towards the future, the demand for AI expertise will be influenced by two forces. First, the existence of problems that are on the verge of benefiting from the application of AI—such as self-driving labs. Second, the ability of intelligent tools to enhance the state-of-the-art and create new opportunities—such as examining biological, chemical or physical processes that happen at length and time scales not accessible in experiments. On the basis of these two forces, we anticipate that research teams will change in composition to include AI specialists, software and hardware engineers, and novel forms of collaboration involving government at all levels, educational institutions and corporations. Recent state-of-the-art deep-learning models continue to grow in size10,234. These models consist of millions or even billions of parameters and have experienced a tenfold increase in size year on year. Training these models involves transmitting data through complex parameterized mathematical operations, with parameters updated to nudge the model outputs towards the desired values. However, the computational and data requirements to calculate these updates are colossal, resulting in a large energy footprint and high computational costs. As a result, big tech companies have heavily invested in computational infrastructure and cloud services, pushing the limits on scale and efficiency. Although for-profit and non-academic organizations have access to vast computational infrastructure, higher-education institutions can be better integrated across multiple disciplines. Furthermore, academic institutions tend to host unique historical databases and measurement technology that might not exist elsewhere but are necessary for AI4Science. These complementary assets have facilitated new modes of industry–academia partnerships, which can impact the selection of research questions pursued.

As AI systems approach performance that rivals and surpasses humans, employing it as a drop-in replacement for routine laboratory work is becoming feasible. This approach enables researchers to develop predictive models from experimental data iteratively and select experiments to improve them without manually performing laborious and repetitive tasks217,235. To support this paradigm shift, educational programmes are emerging to train scientists in designing, implementing and applying laboratory automation and AI in scientific research. These programmes help scientists understand when the use of AI is appropriate and to prevent misinterpreted conclusions from AI analyses.

The misapplication of AI tools and misinterpretation of their results can have significant negative impacts236. The broad range of applications compounds these risks237. However, the misuse of AI is not solely a technological problem; it also depends on the incentives of those leading AI innovation and investing in AI implementation. Establishing ethics review processes and responsible implementation tactics is essential, including a comprehensive overview of the scope and applicability of AI238. Furthermore, security risks associated with AI must be considered, as it has become easier to repurpose algorithm implementations for dual use237. As algorithms are adaptable to a broad range of applications, they can be developed for one purpose but used for another, creating vulnerabilities to threats and manipulation.

Conclusion

AI systems can contribute to scientific understanding, enable the investigation of processes and objects that cannot be visualized or probed in any other way, and systematically inspire ideas by building models from data and combining them with simulation and scalable computing. To realize this potential, safety and security concerns that come with the use of AI must be addressed through responsible and thoughtful deployment of the technology. To use AI responsibly in scientific research, we need to measure the levels of uncertainty, error, and utility of AI systems. This understanding is essential for accurately interpreting AI outputs and ensuring that we do not rely too heavily on potentially flawed results. As AI systems continue to evolve, prioritizing reliable implementation with proper safeguards in place is key to minimizing risks and maximizing benefits. AI has the potential to unlock scientific discoveries that were previously out of reach.