The demonstration of DeepMind’s AlphaFold2 at the 14th Critical Assessment of Structure Prediction (CASP14) competition1 and its subsequent release to the community ushered in a new era of structural biology in which models of many soluble protein structures could be predicted with a level of accuracy comparable to that achieved with expensive and time-consuming lab experiments. Subsequent advances, including the release of the AlphaFold Protein Structure Database and the development of other similar computational methods such as RoseTTAFold2 or extensions such as AlphaFold Multimer3 and the recent report of AphaFold3 (ref. 4), have expanded the computational toolbox and the range of structures available to the community. These developments mean that for many proteins a structure is readily available without the necessity of obtaining experimental data. Furthermore, the AlphaFold Protein Structure Database and the development of open-access servers such as ColabFold5 and OpenFold6 have ensured that researchers do not need to set up their own dedicated servers.

In a Comment, Matthieu Schapira et al. discuss the challenges in using predicted protein model structures in virtual drug screens and explore the limitations that remain for computation- and structure-guided drug discovery. They also outline their thoughts on the importance of public-domain data and remind the reader that the vast number of experimental protein structures that are publicly available in the Protein Data Bank was essential for the development of AlphaFold. They also call for a benchmarking exercise in small-molecule computational drug design that is analogous to the CASP competition, which would help drive advances in the field of computational drug design. The team finish by providing an outlook on the use of predictive technologies in drug discovery.

Drug discovery is a costly process in which most lead compounds fail at some point in the discovery pipeline. In their Perspective on machine learning in preclinical drug discovery, Denise B. Catacutan et al. look at the state of the field in applying machine learning methods throughout the preclinical phases of drug discovery to accelerate initial hit identification, uncover the mechanism of action and help optimize pharmacochemical properties. They discuss how deep learning models can be used to analyze structure–activity relationships and how generative deep learning approaches can expand the chemical search space for virtual screening of in silico chemical libraries. Finally, they comment on how diffusion models can be used for protein docking.

Despite the huge and undoubted success of AlphaFold2, there are cases in which the predicted model is chemically or energetically unlikely, or the confidence in the modeled structure is not high, showcasing that there is still a role for experimental data. In a Perspective, Vinayak Agarwal and Andrew C. McShan outline their views on how AlphaFold2 models can be integrated with experimental data such as small-angle X-ray scattering (SAXS), solution NMR spectroscopy, cryo-electron microscopy (cryo-EM) and X-ray diffraction. They also discuss evaluation of AlphaFold2 model quality and reliability, as well as examples in which the predicted model is considerably different from the experimentally observed structure. They advocate that, although the current deep learning approaches can provide accurate working models for most well-folded globular proteins, caution must be taken for other classes of proteins, providing a useful reminder that proteins adopt an ensemble of conformations.

AlphaFold2 was of course not the first computational method that enabled the modeling of protein structure. Getting to this point required numerous advances over many years and yet there are still a number of challenges that remain. Proteins are not static structures and expanding the computational toolbox to cover the intricacies of protein dynamics and (partial) protein unfolding could unlock new chemical reactivity in designed enzymes or facilitate the design of new drugs or other ligands. There has already been progress in identifying unfolded proteins or regions of proteins based on the areas of sequence for which structure prediction software struggles to identify a robust structure7; however, identifying cooperative movements, and the timescales involved, still relies on computationally intensive molecular dynamics simulations. Achieving a more complete understanding of protein dynamics, the effect of solution conditions and providing a ‘protein dynamics bank’ will require the development of new tools that do not require the same level of computation as current approaches. Calculations of chemical reactivity is another area in which future advances could lessen the computational requirements. It will be interesting to see how new machine learning models and other technologies are developed to address these challenges.

The recently announced AlphaFold3 (Ref. 4) includes an updated diffusion-based architecture and extended protein structure prediction to enable the modeling of protein complexes, such as proteins with nucleic acids, small-molecule ligands, metal ions, and proteins that present modified residues. The pace of development in this area is high with new algorithms providing much higher accuracy and an expanded range of features. Undoubtably these advances will facilitate new chemical biology insights and enable the realization of practical applications that require knowledge of biomolecular structures. Methods that enable the reliable prediction of protein binders and ligands and take into account the inherent flexibility of proteins and protein dynamics will improve virtual drug screening approaches, easing the identification of promising lead compounds.

Computational approaches can also now go well beyond just predicting structure. Generative artificial intelligence (AI) approaches that use deep neural networks or large language models can design desired structural features into proteins via approaches such as protein hallucination8. Currently, computationally designed enzymes often require extensive experimental optimization to achieve sufficient stability, activity and selectivity for their envisaged application. This requirement for experimental engineering shows that there is still plenty of room for improvements in computational design. Although it is difficult to predict what breakthroughs will come next, it is clear that future advances in machine learning methods will facilitate the generation of an enormous amount of new chemical biology insight and help with the development of new drugs.