The latest structure prediction methods, most prominently AlphaFold2 and RoseTTAFold1,2, have reduced the effort needed to obtain a structural model from months or years of laboratory work to a few keystrokes. While the predicted models do not reproduce experimental results in every case, this leap is dramatic enough to have provoked a series of self-reflective commentaries in the structural community3,4,5, some directly debating whether structural biology is ‘solved’ or not6,7.

AlphaFold2 and its contemporaries aim to predict a single structure per sequence, yet proteins do not adopt a unique structural state. They move, and while not all motion is biologically meaningful, many lines of evidence have shown specific motions are necessary for protein function8,9. Nuclear magnetic resonance (NMR), crystallography and cryo-electron microscopy (cryo-EM) have each been used to measure such motions10,11,12, which are often conceptualized as an energy landscape that describes the distribution of conformations a protein adopts and the rates at which those conformations interconvert13.

While the dynamic nature of proteins is widely accepted, the fact that structural heterogeneity manifests regularly in experimental structures — those in the Protein Data Bank (PDB) today — appears underappreciated. Indeed, it is common to hear talk of the structure of a particular protein, reflecting our biased thinking. This may be a byproduct of the expense of determining even a single structure for a given sequence. The PDB is strongly skewed toward single structures per sequence (Supplementary Fig. 1). Nonetheless, whether a structure is derived from crystallography, cryo-EM or NMR data, it would be better to speak of it as a structure, one of many possible conformations.

Experimentally determined structures are not unique given a sequence

A distribution of structures can be determined from experimental data in one of two ways. First, multiple conformations are often required to model diffraction data, cryo-EM images or NMR nuclear Overhauser effects from a single dataset. All three techniques average over an ensemble of many molecules to produce a signal. These averages are blurred by the structural differences from molecule to molecule, requiring models that incorporate structural heterogeneity to explain the data. This is accomplished by means such as B-factors, multiple classes, alternative conformers and ensemble models. Irrespective of the technique, it is essential to model the conformational heterogeneity to obtain satisfactory agreement with the measured experimental data.

A second level of variability exists. Repeated structural measurements of the same protein can yield different structures. While this may seem concerning, often these variations are intentional. For instance, different solution conditions can frequently be used to produce crystals with different lattices. The changes in packing can generate different structures14. While the structural changes are sometimes subtle, they exceed the coordinate error of the data. Alternatively, some structures may be determined in the presence of specific ligands, cofactors, ions or other perturbations that change the protein structure from one experiment to the next15. Studies over the past decade show the temperature of data collection can alter the observed protein structure16. Additional variability arises from the experiments themselves: in crystallography, for instance, radiation damage incurred during data collection can do the same17. Even when these factors are controlled, variability persists. For instance, protein crystals are typically manually handled and plunge-frozen in liquid nitrogen before data collection, inducing idiosyncratic mechanical strain on the crystals lattice and frequently altering the determined structure.

The fact that a single sequence may give rise to many valid protein structures has implications for structure prediction. Because experimental structure determination can yield multiple outcomes, the accuracy of any single experiment is not an appropriate benchmark of success for structure prediction algorithms. Since no experimental data or model is perfect, every structure in the PDB will have some errors in the reported coordinate positions. There is some set of acceptable models that are ‘within error’ for any given experiment. We might feel reasonable in saying the prediction has achieved experimental accuracy if a predicted structure’s atomic coordinates are within this error and that it has failed if it is outside this error limit. This would be too strict, however. Determining the structure of the same sequence but under different experimental conditions can produce appreciably different structures. These different structures would not be considered within error by any reasonable method of quantifying that error. Instead, if a predicted structure is contained within the set of possible experimental outcomes, it should be considered to have achieved experimental accuracy, even if no documented experiment precisely matches that result.

AlphaFold2 has reached the frontier of the single-structure approximation

The main protease (Mpro) from SARS-CoV-2 provides an excellent case study, highlighting why a single experiment cannot be used to assess the experimental accuracy of a prediction. During viral replication, much of the viral genome is synthesized into long polyproteins. Mpro processes these into individual proteins and is essential for viral function. Intense interest in this system as a drug discovery target since late 2019 has resulted in a large number of high-quality structures of this system. As of June 2022, 452 structures of full-length, wild type Mpro were registered in the PDB, spanning many different crystallization conditions, bound ligands and data collection temperatures. They show a correspondingly rich structural distribution (Fig. 1).

Fig. 1: AlphaFold2’s model of Mpro is on the frontier of the set of experimentally determined structures.
figure 1

a, The Mpro functional unit, a dimer (gray surface), is shown with the backbone traces of 64 ligand-free structures deposited in the PDB (blue) aligned to AlphaFold2’s model (magenta). b, R.m.s. deviation between all pairs of Mpro PDB entries (blue) and from AlphaFold2’s model to each PDB entry (yellow). Top: filtered for Mpro structures without specific ligands bound (“ligands” excludes crystallization reagents, buffers, salts). Bottom: all Mpro PDB entries. RMSDs are computed for all non-hydrogen protein atoms modeled in all structures. PDB IDs included are those with 100% sequence identity match to PDB ID 7AR6; a list is included in the Supplementary Information.

An AlphaFold2 model of Mpro compares favorably to this distribution. The smallest root mean square deviation (RMSD) between the AlphaFold2 model and any PDB entry is 1.2 Å (PDB ID 7VLP, RMSD of non-hydrogen atoms modeled in all structures). The largest is 2.0 Å (PDB ID 7T46). Compare this to 1.75 Å, the largest RMSD between any pair of Mpro PDB depositions. Notably, therefore, on the basis of this simple metric, AlphaFold2 provides — in a few cases — a more accurate prediction of an experimental structure than would be provided by a different experimental structure.

The AlphaFold2 model is not in the geometric center of the experimental distribution (Fig. 1). On average, two randomly selected experimental structures will be more similar to each other than to the structure prediction. Still, the experimental-to-experimental distribution of atomic RMSDs is large enough that it overlaps with the experimental-to-AlphaFold2 distribution. This result holds even when the experimental set is restricted to structures with no specific ligands bound (Fig. 1). AlphaFold2 has reached the frontier of the set of experimentally determined structures.

If a predicted structure is of sufficient quality to be contained in the set of experimental outcomes, this has direct implications for applications in which one might substitute a prediction for an experimental structure. Consider structure-based drug discovery, where atomically resolved structures are desirable. For targets with high-quality structure predictions and structural variability in the ligand-binding site, in silico ligand screening procedures that employ a rigid protein receptor will be more limited by a lack of protein flexibility than the accuracy of the structure prediction. Mpro is a prominent drug target that appears to fit this paradigm.

Putative drug-discovery applications highlight why the set of possible experimental outcomes is a more useful definition of ‘experimental accuracy’ than coordinate error. For each individual experimental structure, the all-atom RMSDs between the AlphaFold2 model are at least twice as large as the reported coordinate error for these structures (Supplementary Fig. 2). For any single Mpro experimental dataset, the AlphaFold2 structure would not be an acceptable model to explain the data. Still, this does not mean it is less informative in terms of scientific insight than an experimental structure of Mpro randomly selected from the PDB.

A single structure cannot capture functional motion

Structural heterogeneity underpins protein function. An archetypical example is hemoglobin, the protein that transports oxygen in humans and all other known vertebrates except icefish. In the process, hemoglobin toggles between a ‘tense’ oxygen-free state (T) and a ‘relaxed’ oxygen-bound state (R)9. A survey of 16 human hemoglobin structures deposited in the PDB reveals that the oxygen-free T-state structures are highly similar to one another, reflecting the rigid, low-entropy nature of this state. In contrast, the R-state structures, bound to either O2 or CO, are considerably more diverse. The atomic coordinates of AlphaFold2’s model of hemoglobin lie geometrically halfway between these two structural extremes (Fig. 2). This is, perhaps, expected: with no information about the presence or absence of oxygen, a prediction intermediate between the R and T structures seems ideal. This case demonstrates both how good current structure prediction algorithms are and the opportunity to better understand biology through protein dynamics. The latter cannot be captured by a single structure.

Fig. 2: AlphaFold2’s prediction of hemoglobin lies between the R and T states.
figure 2

Shown are experimental structures of O2- or CO-bound (R, orange) and ligand-free (T, blue) hemoglobin, superimposed with the AlphaFold2 prediction (magenta). Right inset: detail of the structural superimposition. Left inset: principal component analysis of the dihedral angles shows the atomic coordinates in a space of reduced dimensionality. Plotted are the first two principal components (PC1 and PC2), which explain 45% of the total variance (Supplementary Fig. 3). Note: included T-state structure 1HGC is partially O2 bound, with the two α-subunits ligand bound and the β-subunits ligand free. Structures selected from those highlighted in ref. 9.

In fact, AlphaFold2 may be capable of self-reporting when its single-structure approximation breaks down. ABL, a tyrosine kinase, provides an example. ABL is involved in cell differentiation, division, adhesion and DNA repair, and is an oncology drug target. Like other kinases, ABL exhibits a flexible ‘activation’ loop containing a three-amino acid DFG motif, which toggles between an active DFG-in and inactive DFG-out state. This structural flexibility is essential for the protein’s regulation and function. In the 25 human ABL kinase structures deposited in the PDB, the activation loop adopts a variety of conformations. Each conformation is generated by the experimental conditions, notably ligands and crystallization reagents. In each crystal dataset, the various loop positions are unambiguous but, on average, less well-ordered than the surrounding atoms (as judged by B-factors and map correlation; Supplementary Fig. 4).

AlphaFold2 produces a per-residue prediction of confidence, the predicted local distance difference test (pLDDT)1,18. In ABL’s regions of high structural variability, the confidence is correspondingly lower (Fig. 3). Notably, AlphaFold2 correctly reports reduced confidence in the functional activation loop, where the ensemble is not well captured by a single structure. In this case, the algorithm simply predicts a common conformation and down-weights the prediction confidence in regions of high variability, highlighting the limit of the single-structure approximation. Structural variability is only one reason structure prediction confidence might be lowered. For instance, the AlphaFold authors note that regions of low sequence coverage also exhibit reduced pLDDT1. A precise accounting of the contributions to AlphaFold2’s errors is beyond this work, but the ABL case study suggests structural heterogeneity should be considered a significant factor.

Fig. 3: AlphaFold2 uncertainty correlates with structural variability in ABL kinase.
figure 3

Left: for 25 ABL kinase structures, the local distance difference test (LDDT) computed across the experimental ensemble plotted against the AlphaFold2 model’s certainty metric, predicted LDDT (Spearman correlation coefficient 0.47). Color of markers shows position in sequence. The dynamic N terminus and activation loop exhibit notably low confidence. Right: structural superimposition of the same experimental structures from the PDB, colored by AlphaFold2’s confidence (predicted LDDT; AlphaFold2 model not shown). In regions of high variability between structures, such as the activation loop, the pLDDT reports lower confidence. PDB IDs listed in the Supplementary Information.

Supporting the notion that AlphaFold2 might already contain information about the distribution of protein structure is work on fold-switching proteins. Chakravarty and Porter have reported that, for these systems, AlphaFold2 typically succeeds in predicting the structure of one fold, but not the other19. Predictions of fold-switchers have moderately reduced pLDDT versus single-fold proteins, but the pLDDTs are still substantially higher than for disordered proteins or disordered regions of specific proteins. Even more strikingly, Wayment-Steele and colleagues have shown that by clustering the sequences used by AlphaFold2, the algorithm can actually predict both folds in specific fold-switching systems20. This exciting result suggests that predictions of structural distributions may not be far off, as AlphaFold2 can already produce multiple correct structural outputs for the same protein. Further, these results reinforce the hypothesis that current models are limited by a single-structure output space.

Distributions of conformations are the future of structural biology

Structure prediction has advanced significantly, right up to the frontier where the single-sequence, single-structure approximation has broken down. While AlphaFold2 and its peers enable breakthroughs today, tomorrow’s challenge is already clear: modeling protein structural distributions. If we could model and predict distributions, Mpro, hemoglobin and ABL kinase each provide a strong argument for what we might learn, and why a single-structure view cannot capture all of protein function.

Consider hemoglobin. A single structure cannot describe both the T and R states, cannot capture how oxygen is bound and released, and therefore cannot capture the protein’s biological function. It is not difficult, however, to imagine an extension to the current structure prediction model in which the output space is a distribution of protein conformations, not just a single structure. The scientific impact of such models would be greatly enhanced if the distribution could be modeled as a function of relevant conditions: ligands, pH, binding partners, oxygen concentration, temperature and other factors. How to build such models, manage the potentially endless list of possible model inputs, benchmark them against experimental data, and ensure humans can enjoy learning from such models remains hard work. But while the path is not known, the direction is clear. Machine learning algorithms capable of modeling continuous distributions of protein structure are already emerging, and we can expect significant progress in the coming years21,22,23,24.

The mission of learning protein distributions will require new abstractions and representations of protein structure, but also new experiments to train and guide those algorithms. Time-resolved crystallography25, the modeling of continuous structural distributions from cryo-EM data21,22, and even the analysis of large, statistically significant sets of crystals of the same protein are already pushing the frontier of our knowledge. These efforts must continue and expand to support the training of models of protein structure distributions.

Understanding how protein structure changes upon ligand binding, interface formation or changes in the surrounding environment represents an opportunity to get to the heart of protein function. In light of this, we might look back on this time not as the age in which structural biology was solved, but as its golden age.