A community-wide challenge yields recommendations for improving cryo-EM structure validation.
Validation metrics are an essential part of how biomolecular structures are vetted before publication and interpreted after publication, but judging the accuracy of features within structural models is almost always challenging. Lawson and colleagues report on a community-wide model building and validation challenge, highlight progress in developing robust validation of atomic models of biological macromolecules, and offer recommendations on how to improve cryogenic electron microscopy (cryo-EM) structure validation1.
In cryo-EM experiments, large numbers of noisy, two-dimensional projection images of individual macromolecules are processed computationally to yield three-dimensional (3D) volumetric maps of their electron scattering potential2. The technique’s popularity has exploded in recent years because it can be used to study the 3D structure of virtually any protein or macromolecular complex, regardless of biochemists’ ability to coax it into forming rigid crystals for X-ray diffraction experiments. If the protein or complex is large enough (currently ~50 kDa or more) and if it can be purified and placed into a cryo-EM instrument for imaging, 3D structures at resolutions sufficient to answer biological questions and/or guide drug discovery are usually attainable. Recently, several groups have even obtained maps so detailed that each individual atom in the macromolecule is resolved as a distinct, sharp spheroid3. When such high (so-called ‘atomic’) resolution is achieved, structural biologists can assign the positions of the thousands of atoms in a macromolecule with very high certainty, and there can be very little doubt about the 3D coordinates of each atom.
However, the vast majority of 3D maps obtained by cryo-EM do not resolve individual atoms. At these more common, lower resolutions, the task of assigning precise 3D coordinates to each atom in the macromolecule (‘model building’) is much more arduous because the 3D shape features corresponding to individual atoms are blurred together so that the smallest resolvable features may be entire amino acid residues rather than individual atoms or even chemical groups. When the placement of atoms is ambiguous, even armed with prior knowledge (typical bond distances, angles and torsions, the amino acid sequence of a protein, and so on), structural biologists and the software they use are likely to make errors during model building and refinement (Fig. 1).
How then do cryo-EM practitioners ensure their models are sound given imperfect experimental results? Thankfully, other structural biologists have been there before. The X-ray crystallography community, for example, came up with metrics and algorithms to ascertain the quality and ‘believability’ of 3D models of biological macromolecules, long before cryo-EM maps warranted the building of models at all. Some of these metrics were quickly adopted by the cryo-EM community, but the two techniques are sufficiently different that new tools and metrics were needed. Academic groups have been working on filling this gap.
The search for a complete set of robust validation metrics for cryo-EM maps and models is far from over, but a recent model validation challenge, reported in this issue1, marks substantial progress. In the challenge, four high-quality 3D maps were distributed and set as targets for model building. Anyone wishing to participate could download the target maps, build atomic models into them, and submit models for consideration. Anonymized models were then processed using validation pipelines that included well-established as well as more experimental validation metrics, with the dual goals of judging the quality of the submitted models and of characterizing the validation metrics themselves.
On the first count, most submitted models were of higher quality than reference structures of the chosen targets, and the results were highly reproducible across submissions. This is encouraging but perhaps not surprising: as model building and refinement tools improve, the overall quality of models should also improve. What’s more, many of the participants in this challenge were experts in model building and refinement methods — expertise is still an important factor here, with most participants reporting that manual modification of models was required to obtain optimized structures. Even among this group of participants, some typical errors were recurrent: in a nice demonstration that the field is still evolving (and fast!), one of the validation tools, called CaBLAM4, which has been gaining in popularity but is not yet routinely used by all practitioners, found geometrical errors or imperfections in at least two-thirds of submitted models. In fact, only two of the thirteen teams taking part in the challenge managed to completely avoid this type of error. In other words: some aspects of model building are still challenging, but tools are becoming available that will help the community root out more and more errors — provided these tools become widely adopted.
And this is where this challenge and report can really make a difference: the authors’ recommendations with regards to validation practices should not only help individual practitioners but also guide future improvements to public data repositories such as the Protein Data Bank, whose validation reports are widely used during peer review and are available for inspection online at any time after a structure is published. This group’s work to characterize the behavior and performance of a number of validation methods leads to specific recommendations as to which metrics appear to be robust enough and orthogonal enough to existing metrics to warrant consideration for inclusion in validation reports. This should hasten the community-wide acceptance of these metrics and, eventually, improve the accuracy of all cryo-EM structures.
Lawson, C. L. et al. Nat. Methods https://doi.org/10.1038/s41592-020-01051-w (2020).
Cheng, Y., Grigorieff, N., Penczek, P. A. & Walz, T. Cell 161, 438–449 (2015).
Herzik, M. A. Jr. Nature 587, 39–40 (2020).
Prisant, M. G., Williams, C. J., Chen, V. B., Richardson, J. S. & Richardson, D. C. Protein Sci. 29, 315–329 (2020).
A.R. is an employee of Genentech, a subsidiary of Roche, and holds Roche stocks.
About this article
Cite this article
Rohou, A. Improving cryo-EM structure validation. Nat Methods 18, 130–131 (2021). https://doi.org/10.1038/s41592-021-01062-1