Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Residue-wise local quality estimation for protein models from cryo-EM maps

Abstract

An increasing number of protein structures are being determined by cryogenic electron microscopy (cryo-EM). Although the resolution of determined cryo-EM density maps is improving in general, there are still many cases where amino acids of a protein are assigned with different levels of confidence. Here we developed a method that identifies potential misassignment of residues in the map, including residue shifts along an otherwise correct main-chain trace. The score, named DAQ, computes the likelihood that the local density corresponds to different amino acids, atoms, and secondary structures, estimated via deep learning, and assesses the consistency of the amino acid assignment in the protein structure model with that likelihood. When DAQ was applied to different versions of model structures in the Protein Data Bank that were derived from the same density maps, a clear improvement in the DAQ score was observed in the newer versions of the models. DAQ also found potential misassignment errors in a substantial number of deposited protein structure models built into cryo-EM maps.

This is a preview of subscription content, access via your institution

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Overview of DAQ.
Fig. 2: Comparison of DAQ scores between first and revised protein models in the same PDB entry.
Fig. 3: Analysis of the DAQ score distribution for PDB entry 7JSN-B (EMD-22458).
Fig. 4: DAQ score analysis of misaligned residues in the PDBNR90 dataset.
Fig. 5: Analysis of 4,485 non-redundant PDB chain models in PDBNR1Å by DAQ score.

Data availability

The list of IDs of PDB and EMDB entries used in the datasets are provided in Supplementary Tables 2, 3, 5, and 6.

Code availability

The DAQ program is freely available for academic use from Github at https://github.com/kiharalab/DAQ. The program is available to run on a Google Collab website at https://bit.ly/daq-score.

References

  1. Lawson, C. L. et al. EMDataBank unified data resource for 3DEM. Nucleic Acids Res. 44, D396–D403 (2016).

    Article  CAS  Google Scholar 

  2. Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).

    Article  CAS  Google Scholar 

  3. Lawson, C. L. et al. Cryo-EM model validation recommendations based on outcomes of the 2019 EMDataResource challenge. Nat. Methods 18, 156–164 (2021).

    Article  CAS  Google Scholar 

  4. Lagerstedt, I. et al. Web-based visualisation and analysis of 3D electron-microscopy data from EMDB and PDB. J. Struct. Biol. 184, 173–181 (2013).

    Article  CAS  Google Scholar 

  5. Barad, B. A. et al. EMRinger: side chain-directed model and map validation for 3D cryo-electron microscopy. Nat. Methods 12, 943–946 (2015).

    Article  CAS  Google Scholar 

  6. Pintilie, G. et al. Measurement of atom resolvability in cryo-EM maps with Q-scores. Nat. Methods 17, 328–334 (2020).

    Article  CAS  Google Scholar 

  7. Cragnolini, T. et al. TEMPy2: a Python library with improved 3D electron microscopy density-fitting and validation workflows. Acta Crystallogr. Sect. D. Struct. Biol. 77, 41–47 (2021).

    Article  CAS  Google Scholar 

  8. Joseph, A. P. et al. Atomic model validation using the CCP-EM software suite. Acta Crystallogr. Sect. D. Struct. Biol. 78, 152–161 (2022).

    Article  CAS  Google Scholar 

  9. Afonine, P. V. et al. New tools for the analysis and validation of cryo-EM maps and atomic models. Acta Crystallogr. Sect. D. Struct. Biol. 74, 814–840 (2018).

    Article  CAS  Google Scholar 

  10. Chen, V. B. et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. Sect. D. Biol. Crystallogr. 66, 12–21 (2010).

    Article  CAS  Google Scholar 

  11. Prisant, M. G., Williams, C. J., Chen, V. B., Richardson, J. S. & Richardson, D. C. New tools in MolProbity validation: CaBLAM for CryoEM backbone, UnDowser to rethink “waters”, and NGL Viewer to recapture online 3D graphics. Protein Sci. 29, 315–329 (2020).

    Article  CAS  Google Scholar 

  12. Wang, X. et al. Detecting protein and DNA/RNA structures in cryo-EM maps of intermediate resolution using deep learning. Nat. Commun. 12, 2302 (2021).

    Article  CAS  Google Scholar 

  13. Maddhuri Venkata Subramaniya, S. R., Terashi, G. & Kihara, D. Protein secondary structure detection in intermediate-resolution cryo-EM maps using deep learning. Nat. Methods 16, 911–917 (2019).

    Article  CAS  Google Scholar 

  14. Mostosi, P., Schindelin, H., Kollmannsberger, P. & Thorn, A. Haruspex: a neural network for the automatic identification of oligonucleotides and protein secondary structure in cryo-electron microscopy maps. Angew. Chem. 59, 14788–14795 (2020).

    Article  CAS  Google Scholar 

  15. Pfab, J., Phan, N. M. & Si, D. DeepTracer for fast de novo cryo-EM protein structure modeling and special studies on CoV-related complexes. Proc. Natl Acad. Sci. USA 118, e2017525118 (2021).

    Article  CAS  Google Scholar 

  16. Hanson, J., Paliwal, K., Litfin, T., Yang, Y. & Zhou, Y. Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks. Bioinformatics 35, 2403–2410 (2019).

    Article  CAS  Google Scholar 

  17. He, K., Zhang, X., Ren, S. & SUn, J. Deep residual learning for image recognition, In Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).

  18. Gao, Y. et al. Structure of the visual signaling complex between transducin and phosphodiesterase 6. Mol. Cell 80, 237–245 (2020); erratum 81, 2496 (2021)..

  19. Desai, N., Brown, A., Amunts, A. & Ramakrishnan, V. The structure of the yeast mitochondrial ribosome. Science 355, 528–531 (2017).

    Article  CAS  Google Scholar 

  20. Amunts, A. et al. Structure of the yeast mitochondrial large ribosomal subunit. Science 343, 1485–1489 (2014).

    Article  CAS  Google Scholar 

  21. Delano, W. L. The PyMOL Molecular Graphics System. http://www.pymol.org (2002).

  22. Zhu, L., Li, L., Qi, Y., Yu, Z. & Xu, Y. Cryo-EM structure of SMG1–SMG8–SMG9 complex. Cell Res 29, 1027–1034 (2019).

    Article  CAS  Google Scholar 

  23. Langer, L. M., Gat, Y., Bonneau, F. & Conti, E. Structure of substrate-bound SMG1–8–9 kinase complex reveals molecular basis for phosphorylation specificity. eLife 9, e57127 (2020).

    Article  CAS  Google Scholar 

  24. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  CAS  Google Scholar 

  25. Ronneberger, O., Fischer, P. & Brox, T. U-Net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015 (eds Navab, N., Hornegger, J., Wells, W. & Frangi, A.) 234–241 (Springer, 2015).

  26. Dosovitskiy, A. et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. International Conference on Learning Representations (2020).

  27. Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577 (1983).

    Article  CAS  Google Scholar 

  28. Kingma, D. & Ba, J. Adam. A method for stochastic optimization. International Conference on Learning Representations (2015).

  29. Farabella, I. et al. TEMPy: a Python library for assessment of three-dimensional electron microscopy density fits. J. Appl. Crystallogr. 48, 1314–1323 (2015).

    Article  CAS  Google Scholar 

  30. Shindyalov, I. N. & Bourne, P. E. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 11, 739 (1998).

    Article  CAS  Google Scholar 

  31. Gribskov, M. & Robinson, N. L. Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Comput. Chem. 20, 25–33 (1996).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was partly supported by the National Institutes of Health (R01GM133840, R01GM123055, and 3R01GM133840-02S1 to D.K.; R01CA254402, R01CA221289, and R01HL071818 to J.J.G.T.); the National Science Foundation (CMMI1825941, MCB1925643, DBI2003635, and DBI2146026) to D.K.; and the Walther Foundation for Cancer Research to J.J.G.T.

Author information

Authors and Affiliations

Authors

Contributions

J.J.G.T. and D.K. conceived the study. G.T. designed and implemented the DAQ score. X.W. coded and trained Emap2sec+ and computed probability values of structure features for cryo-EM maps. S.R.M.V.S. participated in coding Emap2sec+. G.T. and X.W. constructed datasets. G.T. and X.W. performed the computation and G.T., D.K., X.W., and J.J.G.T. analyzed the data. J.J.G.T. examined individual examples of potentially misassigned models. G.T. drafted the manuscript and J.J.G.T. and D.K. edited it. All the authors read and approved the manuscript.

Corresponding author

Correspondence to Daisuke Kihara.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Grigore Pintilie, Alexis Rohou, and Carlos Óscar Sánchez-Sorzano for their contribution to the peer review of this work. Primary Handling Editor: Rita Strack, in collaboration with the Nature Methods team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs 1–12 and Supplementary Tables 1–9.

Reporting Summary

Peer Review File

Supplementary Table

Supplementary Table. 1, 2, 4, 6, and 7

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Terashi, G., Wang, X., Maddhuri Venkata Subramaniya, S.R. et al. Residue-wise local quality estimation for protein models from cryo-EM maps. Nat Methods 19, 1116–1125 (2022). https://doi.org/10.1038/s41592-022-01574-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-022-01574-4

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing