Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

CryoREAD: de novo structure modeling for nucleic acids in cryo-EM maps using deep learning

Abstract

DNA and RNA play fundamental roles in various cellular processes, where their three-dimensional structures provide information critical to understanding the molecular mechanisms of their functions. Although an increasing number of nucleic acid structures and their complexes with proteins are determined by cryogenic electron microscopy (cryo-EM), structure modeling for DNA and RNA remains challenging particularly when the map is determined at a resolution coarser than atomic level. Moreover, computational methods for nucleic acid structure modeling are relatively scarce. Here, we present CryoREAD, a fully automated de novo DNA/RNA atomic structure modeling method using deep learning. CryoREAD identifies phosphate, sugar and base positions in a cryo-EM map using deep learning, which are traced and modeled into a three-dimensional structure. When tested on cryo-EM maps determined at 2.0 to 5.0 Å resolution, CryoREAD built substantially more accurate models than existing methods. We also applied the method to cryo-EM maps of biomolecular complexes in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Workflow of CryoREAD.
Fig. 2: Performance of modeling structures of nucleic acids by CryoREAD.
Fig. 3: Examples of modeled atomic structure by CryoREAD for experimental maps from our testing set.
Fig. 4: Atomic structure modeling by CryoREAD for experimental maps from SARS-CoV-2 benchmark.
Fig. 5: Comparison with models by Phenix.

Similar content being viewed by others

Data availability

The entries of the maps and corresponding structure models utilized in this study are provided in Supplementary Tables 1 and 4. The experimental EM maps utilized can be downloaded from the EMDB (https://www.emdataresource.org/). The corresponding experimental determined structures utilized can be downloaded from the RCSB (https://www.rcsb.org/). The structures modeled by CryoREAD are available at https://doi.org/10.5281/zenodo.8274164. Source data are provided with this paper.

Code availability

The source code of CryoREAD is available at https://github.com/kiharalab/CryoREAD (ref. 41). The webserver is available at https://em.kiharalab.org/algorithm/CryoREAD, where users can simply upload the map and obtain the structures without installment. Users can also access Google Colab Notebook webserver at https://bit.ly/CryoREAD. A detailed tutorial for CryoREAD is available at https://kiharalab.org/emsuites/cryoread.php.

References

  1. Warner, K. D., Hajdin, C. E. & Weeks, K. M. Principles for targeting RNA with drug-like small molecules. Nat. Rev. Drug Discov. 17, 547–558 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Huang, P. -S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016).

    Article  CAS  PubMed  Google Scholar 

  3. Churkin, A. et al. Design of RNAs: comparing programs for inverse RNA folding. Brief. Bioinform. 19, 350–358 (2018).

    CAS  PubMed  Google Scholar 

  4. Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of COOT. Acta Crystallogr. D Biol. Crystallogr. 66, 486–501 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Liebschner, D. et al. Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallogr. D Struct. Biol. 75, 861–877 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Winn, M. D. et al. Overview of the CCP4 suite and current developments. Acta Crystallogr. D Biol. Crystallogr. 67, 235–242 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Alnabati, E. & Kihara, D. Advances in structure modeling methods for cryo-electron microscopy maps. Molecules 25, 82 (2020).

    Article  CAS  Google Scholar 

  9. Pfab, J., Phan, N. M. & Si, D. DeepTracer for fast de novo cryo-EM protein structure modeling and special studies on CoV-related complexes. Proc. Natl Acad. Sci. USA 118, e2017525118 (2021).

    Article  CAS  PubMed  Google Scholar 

  10. Terashi, G. & Kihara, D. De novo main-chain modeling for EM maps using MAINMAST. Nat. Commun. 9, 1618 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Maddhuri Venkata Subramaniya, S. R., Terashi, G. & Kihara, D. Protein secondary structure detection in intermediate-resolution cryo-EM maps using deep learning. Nat. Methods 16, 911–917 (2019).

    Article  CAS  PubMed  Google Scholar 

  12. Song, Y. et al. High-resolution comparative modeling with RosettaCM. Structure 21, 1735–1742 (2013).

    Article  CAS  PubMed  Google Scholar 

  13. Emsley, P. & Cowtan, K. COOT: model-building tools for molecular graphics. Acta Crystallogr. D Biol. Crystallogr. 60, 2126–2132 (2004).

    Article  PubMed  Google Scholar 

  14. Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).

    Article  CAS  PubMed  Google Scholar 

  15. Schlick, T. & Pyle, A. M. Opportunities and challenges in RNA structural modeling and design. Biophys. J. 113, 225–234 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Keating, K. S. & Pyle, A. M. RCrane: semi-automated RNA model building. Acta Crystallogr. D Biol. Crystallogr. 68, 985–995 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Kappel, K. et al. Accelerated cryo-EM-guided determination of three-dimensional RNA-only structures. Nat. Methods 17, 699–707 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Huang, H. et al. Unet 3+: a full-scale connected unet for medical image segmentation. in 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1055–1059 (IEEE, 2020).

  19. Ronneberger, O., Fischer, P. & Box, T. U-Net: convolutional networks for biomedical image segmentation. in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 234–241 (Springer, 2015).

  20. Carreira-Perpinan, M. A. Acceleration strategies for Gaussian mean-shift image segmentation. in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) 1160–1167 (IEEE, 2006).

  21. Psaraftis, H. N. Dynamic vehicle routing problems. Veh. Routing Methods Stud. 16, 223–248 (1988).

    Google Scholar 

  22. Rossi, F., Van Beek, P. & Walsh, T. Handbook of Constraint Programming (Elsevier, 2006).

  23. Afonine, P. V. et al. Real-space refinement in PHENIX for cryo-EM and crystallography. Acta Crystallogr. D Biol. Crystallogr. 74, 531–544 (2018).

    Article  CAS  Google Scholar 

  24. Wang, X. et al. Detecting protein and DNA/RNA structures in cryo-EM maps of intermediate resolution using deep learning. Nat. Commun. 12, 2302 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Kim, M.-S. et al. Cracking the DNA code for V(D)J recombination. Mol. Cell 70, 358–370 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Grimm, C. et al. Structural basis of poxvirus transcription: vaccinia RNA polymerase complexes. Cell 179, 1537–1550 (2019).

    Article  CAS  PubMed  Google Scholar 

  27. Li, S. et al. Structural basis of amino acid surveillance by higher-order tRNA–mRNA interactions. Nat. Struct. Mol. Biol. 26, 1094–1105 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Nikolay, R. et al. Snapshots of native pre-50S ribosomes reveal a biogenesis factor network and evolutionary specialization. Mol. Cell 81, 1200–1215 (2021).

    Article  CAS  PubMed  Google Scholar 

  29. Shi, M. et al. SARS-CoV-2 Nsp1 suppresses host but not viral translation through a bipartite mechanism. Preprint at BioRxiv https://doi.org/10.1101/2020.09.18.302901 (2020).

  30. Schubert, K. et al. SARS-CoV-2 Nsp1 binds the ribosomal mRNA channel to inhibit translation. Nat. Struct. Mol. Biol. 27, 959–966 (2020).

    Article  CAS  PubMed  Google Scholar 

  31. Thoms, M. et al. Structural basis for translational shutdown and immune evasion by the Nsp1 protein of SARS-CoV-2. Science 369, 1249–1255 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Naydenova, K. et al. Structure of the SARS-CoV-2 RNA-dependent RNA polymerase in the presence of favipiravir-RTP. Proc. Natl Acad. Sci. USA 118, e2021946118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Wang, Q. et al. Structural basis for RNA replication by the SARS-CoV-2 polymerase. Cell 182, 417–428 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Chen, J. et al. Structural basis for helicase-polymerase coupling in the SARS-CoV-2 replication-transcription complex. Cell 182, 1560–1573 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Terwilliger, T. C., Adams, P. D., Afonine, P. V. & Sobolev, O. V. A fully automatic method yielding initial models from high-resolution cryo-electron microscopy maps. Nat. Methods 15, 905–908 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Sudre, C. H., Li, W., Vercauteren, T., Ourselin, S. & Jorge Cardoso, M. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. in Deep learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support 240–248 (Springer, 2017).

  37. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. in Proceedings of International Conference on Learning Representations (2015).

  38. Fukunaga, K. & Hostetler, L. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inform. Theory 21, 32–40 (1975).

    Article  Google Scholar 

  39. Toth, P. & Vigo, D. The Vehicle Routing Problem (SIAM, 2002).

  40. Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Wang, X., Terashi, G. & Kihara, D. CryoREAD: de novo structure modeling for nucleic acids in cryo-EM maps using deep learning. Zenodo. https://doi.org/10.5281/zenodo.8274181

Download references

Acknowledgements

The authors thank J. C. Verburgt, H. Kannan, A. Jain and C. Christoffer for their help in literature search, discussion and proofreading. The authors also thank J. A. Nash, S. Ellis and J. Chen’s suggestion for optimizing the released software. This work was partly supported by the National Institutes of Health (R01GM133840, 3R01 GM133840-02S1) and the National Science Foundation (DMS2151678, DBI2003635, CMMI1825941, MCB2146026 and MCB1925643). X.W. is a recipient of the MolSSI graduate fellowship.

Author information

Authors and Affiliations

Authors

Contributions

D.K. conceived the study. X.W. designed and implemented CryoREAD and computed results. G.T. designed the core strategy of molecular structure building pipeline and participated in implementing the algorithm. All the authors analyzed the results. X.W. drafted the manuscript and D.K. edited it. All the authors read and approved the manuscript.

Corresponding author

Correspondence to Daisuke Kihara.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Allison Doerr, in collaboration with the Nature Methods team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 The detailed network architecture of cryoREAD.

a, the network architecture. The entire network consists of two stages of U-Net networks and here we show the 1st stage networks. It concatenates two U-Net architectures. They are 3D U-shape-based convolutional Network (UNet) with full-scale skip connections and deep supervisions. The channel size of different layers is also illustrated in the figure. b, The Encoder Block (Enc1 in panel a); c, The Merge Encoder Block (MEnc); and d, the Decoder Block (Dec). Conv3D, a 3-dimentional (3D) convolutional layer with the filter size of 3*3*3, stride 1 and padding 1. BatchNorm, a normalization layer that takes statistics in a batch to normalize the input data. ReLU, Rectified Linear Unit, a commonly used activation layer. It is a cascaded U-net, where the first U-Net (on the left) focuses on the prediction of high-level detection of sugar, phosphate, base, and protein while the second U-Net (on the right) focuses predicting different base types: A, C, G, and T/U. The processed information of the 1st U-Net encoder is also passed as input for the 2nd U-Net to help its predictions (dashed lines in orange). We applied deep supervision to the loss on output of different decoder outputs, which was shown to improve the performance. The stage 2 network only includes the first U-Net architecture of the stage 1 network. It takes predicted probabilities of 8*643 Å3 predictions (8 probabilities: protein, phosphate, sugar, base, and four different base types) from the stage 1 network and outputs the refined probabilities in a box of 8*64*64*64 Å3.

Extended Data Fig. 2 The running Time of CryoREAD on 11 structures of different sizes.

The experiments were carried out on a computer server with 1 NVIDIA TITAN RTX 24GB GPU and 24 CPUs. Here 5 colors correspond to 5 steps in CryoREAD pipeline: 1) Structure Detection by Deep Learning; 2) Representative Node Clustering; 3) Backbone Tracing; 4) Sequence Assignment; 5) Full Atom Model. The actual data point of the 11 maps are shown by dots.

Extended Data Fig. 3 The distribution of the size of nucleic acids in the testing set.

The test set includes 68 cryo-EM maps. The x-axis shows the resolution from 2.0 to 5.0 Å and the y-axis denotes the number of nucleotides in each map, ranging from 57 to 4,286.

Source data

Extended Data Fig. 4 Grid level detection accuracy (recall) of 8 structural classes.

A grid was assigned with a structure class that is closer than 2 Å to the grid. If there were multiple different structures that were within 2 Å, the closer one was assigned to the grid. A detection by deep network for a grid was considered as correct if the probability of the correct structure class has a value over 0.5. pho, phosphate. Results of the stage 1 and stage 2 networks are shown. The statistics are calculated over n = 68 independent experimental EM maps, with each points values derived from Supplementary Table 2. For stage 1, the values of minima, maxima, center, bounds of box and whiskers of different categories in order: Sugar(0.333,0.893,0.729,0.601/0.804,0.333/0.893), Phos(0.138,0.849,0.656,0.463/0.754,0.138/0.849), Base(0.380,0.947,0.836,0.772/0.889,0.669/0.947), Protein(0.349,0.929,0.808,0.763/0.879,0.621/0.929), A-Base(0.034,0.890,0.549,0.362/0.746,0.034/0.890), U/T-Base(0.019,0.797,0.460,0.294/0.646,0.019/0.797), C-Base(0.088,0.886,0.539,0.369/0.711,0.088/0.886), G-Base(0.153,0.933,0.637,0.490/0.823,0.153/0.933), Overall(0.438,0.909,0.764,0.685/0.822,0.478/0.909). For stage 2, the values of minima, maxima, center, bounds of box and whiskers of different categories in order: Sugar(0.357,0.920,0.775,0.654/0.846,0.453/0.920), Phos(0.228,0.883,0.693,0.509/0.810,0.228/0.883), Base(0.408,0.952,0.858,0.795/0.905,0.695/0.952), Protein(0.490,0.969,0.903,0.867/0.936,0.772/0.969), A-Base(0.013,0.893,0.501,0.296/0.745,0.013/0.893), U/T-Base(0.024,0.824,0.479,0.291/0.682,0.024/0.824), C-Base(0.104,0.920,0.611,0.449/0.775,0.104/0.920), G-Base(0.237,0.951,0.737,0.609/0.858,0.237/0.951), Overall(0.492,0.952,0.827,0.756/0.878,0.577/0.952). For the moiety level accuracy, see Fig. 2 in the main text.

Source data

Extended Data Fig. 5 Nucleotide moiety-based accuracy relative to the resolution.

A nucleotide moiety was considered as correctly detected if the majority of the atoms in the moiety were correctly detected. The data used here is the same as those which were used for Fig. 2a. a. base detection accuracy relative to the map resolution. The equation of regression line is y = −0.032x + 1.019 (Pearson correlation coefficient: −0.256, p-value: 0.035, standard error:0.015). b. Moiety-based accuracy of detecting 2-ring bases (A/G). If A or G was detected as either A or G, it was considered as correct detection. The equation of regression line is y = −0.116x + 1.239 (Pearson correlation coefficient: −0.582, p-value: 1.925e-7, standard error:0.020). c. Accuracy of detecting 1-ring bases (U/T/C). The equation of regression line is y = −0.191x + 1.412 (Pearson correlation coefficient: −0.658, p-value: 1.109e-9, standard error:0.027). d. Accuracy of detecting Adenine (A). The equation of regression line is y = −0.312x + 1.712 (Pearson correlation coefficient: −0.758, p-value: 6.951e-14, standard error:0.033). e. Accuracy of detecting Uracil/Thymine (U/T). The equation of regression line is y = −0.277x + 1.561 (Pearson correlation coefficient: −0.754, p-value: 1.098e-13, standard error:0.030). f. Accuracy of detecting Cytosine (C). The equation of regression line is y = −0.206x + 1.423 (Pearson correlation coefficient: −0.679, p-value: 1.891e-10, standard error:0.027). g. Accuracy of detecting Guanine (G). The equation of regression line is y = −0.094x + 1.158 (Pearson correlation coefficient: −0.446, p-value: 1.367e-4, standard error:0.023).

Source data

Extended Data Fig. 6 Correlation between sequence recall and sequence recall (match).

Sequence recall (match) only considers nucleotides in the reference structure that have a corresponding nucleotide in the model (an average atom pair distance of less than 5 Å).

Source data

Extended Data Fig. 7 Sequence match relative to the map resolution.

To compute sequence match, first we identified a nucleotide in the model that corresponds to each nucleotide in the reference structure by assigning the nucleotide in the model that has the closest average atom distance, then checked if the bases are identical or not. Sequence match only considers nucleotides in the reference structure that have a corresponding nucleotide in the model (an average atom pair distance of less than 5 Å). In this figure, we compared sequence match of the initial assignment and after the sequence alignment. The initial assignment considers the base type obtained by the base predictions at base nodes of the atomic structures being developed. The initial assignment here is different from the base moiety accuracy reported in Fig. 2a and Extended Data Fig. 5 because Fig. 2a and Extended Data Fig. 5 concern initial grid-based accuracy of bases by deep learning while the initial sequence assignment here considers accuracy of the base assignment in the modeled tertiary structure, where the base positions are determined in consideration of other atoms in the nucleic acids including phosphate and sugar positions. Seq Match is the reassigned base type by sequence assignment to backbone paths. a. Overall sequence match. For initial assignment, the equation of regression line is y = −0.125x + 0.984 (Pearson correlation coefficient: −0.782, p-value: 3.380e-15, standard error:0.012). For seq match, the equation of regression line is y = −0.110x + 0.997 (Pearson correlation coefficient: −0.684, p-value: 1.230e-10, standard error:0.014). b. Sequence match of Adenine (A) relative to the map resolution. For initial assignment, the equation of regression line is y = −0.169x + 1.045 (Pearson correlation coefficient: −0.684, p-value: 1.307e-11, standard error:0.022). For seq match, the equation of regression line is y = −0.143x + 1.040 (Pearson correlation coefficient: −0.591, p-value: 1.127e-7, standard error:0.024). c. Sequence match of Uracil/Thymine (U/T). For initial assignment, the equation of regression line is y = −0.137x + 1.018 (Pearson correlation coefficient: −0.671, p-value: 3.771e-10, standard error:0.019). For seq match, the equation of regression line is y = −0.132x + 1.042 (Pearson correlation coefficient: −0.680, p-value: 1.881e-10, standard error:0.018). d. Sequence match of Cytosine (C). For initial assignment, the equation of regression line is y = −0.072x + 0.873 (Pearson correlation coefficient: −0.482, p-value: 3.141e-8, standard error:0.016). For seq match, the equation of regression line is y = −0.074x + 0.911 (Pearson correlation coefficient: −0.452, p-value: 1.095e-4, standard error:0.018). e. Sequence match of Guanine (G). For initial assignment, the equation of regression line is y = −0.114x + 0.997 (Pearson correlation coefficient: −0.578, p-value: 2.381e-7, standard error:0.020). For seq match, the equation of regression line is y = −0.100x + 1.012 (Pearson correlation coefficient: −0.595, p-value: 9.017e-8, standard error:0.017).

Source data

Extended Data Fig. 8 The number of atom clashes before and after the structure refinement.

An atom clash is defined as heavy atom pairs closer than 3.0 Å. The line shown is y = x.

Extended Data Fig. 9 Examples of modeled atomic structure by CryoREAD for experimental maps without full atomic structures.

Detailed Evaluation Results are shown in Supplementary Table 3. In this figure, from left to right, the 3 columns correspond to 1) EM map and its corresponding structure; 2) showing only RNA structures in the map. In addition to the RNA structure modeled by the authors, we also shown here homologous structures of missing RNAs in the map, which we searched by BLAST1. 3) the atomic structure model by CryoREAD. a. the initial Shwachman-Bodian-Diamond syndrome (SBDS) protein closed state of the nascent 60 S ribosomal subunit (EMD-3145, PDB 5AN9, Resolution: 3.3 Å; protein lengths: 1905 aa; RNA length: 1162 nt): Backbone recall: 0.888; Sequence match: 0.696. Identified homologous structure (PDB 5XXB, RNA length: 3352 nt, Sequence Identity: 84.3%, RMSD: 1.2 Å): Backbone recall: 0.832. b. the SBDS open state of the nascent 60 S ribosomal subunit (EMD-3146, PDB 5ANB, Resolution: 4.1 Å; protein lengths: 3025 aa; RNA length: 1162 nt): Backbone recall: 0.871; Sequence match: 0.544. Identified homologous RNA structure (PDB 5XXB, RNA length: 3352 nt, Sequence Identity: 84.3%, RMSD: 1.3 Å): Backbone recall: 0.821. c. the ELF1 accommodated state of the nascent 60 S ribosomal subunit (EMD-3147, PDB 5ANC, Resolution: 4.2 Å; protein lengths: 2801 aa; RNA length: 1162 nt): Backbone recall: 0.881; Sequence match: 0.535. Identified homologous structure (PDB 5XXB, RNA length: 3352 nt, Sequence Identity: 84.3%, RMSD: 1.3 Å): Backbone recall: 0.804. d. TnaC-stalled ribosome complex with the titin I27 domain folding close to the ribosomal exit tunnel (EMD-0322, PDB 6I0Y, Resolution: 3.2 Å; protein lengths: 3552 aa; RNA length: 3049 nt): Backbone recall: 0.901; Sequence match: 0.632. Identified homologous RNA structure (PDB 7D80, RNA length: 4761 nt, Sequence Identity: 100%, RMSD: 1.2 Å): Backbone recall: 0.834. e. RNC-SRP-SR complex early state (EMD-8000, PDB 5GAD, Resolution: 3.7 Å; protein lengths: 4087 aa; RNA length: 3049 nt): Backbone recall: 0.914; Sequence match: 0.655. Identified homologous structure (PDB 7D80, RNA length: 4761 nt, Sequence Identity: 100%, RMSD: 1.0 Å): Backbone recall: 0.847. f. Structure of the 40 S ABCE1 post-splitting complex (EMD-4071, PDB5LL6, Resolution: 3.9 Å; protein lengths: 3429 aa; RNA length: 1325 nt): Backbone recall: 0.856; Sequence match: 0.586. Identified homologous structure (PDB 7OSM, RNA length: 1740 nt, Sequence Identity: 99%, RMSD: 1.0 Å): Backbone recall: 0.784. In Extended Data Fig. 9, we show cases where only a part of the structures in an EM map was modelled by authors. They are maps from three different sets of EM maps of ribosomal subunits. The first set (panel a-c) includes three different states of eIF6 release from the nascent 60 S ribosomal subunit2 of Dictyostelium discoideum, where only part of 26 S ribosomal unit is modeled by the authors. The second set (panel d-e) presents two different forms of 70 S ribosomal subunit3 of Escherichia coli, where the authors only modeled 50 S ribosomal subunit but 30 S ribosomal subunit was left unmodelled. The third example (panel f) is 40 S ribosomal subunit of Saccharomyces cerevisiae, where only part of 18 S ribosomal RNA was modeled. We filled the missing RNA structure in the maps with homologous RNA structure found by BLAST1 against PDB. Sequence identities of the identified RNAs were 84.3% to 100%. CryoREAD models for the missing RNA structures had backbone recall of 0.784 to 0.847, when the homologous structures were considered as reference. Backbone recall of CryoREAD models for RNAs with authors’ model was from 0.856 to 0.914.

Source data

Extended Data Fig. 10 Structure model evaluation on the 68 experimental EM maps with Phenix.

a, the number of nucleotides modelled by Phenix map_to_model and CryoREAD. For Phenix results, two models were generated. Models from map regions that are predicted to include nucleic acid atoms (Phenix (Mask), blue) and models that were built from the entire map (Phenix, orange). b, comparison of backbone atom/sequence recall of Phenix (Mask) and Phenix. c, backbone atom recalls of Phenix (Mask), Phenix, and CryoREAD relative to map resolution. For CryoREAD, the equation of regression line is y = −0.042x + 1.012 (Pearson correlation coefficient: −0.320, p-value: 0.008, standard error:0.015). For Phenix(Mask), the equation of regression line is y = −0.091x + 0.877 (Pearson correlation coefficient: −0.445, p-value: 1.456e-4, standard error:0.023). For Phenix, the equation of regression line is y = −0.108x + 0.839 (Pearson correlation coefficient: −0.492, p-value: 2.006e-5, standard error:0.023). d, sequence recalls of Phenix (Mask), Phenix and CryoREAD relative to map resolution. For CryoREAD, the equation of regression line is y = −0.117x + 0.961 (Pearson correlation coefficient: −0.632, p-value: 7.280e-9, standard error:0.017). For Phenix(Mask), the equation of regression line is y = −0.064x + 0.444 (Pearson correlation coefficient: −0.550, p-value: 1.190e-6, standard error:0.012). For Phenix, the equation of regression line is y = −0.063x + 0.408 (Pearson correlation coefficient: −0.525, p-value: 4.254e-6, standard error:0.013).

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–4 and legends for Supplementary Tables 1–7.

Reporting Summary

Peer Review File

Supplementary Tables 1–7.

Source data

Source Data

Source data for Figs. 2–5 and Extended Data Figs. 3–7, 9 and 10.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Terashi, G. & Kihara, D. CryoREAD: de novo structure modeling for nucleic acids in cryo-EM maps using deep learning. Nat Methods 20, 1739–1747 (2023). https://doi.org/10.1038/s41592-023-02032-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-023-02032-5

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics