CryoREAD: de novo structure modeling for nucleic acids in cryo-EM maps using deep learning

Wang, Xiao; Terashi, Genki; Kihara, Daisuke

doi:10.1038/s41592-023-02032-5

Article
Published: 02 October 2023

CryoREAD: de novo structure modeling for nucleic acids in cryo-EM maps using deep learning

Nature Methods volume 20, pages 1739–1747 (2023)Cite this article

5246 Accesses
6 Citations
52 Altmetric
Metrics details

Subjects

Abstract

DNA and RNA play fundamental roles in various cellular processes, where their three-dimensional structures provide information critical to understanding the molecular mechanisms of their functions. Although an increasing number of nucleic acid structures and their complexes with proteins are determined by cryogenic electron microscopy (cryo-EM), structure modeling for DNA and RNA remains challenging particularly when the map is determined at a resolution coarser than atomic level. Moreover, computational methods for nucleic acid structure modeling are relatively scarce. Here, we present CryoREAD, a fully automated de novo DNA/RNA atomic structure modeling method using deep learning. CryoREAD identifies phosphate, sugar and base positions in a cryo-EM map using deep learning, which are traced and modeled into a three-dimensional structure. When tested on cryo-EM maps determined at 2.0 to 5.0 Å resolution, CryoREAD built substantially more accurate models than existing methods. We also applied the method to cryo-EM maps of biomolecular complexes in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Performance of modeling structures of nucleic acids by CryoREAD.**

**Fig. 3: Examples of modeled atomic structure by CryoREAD for experimental maps from our testing set.**

**Fig. 4: Atomic structure modeling by CryoREAD for experimental maps from SARS-CoV-2 benchmark.**

**Fig. 5: Comparison with models by Phenix.**

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

An open source knowledge graph ecosystem for the life sciences

Article Open access 11 April 2024

De novo design of protein structure and function with RFdiffusion

Article Open access 11 July 2023

Data availability

The entries of the maps and corresponding structure models utilized in this study are provided in Supplementary Tables 1 and 4. The experimental EM maps utilized can be downloaded from the EMDB (https://www.emdataresource.org/). The corresponding experimental determined structures utilized can be downloaded from the RCSB (https://www.rcsb.org/). The structures modeled by CryoREAD are available at https://doi.org/10.5281/zenodo.8274164. Source data are provided with this paper.

Code availability

The source code of CryoREAD is available at https://github.com/kiharalab/CryoREAD (ref. ⁴¹). The webserver is available at https://em.kiharalab.org/algorithm/CryoREAD, where users can simply upload the map and obtain the structures without installment. Users can also access Google Colab Notebook webserver at https://bit.ly/CryoREAD. A detailed tutorial for CryoREAD is available at https://kiharalab.org/emsuites/cryoread.php.

References

Warner, K. D., Hajdin, C. E. & Weeks, K. M. Principles for targeting RNA with drug-like small molecules. Nat. Rev. Drug Discov. 17, 547–558 (2018).
Article CAS PubMed PubMed Central Google Scholar
Huang, P. -S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016).
Article CAS PubMed Google Scholar
Churkin, A. et al. Design of RNAs: comparing programs for inverse RNA folding. Brief. Bioinform. 19, 350–358 (2018).
CAS PubMed Google Scholar
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
Article CAS PubMed PubMed Central Google Scholar
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of COOT. Acta Crystallogr. D Biol. Crystallogr. 66, 486–501 (2010).
Article CAS PubMed PubMed Central Google Scholar
Liebschner, D. et al. Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallogr. D Struct. Biol. 75, 861–877 (2019).
Article CAS PubMed PubMed Central Google Scholar
Winn, M. D. et al. Overview of the CCP4 suite and current developments. Acta Crystallogr. D Biol. Crystallogr. 67, 235–242 (2011).
Article CAS PubMed PubMed Central Google Scholar
Alnabati, E. & Kihara, D. Advances in structure modeling methods for cryo-electron microscopy maps. Molecules 25, 82 (2020).
Article CAS Google Scholar
Pfab, J., Phan, N. M. & Si, D. DeepTracer for fast de novo cryo-EM protein structure modeling and special studies on CoV-related complexes. Proc. Natl Acad. Sci. USA 118, e2017525118 (2021).
Article CAS PubMed Google Scholar
Terashi, G. & Kihara, D. De novo main-chain modeling for EM maps using MAINMAST. Nat. Commun. 9, 1618 (2018).
Article PubMed PubMed Central Google Scholar
Maddhuri Venkata Subramaniya, S. R., Terashi, G. & Kihara, D. Protein secondary structure detection in intermediate-resolution cryo-EM maps using deep learning. Nat. Methods 16, 911–917 (2019).
Article CAS PubMed Google Scholar
Song, Y. et al. High-resolution comparative modeling with RosettaCM. Structure 21, 1735–1742 (2013).
Article CAS PubMed Google Scholar
Emsley, P. & Cowtan, K. COOT: model-building tools for molecular graphics. Acta Crystallogr. D Biol. Crystallogr. 60, 2126–2132 (2004).
Article PubMed Google Scholar
Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).
Article CAS PubMed Google Scholar
Schlick, T. & Pyle, A. M. Opportunities and challenges in RNA structural modeling and design. Biophys. J. 113, 225–234 (2017).
Article CAS PubMed PubMed Central Google Scholar
Keating, K. S. & Pyle, A. M. RCrane: semi-automated RNA model building. Acta Crystallogr. D Biol. Crystallogr. 68, 985–995 (2012).
Article CAS PubMed PubMed Central Google Scholar
Kappel, K. et al. Accelerated cryo-EM-guided determination of three-dimensional RNA-only structures. Nat. Methods 17, 699–707 (2020).
Article CAS PubMed PubMed Central Google Scholar
Huang, H. et al. Unet 3+: a full-scale connected unet for medical image segmentation. in 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1055–1059 (IEEE, 2020).
Ronneberger, O., Fischer, P. & Box, T. U-Net: convolutional networks for biomedical image segmentation. in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 234–241 (Springer, 2015).
Carreira-Perpinan, M. A. Acceleration strategies for Gaussian mean-shift image segmentation. in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) 1160–1167 (IEEE, 2006).
Psaraftis, H. N. Dynamic vehicle routing problems. Veh. Routing Methods Stud. 16, 223–248 (1988).
Google Scholar
Rossi, F., Van Beek, P. & Walsh, T. Handbook of Constraint Programming (Elsevier, 2006).
Afonine, P. V. et al. Real-space refinement in PHENIX for cryo-EM and crystallography. Acta Crystallogr. D Biol. Crystallogr. 74, 531–544 (2018).
Article CAS Google Scholar
Wang, X. et al. Detecting protein and DNA/RNA structures in cryo-EM maps of intermediate resolution using deep learning. Nat. Commun. 12, 2302 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kim, M.-S. et al. Cracking the DNA code for V(D)J recombination. Mol. Cell 70, 358–370 (2018).
Article CAS PubMed PubMed Central Google Scholar
Grimm, C. et al. Structural basis of poxvirus transcription: vaccinia RNA polymerase complexes. Cell 179, 1537–1550 (2019).
Article CAS PubMed Google Scholar
Li, S. et al. Structural basis of amino acid surveillance by higher-order tRNA–mRNA interactions. Nat. Struct. Mol. Biol. 26, 1094–1105 (2019).
Article CAS PubMed PubMed Central Google Scholar
Nikolay, R. et al. Snapshots of native pre-50S ribosomes reveal a biogenesis factor network and evolutionary specialization. Mol. Cell 81, 1200–1215 (2021).
Article CAS PubMed Google Scholar
Shi, M. et al. SARS-CoV-2 Nsp1 suppresses host but not viral translation through a bipartite mechanism. Preprint at BioRxiv https://doi.org/10.1101/2020.09.18.302901 (2020).
Schubert, K. et al. SARS-CoV-2 Nsp1 binds the ribosomal mRNA channel to inhibit translation. Nat. Struct. Mol. Biol. 27, 959–966 (2020).
Article CAS PubMed Google Scholar
Thoms, M. et al. Structural basis for translational shutdown and immune evasion by the Nsp1 protein of SARS-CoV-2. Science 369, 1249–1255 (2020).
Article CAS PubMed PubMed Central Google Scholar
Naydenova, K. et al. Structure of the SARS-CoV-2 RNA-dependent RNA polymerase in the presence of favipiravir-RTP. Proc. Natl Acad. Sci. USA 118, e2021946118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wang, Q. et al. Structural basis for RNA replication by the SARS-CoV-2 polymerase. Cell 182, 417–428 (2020).
Article CAS PubMed PubMed Central Google Scholar
Chen, J. et al. Structural basis for helicase-polymerase coupling in the SARS-CoV-2 replication-transcription complex. Cell 182, 1560–1573 (2020).
Article CAS PubMed PubMed Central Google Scholar
Terwilliger, T. C., Adams, P. D., Afonine, P. V. & Sobolev, O. V. A fully automatic method yielding initial models from high-resolution cryo-electron microscopy maps. Nat. Methods 15, 905–908 (2018).
Article CAS PubMed PubMed Central Google Scholar
Sudre, C. H., Li, W., Vercauteren, T., Ourselin, S. & Jorge Cardoso, M. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. in Deep learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support 240–248 (Springer, 2017).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. in Proceedings of International Conference on Learning Representations (2015).
Fukunaga, K. & Hostetler, L. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inform. Theory 21, 32–40 (1975).
Article Google Scholar
Toth, P. & Vigo, D. The Vehicle Routing Problem (SIAM, 2002).
Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011).
Article PubMed PubMed Central Google Scholar
Wang, X., Terashi, G. & Kihara, D. CryoREAD: de novo structure modeling for nucleic acids in cryo-EM maps using deep learning. Zenodo. https://doi.org/10.5281/zenodo.8274181

Download references

Acknowledgements

The authors thank J. C. Verburgt, H. Kannan, A. Jain and C. Christoffer for their help in literature search, discussion and proofreading. The authors also thank J. A. Nash, S. Ellis and J. Chen’s suggestion for optimizing the released software. This work was partly supported by the National Institutes of Health (R01GM133840, 3R01 GM133840-02S1) and the National Science Foundation (DMS2151678, DBI2003635, CMMI1825941, MCB2146026 and MCB1925643). X.W. is a recipient of the MolSSI graduate fellowship.

Author information

Authors and Affiliations

Department of Computer Science, Purdue University, West Lafayette, IN, USA
Xiao Wang & Daisuke Kihara
Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
Genki Terashi & Daisuke Kihara

Authors

Xiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Genki Terashi
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Kihara
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.K. conceived the study. X.W. designed and implemented CryoREAD and computed results. G.T. designed the core strategy of molecular structure building pipeline and participated in implementing the algorithm. All the authors analyzed the results. X.W. drafted the manuscript and D.K. edited it. All the authors read and approved the manuscript.

Corresponding author

Correspondence to Daisuke Kihara.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Allison Doerr, in collaboration with the Nature Methods team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 The detailed network architecture of cryoREAD.

a, the network architecture. The entire network consists of two stages of U-Net networks and here we show the 1^st stage networks. It concatenates two U-Net architectures. They are 3D U-shape-based convolutional Network (UNet) with full-scale skip connections and deep supervisions. The channel size of different layers is also illustrated in the figure. b, The Encoder Block (Enc1 in panel a); c, The Merge Encoder Block (MEnc); and d, the Decoder Block (Dec). Conv3D, a 3-dimentional (3D) convolutional layer with the filter size of 3*3*3, stride 1 and padding 1. BatchNorm, a normalization layer that takes statistics in a batch to normalize the input data. ReLU, Rectified Linear Unit, a commonly used activation layer. It is a cascaded U-net, where the first U-Net (on the left) focuses on the prediction of high-level detection of sugar, phosphate, base, and protein while the second U-Net (on the right) focuses predicting different base types: A, C, G, and T/U. The processed information of the 1^st U-Net encoder is also passed as input for the 2^nd U-Net to help its predictions (dashed lines in orange). We applied deep supervision to the loss on output of different decoder outputs, which was shown to improve the performance. The stage 2 network only includes the first U-Net architecture of the stage 1 network. It takes predicted probabilities of 8*64³ Å³ predictions (8 probabilities: protein, phosphate, sugar, base, and four different base types) from the stage 1 network and outputs the refined probabilities in a box of 8*64*64*64 Å³.

Extended Data Fig. 2 The running Time of CryoREAD on 11 structures of different sizes.

The experiments were carried out on a computer server with 1 NVIDIA TITAN RTX 24GB GPU and 24 CPUs. Here 5 colors correspond to 5 steps in CryoREAD pipeline: 1) Structure Detection by Deep Learning; 2) Representative Node Clustering; 3) Backbone Tracing; 4) Sequence Assignment; 5) Full Atom Model. The actual data point of the 11 maps are shown by dots.

Extended Data Fig. 3 The distribution of the size of nucleic acids in the testing set.

The test set includes 68 cryo-EM maps. The x-axis shows the resolution from 2.0 to 5.0 Å and the y-axis denotes the number of nucleotides in each map, ranging from 57 to 4,286.

Source data

Extended Data Fig. 4 Grid level detection accuracy (recall) of 8 structural classes.

A grid was assigned with a structure class that is closer than 2 Å to the grid. If there were multiple different structures that were within 2 Å, the closer one was assigned to the grid. A detection by deep network for a grid was considered as correct if the probability of the correct structure class has a value over 0.5. pho, phosphate. Results of the stage 1 and stage 2 networks are shown. The statistics are calculated over n = 68 independent experimental EM maps, with each points values derived from Supplementary Table 2. For stage 1, the values of minima, maxima, center, bounds of box and whiskers of different categories in order: Sugar(0.333,0.893,0.729,0.601/0.804,0.333/0.893), Phos(0.138,0.849,0.656,0.463/0.754,0.138/0.849), Base(0.380,0.947,0.836,0.772/0.889,0.669/0.947), Protein(0.349,0.929,0.808,0.763/0.879,0.621/0.929), A-Base(0.034,0.890,0.549,0.362/0.746,0.034/0.890), U/T-Base(0.019,0.797,0.460,0.294/0.646,0.019/0.797), C-Base(0.088,0.886,0.539,0.369/0.711,0.088/0.886), G-Base(0.153,0.933,0.637,0.490/0.823,0.153/0.933), Overall(0.438,0.909,0.764,0.685/0.822,0.478/0.909). For stage 2, the values of minima, maxima, center, bounds of box and whiskers of different categories in order: Sugar(0.357,0.920,0.775,0.654/0.846,0.453/0.920), Phos(0.228,0.883,0.693,0.509/0.810,0.228/0.883), Base(0.408,0.952,0.858,0.795/0.905,0.695/0.952), Protein(0.490,0.969,0.903,0.867/0.936,0.772/0.969), A-Base(0.013,0.893,0.501,0.296/0.745,0.013/0.893), U/T-Base(0.024,0.824,0.479,0.291/0.682,0.024/0.824), C-Base(0.104,0.920,0.611,0.449/0.775,0.104/0.920), G-Base(0.237,0.951,0.737,0.609/0.858,0.237/0.951), Overall(0.492,0.952,0.827,0.756/0.878,0.577/0.952). For the moiety level accuracy, see Fig. 2 in the main text.

Source data

Extended Data Fig. 5 Nucleotide moiety-based accuracy relative to the resolution.

A nucleotide moiety was considered as correctly detected if the majority of the atoms in the moiety were correctly detected. The data used here is the same as those which were used for Fig. 2a. a. base detection accuracy relative to the map resolution. The equation of regression line is y = −0.032x + 1.019 (Pearson correlation coefficient: −0.256, p-value: 0.035, standard error:0.015). b. Moiety-based accuracy of detecting 2-ring bases (A/G). If A or G was detected as either A or G, it was considered as correct detection. The equation of regression line is y = −0.116x + 1.239 (Pearson correlation coefficient: −0.582, p-value: 1.925e-7, standard error:0.020). c. Accuracy of detecting 1-ring bases (U/T/C). The equation of regression line is y = −0.191x + 1.412 (Pearson correlation coefficient: −0.658, p-value: 1.109e-9, standard error:0.027). d. Accuracy of detecting Adenine (A). The equation of regression line is y = −0.312x + 1.712 (Pearson correlation coefficient: −0.758, p-value: 6.951e-14, standard error:0.033). e. Accuracy of detecting Uracil/Thymine (U/T). The equation of regression line is y = −0.277x + 1.561 (Pearson correlation coefficient: −0.754, p-value: 1.098e-13, standard error:0.030). f. Accuracy of detecting Cytosine (C). The equation of regression line is y = −0.206x + 1.423 (Pearson correlation coefficient: −0.679, p-value: 1.891e-10, standard error:0.027). g. Accuracy of detecting Guanine (G). The equation of regression line is y = −0.094x + 1.158 (Pearson correlation coefficient: −0.446, p-value: 1.367e-4, standard error:0.023).

Source data

Extended Data Fig. 6 Correlation between sequence recall and sequence recall (match).

Sequence recall (match) only considers nucleotides in the reference structure that have a corresponding nucleotide in the model (an average atom pair distance of less than 5 Å).

Source data

Extended Data Fig. 7 Sequence match relative to the map resolution.

To compute sequence match, first we identified a nucleotide in the model that corresponds to each nucleotide in the reference structure by assigning the nucleotide in the model that has the closest average atom distance, then checked if the bases are identical or not. Sequence match only considers nucleotides in the reference structure that have a corresponding nucleotide in the model (an average atom pair distance of less than 5 Å). In this figure, we compared sequence match of the initial assignment and after the sequence alignment. The initial assignment considers the base type obtained by the base predictions at base nodes of the atomic structures being developed. The initial assignment here is different from the base moiety accuracy reported in Fig. 2a and Extended Data Fig. 5 because Fig. 2a and Extended Data Fig. 5 concern initial grid-based accuracy of bases by deep learning while the initial sequence assignment here considers accuracy of the base assignment in the modeled tertiary structure, where the base positions are determined in consideration of other atoms in the nucleic acids including phosphate and sugar positions. Seq Match is the reassigned base type by sequence assignment to backbone paths. a. Overall sequence match. For initial assignment, the equation of regression line is y = −0.125x + 0.984 (Pearson correlation coefficient: −0.782, p-value: 3.380e-15, standard error:0.012). For seq match, the equation of regression line is y = −0.110x + 0.997 (Pearson correlation coefficient: −0.684, p-value: 1.230e-10, standard error:0.014). b. Sequence match of Adenine (A) relative to the map resolution. For initial assignment, the equation of regression line is y = −0.169x + 1.045 (Pearson correlation coefficient: −0.684, p-value: 1.307e-11, standard error:0.022). For seq match, the equation of regression line is y = −0.143x + 1.040 (Pearson correlation coefficient: −0.591, p-value: 1.127e-7, standard error:0.024). c. Sequence match of Uracil/Thymine (U/T). For initial assignment, the equation of regression line is y = −0.137x + 1.018 (Pearson correlation coefficient: −0.671, p-value: 3.771e-10, standard error:0.019). For seq match, the equation of regression line is y = −0.132x + 1.042 (Pearson correlation coefficient: −0.680, p-value: 1.881e-10, standard error:0.018). d. Sequence match of Cytosine (C). For initial assignment, the equation of regression line is y = −0.072x + 0.873 (Pearson correlation coefficient: −0.482, p-value: 3.141e-8, standard error:0.016). For seq match, the equation of regression line is y = −0.074x + 0.911 (Pearson correlation coefficient: −0.452, p-value: 1.095e-4, standard error:0.018). e. Sequence match of Guanine (G). For initial assignment, the equation of regression line is y = −0.114x + 0.997 (Pearson correlation coefficient: −0.578, p-value: 2.381e-7, standard error:0.020). For seq match, the equation of regression line is y = −0.100x + 1.012 (Pearson correlation coefficient: −0.595, p-value: 9.017e-8, standard error:0.017).

Source data

Extended Data Fig. 8 The number of atom clashes before and after the structure refinement.

An atom clash is defined as heavy atom pairs closer than 3.0 Å. The line shown is y = x.

Extended Data Fig. 9 Examples of modeled atomic structure by CryoREAD for experimental maps without full atomic structures.

Detailed Evaluation Results are shown in Supplementary Table 3. In this figure, from left to right, the 3 columns correspond to 1) EM map and its corresponding structure; 2) showing only RNA structures in the map. In addition to the RNA structure modeled by the authors, we also shown here homologous structures of missing RNAs in the map, which we searched by BLAST¹. 3) the atomic structure model by CryoREAD. a. the initial Shwachman-Bodian-Diamond syndrome (SBDS) protein closed state of the nascent 60 S ribosomal subunit (EMD-3145, PDB 5AN9, Resolution: 3.3 Å; protein lengths: 1905 aa; RNA length: 1162 nt): Backbone recall: 0.888; Sequence match: 0.696. Identified homologous structure (PDB 5XXB, RNA length: 3352 nt, Sequence Identity: 84.3%, RMSD: 1.2 Å): Backbone recall: 0.832. b. the SBDS open state of the nascent 60 S ribosomal subunit (EMD-3146, PDB 5ANB, Resolution: 4.1 Å; protein lengths: 3025 aa; RNA length: 1162 nt): Backbone recall: 0.871; Sequence match: 0.544. Identified homologous RNA structure (PDB 5XXB, RNA length: 3352 nt, Sequence Identity: 84.3%, RMSD: 1.3 Å): Backbone recall: 0.821. c. the ELF1 accommodated state of the nascent 60 S ribosomal subunit (EMD-3147, PDB 5ANC, Resolution: 4.2 Å; protein lengths: 2801 aa; RNA length: 1162 nt): Backbone recall: 0.881; Sequence match: 0.535. Identified homologous structure (PDB 5XXB, RNA length: 3352 nt, Sequence Identity: 84.3%, RMSD: 1.3 Å): Backbone recall: 0.804. d. TnaC-stalled ribosome complex with the titin I27 domain folding close to the ribosomal exit tunnel (EMD-0322, PDB 6I0Y, Resolution: 3.2 Å; protein lengths: 3552 aa; RNA length: 3049 nt): Backbone recall: 0.901; Sequence match: 0.632. Identified homologous RNA structure (PDB 7D80, RNA length: 4761 nt, Sequence Identity: 100%, RMSD: 1.2 Å): Backbone recall: 0.834. e. RNC-SRP-SR complex early state (EMD-8000, PDB 5GAD, Resolution: 3.7 Å; protein lengths: 4087 aa; RNA length: 3049 nt): Backbone recall: 0.914; Sequence match: 0.655. Identified homologous structure (PDB 7D80, RNA length: 4761 nt, Sequence Identity: 100%, RMSD: 1.0 Å): Backbone recall: 0.847. f. Structure of the 40 S ABCE1 post-splitting complex (EMD-4071, PDB5LL6, Resolution: 3.9 Å; protein lengths: 3429 aa; RNA length: 1325 nt): Backbone recall: 0.856; Sequence match: 0.586. Identified homologous structure (PDB 7OSM, RNA length: 1740 nt, Sequence Identity: 99%, RMSD: 1.0 Å): Backbone recall: 0.784. In Extended Data Fig. 9, we show cases where only a part of the structures in an EM map was modelled by authors. They are maps from three different sets of EM maps of ribosomal subunits. The first set (panel a-c) includes three different states of eIF6 release from the nascent 60 S ribosomal subunit² of Dictyostelium discoideum, where only part of 26 S ribosomal unit is modeled by the authors. The second set (panel d-e) presents two different forms of 70 S ribosomal subunit³ of Escherichia coli, where the authors only modeled 50 S ribosomal subunit but 30 S ribosomal subunit was left unmodelled. The third example (panel f) is 40 S ribosomal subunit of Saccharomyces cerevisiae, where only part of 18 S ribosomal RNA was modeled. We filled the missing RNA structure in the maps with homologous RNA structure found by BLAST¹ against PDB. Sequence identities of the identified RNAs were 84.3% to 100%. CryoREAD models for the missing RNA structures had backbone recall of 0.784 to 0.847, when the homologous structures were considered as reference. Backbone recall of CryoREAD models for RNAs with authors’ model was from 0.856 to 0.914.

Source data

Extended Data Fig. 10 Structure model evaluation on the 68 experimental EM maps with Phenix.

a, the number of nucleotides modelled by Phenix map_to_model and CryoREAD. For Phenix results, two models were generated. Models from map regions that are predicted to include nucleic acid atoms (Phenix (Mask), blue) and models that were built from the entire map (Phenix, orange). b, comparison of backbone atom/sequence recall of Phenix (Mask) and Phenix. c, backbone atom recalls of Phenix (Mask), Phenix, and CryoREAD relative to map resolution. For CryoREAD, the equation of regression line is y = −0.042x + 1.012 (Pearson correlation coefficient: −0.320, p-value: 0.008, standard error:0.015). For Phenix(Mask), the equation of regression line is y = −0.091x + 0.877 (Pearson correlation coefficient: −0.445, p-value: 1.456e-4, standard error:0.023). For Phenix, the equation of regression line is y = −0.108x + 0.839 (Pearson correlation coefficient: −0.492, p-value: 2.006e-5, standard error:0.023). d, sequence recalls of Phenix (Mask), Phenix and CryoREAD relative to map resolution. For CryoREAD, the equation of regression line is y = −0.117x + 0.961 (Pearson correlation coefficient: −0.632, p-value: 7.280e-9, standard error:0.017). For Phenix(Mask), the equation of regression line is y = −0.064x + 0.444 (Pearson correlation coefficient: −0.550, p-value: 1.190e-6, standard error:0.012). For Phenix, the equation of regression line is y = −0.063x + 0.408 (Pearson correlation coefficient: −0.525, p-value: 4.254e-6, standard error:0.013).

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–4 and legends for Supplementary Tables 1–7.

Reporting Summary

Peer Review File

Supplementary Tables 1–7.

Source data

Source Data

Source data for Figs. 2–5 and Extended Data Figs. 3–7, 9 and 10.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, X., Terashi, G. & Kihara, D. CryoREAD: de novo structure modeling for nucleic acids in cryo-EM maps using deep learning. Nat Methods 20, 1739–1747 (2023). https://doi.org/10.1038/s41592-023-02032-5

Download citation

Received: 28 November 2022
Accepted: 24 August 2023
Published: 02 October 2023
Issue Date: November 2023
DOI: https://doi.org/10.1038/s41592-023-02032-5

This article is cited by

All-atom RNA structure determination from cryo-EM maps
- Tao Li
- Jiahua He
- Sheng-You Huang
Nature Biotechnology (2024)
DeepMainmast: integrated protocol of protein structure modeling for cryo-EM with deep learning and structure prediction
- Genki Terashi
- Xiao Wang
- Daisuke Kihara
Nature Methods (2024)
A deep learning-based method for modeling of RNA structures from cryo-EM maps

Nature Biotechnology (2024)
Automated model building and protein identification in cryo-EM maps
- Kiarash Jamali
- Lukas Käll
- Sjors H. W. Scheres
Nature (2024)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links