DeepMainmast: integrated protocol of protein structure modeling for cryo-EM with deep learning and structure prediction

Terashi, Genki; Wang, Xiao; Prasad, Devashish; Nakamura, Tsukasa; Kihara, Daisuke

doi:10.1038/s41592-023-02099-0

Article
Published: 08 December 2023

DeepMainmast: integrated protocol of protein structure modeling for cryo-EM with deep learning and structure prediction

Nature Methods volume 21, pages 122–131 (2024)Cite this article

5979 Accesses
1 Citations
69 Altmetric
Metrics details

Subjects

Abstract

Three-dimensional structure modeling from maps is an indispensable step for studying proteins and their complexes with cryogenic electron microscopy. Although the resolution of determined cryogenic electron microscopy maps has generally improved, there are still many cases where tracing protein main chains is difficult, even in maps determined at a near-atomic resolution. Here we developed a protein structure modeling method, DeepMainmast, which employs deep learning to capture the local map features of amino acids and atoms to assist main-chain tracing. Moreover, we integrated AlphaFold2 with the de novo density tracing protocol to combine their complementary strengths and achieved even higher accuracy than each method alone. Additionally, the protocol is able to accurately assign the chain identity to the structure models of homo-multimers, which is not a trivial task for existing methods.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of the DeepMainmast protocol.**

**Fig. 2: Single-chain modeling results on the 29 EM map dataset.**

**Fig. 3: Single-chain modeling results on the 178 experimental maps.**

**Fig. 4: Modeling examples of single-chain targets.**

**Fig. 5: Modeling results of 20 multi-chain protein complex targets.**

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

Pooled multicolour tagging for visualizing subcellular protein dynamics

Article Open access 19 April 2024

scGPT: toward building a foundation model for single-cell multi-omics using generative AI

Article 26 February 2024

Data availability

Source data are made available in Supplementary Tables. The list of PDB and EMDB entries used in the benchmark datasets are available in Supplementary Tables 1_Dataset, 4_Single_Model_Acc, 5_178targets and 6_MultiChain_results. The list of training and testing set are available in Supplementary Tables 8_MultiChain_results and 9_Test_set. Source data are provided with this paper.

Code availability

The source code of DeepMainmast is available at github.com/kiharalab/DeepMainMast. The webserver is available at em.kiharalab.org/algorithm/DeepMainMast. It can run on a Google Colab notebook webserver without the need for installing on a local machine at github.com/kiharalab/DeepMainMast/blob/main/DeepMainMast.ipynb. Capsules are prepared at CodeOcean at codeocean.com/capsule/9358532. VESPER, used in the pipeline, is available at github.com/kiharalab/VESPER. DAQ is available at github.com/kiharalab/DAQ.

References

Terashi, G., Wang, X., Maddhuri Venkata Subramaniya, S. R., Tesmer, J. J. G. & Kihara, D. Residue-wise local quality estimation for protein models from cryo-EM maps. Nat. Methods 19, 1116–1125 (2022).
Article CAS PubMed PubMed Central Google Scholar
Nakamura, T., Wang, X., Terashi, G. & Kihara, D. DAQ-score database: assessment of map-model compatibility for protein structure models from cryo-EM maps. Nat. Methods 20, 775–776 (2023).
Article CAS PubMed PubMed Central Google Scholar
Alnabati, E. & Kihara, D. Advances in structure modeling methods for cryo-electron microscopy maps. Molecules https://doi.org/10.3390/molecules25010082 (2019).
Hryc, C. F. & Baker, M. L. Beyond the backbone: the next generation of pathwalking utilities for model building in cryoEM density maps. Biomolecules https://doi.org/10.3390/biom12060773 (2022).
Terashi, G. & Kihara, D. De novo main-chain modeling for EM maps using MAINMAST. Nat. Commun. 9, 1618 (2018).
Article PubMed PubMed Central Google Scholar
Terashi, G., Kagaya, Y. & Kihara, D. MAINMASTseg: automated map segmentation method for cryo-EM density maps with symmetry. J. Chem. Inf. Model. 60, 2634–2643 (2020).
Article CAS PubMed PubMed Central Google Scholar
Song, Y. et al. High-resolution comparative modeling with RosettaCM. Structure 21, 1735–1742 (2013).
Article CAS PubMed Google Scholar
Wang, R. Y. et al. Automated structure refinement of macromolecular assemblies from cryo-EM maps using Rosetta. eLife https://doi.org/10.7554/eLife.17219 (2016).
Zhang, X., Zhang, B., Freddolino, P. L. & Zhang, Y. CR-I-TASSER: assemble protein structures from cryo-EM density maps using deep convolutional neural networks. Nat. Methods 19, 195–204 (2022).
Article CAS PubMed PubMed Central Google Scholar
Pfab, J., Phan, N. M. & Si, D. DeepTracer for fast de novo cryo-EM protein structure modeling and special studies on CoV-related complexes. Proc. Natl Acad. Sci. USA https://doi.org/10.1073/pnas.2017525118 (2021).
He, J. & Huang, S. Y. Full-length de novo protein structure determination from cryo-EM maps using deep learning. Bioinformatics https://doi.org/10.1093/bioinformatics/btab357 (2021).
Terwilliger, T. C. et al. Iterative model building, structure refinement and density modification with the PHENIX AutoBuild wizard. Acta Crystallogr. D Biol. Crystallogr. 64, 61–69 (2008).
Article CAS PubMed Google Scholar
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D Biol. Crystallogr. 66, 486–501 (2010).
Article CAS PubMed PubMed Central Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kryshtafovych, A. et al. Computational models in the service of X-ray and cryo-electron microscopy structure determination. Proteins 89, 1633–1646 (2021).
Article CAS PubMed PubMed Central Google Scholar
McCafferty, C. L., Pennington, E. L., Papoulas, O., Taylor, D. W. & Marcotte, E. M. Does AlphaFold2 model proteins’ intracellular conformations? An experimental test using cross-linking mass spectrometry of endogenous ciliary proteins. Commun. Biol. 6, 421 (2023).
Article CAS PubMed PubMed Central Google Scholar
Hryc, C. F. & Baker, M. L. AlphaFold2 and CryoEM: Revisiting CryoEM modeling in near-atomic resolution density maps. iScience 25, 104496 (2022).
Article PubMed PubMed Central Google Scholar
Terwilliger, T. C. et al. AlphaFold predictions are valuable hypotheses, and accelerate but do not replace experimental structure determination. Preprint at bioRxiv https://doi.org/10.1101/2022.11.21.517405 (2022).
Dantzig, G. B. & Ramser, J. H. The truck dispatching problem. Manag. Sci. 6, 80–91 (1959).
Article Google Scholar
Perron, L. Operations research and constraint programming at Google. In International Conference on Principles and Practice of Constraint Programming. 2 (Springer, 2011).
Han, X., Terashi, G., Christoffer, C., Chen, S. & Kihara, D. VESPER: global and local cryo-EM map alignment using local density vectors. Nat. Commun. 12, 2090 (2021).
Article CAS PubMed PubMed Central Google Scholar
Huang, H. et al. UNet 3+: a full-scale connected unet for medical image segmentation. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1055–1059 (IEEE, 2020).
Carreira-Perpinan, M. A. Acceleration strategies for Gaussian mean-shift image segmentation. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 06). 1160–1167 (IEEE, 2006).
Rotkiewicz, P. & Skolnick, J. Fast procedure for reconstruction of full-atom protein models from reduced representations. J. Comput. Chem. 29, 1460–1465 (2008).
Article CAS PubMed PubMed Central Google Scholar
Pettersen, E. F. et al. UCSF Chimera–a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).
Article CAS PubMed Google Scholar
Hoh, S. W., Burnley, T. & Cowtan, K. Current approaches for automated model building into cryo-EM maps using Buccaneer with CCP-EM. Acta Crystallogr. D Struct. Biol. 76, 531–541 (2020).
Article CAS PubMed PubMed Central Google Scholar
Terwilliger, T. C., Adams, P. D., Afonine, P. V. & Sobolev, O. V. A fully automatic method yielding initial models from high-resolution cryo-electron microscopy maps. Nat. Methods 15, 905–908 (2018).
Article CAS PubMed PubMed Central Google Scholar
Shekhar, M. et al. CryoFold: determining protein structures and data-guided ensembles from cryo-EM density maps. Matter 4, 3195–3216 (2021).
Article CAS PubMed PubMed Central Google Scholar
Singharoy, A. et al. Molecular dynamics-based refinement and validation for sub-5 A cryo-electron microscopy maps. eLife https://doi.org/10.7554/eLife.16105 (2016).
Perez, A., MacCallum, J. L. & Dill, K. A. Accelerating molecular simulations of proteins using Bayesian inference on weak information. Proc. Natl Acad. Sci. USA 112, 11846–11851 (2015).
Article CAS PubMed PubMed Central Google Scholar
Allegretti, M., Mills, D. J., McMullan, G., Kuhlbrandt, W. & Vonck, J. Atomic model of the F420-reducing [NiFe] hydrogenase by electron cryo-microscopy using a direct electron detector. eLife 3, e01963 (2014).
Article PubMed PubMed Central Google Scholar
Bartesaghi, A., Matthies, D., Banerjee, S., Merk, A. & Subramaniam, S. Structure of β-galactosidase at 3.2-A resolution obtained by cryo-electron microscopy. Proc. Natl Acad. Sci. USA 111, 11709–11714 (2014).
Article CAS PubMed PubMed Central Google Scholar
Hattne, J. et al. Analysis of global and site-specific radiation damage in cryo-EM. Structure 26, 759–766 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33, 2302–2309 (2005).
Article CAS PubMed PubMed Central Google Scholar
Wang, X., Terashi, G. & Kihara, D. De novo structure modeling for nucleic acids in cryo-EM maps using deep learning. Nat. Methods (2023).
Mukherjee, S. & Zhang, Y. MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming. Nucleic Acids Res. 37, e83 (2009).
Article PubMed PubMed Central Google Scholar
Lawson, C. L. et al. EMDataBank unified data resource for 3DEM. Nucleic Acids Res. 44, D396–D403 (2016).
Article CAS PubMed Google Scholar
Stoyanov, D. et al. (eds). Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018 and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018. Vol. 11045 (Springer, 2018).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at arXiv https://doi.org/10.48550/arXiv.1412.6980 (2014).
Siew, N., Elofsson, A., Rychlewski, L. & Fischer, D. MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 16, 776–785 (2000).
Article CAS PubMed Google Scholar
Tang, G. et al. EMAN2: an extensible image processing suite for electron microscopy. J. Struct. Biol. 157, 38–46 (2007).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was partly supported by the National Institutes of Health (R01GM133840 and 3R01GM133840-02S1) and the National Science Foundation (CMMI1825941, MCB1925643, IIS2211598, DMS2151678, DBI2146026 and DBI2003635).

Author information

Authors and Affiliations

Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
Genki Terashi, Tsukasa Nakamura & Daisuke Kihara
Department of Computer Science, Purdue University, West Lafayette, IN, USA
Xiao Wang, Devashish Prasad & Daisuke Kihara

Authors

Genki Terashi
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Devashish Prasad
View author publications
You can also search for this author in PubMed Google Scholar
Tsukasa Nakamura
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Kihara
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.K. conceived the study. G.T. designed and implemented DeepMainmast and overall modeling protocol. X.W. coded and trained the deep neural network and computed probability values of structure features for cryo-EM maps. G.T. and X.W. constructed datasets. G.T. and X.W. performed the computation and G.T., D.K. and X.W. analyzed the data. D.P. prepared codes to release and developed the CodeOcean pages and the Google Colab notebooks. T.N. analyzed the influence of local map resolution on modeling accuracy. G.T. and X.W. drafted the manuscript and D.K. edited it. All the authors read and approved the manuscript. We thank L. Chang and A. Perez for helping us run CryoFold.

Corresponding author

Correspondence to Daisuke Kihara.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Matthew Baker, Po-Lin Chiu, and Arek Kulczyk for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Three types of constraints in the step of combining Cα fragments with the CP solver.

The illustration shows two Cα fragments that are colored green and blue. Numbered circles represent amino acid residues, and the number is the residue number in the sequence. Red arrows represent the distance between the two Cα atoms of the amino acid residues. The left column represents states that are not allowed under the constraints, while the right column represents states that are allowed under the constraints.

Extended Data Fig. 2 Modeling accuracy of DeepMainmast(base) and CryoFold for the 29-map dataset.

This figure corresponds to Fig. 2c to g, where DeepMainmast(base) was compared with MAIMAST, DeepTracer, Buccaneer, and Phenix. We show here comparison against CryoFold separately because CryoFold did not run for 11 targets due to a failure in the energy minimization step in MELD. a, the Cα coverage of the protein models. The results by DeepMainmast(base) (the y-axis) were compared with CryoFold (the x-axis). b, the amino acid matching accuracy. c, TM-Score. d, the length of aligned regions between the model and the native structure by TM-align. e. Cα RMSD of protein models.

Source data

Extended Data Fig. 3 Analysis of incorrect amino acid type assignment in single-chain modeling results of DeepMainmast(base).

These plots are related to main Fig. 2d. In Fig. 2d, we plotted the amino acid matching accuracy, which is the fraction of correctly modeled amino acids in each target. If an amino acid is not modeled correctly, in our definition, it should be either the Cα position itself is not within 3 Å in the first place or a case that the amino acid type was incorrectly assigned to correctly identified Cα position. Here, we examined these two reasons of amino acid matching errors. Thus, for each target, 1 – amino acid matching accuracy (AA Match) = (Fraction of cases with an incorrect Cα position) + (Fraction of cases of an incorrect AA type assignment). a, the fraction of cases of incorrect AA type was plotted relative to the AA match accuracy. The line shows y = −x + 1.0, the maximum possible value for the incorrect AA type relative to the AA match. b, the y-axis shows (Incorrect AA Type)/(1 – AA match). The average of this value for all the targets is 0.35. For relatively easy targets where AA Match > 0.95, the fraction of Incorrect AA Type was 0.11, indicating that DeepMainmast did not make much incorrect AA type assignments and most of the errors come from incorrect Cα position detection. For targets when AA match < 0.95, the fraction of incorrect AA type was 0.44.

Source data

Extended Data Fig. 4 Modeling accuracy of full-atom models for the 29-map dataset.

DeepMainmast(base) produced one Cα model for each target. For the Cα model, we ran Rosetta-CM, which fills missing regions (if any) and relax the structure, which produces 5 models. Out of them, we used the combination of the DOT score and the DAQ score, the same protocol as used in the final model selection step in the DeepMainmast pipeline (Fig. 1), to select the model for comparison. As for full-atom models for MAINMAST, following the MAINMAST protocol, MDFF was used to generate 500 full-atom models from one Cα model, among which the top-scoring full atom model was selected. a, the Cα coverage. b, the amino acid matching accuracy. c, TM-Score. d, the length of aligned regions between the model and the native structure by TM-align. These regions were used to compute RMSD in panel e, Cα RMSD of protein models.

Source data

Extended Data Fig. 5 Modeling accuracy relative to local resolution and local structures.

a and b, We used 33 targets in 178 experimental dataset. Details of computing local resolution is provided in Supplementary Table 10_Local_Resolution. We analyzed the models generated by DeepMainmast(base) and DeepMainmast without applying the last full atom model construction step. a, the accuracy of Cα atom positions of DeepMainmast(base) models (blue) and DeepMainmast models (orange). The bars and values represent the fraction of Cα atoms in models that are correctly positioned within 3 Å. The black line indicates the number of Cα atoms at the local resolution. b, The accuracy of amino acid type assignment by DeepMainmast(base) (blue) and DeepMainmast protocols (orange). The accuracy is defined as the fraction of Cα atoms in a model which are placed within 3 Å to the correct position and have the correct amino acid type assignment. c, the accuracy of Cα positions based on the secondary structure types. Orange and blue dots represent the results of DeepMainmast(base) and DeepMainmast for 33 targets, respectively. The secondary structures were computed by DSSP. The secondary structure types of G, H, and I are assigned as ‘Helix’. The secondary structure types of E and B are assigned as ‘Strand’. The secondary structure types of S, T, and B are assigned as ‘Loop’. The regions within the loop with a relative accessible surface area (ASA) greater than 10% were classified as ‘Flex’ (flexible). The values of minima, maxima, mean, median, bounds of box and whiskers of different categories in order: DeepMainmast(base), Helix(0.07, 1.00, 0.93, 1.00, 0.97/1.00, 0.94/1.00), Strand(0.25, 1.00, 0.90, 1.00, 0.96/1.00, 0.90/1.00), Loop(0.24, 0.98, 0.82, 0.91, 0.83/0.95, 0.76/0.98), and Flex(0.24, 0.98, 0.82, 0.91, 0.82/0.95, 0.71/0.98). DeepMainmast, Helix(0.03, 1.00, 0.94, 1.00, 1.00/1.00, 1.00/1.00), Strand(0.27, 1.00, 0.91, 1.00, 1.00/1.00, 1.00/1.00), Loop(0.21, 1.00, 0.86, 0.93, 0.86/0.97, 0.73/1.00), and Flex(0.19, 1.00, 0.91, 0.92, 0.86/0.96, 0.72/1.00). d, the accuracy of amino acid type assignment based on the secondary structure types. The values of minima, maxima, mean, median, bounds of box, and whiskers of different categories in order: DeepMainmast(base), Helix(0.44, 1.00, 0.95, 1.00, 0.94/1.00, 0.86/1.00), Strand(0.40, 1.00, 0.89, 0.99, 0.86/1.00, 0.71/1.00), Loop(0.31, 1.00, 0.89, 0.93, 0.83/0.99, 0.70/1.00), and Flex(0.36, 1.00, 0.89, 0.93, 0.82/1.00, 0.70/1.00). DeepMainmast, Helix(0.17, 1.00, 0.95, 1.00, 1.00/1.00, 1.00/1.00), Strand(0.60, 1.00, 0.96,1.00, 0.96/1.00, 0.94/1.00), Loop(0.45, 1.00, 0.92, 0.96, 0.89/1.00, 0.77/1.00), and Flex(0.44, 1.00, 0.91, 0.95, 0.88/1.00, 0.77/1.00). c and d, The bold numbers shown are the average values across all the targets. In this box plot, the center line, the bottom, and the ceiling in a box show the median, first quartile, and third quartile values, respectively. The boundaries of whiskers show 1.5 of the distance between the upper and lower quartiles. Details are provided in Supplementary Table 11_SSanalysis.

Source data

Extended Data Fig. 6 TM-Score distribution of models generated at major steps in the DeepMainmast protocol.

For the three target proteins shown in Fig. 4d, e, and f, all models generated at the three major steps were evaluated in terms of TM-score. a. Models for PDB 3J9S chain A (Fig. 4d). b. Models for PDB 3J9C chain A (Fig. 4e). c. Models for PDB 5V6P chain A (Fig. 4f). In each panel, blue, orange, and green box plots show TM-Score distribution of the models generated by the ‘Assembling Ca Fragments’, ‘Combining Models,’ and ‘Building Full-Atom Models & Refinement’ steps, in Fig. 1, respectively. In these three steps, 54, 4, and 20 models were respectively generated. Red circles represent the models generated by the DeepMainmast(base) protocol. Black circles represent the models additionally generated by DeepMainmast. In the box plots, the middle line in a box corresponds to the median, and the top and bottom ends of a box represent quartiles. The upper and lower whiskers represent 1.5 * the interquartile range. Black diamond represents the outlier from the whiskers. Details are provided in Supplementary Table 12_ModelingStep.

Source data

Extended Data Fig. 7 Modeling results of 20 multi-chain protein complex targets.

a, the Cα coverage. b, the amino acid matching accuracy. c, TM-Score. d, the sequence identity at aligned positions. TM-Score and the sequence identities were computed by MMalign.

Source data

Extended Data Fig. 8 The network architecture of the deep learning method for local structure detection.

The network architecture of Emap2sf (Emap to structural features), which is used to detect amino acid types, atom types, at each grid point in an input EM density map. a. the network architecture. The entire network is a 3D U-shape-based convolutional Network (UNet) with full-scale skip connections and deep supervisions. The numbers indicate the channel size of the corresponding layers. N is 20 for amino acid type detection UNet, N is 6 for atom type detection UNet. b. the encoder and the decoder blocks are shown. The encoder block (Enc in panel a; the decoder block (Dec in panel a). Conv3D, a 3-dimentional (3D) convolutional layer with the filter size of 3*3*3, stride 1 and padding 1. BatchNorm, a normalization layer that takes statistics in a batch to normalize the input data. ReLU, Rectified Linear Unit, a commonly used activation layer.

Extended Data Fig. 9 Chain ID assignment in the DeepMainmast protocol.

a. Example of chain ID assignment for EMD-5925. The deposited model (PDB ID 3J6J) consists of homo octamer structure. All models were colored by chain ID. The magnified images highlight a region where different chains interact. The left column shows the deposited model (PDB 3J6J) of EMD-5925. The middle column shows the model generated by the DeepMainmast(base) protocol prior to the chain ID assignment step. The right column shows the DeepMainmast(base) model after the chain ID assignment step is completed. As shown, chains are correctly connected and identified. b. Illustration of the chain ID assignment for a homo-dimer target. In this example, two five-residue-long models with different chain IDs (green: chain A, and blue: chain B) are shown. Numbered circles represent amino acid residues and the number is the residue number in the sequence. For the chain ID assignment, DeepMainmast maximizes the object function (Eq. 11) that consists of a DAQ score term and a penalty term. The penalty term is intended to have similar structures in different chains of homo-oligomers. On the left and right columns, we illustrated the computation of the penalty term e() before and after the chain ID assignment, respectively. Arrows between two residues with solid lines indicate the distance between residues i and j for chain A and B models (d_A(i,j) and d_B(i,j)). If the |d_A(i,j) - d_B(i,j)| > 3.0 Å, e() = 1. Since the models of chain A and B has different structures, all penalties e() is 1 before the chain ID assignment. After the chain ID assignment, all penalties e() were reduced to zero.

Extended Data Fig. 10 The computational time of DeepMainmast.

a. Computational time of the DeepMainmast protocol on the dataset of 178 single-chain targets. b. Computational time of the DeepMainmast protocol on the dataset of 20 multi-chain targets. For this experiment, we used one GPU card (Nvidia GeForce GTX 1080Ti, 12GB memory) and four threads on one CPU (Intel Xeon CPU E5-1650 v4). The plots show the computational time in three colors (blue, orange, and green), corresponding to the (1) the steps up to the ‘Assembling Cα Fragment’, (2) the steps up to the ‘Combining Models’, and (3) the steps up to ‘building Full-Atom Models & Refinement’ in Fig. 1, respectively. The green solid lines represent the regression lines of the total computational time. GPU handles the deep learning process, while CPU performs the other steps. As shown in the figure, the combining model step, which uses CPUs, takes most of the time. It is generally proportional to the length of the protein but depends on the target protein structure. The required time is strongly influenced by the difficulty of modeling, that is more time is needed by the CP Solver for maps that need to explore a larger number of fragment combinations. On average, a single protein of up to ~500 residues can be modeled within a few hours. To speed up the process, we provided a multi-thread version of the code at the Github repository, which can use multiple CPUs simultaneously. Also, we provided a fast version, which only uses a limited number of parameter combinations and does not perform full-atom building and structure refinement. Details are provided in Supplementary Tables 13_CompTime_single and 14_CompTime_multi.

Source data

Supplementary information

Supplementary Information

Supplementary Table 7 and legends for Supplementary Tables 1–6 and 8–14 (also source data tables).

Reporting Summary

Peer Review File

Source data

Source Data Figs. 2–5 and Source Data Extended Data Figs. 2–7 and 10

The source data of Fig. 2a is in ‘Table S2_AtomAcc’ tab in the Supplementary Tables. The source data of Fig. 2b is in ‘Table S3_DAQ(AA)’ tab in the Supplementary Tables. The source data of Fig. 2c–g is in ‘Table S4_Single_Model_Acc’ tab in the Supplementary Tables. The source data is in ‘Table S5_178targets’ tab in the Supplementary Tables. The source data of Fig. 5a–c is ‘Table S6_MultiChain_results’ tab in the Supplementary Tables. The source data is ‘Table S4_Single_Model_Acc’ tab in the Supplementary Tables. The source data is ‘Table S4_Single_Model_Acc’ tab in the Supplementary Tables. The source data is ‘Table S4_Single_Model_Acc’ tab in the Supplementary Tables. The source data of Extended Data Fig. 5a,b is ‘Table S10_Local_Resolution’ tab in the Supplementary Tables. The source data of Extended Data Fig. 5c,d is ‘Table S11_SSanalysis’ tab in the Supplementary Tables. The source data is ‘Table S12_ModelingStep’ tab in the Supplementary Tables. The source data is ‘Table S6_MultiChain_results’ tab in the Supplementary Tables. The source data of Extended Data Fig. 10a is ‘Table S13_CompTime_single’ tab in the Supplementary Tables. The source data of Extended Data Fig. 10b is ‘Table S14_CompTime_multi’ tab in the Supplementary Tables

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Terashi, G., Wang, X., Prasad, D. et al. DeepMainmast: integrated protocol of protein structure modeling for cryo-EM with deep learning and structure prediction. Nat Methods 21, 122–131 (2024). https://doi.org/10.1038/s41592-023-02099-0

Download citation

Received: 15 March 2023
Accepted: 22 October 2023
Published: 08 December 2023
Issue Date: January 2024
DOI: https://doi.org/10.1038/s41592-023-02099-0

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links