Cryo-electron microscopy (cryo-EM) single-particle analysis has proven powerful in determining the structures of rigid macromolecules. However, many imaged protein complexes exhibit conformational and compositional heterogeneity that poses a major challenge to existing three-dimensional reconstruction methods. Here, we present cryoDRGN, an algorithm that leverages the representation power of deep neural networks to directly reconstruct continuous distributions of 3D density maps and map per-particle heterogeneity of single-particle cryo-EM datasets. Using cryoDRGN, we uncovered residual heterogeneity in high-resolution datasets of the 80S ribosome and the RAG complex, revealed a new structural state of the assembling 50S ribosome, and visualized large-scale continuous motions of a spliceosome complex. CryoDRGN contains interactive tools to visualize a dataset’s distribution of per-particle variability, generate density maps for exploratory analysis, extract particle subsets for use with other tools and generate trajectories to visualize molecular motions. CryoDRGN is open-source software freely available at http://cryodrgn.csail.mit.edu.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Nature Communications Open Access 03 June 2023
Nature Methods Open Access 11 May 2023
Scientific Reports Open Access 25 January 2023
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Trained cryoDRGN models and generated volumes were deposited in Zenodo at https://doi.org/10.5281/zenodo.435528452. Input files for training (excluding particle stacks) were deposited in Zenodo at https://doi.org/10.5281/zenodo.4412072 and are also available at https://www.github.com/zhonge/cryodrgn_empiar53. We used the following publicly available datasets: EMPIAR-10049 (cryo-EM structures of a synaptic RAG1–RAG2 complex), EMPIAR-10028 (cryo-EM structure of a P. falciparum 80S ribosome bound to the anti-protozoan drug emetine), EMPIAR-10076 (modular assembly of the large bacterial ribosome) and EMPIAR-10180 (structure of a pre-catalytic spliceosome). The simulated heterogeneous datasets were deposited in Zenodo at https://doi.org/10.5281/zenodo.435528452.
Nogales, E. The development of cryo-EM into a mainstream structural biology technique. Nat. Methods 13, 24–27 (2015).
Cheng, Y. Single-particle cryo-EM—how did it get here and where will it go. Science 361, 876–880 (2018).
Bammes, B. E., Rochat, R. H., Jakana, J., Chen, D.-H. & Chiu, W. Direct electron detection yields cryo-EM reconstructions at resolutions beyond 3/4 Nyquist frequency. J. Struct. Biol. 177, 589–601 (2012).
Suloway, C. et al. Automated molecular microscopy: the new Leginon system. J. Struct. Biol. 151, 41–60 (2005).
Li, X. et al. Electron counting and beam-induced motion correction enable near-atomic-resolution single-particle cryo-EM. Nat. Methods 10, 584–590 (2013).
Zhang, K. Gctf: real-time CTF determination and correction. J. Struct. Biol. 193, 1–12 (2016).
Brubaker, M. A., Punjani, A. & Fleet, D. J. Building proteins in a day: efficient 3D molecular reconstruction. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 3099–3108 (CVPR, 2015).
Scheres, S. H. W. A Bayesian view on cryo-EM structure determination. J. Mol. Biol. 415, 406–418 (2012).
Bepler, T. et al. Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs. Nat. Methods 16, 1153–1160 (2019).
Ahmed, T., Yin, Z. & Bhushan, S. Cryo-EM structure of the large subunit of the spinach chloroplast ribosome. Sci. Rep. 6, 35793 (2016).
Wrapp, D. et al. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science 367, 1260–1263 (2020).
Davis, J. H. et al. Modular assembly of the bacterial large ribosomal subunit. Cell 167, 1610–1622 (2016).
Haselbach, D. et al. Structure and conformational dynamics of the human spliceosomal Bact complex. Cell 172, 454–464 (2018).
Sigworth, F. J. Principles of cryo-EM single-particle image processing. Microscopy 65, 57–67 (2016).
Scheres, S. H. W. RELION: implementation of a Bayesian approach to cryo-EM structure determination. J. Struct. Biol. 180, 519–530 (2012).
Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017).
Lyumkis, D., Brilot, A. F., Theobald, D. L. & Grigorieff, N. Likelihood-based classification of cryo-EM images using FREALIGN. J. Struct. Biol. 183, 377–388 (2013).
Grant, T., Rohou, A. & Grigorieff, N. cisTEM, user-friendly software for single-particle image processing. eLife 7, e14874 (2018).
Nakane, T., Kimanius, D., Lindahl, E. & Scheres, S. H. Characterisation of molecular motions in cryo-EM single-particle data by multi-body refinement in RELION. eLife 7, e36861 (2018).
Liu, W. & Frank, J. Estimation of variance distribution in three-dimensional reconstruction. I. Theory. J. Opt. Soc. Am. A 12, 2615–2627 (1995).
Penczek, P. A., Kimmel, M. & Spahn, C. M. T. Identifying conformational states of macromolecules by eigen-analysis of resampled cryo-EM images. Structure 19, 1582–1590 (2011).
Tagare, H. D., Kucukelbir, A., Sigworth, F. J., Wang, H. & Rao, M. Directly reconstructing principal components of heterogeneous particles from cryo-EM images. J. Struct. Biol. 191, 245–262 (2015).
Punjani, A. & Fleet, D. J. 3D Variability Analysis: directly resolving continuous flexibility and discrete heterogeneity from single particle cryo-EM images. Preprint at bioRxiv https://doi.org/10.1101/2020.04.08.032466 (2020).
Dashti, A. et al. Trajectories of the ribosome as a Brownian nanomachine. Proc. Natl Acad. Sci. USA 111, 17492–17497 (2014).
Frank, J. & Ourmazd, A. Continuous changes in structure mapped by manifold embedding of single-particle data in cryo-EM. Methods 100, 61–67 (2016).
Moscovich, A., Halevi, A., Andén, J. & Singer, A. Cryo-EM reconstruction of continuous heterogeneity by Laplacian spectral volumes. Inverse Probl. 36, 024003 (2020).
Lederman, R. R. & Singer, A. Continuously heterogeneous hyper-objects in cryo-EM and 3-D movies of many temporal dimensions. Preprint at https://arxiv.org/abs/1704.02899 (2017).
Hornik, K., Stinchcombe, M. B. & White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366 (1989).
Bracewell, R. N. Strip integration in radio astronomy. Aust. J. Phys. 9, 198–217 (1956).
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. In International Conference on Learning Representations (ICLR, 2014).
Ru, H. et al. Molecular mechanism of V(D)J recombination from synaptic RAG1–RAG2 complex structures. Cell 163, 1138–1152 (2015).
Wong, W. et al. Cryo-EM structure of the Plasmodium falciparum 80S ribosome bound to the anti-protozoan drug emetine. eLife 3, e01963 (2014).
Ru, H., Zhang, P. & Wu, H. Structural gymnastics of RAG-mediated DNA cleavage in V(D)J recombination. Curr. Opin. Struct. Biol. 53, 178–186 (2018).
Sun, M. et al. Dynamical features of the Plasmodium falciparum ribosome during translation. Nucleic Acids Res. 43, 10515–10524 (2015).
McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and Projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).
Plaschka, C., Lin, P.-C. & Nagai, K. Structure of a pre-catalytic spliceosome. Nature 546, 617–621 (2017).
Cormen, T. H., Leiserson, C. E., Rivest, R. L. & Stein, C. Introduction to Algorithms. 595–601 (MIT Press and McGraw-Hill, 2009).
Zhang, C., Bengio, S., Hardt, M., Recht, B. & Vinyals, O. Understanding deep learning requires rethinking generalization. In International Conference on Learning Representations (ICLR, 2017).
Zivanov, J., Nakane, T. & Scheres, S. H. W. Estimation of high-order aberrations and anisotropic magnification from cryo-EM data sets in RELION-3.1. IUCrJ 7, 253–267 (2020).
Punjani, A., Zhang, H. & Fleet, D. J. Non-uniform refinement: adaptive regularization improves single particle cryo-EM reconstruction. Nat. Methods 17, 1214–1221 (2020).
Zhong, E. D., Bepler, T., Davis, J. H. & Berger, B. Reconstructing continuous distributions of 3D protein structure from cryo-EM images. In International Conference of Learning Representations (ICLR, 2020).
Bepler, T., Zhong, E., Kelley, K., Brignole, E. & Berger, B. Explicitly disentangling image content from translation and rotation with spatial-VAE. In Advances in Neural Information Processing Systems (NeurIPS, 2019).
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems (NIPS, 2017).
Rezende, D. J., Mohamed, S. & Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. In International Conference on Machine Learning (ICML, 2014).
The PyMOL Molecular Graphics System, version 2.3 (Schrodinger, 2019).
Tang, G. et al. EMAN2: an extensible image processing suite for electron microscopy. J. Struct. Biol. 157, 38–46 (2007).
Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).
Iudin, A., Korir, P. K., Salavert-Torres, J., Kleywegt, G. J. & Patwardhan, A. EMPIAR: a public archive for raw electron microscopy image data. Nat. Methods 13, 387–388 (2016).
Rosenthal, P. B. & Henderson, R. Optimal determination of particle orientation, absolute hand, and contrast loss in single-particle electron cryomicroscopy. J. Mol. Biol. 333, 721–745 (2003).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Goddard, T. D. et al. UCSF ChimeraX: meeting modern challenges in visualization and analysis. Protein Sci. 27, 14–25 (2018).
Zhong, E. D. Data for "CryoDRGN: Reconstruction of heterogeneous cryo-EM structures using neural networks". Zenodo https://doi.org/10.5281/zenodo.4355284 (2021).
Zhong, E. D. zhonge/cryodrgn_empiar: initial release. Zenodo https://doi.org/10.5281/zenodo.4412072 (2021).
Zhong, E. D. zhonge/cryodrgn: version 0.3.0. Zenodo https://doi.org/10.5281/zenodo.4355743 (2020).
We thank A. Lerer, R. Lederman, B. Demeo, A. Narayan, K. Kelley, B. Sauer, P. Sharp, S. Rodriques and D. Haselbach for helpful discussions and feedback. We are grateful to the MIT-IBM Satori team for GPU computing resources and support. This work was funded by the National Science Foundation Graduate Research Fellowship Program to E.D.Z., NIH grant R01-GM081871 to B.B., NIH grant R00-AG050749 to J.H.D., NVIDIA-GPU grant to J.H.D. and a grant from the MIT J-Clinic for Machine Learning and Health to J.H.D. and B.B.
The authors declare no competing interests.
Peer review information Arunima Singh was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 Per-image FSC curves between ground-truth maps and density maps from cryoDRGN trained on simulated heterogeneous datasets.
For each dataset, we compute 100 ‘per-image FSC curves’ between generated and ground-truth density maps (Methods). Images are sampled at equally spaced percentiles along the reaction coordinate for the Uniform, Cooperative, and Noncontiguous datasets. For the Compositional dataset, the per-image FSC for 20, 30, and 50 randomly sampled images of the 30S, 50S, and 70S ribosome, respectively, are shown. No mask is used in computing the FSC.
Extended Data Fig. 2 RAG complex density maps reconstructed by cryoDRGN and by heterogeneous refinement in cryoSPARC.
a, Front (top) and back (bottom) view of the six cryoDRGN density maps of the RAG complex from Fig. 4b. b, Density maps from 3D classification in cryoSPARC using the cryoDRGN density maps in (a) as initial models. Gold-standard FSC resolution and number of particles used in reconstruction are noted. c) Two side views of the density maps from 3D classification in (b), focusing on the RSS and NBD.
a, UMAP visualization of latent space encodings of EMPIAR-10028 particles with 50 sampled points shown in black. Sampled points are ordered according to distances in latent space (Methods). Visual inspection of the 50 volumes generated at the depicted points reveals 3 volumes with the 40S in a rotated state (purple) and 6 volumes with portions of the 40S head region missing (pink). b, Density map of the 80S ribosome with the missing head group reconstructed by cryoDRGN (pink) compared with the density maps from Fig. 4c showing the canonical (blue) and 40S-rotated (purple) forms of the 80S ribosome. The density maps are generated from points 32, 4, and 1 in panel A from left to right.
a, PCA and UMAP visualization of the cryoDRGN latent space representation of Pf80S particle images with 4,889 particles separated along PC1, selected with k-means clustering, colored in purple (Methods). b, Density map from cryoSPARC homogeneous refinement (purple) using the 4,889 particles selected in (a). The density map is also shown superimposed with the cryoDRGN unrotated state (blue) and annotated as in Fig. 4c. c, Gold standard FSC (GSFSC) curve between independent half-maps of the cryoSPARC refinement of the Pf80S rotated state and map-to-map FSC between the cryoDRGN and cryoSPARC density map of the Pf80S rotated state. Dotted lines indicate 0.5 and 0.143 cutoffs.
a, UMAP visualization of the 10-D latent encodings from cryoDRGN as in Fig. 5b, colored by cluster after fitting a 5-component Gaussian mixture model. The cluster that was removed from subsequent analysis is colored orange. b, UMAP visualization of (a), colored by the magnitude of the latent encodings, ||z||. c, Nine randomly sampled particle images from EMPIAR-10076 with latent encoding magnitude ||z|| > 10 as predicted from cryoDRGN training in (a,b). Each image is 419.2 Å along each side. d, Table summarizing dataset filtering. e,f, 2D classification and ab initio reconstruction of the 34,868 removed particles. g,h, 2D classification and ab initio reconstruction of the 97,031 kept particles.
a, Density maps of the LSU minor assembly states reconstructed by cryoDRGN. Each cryoDRGN structure is generated at mean of the latent encoding of particles with the corresponding class assignment from Davis et al.12. b, Map-to-map FSC curves between the generated cryoDRGN density maps and the published density map from Davis et al.12. Published resolutions for assembly states B-E ranged between ~4-5 Å. Dotted lines indicate 0.5 and 0.143 cutoffs. c,d, Reproduction of the cryoDRGN latent space shown in Fig. 5g, colored by minor assembly state (c), or viewed in separate panels (d).
a, Density map from cryoSPARC homogeneous refinement of the 1,113 particles selected from the cryoDRGN latent representation that constitute class C4 (right), compared with the density map generated by cryoDRGN (left) from Fig. 5i. rRNA helix 68 is circled in red. b, Gold standard FSC (GSFSC) curve between independent half-maps of the cryoSPARC reconstruction and map-to-map FSC between the cryoDRGN and cryoSPARC maps shown in (a). Dotted lines indicate 0.5 and 0.143 cutoffs.
Extended Data Fig. 8 Reproducibility of cryoDRGN’s latent space representation of the assembling ribosome.
a, UMAP visualization of the latent encodings from replicate runs of cryoDRGN trained on the filtered particles of EMPIAR-10076. Particle embeddings are colored by major assembly state assigned from 3D classification in Davis et al12. b, UMAP visualization of (a), colored by cluster after fitting a 5-component Gaussian mixture model on the UMAP embeddings. c,d, Consistency of the GMM labeling between replicates reported as the percentage of particles with identical labels (c) and the confusion matrix of GMM cluster assignments (d).
Extended Data Fig. 9 Comparison of multi-body refinement and cryoDRGN of the pre-catalytic spliceosome.
a, Visualization of a rigid-body trajectory from multibody refinement of the pre-catalytic spliceosome. Snapshots are extracted from the trajectory along PC1 of rigid-body orientations, showing a large-scale motion of the SF3b subcomplex. The masks that define the rigid-body decomposition of the complex are shown on the right. The circle highlights a helix that breaks at the boundary between bodies where the rigid-body assumption no longer holds. Adapted from Video 3 of Nakane et al.19 and density maps and masks deposited in EMPIAR-10180. b, Alternate view of cryoDRGN’s PC1 traversal in Fig. 6. CryoDRGN learns the same overall motion of the SF3b subcomplex, however its neural network representation lacks the helix-breaking artifact.
a, Density map of the consensus reconstruction and 2D projections of the top three 3DVA variability components (that is, eigen-volumes) that form a linear basis describing structural heterogeneity of the pre-catalytic spliceosome. b, 3DVA latent encodings of particles from the filtered EMPIAR-10180 dataset. c, Comparison of 3DVA component 1 latent encodings and PC1 of the cryoDRGN 10D latent encodings from Fig. 6c. Correlation indicates Spearman correlation. d, 3DVA component 1 trajectory at the depicted points in (b). e, Alternate view of the density maps from the cryoDRGN PC1 trajectory in Fig. 6d.
Supplementary Table 1 and Fig. 1
CryoDRGN reconstruction of the RAG1–RAG2 complex. Structures (left) and their corresponding location in the latent space (right). The series of structures shown are generated at the k-means cluster centers of the latent encodings with k = 20, followed by a trajectory generated with cryoDRGN’s graph-traversal algorithm.
CryoDRGN reconstruction of the Pf80S ribosome. Structures from two distinct views (top) and their corresponding location in the latent space (bottom). The series of structures shown are generated at the k-means cluster centers of the latent encodings with k = 20, followed by a trajectory generated with cryoDRGN’s graph-traversal algorithm.
CryoDRGN reconstruction of the bacterial LSU undergoing assembly. Structures (left) and their corresponding location in latent space (right), which is colored by minor assembly state as in Fig. 5h. Trajectories are generated with cryoDRGN’s graph-traversal algorithm through the three parallel assembly pathways assigned in Davis et al.12.
CryoDRGN reconstruction of the pre-catalytic spliceosome. Front and back view of structures (left) and their corresponding locations in latent space (right). Trajectories show density maps generated from PC1 and PC2 of the latent encodings and from cryoDRGN’s graph-traversal algorithm.
Supplementary Table 1. Summary of dataset statistics, training hyperparameters, and runtimes for cryoDRGN heterogeneous reconstruction experiments. The neural network architecture is denoted as d×l, where d indicates the number of nodes per layer and l is the number of hidden layers. The architecture corresponds to both the encoder (E) and decoder (D) MLPs unless otherwise specified. Total training times were recorded from training on a single Nvidia Tesla V100 32GB memory GPU card on either an Intel Xeon Gold 6130 CPU (2.10GHz, 791GB of RAM) or an IBM Power9 node with 1.2 TB of RAM. The reported training times may be overestimated as the presence of any concurrently running programs was not controlled for. (*) The training time of the third replicate of EMPIAR-10076 is substantially faster as using |z| = 8 better satisfies tensor shape constraints for Nvidia Tensor Core hardware acceleration.
About this article
Cite this article
Zhong, E.D., Bepler, T., Berger, B. et al. CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks. Nat Methods 18, 176–185 (2021). https://doi.org/10.1038/s41592-020-01049-4
This article is cited by
Nature Structural & Molecular Biology (2023)
Nature Methods (2023)
Scientific Reports (2023)
Nature Communications (2023)