Single-particle cryogenic electron microscopy (cryo-EM) has emerged as a powerful technique to visualize the structural landscape sampled by a protein complex. However, algorithmic and computational bottlenecks in analyzing heterogeneous cryo-EM datasets have prevented the full realization of this potential. CryoDRGN is a machine learning system for heterogeneous cryo-EM reconstruction of proteins and protein complexes from single-particle cryo-EM data. Central to this approach is a deep generative model for heterogeneous cryo-EM density maps, which we empirically find is effective in modeling both discrete and continuous forms of structural variability. Once trained, cryoDRGN is capable of generating an arbitrary number of 3D density maps, and thus interpreting the resulting ensemble is a challenge. Here, we showcase interactive and automated processing approaches for analyzing cryoDRGN results. Specifically, we detail a step-by-step protocol for the analysis of an existing assembling 50S ribosome dataset, including preparation of inputs, network training and visualization of the resulting ensemble of density maps. Additionally, we describe and implement methods to comprehensively analyze and interpret the distribution of volumes with the assistance of an associated atomic model. This protocol is appropriate for structural biologists familiar with processing single-particle cryo-EM datasets and with moderate experience navigating Python and Jupyter notebooks. It requires 3–4 days to complete. CryoDRGN is open source software that is freely available.
This is a preview of subscription content, access via your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
All final and intermediate results presented in this protocol are available at https://doi.org/10.5281/zenodo.5164127.
The software and scripts used in these analyses are available at https://github.com/zhonge/cryodrgn (version 0.3.5) and https://github.com/lkinman/occupancy-analysis (version 0.1.2), as described in Materials. Updates to cryoDRGN will be posted at cryodrgn.csail.mit.edu. All code is available through the open source GPL-3.0 License.
Lyumkis, D. Challenges and opportunities in cryo-EM single-particle analysis. J. Biol. Chem. 294, 5181–5197 (2019).
Wu, M. & Lander, G. C. Present and emerging methodologies in cryo-EM single-particle analysis. Biophys. J. 119, 1281–1289 (2020).
Serna, M. Hands on methods for high resolution cryo-electron microscopy structures of heterogeneous macromolecular complexes. Front. Mol. Biosci. 6, 33 (2019).
Dashti, A. et al. Retrieving functional pathways of biomolecules from single-particle snapshots. Nat. Commun. 11, 4734 (2020).
Dashti, A. et al. Trajectories of the ribosome as a Brownian nanomachine. Proc. Natl Acad. Sci. USA 111, 17492–17497 (2014).
Haselbach, D. et al. Long-range allosteric regulation of the human 26S proteasome by 20S proteasome-targeting cancer drugs. Nat. Commun. 8, 15578 (2017).
Gui, M. et al. Structures of radial spokes and associated complexes important for ciliary motility. Nat. Struct. Mol. Biol. 28, 29–37 (2021).
Zhong, E., Bepler, T., Berger, B. & Davis, J. CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks. Nat. Methods 8, 176–185 (2021).
Punjani, A. & Fleet, D. J. 3D variability analysis: resolving continuous flexibility and discrete heterogeneity from single particle cryo-EM. J. Struct. Biol. 213, 107702 (2021).
Zivanov, J. et al. New tools for automated high-resolution cryo-EM structure determination in RELION-3. eLife https://doi.org/10.7554/eLife.42166 (2018).
Grant, T., Rohou, A. & Grigorieff, N. cisTEM, user-friendly software for single-particle image processing. eLife https://doi.org/10.7554/eLife.35383 (2018).
Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017).
Nakane, T., Kimanius, D., Lindahl, E. & Scheres, S. H. Characterisation of molecular motions in cryo-EM single-particle data by multi-body refinement in RELION. eLife https://doi.org/10.7554/eLife.36861 (2018).
Kingma, D. & Welling, M. Auto-encoding variational Bayes. 2nd International Conference on Learning Representations (2013).
Zhong, E.D., Bepler, T., Davis, J.H. & Berger, B. Reconstructing continuous distributions of 3D protein structure from cryo-EM images. Eighth International Conference on Learning Representations (2020).
Davis, J. H. et al. Modular assembly of the bacterial large ribosomal subunit. Cell 167, 1610–1622 e1615 (2016).
Rabuck-Gibbons, J. N., Lyumkis, D. & Williamson, J. R. Quantitative mining of compositional heterogeneity in cryo-EM datasets of ribosome assembly intermediates. Structure https://doi.org/10.1016/j.str.2021.12.005 (2022).
von Loeffelholz, O. et al. Focused classification and refinement in high-resolution cryo-EM structural analysis of ribosome complexes. Curr. Opin. Struct. Biol. 46, 140–148 (2017).
Zhong, E.D., Lerer A., Davis J.H. & Berger B. CryoDRGN2: Ab initio neural reconstruction of 3D protein structures from real cryo-EM images. IEEE/CVF International Conference on Computer Vision (2021).
Punjani, A. & Fleet, D. J. 3D flexible refinement: structure and motion of flexible proteins from cryo-em. Preprint at bioRxiv https://doi.org/10.1101/2021.04.22.440893 (2021).
Ludtke, S. & Chen, M. Deep learning based mixed-dimensional GMM for characterizing variability in CryoEM. Nat. Methods 18, 930–936 (2021).
Zhong, E. D., Lerer, A., Davis, J. H. & Berger, B. Exploring generative atomic models in cryo-EM reconstruction. Preprint at Arxiv https://arxiv.org/abs/2107.01331v1 (2021).
Rosenbaum, D. et al. Inferring a continuous distribution of atom coordinates from cryo-EM images using VAEs. Preprint at Arxiv https://arxiv.org/abs/2106.14108v1 (2021).
Sekne, Z., Ghanim, G. E., van Roon, A. M. & Nguyen, T. H. D. Structural basis of human telomerase recruitment by TPP1-POT1. Science 375, 1173–1176 (2022).
Chaaban, S. & Carter, A. P. Structure of dynein-dynactin on microtubules shows tandem recruitment of cargo adaptors. Preprint at bioRxiv https://doi.org/10.1101/2022.03.17.482250 (2022).
Schoppe, J. et al. Flexible open conformation of the AP-3 complex explains its role in cargo recruitment at the Golgi. J. Biol. Chem. 297, 101334 (2021).
Punjani, A., Zhang, H. & Fleet, D. J. Non-uniform refinement: adaptive regularization improves single-particle cryo-EM reconstruction. Nat. Methods 17, 1214–1221 (2020).
Zivanov, J., Nakane, T. & Scheres, S. H. W. A Bayesian approach to beam-induced motion correction in cryo-EM single-particle analysis. IUCrJ 6, 5–17 (2019).
Scheres, S. H. RELION: implementation of a Bayesian approach to cryo-EM structure determination. J. Struct. Biol. 180, 519–530 (2012).
Cheng, Y., Grigorieff, N., Penczek, P. A. & Walz, T. A primer to single-particle cryo-electron microscopy. Cell 161, 438–449 (2015).
McInnes, L., Healy, J., Saul, N. & Grossberger, L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 3, 861 (2018).
Davis, J. H. & Williamson, J. R. Structure and dynamics of bacterial ribosome biogenesis. Philos. Trans. Soc. B https://doi.org/10.1098/rstb.2016.0181 (2017).
Trabuco, L. G., Villa, E., Schreiner, E., Harrison, C. B. & Schulten, K. Molecular dynamics flexible fitting: a practical guide to combine cryo-electron microscopy and X-ray crystallography. Methods 49, 174–180 (2009).
Goddard, T. D. et al. UCSF ChimeraX: meeting modern challenges in visualization and analysis. Protein Sci. 27, 14–25 (2018).
We thank the MIT-IBM Satori team for GPU computing resources and support. This work was funded by the NSF GRFP Fellowship to E.D.Z., NIH grant R35-GM141861 to B.B., NSFCAREER-2046778 and NIH grant R01-GM144542 to J.H.D. and a grant from the MIT J-Clinic for Machine Learning and Health to J.H.D. and B.B. Research in the Davis lab is supported by the Alfred P. Sloan Foundation, the James H. Ferry Fund and the Whitehead Family.
The authors declare no competing interests.
Peer review information
Nature Protocols thanks Daniel Edelberg, Dong Si and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Key references using this protocol
Zhong, E. et al. Nat. Methods 18, 176–185 (2021): https://doi.org/10.1038/s41592-020-01049-4
Gui, M. et al. Nat. Struct. Mol. Biol. 28, 29–37 (2021): https://doi.org/10.1038/s41594-020-00530-0
Schoppe, J. et al. J. Biol. Chem. 297, 101334 (2021): https://doi.org/10.1016/j.jbc.2021.101334
Key data used in this protocol
Davis, J. H. et al. Cell 167, 1610–1622.e1615 (2016): https://doi.org/10.1016/j.cell.2016.11.020
Comparison of 10,000 cryoDRGN-parsed particles back-projected at D = 128 px (left) with the unsharpened map from cryoSPARC’s homogeneous refinement (right).
Extended Data Fig. 2 Assessing convergence of representative cryoDRGN density maps during network training.
a, Particle sets of interest A–J identified in epoch 49 by the ‘UMAP local maximum’ method are mapped to prior epochs’ UMAP embeddings. The on-data median latent value of each particle set is embedded into UMAP space and annotated for each epoch. Note that each annotated point maps to the same high-occupancy region of UMAP space following convergence. b, Corresponding volumes generated from each on-data median latent value at five epoch intervals as shown in a. Note that the volumes’ gross morphology stabilizes by epochs 14–19, though some additional details in maxima I and J require 24–29 epochs of training. c, FSC plots correlating each local maximum volume at epochj and at epochj-5.
a, Representative particles filtered by ind_keep.star, selected for further training, and corresponding 2D classification using default cryoSPARC parameters. b, Representative particles filtered by ind_bad.star excluded from further training, and corresponding 2D classification using default cryoSPARC parameters. c, Three-way Venn diagram of ‘junk’ particles identified by one of the following methods: two classes from k = 6 Gaussian mixture model latent-space classification (red, 35,421 particles); nine classes from k = 20 k-means latent-space classification (green, 29,080 particles); or latent encoding magnitude (z-norm) exceeding 0.5 standard deviations larger than the mean (blue, 30,879 particles). d, Corresponding CryoSPARC 2D-classification results using ‘junk’ particles identified through the GMM (top), k-means (middle) or z-norm (bottom) filtering approaches. e,f, UMAP embedding (e) or PCA projections of latent space (f) highlighting location of junk particles identified by GMM (red), k-means (green) or z-norm (blue) methods.
a, Representative plot of average total loss at each epoch. b, Median per-particle movement through latent space, characterized by vectors connecting each particle’s latent embedding in successive epochs. Resulting vector dot products (left), magnitude (center) and cosine distance (right) are shown. c, Identification of representative latent embeddings via the ‘UMAP local maxima method’. The UMAP embedding of epoch 99 is binned into a 2D histogram, smoothed, annotated with local maxima and overlaid with the maxima. The on-data median UMAP location of each maximum and its neighboring eight bins is shown. Label order corresponds to decreasing particle count in each local maximum. d,e, Map–map correlation (d) and FSC (e) at Nyquist frequency calculated between representative volumes generated as defined in c at five epoch intervals. Epochs for which the encoder network has not converged are noted with dotted lines.
Extended Data Fig. 5 Assessing convergence of representative cryoDRGN density maps during high-resolution training.
a, Particle sets A–J identified by the ‘UMAP local maximum’ method (Box 1) mapped to prior epochs as illustrated in Extended Data Fig. 2. b, Corresponding volumes generated from labeled positions in a. Note that the volumes’ gross morphology stabilizes by epochs 19–29, though maximum I stabilizes as a 70S ribosome around epoch 39. c, FSC plots between volumes from each local maximum offset by five epochs of training, as in Extended Data Fig. 2. The map-to-map FSC stabilizes by epoch 39.
a, The UMAP representation of the latent space resulting from 50 epochs of high-resolution training, colored by indicated imaging parameters. b, Angular and translational pose distributions. c, PCA of the latent space, colored by the 20 k-means cluster centers automatically generated by cryodrgn analyze. Numbered black dots indicate the locations in latent space of each k-means cluster center volume.
UMAP representation of the latent space resulting from 50 epochs of high-resolution training, with contours colored with darker blues as particle density increases. Sampled points correspond to the centers of 500 k-means clusters and are indicated with white circles.
Extended Data Fig. 8 Confusion matrix of published class labels and classes assigned by subunit occupancy analysis.
k-Means 500 cluster center maps were assigned to 15 classes by subunit occupancy analysis. Particles within a given k-means 500 cluster are assigned to the same subunit occupancy class as the center map. Published particle labels were drawn from ref. 16, and the fractional correspondence is plotted as a heatmap. Note that published classes A and F corresponded to 70S and 30S particles, respectively.
Extended Data Fig. 9 Graph traversal through latent space for the B→D1→D2→D3→D4→E3→E5 assembly pathway.
Centroid volumes from the subunit occupancy classes were aligned and compared with the assembly intermediate structures identified in ref. 16 to determine approximate equivalences between published classes and subunit occupancy classes. The volumes corresponding to intermediates B, D1, D2, D3, D4, E3 and E5 were provided to cryodrgn graph_traversal as anchor points; the resulting path through latent space is shown. Non-anchor points are indicated with white circles, whereas anchor points and their corresponding class ID are shown with colored circles. Volumes resulting from the complete graph traversal are shown in Supplementary Video 3.
Particles (1,149) in the C4 class were identified by subunit occupancy analysis and are highlighted in orange.
Supplementary Protocols 1–6 and Supplementary Tables 1 and 2.
PC1 trajectory from high resolution training. Density maps sampled along PC1 were automatically generated by the cryodrgn analyze command. Volumes are displayed at the same isosurface level, and generated from the 5th to 95th PC1 value along the PC1 axis.
PC2 trajectory from high-resolution training. Density maps sampled along PC2 were automatically generated by the cryodrgn analyze command. Volumes are displayed at the same isosurface level, and generated from the 5th to 95th PC2 value along the PC2 axis.
Graph traversal showing the B→D1→D2→D3→D4→E3→E5 assembly pathway. Graph traversal pathway was generated using the cryodrgn graph_traversal command as described in the protocol. The path taken by the traversal through latent space is shown in Extended Data Figure 9. All volumes are displayed at the same isosurface level.
About this article
Cite this article
Kinman, L.F., Powell, B.M., Zhong, E.D. et al. Uncovering structural ensembles from single-particle cryo-EM data using cryoDRGN. Nat Protoc 18, 319–339 (2023). https://doi.org/10.1038/s41596-022-00763-x