Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Learning structural heterogeneity from cryo-electron sub-tomograms with tomoDRGN

Abstract

Cryo-electron tomography (cryo-ET) enables observation of macromolecular complexes in their native, spatially contextualized cellular environment. Cryo-ET processing software to visualize such complexes at nanometer resolution via iterative alignment and averaging are well developed but rely upon assumptions of structural homogeneity among the complexes of interest. Recently developed tools allow for some assessment of structural diversity but have limited capacity to represent highly heterogeneous structures, including those undergoing continuous conformational changes. Here we extend the highly expressive cryoDRGN (Deep Reconstructing Generative Networks) deep learning architecture, originally created for single-particle cryo-electron microscopy analysis, to cryo-ET. Our new tool, tomoDRGN, learns a continuous low-dimensional representation of structural heterogeneity in cryo-ET datasets while also learning to reconstruct heterogeneous structural ensembles supported by the underlying data. Using simulated and experimental data, we describe and benchmark architectural choices within tomoDRGN that are uniquely necessitated and enabled by cryo-ET. We additionally illustrate tomoDRGN’s efficacy in analyzing diverse datasets, using it to reveal high-level organization of human immunodeficiency virus (HIV) capsid complexes assembled in virus-like particles and to resolve extensive structural heterogeneity among ribosomes imaged in situ.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: A neural network architecture to analyze structurally heterogeneous particles imaged by cryo-ET.
Fig. 2: TomoDRGN recovers compositional and conformational heterogeneity in simulated datasets.
Fig. 3: TomoDRGN finds residual heterogeneity within primarily homogeneous purified particles.
Fig. 4: TomoDRGN resolves high-resolution features from sub-tomograms collected in situ.
Fig. 5: TomoDRGN uncovers structural heterogeneity in ribosomes imaged in situ.
Fig. 6: TomoDRGN captures intermolecular heterogeneity in situ.

Similar content being viewed by others

Data availability

Extracted particle sub-tomograms from reprocessing of EMPIAR-10499 have been deposited under EMPIAR-11843. Requisite EMDB volumes and PDB models to generate synthetic data using cryoSRPNT as described in Methods are deposited at https://zenodo.org/doi/10.5281/zenodo.10076628. The trained models, latent embeddings and particle classifications used to analyze all datasets presented have been deposited at https://zenodo.org/doi/10.5281/zenodo.10076628 for simulated datasets and at https://zenodo.org/doi/10.5281/zenodo.10093310 for experimental datasets. Maps corresponding to C1 holoferritin and C1 apoferritin from EMPIAR-10491 generated in M have been deposited under EMD-43285 and EMD-43286. The map of the SecDF-associated 70S ribosome from EMPIAR-10499 generated in RELION has been deposited under EMD-43287. Source data are provided with this paper.

Code availability

TomoDRGN source code, installation instructions and example usage are available at https://github.com/bpowell122/tomodrgn. Version 0.2.2 was used in this study. Scripts used to generate simulated data are available at https://github.com/bpowell122/cryoSRPNT. Version 0.1.0 was used in this study.

References

  1. Bai, X. C., McMullan, G. & Scheres, S. H. How cryo-EM is revolutionizing structural biology. Trends Biochem. Sci. 40, 49–57 (2015).

    Article  CAS  PubMed  Google Scholar 

  2. Murata, K. & Wolf, M. Cryo-electron microscopy for structural analysis of dynamic biological macromolecules. Biochim. Biophys. Acta Gen. Subj. 1862, 324–334 (2018).

    Article  CAS  PubMed  Google Scholar 

  3. Cheng, Y., Grigorieff, N., Penczek, P. A. & Walz, T. A primer to single-particle cryo-electron microscopy. Cell 161, 438–449 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Zhong, E. D., Bepler, T., Berger, B. & Davis, J. H. CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks. Nat. Methods 18, 176–185 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Punjani, A. & Fleet, D. J. 3D variability analysis: resolving continuous flexibility and discrete heterogeneity from single particle cryo-EM. J. Struct. Biol. 213, 107702 (2021).

    Article  CAS  PubMed  Google Scholar 

  6. Chen, M. & Ludtke, S. J. Deep learning-based mixed-dimensional Gaussian mixture model for characterizing variability in cryo-EM. Nat. Methods 18, 930–936 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Dashti, A. et al. Retrieving functional pathways of biomolecules from single-particle snapshots. Nat. Commun. 11, 4734 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  8. Kinman, L. F., Powell, B. M., Zhong, E. D., Berger, B. & Davis, J. H. Uncovering structural ensembles from single-particle cryo-EM data using cryoDRGN. Nat. Protoc. 18, 319–339 (2023).

    CAS  PubMed  Google Scholar 

  9. Sun, J., Kinman, L. F., Jahagirdar, D., Ortega, J. & Davis, J. H. KsgA facilitates ribosomal small subunit maturation by proofreading a key structural lesion. Nat. Struct. Mol. Biol. 30, 1468–1480 (2023).

  10. Asano, S., Engel, B. D. & Baumeister, W. In situ cryo-electron tomography: a post-reductionist approach to structural biology. J. Mol. Biol. 428, 332–343 (2016).

    Article  CAS  PubMed  Google Scholar 

  11. Lovatt, M., Leistner, C. & Frank, R. A. W. Bridging length scales from molecules to the whole organism by cryoCLEM and cryoET. Faraday Discuss. 240, 114–126 (2022).

  12. Xue, L. et al. Visualizing translation dynamics at atomic detail inside a bacterial cell. Nature 610, 205–211 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  13. Gemmer, M. et al. Visualization of translation and protein biogenesis at the ER membrane. Nature 614, 160–167 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  14. Hoffmann, P. C. et al. Structures of the eukaryotic ribosome and its translational states in situ. Nat. Commun. 13, 7435 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  15. Zhang, P. Advances in cryo-electron tomography and subtomogram averaging and classification. Curr. Opin. Struct. Biol. 58, 249–258 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Castano-Diez, D. & Zanetti, G. In situ structure determination by subtomogram averaging. Curr. Opin. Struct. Biol. 58, 68–75 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Bharat, T. A. & Scheres, S. H. Resolving macromolecular structures from electron cryo-tomography data using subtomogram averaging in RELION. Nat. Protoc. 11, 2054–2065 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Pyle, E. & Zanetti, G. Current data processing strategies for cryo-electron tomography and subtomogram averaging. Biochem. J. 478, 1827–1845 (2021).

    Article  CAS  PubMed  Google Scholar 

  19. Castano-Diez, D., Kudryashev, M., Arheit, M. & Stahlberg, H. Dynamo: a flexible, user-friendly development tool for subtomogram averaging of cryo-EM data in high-performance computing environments. J. Struct. Biol. 178, 139–151 (2012).

    Article  PubMed  Google Scholar 

  20. Hrabe, T. et al. PyTom: a Python-based toolbox for localization of macromolecules in cryo-electron tomograms and subtomogram analysis. J. Struct. Biol. 178, 177–188 (2012).

    Article  CAS  PubMed  Google Scholar 

  21. Nickell, S. et al. TOM software toolbox: acquisition and analysis for electron tomography. J. Struct. Biol. 149, 227–234 (2005).

    Article  PubMed  Google Scholar 

  22. Scheres, S. H. W., Melero, R., Valle, M. & Carazo, J. M. Averaging of electron subtomograms and random conical tilt reconstructions through likelihood optimization. Structure 17, 1563–1572 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Winkler, H. et al. Tomographic subvolume alignment and subvolume classification applied to myosin V and SIV envelope spikes. J. Struct. Biol. 165, 64–77 (2009).

    Article  CAS  PubMed  Google Scholar 

  24. Bartesaghi, A. et al. Classification and 3D averaging with missing wedge correction in biological electron tomography. J. Struct. Biol. 162, 436–450 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Walz, J. et al. Electron tomography of single ice-embedded macromolecules: three-dimensional alignment and classification. J. Struct. Biol. 120, 387–395 (1997).

    Article  CAS  PubMed  Google Scholar 

  26. Zivanov, J. et al. A Bayesian approach to single-particle electron cryo-tomography in RELION-4.0. eLife 11, e83724 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Tegunov, D., Xue, L., Dienemann, C., Cramer, P. & Mahamid, J. Multi-particle cryo-EM refinement with M visualizes ribosome–antibiotic complex at 3.5 Å in cells. Nat. Methods 18, 186–193 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Chen, M. et al. A complete data processing workflow for cryo-ET and subtomogram averaging. Nat. Methods 16, 1161–1168 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Himes, B. A. & Zhang, P. emClarity: software for high-resolution cryo-electron tomography and subtomogram averaging. Nat. Methods 15, 955–961 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Jiang, W. et al. A transformation clustering algorithm and its application in polyribosomes structural profiling. Nucleic Acids Res. 50, 9001–9011 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Cheng, J., Wu, C., Li, J., Yang, Q. & Zhang, X. Visualizing translating dynamics in situ at high spatial and temporal resolution in eukaryotic cells. Preprint at bioRxiv https://doi.org/10.1101/2023.07.12.548775 (2023).

  32. Fedry, J. et al. Visualization of translation reorganization upon persistent collision stress in mammalian cells. Preprint at bioRxiv https://doi.org/10.1101/2023.03.23.533914 (2023).

  33. Harastani, M., Eltsov, M., Leforestier, A. & Jonic, S. TomoFlow: analysis of continuous conformational variability of macromolecules in cryogenic subtomograms based on 3D dense optical flow. J. Mol. Biol. 434, 167381 (2022).

    Article  CAS  PubMed  Google Scholar 

  34. Harastani, M., Eltsov, M., Leforestier, A. & Jonic, S. HEMNMA-3D: cryo electron tomography method based on normal mode analysis to study continuous conformational variability of macromolecular complexes. Front. Mol. Biosci. 8, 663121 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Stolken, M. et al. Maximum likelihood based classification of electron tomographic data. J. Struct. Biol. 173, 77–85 (2011).

    Article  PubMed  Google Scholar 

  36. Bartesaghi, A., Lecumberry, F., Sapiro, G. & Subramaniam, S. Protein secondary structure determination by constrained single-particle cryo-electron tomography. Structure 20, 2003–2013 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Balyschew, N. et al. Streamlined structure determination by cryo-electron tomography and subtomogram averaging using TomoBEAR. Nat. Commun. 14, 6543 (2023).

  38. Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. Preprint at arxiv.org/abs/1312.6114 (2013).

  39. Zhong, E. D., Bepler, T., Davis, J. H. & Berger, B. Reconstructing continuous distributions of 3D protein structure from cryo-EM images. Preprint at arxiv.org/abs/1909.05215 (2019).

  40. Bepler, T., Zhong, E., Kelley, K., Brignole, E. & Berger, B. Explicitly disentangling image content from translation and rotation with spatial-VAE. In Advances in Neural Information Processing Systems (NeurIPS, 2019).

  41. Higgins, I. et al. β-VAE: learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations (ICLR, 2016).

  42. Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2018).

  43. Grant, T. & Grigorieff, N. Measuring the optimal exposure for single particle cryo-EM using a 2.6 Å reconstruction of rotavirus VP6. eLife 4, e06980 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Bharat, T. A. M., Russo, C. J., Lowe, J., Passmore, L. A. & Scheres, S. H. W. Advances in single-particle electron cryomicroscopy structure determination applied to sub-tomogram averaging. Structure 23, 1743–1753 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Hayward, S. B. & Glaeser, R. M. Radiation damage of purple membrane at low temperature. Ultramicroscopy 4, 201–210 (1979).

    Article  CAS  Google Scholar 

  46. Glaeser, R. M. Prospects for extending the resolution limit of the electron microscope. J. Microsc. 117, 77–91 (1979).

    Article  CAS  PubMed  Google Scholar 

  47. Baxter, W. T., Grassucci, R. A., Gao, H. & Frank, J. Determination of signal-to-noise ratios and spectral SNRs in cryo-EM low-dose imaging of molecules. J. Struct. Biol. 166, 126–132 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Davis, J. H. et al. Modular assembly of the bacterial large ribosomal subunit. Cell 167, 1610–1622 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Davis, J. H. & Williamson, J. R. Structure and dynamics of bacterial ribosome biogenesis. Philos. Trans. R. Soc. Lond. B Biol. Sci. 372, 20160181 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Guo, H. & Rubinstein, J. L. Structure of ATP synthase under strain during catalysis. Nat. Commun. 13, 2232 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  51. Schur, F. K. et al. An atomic model of HIV-1 capsid–SP1 reveals structures regulating assembly and maturation. Science 353, 506–508 (2016).

    Article  ADS  CAS  PubMed  Google Scholar 

  52. Mendonca, L. et al. CryoET structures of immature HIV Gag reveal six-helix bundle. Commun. Biol. 4, 481 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Stojkovic, V. et al. Assessment of the nucleotide modifications in the high-resolution cryo-electron microscopy structure of the Escherichia coli 50S subunit. Nucleic Acids Res. 48, 2723–2732 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Fromm, S. A. et al. The translating bacterial ribosome at 1.55 Å resolution generated by cryo-EM imaging services. Nat. Commun. 14, 1095 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  55. Chen, S. S., Sperling, E., Silverman, J. M., Davis, J. H. & Williamson, J. R. Measuring the dynamics of E. coli ribosome biogenesis using pulse-labeling and quantitative mass spectrometry. Mol. Biosyst. 8, 3325–3334 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Turk, M. & Baumeister, W. The promise and the challenges of cryo-electron tomography. FEBS Lett. 594, 3243–3261 (2020).

    Article  CAS  PubMed  Google Scholar 

  57. Saito, K. et al. Ribosome collisions induce mRNA cleavage and ribosome rescue in bacteria. Nature 603, 503–508 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  58. Rangan, R. et al. Deep reconstructing generative networks for visualizing dynamic biomolecules inside cells. Preprint at bioRxiv https://doi.org/10.1101/2023.08.18.553799 (2023).

  59. Vasyliuk, D. et al. Conformational landscape of the yeast SAGA complex as revealed by cryo-EM. Sci. Rep. 12, 12306 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  60. Sekne, Z., Ghanim, G. E., van Roon, A. M. & Nguyen, T. H. D. Structural basis of human telomerase recruitment by TPP1–POT1. Science 375, 1173–1176 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  61. Rice, G. et al. TomoTwin: generalized 3D localization of macromolecules in cryo-electron tomograms with structural data mining. Nat. Methods 20, 871–880 (2023).

  62. Tancik, M. et al. Fourier features let networks learn high frequency functions in low dimensional domains. In Advances in Neural Information Processing Systems 7537–7547 (NeurIPS, 2020).

  63. Bracewell, R. N. Strip integration in radio astronomy. Aust. J. Phys. 9, 198–217 (1956).

    Article  ADS  MathSciNet  Google Scholar 

  64. Moebel, E. et al. Deep learning improves macromolecule identification in 3D cellular cryo-electron tomograms. Nat. Methods 18, 1386–1394 (2021).

    Article  CAS  PubMed  Google Scholar 

  65. Luo, Z., Ni, F., Wang, Q. & Ma, J. OPUS-DSD: deep structural disentanglement for cryo-EM single-particle analysis. Nat. Methods 20, 1729–1738 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  66. Tegunov, D. & Cramer, P. Real-time cryo-electron microscopy data preprocessing with Warp. Nat. Methods 16, 1146–1152 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Zheng, S. et al. AreTomo: an integrated software package for automated marker-free, motion-corrected cryo-electron tomographic alignment and reconstruction. J. Struct. Biol. X 6, 100068 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. Burt, A., Gaifas, L., Dendooven, T. & Gutsche, I. A flexible framework for multi-particle refinement in cryo-electron tomography. PLoS Biol. 19, e3001319 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Hubert, L. & Arabie, P. Comparing partitions. J. Classif. 2, 193–218 (1985).

    Google Scholar 

  70. Afonine, P. V. et al. New tools for the analysis and validation of cryo-EM maps and atomic models. Acta Crystallogr. D Struct. Biol. 74, 814–840 (2018).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  71. Pettersen, E. F. et al. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci. 30, 70–82 (2021).

    Article  CAS  PubMed  Google Scholar 

  72. Goddard, T. D. et al. UCSF ChimeraX: meeting modern challenges in visualization and analysis. Protein Sci. 27, 14–25 (2018).

    Article  CAS  PubMed  Google Scholar 

  73. Petrov, A. S. et al. Secondary structures of rRNAs from all three domains of life. PLoS ONE 9, e88222 (2014).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank L. Kinman and E. Zhong for helpful discussion and the MIT-IBM Satori team and the MIT SuperCloud and Lincoln Laboratory Supercomputing Center for HPC computing resources and support. This work was supported by NIH grants R01-GM144542 (J.H.D.) and 5T32-GM007287 (B.M.P.), NSF-CAREER grant 2046778 (J.H.D.) and awards from the Sloan Foundation (J.H.D.) and the MIT Jameel Clinic (J.H.D.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the paper.

Author information

Authors and Affiliations

Authors

Contributions

B.M.P. and J.H.D. conceived the work. B.M.P. implemented the tomoDRGN method. B.M.P. and J.H.D. designed experiments. B.M.P. performed and analyzed experiments. B.M.P. and J.H.D. wrote the paper.

Corresponding authors

Correspondence to Barrett M. Powell or Joseph H. Davis.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Efficient model training on a weighted subset of pixels improves reconstruction quality and compute performance.

(a) Graphical overview of the dose filtering scheme (applied upstream of the decoder) and dose and tilt weighting scheme (applied during reconstruction error calculation) for a single representative tilt image. Filtering: the fixed optimal exposure curve is used to determine which spatial frequencies will be considered as a function of dose; the decoder processes only Fourier lattice coordinates within this mask (green lattice circle). Weighting: the squared error of the reconstructed Fourier slice is weighted per-frequency by the exposure-dependent amplitude attenuation curve and per-slice by the cosine of the corresponding stage tilt angle, before backpropagation of the mean squared error (red arrows). (b) Relative weight of each tilt image assigned to a particle’s reconstruction error during model training as a function of spatial frequencies (x-axis), and tilt and dose, which are colored yellow to blue from low-to-high dose and tilt angle, assuming a dose symmetric tilt scheme (Hagen, Wan et al. 2017). Note that dose-filtering is applied upstream of the illustrated reconstruction weights. (c) Map-map FSC of simulated class E large ribosomal subunit volumes (Davis, Tan et al. 2016) compared to tomoDRGN homogeneous network reconstructions in the presence or absence of the weighting or masking schemes at varying box and pixel sizes. (d) Spatial frequencies corresponding to FSC = 0.5 map-map correlation with the ground truth volume plotted against wall time for model training. (e) Final tomoDRGN reconstructed volumes (left and center) and ground truth volumes (right) in the presence or absence of the weighting or masking schemes at box and pixel sizes assessed in panels (c) and (d).

Source data

Extended Data Fig. 2 Random selection of tilts per epoch allows flexible and robust model training for datasets with non-uniform numbers of tilt-images per particle.

(a) Graphical summary of a dataset with non-uniform numbers of tilt images per particle. Here, the minimum number of tilt images for any particle is 3. (b) Corresponding tomoDRGN network architecture for random sampling and ordering of 3 tilt images per particle. (c) Mean per-class volumetric correlation coefficient for identical tomoDRGN models trained on 41 sequentially sampled tilts (top) or 41 randomly sampled tilts (bottom). At 5 epoch intervals, 25 random volumes were generated from each class for correlation coefficient calculation to ground truth ribosome assembly intermediate volumes (classes B-E). Error bars denote standard error of the mean CC. (d) Nine tomoDRGN models with identical architectures were trained with the indicated number of tilts sampled per particle (total available tilts = 41). PCA (left) and UMAP (right) dimensionality reduction of each final epoch’s latent embeddings. Once trained, up to 10 randomly sampled and permuted tilt images for one representative particle from each volume class were embedded using the corresponding pretrained tomoDRGN model and are superimposed as colored points. Note increased dispersion of colored points as number of tilts sampled during training decreased. (e) For each ribosomal large subunit class (B-E), 25 particles were randomly selected and up to 10 subsets of their tilt images were randomly sampled and permuted as in (d). In the heatmap, row indices refer to models trained in (d) using different numbers of sampled tilts (1-41), and columns denote epochs of training with that model. For each particle, each tilt subset was evaluated with the corresponding tomoDRGN model and the ratio of standard deviations of each particle’s 10 latent embeddings to all particles’ latent embeddings was calculated. The mean ratio across all particles, which measures the dispersion of encoder embeddings, is plotted per ribosomal LSU class. Here, lower dispersion indicates better performance. (f) Particles and tilt subsets were selected as in (e). At each indicated epoch of training, the corresponding tomoDRGN model was used to generate volumes for each particle’s tilt subsets. For each such volume, the correlation coefficient was calculated between that volume and the corresponding ground truth volume. The mean across all particles at each epoch for each model is shown as a heatmap per ribosomal LSU class. Here, higher CC indicates improved performance.

Source data

Extended Data Fig. 3 TomoDRGN and MAVEn identify structural variations within HIV Gag lattice.

(a) Mask used for MAVEn-based occupancy analysis of NC layer density (gray, translucent). PDB: 5L93 is shown for reference, with CA-NTD colored salmon, CA-CTD colored green, and CA-SP1 helix colored purple. (b) Histogram and kernel density estimate of NC layer occupancy across 500 volumes sampled from the trained tomoDRGN model, excluding junk particles (see Fig. 3g). (c) Representative volumes sampling along the NC occupancy histogram, colored as indicated in (b). Volumes are rendered at constant isosurface and same pose as in (a).

Source data

Extended Data Fig. 4 TomoDRGN identifies non-ribosomal particles picked from EMPIAR-10499 tomograms.

(a) Latent UMAP and corresponding sampled volumes from tomoDRGN heterogeneous network training from Fig. 5a. Eight representative non-ribosomal particles identified through manual inspection of k = 100 k-means clustering of latent space are rendered at a constant isosurface and pose. (b) Two tomograms are shown in slice view using Cube (https://github.com/dtegunov/cube) with locations of particles labeled as non-ribosomal annotated within each tomogram. (c) RELION3-based multiclass (k = 5) ab initio sub-tomogram volume generation using particles annotated as non-ribosomal via tomoDRGN (n = 1,310).

Source data

Extended Data Fig. 5 TomoDRGN visualizes structurally heterogeneous disomes.

(a) An EMPIAR-10499 tomogram reconstructed with tomoDRGN intermolecular volumes. Volumes were generated for each ribosome using the trained intermolecular tomoDRGN model, colored as in Fig. 6a, and positioned correspondingly in the source tomogram. Transparent ribosomes correspond to free 50S and 70S ribosomes as annotated in Fig. 6a. (b) The same tomogram as in panel (a) reconstructed with tomoDRGN intramolecular volumes. Volumes were generated for each ribosome using the trained intramolecular tomoDRGN model (Fig. 5d). Pairs of volumes that were colored as disomes or trisomes and that exhibited mutually overlapping main and adjacent monosomes when mapped back to the tomogram in panel (a) were combined in ChimeraX (n = 21 disomes). Disomes are colored by manual classification into three classes with representative volumes indicated with asterisks and shown in panels (c-e). (c) A representative tightly packed disome exhibiting continuous mRNA density between the two monosomes (n = 7 disomes). Density of each monosome fit by the indicated atomic model, excluding tRNA, mRNA, and elongation factors, has been removed using ChimeraX’s zone functionality (Inset). (d) A representative loosely packed disome exhibiting continuous mRNA density between the two monosomes (n = 9 disomes). Inset as in panel (c). (e) A representative ribosome pair with no apparent structural contact between the two monosomes (n = 5 disomes).

Extended Data Fig. 6 Comparison of tomoDRGN-generated volumes to traditional sub-tomogram averaged volumes.

Comparison of volumes generated by a full tomoDRGN network (row 1), an isolated decoder neural network (row 2), or traditional sub-tomogram averaging (row 3). A full tomoDRGN network was trained on the heterogeneous ribosomal particle stack (row 1, n = 20,981, see Figs. 5d and 6a) and representative volumes are depicted. Separate tomoDRGN homogeneous decoder networks were trained on one of three homogeneous substacks corresponding to (a) 70S particles (n = 20,129); (b) 50S particles (n = 852); or (c) SecDF-positive ribosomes (n = 380). Traditional STA was also performed on each of these three particles stacks.

Extended Data Fig. 7 CryoDRGN fails to consistently encode structural heterogeneity using a simulated tilt series dataset.

(a) Schematic of two cryoDRGN network architectures that were tested, and the tomoDRGN architecture used in Fig. 2c–e. Each model was trained using the same simulated dataset of ribosome large subunit assembly classes B-E (Davis, Tan et al. 2016) consisting of 41 tilt images for each of 5,000 particles for each of the four assembly states and thus the dataset was treated by cryoDRGN as n = 820,000 images (see Methods). (b) UMAP of final epoch latent embeddings of each particle image, with kernel density estimates independently estimated and plotted for each of the four ground truth assembly states. (c) UMAP of final epoch latent embedding with k = 4 k-means latent classification of the resulting latent space. KDEs were independently estimated and plotted for each of the four k-means classes. The predicted labels are annotated by both the k-means class index (0-3) and corresponding ground truth class label (B-E) of the central particle within each k-means class. (d) Confusion matrix of ground truth class labels versus k = 4 k-means latent classification. (e) Volumes sampled at the k = 4 k-means cluster centers illustrated in (c). Volumes are annotated by the k-means class index and ground truth class label and colored by the ground truth class label. (f) Violin plot of consistency of k = 4 k-means clustering of each model by Adjusted Rand Index (Hubert and Arabie 1985) (n = 100 randomly seeded initializations, higher values correspond to greater fidelity to ground truth classification).

Source data

Extended Data Fig. 8 CryoDRGN learns errant structural heterogeneity in an exemplar tomographic dataset.

Two cryoDRGN models (a, b) were trained on the unfiltered particle stack of Mycoplasma pneumoniae ribosomes from Fig. 5a (n = 22,291 particles, treated as n = 913,931 images). The latent space is shown as a KDE plot following UMAP dimensionality reduction, with k = 20 k-means class center particles annotated (left) and corresponding volumes visualized (right). Note that many putative 70S particles lack density in the particle core. A reference 70S volume sampled from tomoDRGN’s model in Fig. 5a is shown in the same pose for comparison.

Source data

Extended Data Fig. 9 CryoDRGN’s learned latent space embeddings exhibit undesirable correlations with tilt image index.

(a) Two cryoDRGN models were tested on the unfiltered particle stack of Mycoplasma pneumoniae ribosomes from Fig. 5a. The latent space is shown as a KDE plot following UMAP dimensionality reduction. The latent embeddings were binned by the tilt image index, and the median value across each bin is annotated. (b) KDEs from panel A replotted after binning by tilt image index quartiles. (c) KDEs from panel A with annotated positions corresponding to three representative particles evaluated using their 5th, 15th, 25th, or 35th tilt images. (d) Volumes generated from cryoDRGN using the latent embeddings highlighted in panel C.

Source data

Extended Data Fig. 10 Assessment of tomoDRGN sensitivity to pose accuracy.

(a) The unfiltered stack of EMPIAR-10499 ribosomes in situ from Fig. 5a was used to train a series of tomoDRGN decoder-only models with increasing levels of random perturbations from STA-derived, ‘ground truth’ rotation and translation poses (see Methods). The resulting map-map FSC curves against the STA ribosomal reconstruction are shown. (b) Final tomoDRGN decoder-only reconstructed volumes corresponding to the FSC curves shown in (a). Volumes are lowpass filtered to the resolution where their map-map FSC to the STA ribosomal reconstruction crossed 0.5. (c, d, e) UMAP of first 128 principal components of volume ensembles consisting of volumes generated for every particle, using tomoDRGN models trained on EMPIAR-10499 unfiltered ribosome stacks with indicated levels of pose perturbation. Particles annotated as 70S, 50S, and NR are colored as in Fig. 5c, with representative volumes of each class shown below. Note that NR particles are expected to be structurally diverse.

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1 and 2 and Tables 1–4.

Reporting Summary

Peer Review File

Supplementary Video 1

Continuous conformational heterogeneity recapitulated by tomoDRGN-generated volumes. Volumes corresponding to each purple circle in Fig. 2g were generated using the tomoDRGN model trained on yeast ATP synthase simulated data and are visualized sequentially down the pink-to-purple gradient from Fig. 2g in ChimeraX.

Supplementary Video 2

Structural heterogeneity in the large ribosomal subunit. Volumes were generated using the tomoDRGN model in Fig. 5d from sampling via k = 100 k-means clustering of latent space. Density for the 30S subunit was removed using the volume zone tool in ChimeraX, guided by atomic model PDB 7PHB, to reveal distinct conformation and compositional states of the LSU. Note conformational and compositional heterogeneity in tRNA- and elongation factor-binding sites, which are found along the midline of the particle.

Supplementary Video 3

Membrane-associated ribosomes exhibit diverse membrane-contact angles. Volumes were generated for all particles used to train the model in Fig. 6e. The tertile of volumes with highest SecDF occupancy are displayed, ordered by increasing occupancy (n = 162). Note substantial dynamics in the orientation of the membrane relative to the associated ribosome.

Supplementary Video 4

TomoDRGN-annotated disomes from tomogram 256. TomoDRGN-annotated disomes are as described and colored in Extended Data Fig. 5b. Disomes are aligned on the 3′ monosome.

Supplementary Data 1

Supporting data for Supplementary Fig. 1.

Supplementary Data 2

Supporting data for Supplementary Fig. 2.

Source data

Source Data Fig. 2

Numerical source data for Fig. 2.

Source Data Fig. 3

Numerical source data for Fig. 3.

Source Data Fig. 4

Numerical source data for Fig. 4.

Source Data Fig. 5

Numerical source data for Fig. 5.

Source Data Fig. 6

Numerical source data for Fig. 6.

Source Data Extended Data Fig. 1

Numerical source data for Extended Data Fig. 1.

Source Data Extended Data Fig. 2

Numerical source data for Extended Data Fig. 2.

Source Data Extended Data Fig. 3

Numerical source data for Extended Data Fig. 3.

Source Data Extended Data Fig. 4

Numerical source data for Extended Data Fig. 4.

Source Data Extended Data Fig. 7

Numerical source data for Extended Data Fig. 7.

Source Data Extended Data Fig. 8

Numerical source data for Extended Data Fig. 8.

Source Data Extended Data Fig. 9

Numerical source data for Extended Data Fig. 9.

Source Data Extended Data Fig. 10

Numerical source data for Extended Data Fig. 10.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Powell, B.M., Davis, J.H. Learning structural heterogeneity from cryo-electron sub-tomograms with tomoDRGN. Nat Methods (2024). https://doi.org/10.1038/s41592-024-02210-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41592-024-02210-z

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing