Learning structural heterogeneity from cryo-electron sub-tomograms with tomoDRGN

Powell, Barrett M.; Davis, Joseph H.

doi:10.1038/s41592-024-02210-z

Article
Published: 08 March 2024

Learning structural heterogeneity from cryo-electron sub-tomograms with tomoDRGN

Nature Methods (2024)Cite this article

2938 Accesses
76 Altmetric
Metrics details

Subjects

Abstract

Cryo-electron tomography (cryo-ET) enables observation of macromolecular complexes in their native, spatially contextualized cellular environment. Cryo-ET processing software to visualize such complexes at nanometer resolution via iterative alignment and averaging are well developed but rely upon assumptions of structural homogeneity among the complexes of interest. Recently developed tools allow for some assessment of structural diversity but have limited capacity to represent highly heterogeneous structures, including those undergoing continuous conformational changes. Here we extend the highly expressive cryoDRGN (Deep Reconstructing Generative Networks) deep learning architecture, originally created for single-particle cryo-electron microscopy analysis, to cryo-ET. Our new tool, tomoDRGN, learns a continuous low-dimensional representation of structural heterogeneity in cryo-ET datasets while also learning to reconstruct heterogeneous structural ensembles supported by the underlying data. Using simulated and experimental data, we describe and benchmark architectural choices within tomoDRGN that are uniquely necessitated and enabled by cryo-ET. We additionally illustrate tomoDRGN’s efficacy in analyzing diverse datasets, using it to reveal high-level organization of human immunodeficiency virus (HIV) capsid complexes assembled in virus-like particles and to resolve extensive structural heterogeneity among ribosomes imaged in situ.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: A neural network architecture to analyze structurally heterogeneous particles imaged by cryo-ET.**

**Fig. 2: TomoDRGN recovers compositional and conformational heterogeneity in simulated datasets.**

**Fig. 3: TomoDRGN finds residual heterogeneity within primarily homogeneous purified particles.**

**Fig. 4: TomoDRGN resolves high-resolution features from sub-tomograms collected in situ.**

**Fig. 5: TomoDRGN uncovers structural heterogeneity in ribosomes imaged in situ.**

**Fig. 6: TomoDRGN captures intermolecular heterogeneity in situ.**

Uncovering structural ensembles from single-particle cryo-EM data using cryoDRGN

Article 14 November 2022

CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks

Article 04 February 2021

Deep learning improves macromolecule identification in 3D cellular cryo-electron tomograms

Article 21 October 2021

Data availability

Extracted particle sub-tomograms from reprocessing of EMPIAR-10499 have been deposited under EMPIAR-11843. Requisite EMDB volumes and PDB models to generate synthetic data using cryoSRPNT as described in Methods are deposited at https://zenodo.org/doi/10.5281/zenodo.10076628. The trained models, latent embeddings and particle classifications used to analyze all datasets presented have been deposited at https://zenodo.org/doi/10.5281/zenodo.10076628 for simulated datasets and at https://zenodo.org/doi/10.5281/zenodo.10093310 for experimental datasets. Maps corresponding to C₁ holoferritin and C₁ apoferritin from EMPIAR-10491 generated in M have been deposited under EMD-43285 and EMD-43286. The map of the SecDF-associated 70S ribosome from EMPIAR-10499 generated in RELION has been deposited under EMD-43287. Source data are provided with this paper.

Code availability

TomoDRGN source code, installation instructions and example usage are available at https://github.com/bpowell122/tomodrgn. Version 0.2.2 was used in this study. Scripts used to generate simulated data are available at https://github.com/bpowell122/cryoSRPNT. Version 0.1.0 was used in this study.

References

Bai, X. C., McMullan, G. & Scheres, S. H. How cryo-EM is revolutionizing structural biology. Trends Biochem. Sci. 40, 49–57 (2015).
Article CAS PubMed Google Scholar
Murata, K. & Wolf, M. Cryo-electron microscopy for structural analysis of dynamic biological macromolecules. Biochim. Biophys. Acta Gen. Subj. 1862, 324–334 (2018).
Article CAS PubMed Google Scholar
Cheng, Y., Grigorieff, N., Penczek, P. A. & Walz, T. A primer to single-particle cryo-electron microscopy. Cell 161, 438–449 (2015).
Article CAS PubMed PubMed Central Google Scholar
Zhong, E. D., Bepler, T., Berger, B. & Davis, J. H. CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks. Nat. Methods 18, 176–185 (2021).
Article CAS PubMed PubMed Central Google Scholar
Punjani, A. & Fleet, D. J. 3D variability analysis: resolving continuous flexibility and discrete heterogeneity from single particle cryo-EM. J. Struct. Biol. 213, 107702 (2021).
Article CAS PubMed Google Scholar
Chen, M. & Ludtke, S. J. Deep learning-based mixed-dimensional Gaussian mixture model for characterizing variability in cryo-EM. Nat. Methods 18, 930–936 (2021).
Article CAS PubMed PubMed Central Google Scholar
Dashti, A. et al. Retrieving functional pathways of biomolecules from single-particle snapshots. Nat. Commun. 11, 4734 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Kinman, L. F., Powell, B. M., Zhong, E. D., Berger, B. & Davis, J. H. Uncovering structural ensembles from single-particle cryo-EM data using cryoDRGN. Nat. Protoc. 18, 319–339 (2023).
CAS PubMed Google Scholar
Sun, J., Kinman, L. F., Jahagirdar, D., Ortega, J. & Davis, J. H. KsgA facilitates ribosomal small subunit maturation by proofreading a key structural lesion. Nat. Struct. Mol. Biol. 30, 1468–1480 (2023).
Asano, S., Engel, B. D. & Baumeister, W. In situ cryo-electron tomography: a post-reductionist approach to structural biology. J. Mol. Biol. 428, 332–343 (2016).
Article CAS PubMed Google Scholar
Lovatt, M., Leistner, C. & Frank, R. A. W. Bridging length scales from molecules to the whole organism by cryoCLEM and cryoET. Faraday Discuss. 240, 114–126 (2022).
Xue, L. et al. Visualizing translation dynamics at atomic detail inside a bacterial cell. Nature 610, 205–211 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Gemmer, M. et al. Visualization of translation and protein biogenesis at the ER membrane. Nature 614, 160–167 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Hoffmann, P. C. et al. Structures of the eukaryotic ribosome and its translational states in situ. Nat. Commun. 13, 7435 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhang, P. Advances in cryo-electron tomography and subtomogram averaging and classification. Curr. Opin. Struct. Biol. 58, 249–258 (2019).
Article CAS PubMed PubMed Central Google Scholar
Castano-Diez, D. & Zanetti, G. In situ structure determination by subtomogram averaging. Curr. Opin. Struct. Biol. 58, 68–75 (2019).
Article CAS PubMed PubMed Central Google Scholar
Bharat, T. A. & Scheres, S. H. Resolving macromolecular structures from electron cryo-tomography data using subtomogram averaging in RELION. Nat. Protoc. 11, 2054–2065 (2016).
Article CAS PubMed PubMed Central Google Scholar
Pyle, E. & Zanetti, G. Current data processing strategies for cryo-electron tomography and subtomogram averaging. Biochem. J. 478, 1827–1845 (2021).
Article CAS PubMed Google Scholar
Castano-Diez, D., Kudryashev, M., Arheit, M. & Stahlberg, H. Dynamo: a flexible, user-friendly development tool for subtomogram averaging of cryo-EM data in high-performance computing environments. J. Struct. Biol. 178, 139–151 (2012).
Article PubMed Google Scholar
Hrabe, T. et al. PyTom: a Python-based toolbox for localization of macromolecules in cryo-electron tomograms and subtomogram analysis. J. Struct. Biol. 178, 177–188 (2012).
Article CAS PubMed Google Scholar
Nickell, S. et al. TOM software toolbox: acquisition and analysis for electron tomography. J. Struct. Biol. 149, 227–234 (2005).
Article PubMed Google Scholar
Scheres, S. H. W., Melero, R., Valle, M. & Carazo, J. M. Averaging of electron subtomograms and random conical tilt reconstructions through likelihood optimization. Structure 17, 1563–1572 (2009).
Article CAS PubMed PubMed Central Google Scholar
Winkler, H. et al. Tomographic subvolume alignment and subvolume classification applied to myosin V and SIV envelope spikes. J. Struct. Biol. 165, 64–77 (2009).
Article CAS PubMed Google Scholar
Bartesaghi, A. et al. Classification and 3D averaging with missing wedge correction in biological electron tomography. J. Struct. Biol. 162, 436–450 (2008).
Article CAS PubMed PubMed Central Google Scholar
Walz, J. et al. Electron tomography of single ice-embedded macromolecules: three-dimensional alignment and classification. J. Struct. Biol. 120, 387–395 (1997).
Article CAS PubMed Google Scholar
Zivanov, J. et al. A Bayesian approach to single-particle electron cryo-tomography in RELION-4.0. eLife 11, e83724 (2022).
Article CAS PubMed PubMed Central Google Scholar
Tegunov, D., Xue, L., Dienemann, C., Cramer, P. & Mahamid, J. Multi-particle cryo-EM refinement with M visualizes ribosome–antibiotic complex at 3.5 Å in cells. Nat. Methods 18, 186–193 (2021).
Article CAS PubMed PubMed Central Google Scholar
Chen, M. et al. A complete data processing workflow for cryo-ET and subtomogram averaging. Nat. Methods 16, 1161–1168 (2019).
Article CAS PubMed PubMed Central Google Scholar
Himes, B. A. & Zhang, P. emClarity: software for high-resolution cryo-electron tomography and subtomogram averaging. Nat. Methods 15, 955–961 (2018).
Article CAS PubMed PubMed Central Google Scholar
Jiang, W. et al. A transformation clustering algorithm and its application in polyribosomes structural profiling. Nucleic Acids Res. 50, 9001–9011 (2022).
Article CAS PubMed PubMed Central Google Scholar
Cheng, J., Wu, C., Li, J., Yang, Q. & Zhang, X. Visualizing translating dynamics in situ at high spatial and temporal resolution in eukaryotic cells. Preprint at bioRxiv https://doi.org/10.1101/2023.07.12.548775 (2023).
Fedry, J. et al. Visualization of translation reorganization upon persistent collision stress in mammalian cells. Preprint at bioRxiv https://doi.org/10.1101/2023.03.23.533914 (2023).
Harastani, M., Eltsov, M., Leforestier, A. & Jonic, S. TomoFlow: analysis of continuous conformational variability of macromolecules in cryogenic subtomograms based on 3D dense optical flow. J. Mol. Biol. 434, 167381 (2022).
Article CAS PubMed Google Scholar
Harastani, M., Eltsov, M., Leforestier, A. & Jonic, S. HEMNMA-3D: cryo electron tomography method based on normal mode analysis to study continuous conformational variability of macromolecular complexes. Front. Mol. Biosci. 8, 663121 (2021).
Article PubMed PubMed Central Google Scholar
Stolken, M. et al. Maximum likelihood based classification of electron tomographic data. J. Struct. Biol. 173, 77–85 (2011).
Article PubMed Google Scholar
Bartesaghi, A., Lecumberry, F., Sapiro, G. & Subramaniam, S. Protein secondary structure determination by constrained single-particle cryo-electron tomography. Structure 20, 2003–2013 (2012).
Article CAS PubMed PubMed Central Google Scholar
Balyschew, N. et al. Streamlined structure determination by cryo-electron tomography and subtomogram averaging using TomoBEAR. Nat. Commun. 14, 6543 (2023).
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. Preprint at arxiv.org/abs/1312.6114 (2013).
Zhong, E. D., Bepler, T., Davis, J. H. & Berger, B. Reconstructing continuous distributions of 3D protein structure from cryo-EM images. Preprint at arxiv.org/abs/1909.05215 (2019).
Bepler, T., Zhong, E., Kelley, K., Brignole, E. & Berger, B. Explicitly disentangling image content from translation and rotation with spatial-VAE. In Advances in Neural Information Processing Systems (NeurIPS, 2019).
Higgins, I. et al. β-VAE: learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations (ICLR, 2016).
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2018).
Grant, T. & Grigorieff, N. Measuring the optimal exposure for single particle cryo-EM using a 2.6 Å reconstruction of rotavirus VP6. eLife 4, e06980 (2015).
Article PubMed PubMed Central Google Scholar
Bharat, T. A. M., Russo, C. J., Lowe, J., Passmore, L. A. & Scheres, S. H. W. Advances in single-particle electron cryomicroscopy structure determination applied to sub-tomogram averaging. Structure 23, 1743–1753 (2015).
Article CAS PubMed PubMed Central Google Scholar
Hayward, S. B. & Glaeser, R. M. Radiation damage of purple membrane at low temperature. Ultramicroscopy 4, 201–210 (1979).
Article CAS Google Scholar
Glaeser, R. M. Prospects for extending the resolution limit of the electron microscope. J. Microsc. 117, 77–91 (1979).
Article CAS PubMed Google Scholar
Baxter, W. T., Grassucci, R. A., Gao, H. & Frank, J. Determination of signal-to-noise ratios and spectral SNRs in cryo-EM low-dose imaging of molecules. J. Struct. Biol. 166, 126–132 (2009).
Article CAS PubMed PubMed Central Google Scholar
Davis, J. H. et al. Modular assembly of the bacterial large ribosomal subunit. Cell 167, 1610–1622 (2016).
Article CAS PubMed PubMed Central Google Scholar
Davis, J. H. & Williamson, J. R. Structure and dynamics of bacterial ribosome biogenesis. Philos. Trans. R. Soc. Lond. B Biol. Sci. 372, 20160181 (2017).
Article PubMed PubMed Central Google Scholar
Guo, H. & Rubinstein, J. L. Structure of ATP synthase under strain during catalysis. Nat. Commun. 13, 2232 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Schur, F. K. et al. An atomic model of HIV-1 capsid–SP1 reveals structures regulating assembly and maturation. Science 353, 506–508 (2016).
Article ADS CAS PubMed Google Scholar
Mendonca, L. et al. CryoET structures of immature HIV Gag reveal six-helix bundle. Commun. Biol. 4, 481 (2021).
Article CAS PubMed PubMed Central Google Scholar
Stojkovic, V. et al. Assessment of the nucleotide modifications in the high-resolution cryo-electron microscopy structure of the Escherichia coli 50S subunit. Nucleic Acids Res. 48, 2723–2732 (2020).
Article CAS PubMed PubMed Central Google Scholar
Fromm, S. A. et al. The translating bacterial ribosome at 1.55 Å resolution generated by cryo-EM imaging services. Nat. Commun. 14, 1095 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Chen, S. S., Sperling, E., Silverman, J. M., Davis, J. H. & Williamson, J. R. Measuring the dynamics of E. coli ribosome biogenesis using pulse-labeling and quantitative mass spectrometry. Mol. Biosyst. 8, 3325–3334 (2012).
Article CAS PubMed PubMed Central Google Scholar
Turk, M. & Baumeister, W. The promise and the challenges of cryo-electron tomography. FEBS Lett. 594, 3243–3261 (2020).
Article CAS PubMed Google Scholar
Saito, K. et al. Ribosome collisions induce mRNA cleavage and ribosome rescue in bacteria. Nature 603, 503–508 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Rangan, R. et al. Deep reconstructing generative networks for visualizing dynamic biomolecules inside cells. Preprint at bioRxiv https://doi.org/10.1101/2023.08.18.553799 (2023).
Vasyliuk, D. et al. Conformational landscape of the yeast SAGA complex as revealed by cryo-EM. Sci. Rep. 12, 12306 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Sekne, Z., Ghanim, G. E., van Roon, A. M. & Nguyen, T. H. D. Structural basis of human telomerase recruitment by TPP1–POT1. Science 375, 1173–1176 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Rice, G. et al. TomoTwin: generalized 3D localization of macromolecules in cryo-electron tomograms with structural data mining. Nat. Methods 20, 871–880 (2023).
Tancik, M. et al. Fourier features let networks learn high frequency functions in low dimensional domains. In Advances in Neural Information Processing Systems 7537–7547 (NeurIPS, 2020).
Bracewell, R. N. Strip integration in radio astronomy. Aust. J. Phys. 9, 198–217 (1956).
Article ADS MathSciNet Google Scholar
Moebel, E. et al. Deep learning improves macromolecule identification in 3D cellular cryo-electron tomograms. Nat. Methods 18, 1386–1394 (2021).
Article CAS PubMed Google Scholar
Luo, Z., Ni, F., Wang, Q. & Ma, J. OPUS-DSD: deep structural disentanglement for cryo-EM single-particle analysis. Nat. Methods 20, 1729–1738 (2023).
Article PubMed PubMed Central Google Scholar
Tegunov, D. & Cramer, P. Real-time cryo-electron microscopy data preprocessing with Warp. Nat. Methods 16, 1146–1152 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zheng, S. et al. AreTomo: an integrated software package for automated marker-free, motion-corrected cryo-electron tomographic alignment and reconstruction. J. Struct. Biol. X 6, 100068 (2022).
CAS PubMed PubMed Central Google Scholar
Burt, A., Gaifas, L., Dendooven, T. & Gutsche, I. A flexible framework for multi-particle refinement in cryo-electron tomography. PLoS Biol. 19, e3001319 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hubert, L. & Arabie, P. Comparing partitions. J. Classif. 2, 193–218 (1985).
Google Scholar
Afonine, P. V. et al. New tools for the analysis and validation of cryo-EM maps and atomic models. Acta Crystallogr. D Struct. Biol. 74, 814–840 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Pettersen, E. F. et al. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci. 30, 70–82 (2021).
Article CAS PubMed Google Scholar
Goddard, T. D. et al. UCSF ChimeraX: meeting modern challenges in visualization and analysis. Protein Sci. 27, 14–25 (2018).
Article CAS PubMed Google Scholar
Petrov, A. S. et al. Secondary structures of rRNAs from all three domains of life. PLoS ONE 9, e88222 (2014).
Article ADS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank L. Kinman and E. Zhong for helpful discussion and the MIT-IBM Satori team and the MIT SuperCloud and Lincoln Laboratory Supercomputing Center for HPC computing resources and support. This work was supported by NIH grants R01-GM144542 (J.H.D.) and 5T32-GM007287 (B.M.P.), NSF-CAREER grant 2046778 (J.H.D.) and awards from the Sloan Foundation (J.H.D.) and the MIT Jameel Clinic (J.H.D.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the paper.

Author information

Authors and Affiliations

Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
Barrett M. Powell & Joseph H. Davis
Program in Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
Joseph H. Davis

Authors

Barrett M. Powell
View author publications
You can also search for this author in PubMed Google Scholar
Joseph H. Davis
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.M.P. and J.H.D. conceived the work. B.M.P. implemented the tomoDRGN method. B.M.P. and J.H.D. designed experiments. B.M.P. performed and analyzed experiments. B.M.P. and J.H.D. wrote the paper.

Corresponding authors

Correspondence to Barrett M. Powell or Joseph H. Davis.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Efficient model training on a weighted subset of pixels improves reconstruction quality and compute performance.

(a) Graphical overview of the dose filtering scheme (applied upstream of the decoder) and dose and tilt weighting scheme (applied during reconstruction error calculation) for a single representative tilt image. Filtering: the fixed optimal exposure curve is used to determine which spatial frequencies will be considered as a function of dose; the decoder processes only Fourier lattice coordinates within this mask (green lattice circle). Weighting: the squared error of the reconstructed Fourier slice is weighted per-frequency by the exposure-dependent amplitude attenuation curve and per-slice by the cosine of the corresponding stage tilt angle, before backpropagation of the mean squared error (red arrows). (b) Relative weight of each tilt image assigned to a particle’s reconstruction error during model training as a function of spatial frequencies (x-axis), and tilt and dose, which are colored yellow to blue from low-to-high dose and tilt angle, assuming a dose symmetric tilt scheme (Hagen, Wan et al. 2017). Note that dose-filtering is applied upstream of the illustrated reconstruction weights. (c) Map-map FSC of simulated class E large ribosomal subunit volumes (Davis, Tan et al. 2016) compared to tomoDRGN homogeneous network reconstructions in the presence or absence of the weighting or masking schemes at varying box and pixel sizes. (d) Spatial frequencies corresponding to FSC = 0.5 map-map correlation with the ground truth volume plotted against wall time for model training. (e) Final tomoDRGN reconstructed volumes (left and center) and ground truth volumes (right) in the presence or absence of the weighting or masking schemes at box and pixel sizes assessed in panels (c) and (d).

Source data

Extended Data Fig. 2 Random selection of tilts per epoch allows flexible and robust model training for datasets with non-uniform numbers of tilt-images per particle.

(a) Graphical summary of a dataset with non-uniform numbers of tilt images per particle. Here, the minimum number of tilt images for any particle is 3. (b) Corresponding tomoDRGN network architecture for random sampling and ordering of 3 tilt images per particle. (c) Mean per-class volumetric correlation coefficient for identical tomoDRGN models trained on 41 sequentially sampled tilts (top) or 41 randomly sampled tilts (bottom). At 5 epoch intervals, 25 random volumes were generated from each class for correlation coefficient calculation to ground truth ribosome assembly intermediate volumes (classes B-E). Error bars denote standard error of the mean CC. (d) Nine tomoDRGN models with identical architectures were trained with the indicated number of tilts sampled per particle (total available tilts = 41). PCA (left) and UMAP (right) dimensionality reduction of each final epoch’s latent embeddings. Once trained, up to 10 randomly sampled and permuted tilt images for one representative particle from each volume class were embedded using the corresponding pretrained tomoDRGN model and are superimposed as colored points. Note increased dispersion of colored points as number of tilts sampled during training decreased. (e) For each ribosomal large subunit class (B-E), 25 particles were randomly selected and up to 10 subsets of their tilt images were randomly sampled and permuted as in (d). In the heatmap, row indices refer to models trained in (d) using different numbers of sampled tilts (1-41), and columns denote epochs of training with that model. For each particle, each tilt subset was evaluated with the corresponding tomoDRGN model and the ratio of standard deviations of each particle’s 10 latent embeddings to all particles’ latent embeddings was calculated. The mean ratio across all particles, which measures the dispersion of encoder embeddings, is plotted per ribosomal LSU class. Here, lower dispersion indicates better performance. (f) Particles and tilt subsets were selected as in (e). At each indicated epoch of training, the corresponding tomoDRGN model was used to generate volumes for each particle’s tilt subsets. For each such volume, the correlation coefficient was calculated between that volume and the corresponding ground truth volume. The mean across all particles at each epoch for each model is shown as a heatmap per ribosomal LSU class. Here, higher CC indicates improved performance.

Source data

Extended Data Fig. 3 TomoDRGN and MAVEn identify structural variations within HIV Gag lattice.

(a) Mask used for MAVEn-based occupancy analysis of NC layer density (gray, translucent). PDB: 5L93 is shown for reference, with CA-NTD colored salmon, CA-CTD colored green, and CA-SP1 helix colored purple. (b) Histogram and kernel density estimate of NC layer occupancy across 500 volumes sampled from the trained tomoDRGN model, excluding junk particles (see Fig. 3g). (c) Representative volumes sampling along the NC occupancy histogram, colored as indicated in (b). Volumes are rendered at constant isosurface and same pose as in (a).

Source data

Extended Data Fig. 4 TomoDRGN identifies non-ribosomal particles picked from EMPIAR-10499 tomograms.

(a) Latent UMAP and corresponding sampled volumes from tomoDRGN heterogeneous network training from Fig. 5a. Eight representative non-ribosomal particles identified through manual inspection of k = 100 k-means clustering of latent space are rendered at a constant isosurface and pose. (b) Two tomograms are shown in slice view using Cube (https://github.com/dtegunov/cube) with locations of particles labeled as non-ribosomal annotated within each tomogram. (c) RELION3-based multiclass (k = 5) ab initio sub-tomogram volume generation using particles annotated as non-ribosomal via tomoDRGN (n = 1,310).

Source data

Extended Data Fig. 5 TomoDRGN visualizes structurally heterogeneous disomes.

(a) An EMPIAR-10499 tomogram reconstructed with tomoDRGN intermolecular volumes. Volumes were generated for each ribosome using the trained intermolecular tomoDRGN model, colored as in Fig. 6a, and positioned correspondingly in the source tomogram. Transparent ribosomes correspond to free 50S and 70S ribosomes as annotated in Fig. 6a. (b) The same tomogram as in panel (a) reconstructed with tomoDRGN intramolecular volumes. Volumes were generated for each ribosome using the trained intramolecular tomoDRGN model (Fig. 5d). Pairs of volumes that were colored as disomes or trisomes and that exhibited mutually overlapping main and adjacent monosomes when mapped back to the tomogram in panel (a) were combined in ChimeraX (n = 21 disomes). Disomes are colored by manual classification into three classes with representative volumes indicated with asterisks and shown in panels (c-e). (c) A representative tightly packed disome exhibiting continuous mRNA density between the two monosomes (n = 7 disomes). Density of each monosome fit by the indicated atomic model, excluding tRNA, mRNA, and elongation factors, has been removed using ChimeraX’s zone functionality (Inset). (d) A representative loosely packed disome exhibiting continuous mRNA density between the two monosomes (n = 9 disomes). Inset as in panel (c). (e) A representative ribosome pair with no apparent structural contact between the two monosomes (n = 5 disomes).

Extended Data Fig. 6 Comparison of tomoDRGN-generated volumes to traditional sub-tomogram averaged volumes.

Comparison of volumes generated by a full tomoDRGN network (row 1), an isolated decoder neural network (row 2), or traditional sub-tomogram averaging (row 3). A full tomoDRGN network was trained on the heterogeneous ribosomal particle stack (row 1, n = 20,981, see Figs. 5d and 6a) and representative volumes are depicted. Separate tomoDRGN homogeneous decoder networks were trained on one of three homogeneous substacks corresponding to (a) 70S particles (n = 20,129); (b) 50S particles (n = 852); or (c) SecDF-positive ribosomes (n = 380). Traditional STA was also performed on each of these three particles stacks.

Extended Data Fig. 7 CryoDRGN fails to consistently encode structural heterogeneity using a simulated tilt series dataset.

(a) Schematic of two cryoDRGN network architectures that were tested, and the tomoDRGN architecture used in Fig. 2c–e. Each model was trained using the same simulated dataset of ribosome large subunit assembly classes B-E (Davis, Tan et al. 2016) consisting of 41 tilt images for each of 5,000 particles for each of the four assembly states and thus the dataset was treated by cryoDRGN as n = 820,000 images (see Methods). (b) UMAP of final epoch latent embeddings of each particle image, with kernel density estimates independently estimated and plotted for each of the four ground truth assembly states. (c) UMAP of final epoch latent embedding with k = 4 k-means latent classification of the resulting latent space. KDEs were independently estimated and plotted for each of the four k-means classes. The predicted labels are annotated by both the k-means class index (0-3) and corresponding ground truth class label (B-E) of the central particle within each k-means class. (d) Confusion matrix of ground truth class labels versus k = 4 k-means latent classification. (e) Volumes sampled at the k = 4 k-means cluster centers illustrated in (c). Volumes are annotated by the k-means class index and ground truth class label and colored by the ground truth class label. (f) Violin plot of consistency of k = 4 k-means clustering of each model by Adjusted Rand Index (Hubert and Arabie 1985) (n = 100 randomly seeded initializations, higher values correspond to greater fidelity to ground truth classification).

Source data

Extended Data Fig. 8 CryoDRGN learns errant structural heterogeneity in an exemplar tomographic dataset.

Two cryoDRGN models (a, b) were trained on the unfiltered particle stack of Mycoplasma pneumoniae ribosomes from Fig. 5a (n = 22,291 particles, treated as n = 913,931 images). The latent space is shown as a KDE plot following UMAP dimensionality reduction, with k = 20 k-means class center particles annotated (left) and corresponding volumes visualized (right). Note that many putative 70S particles lack density in the particle core. A reference 70S volume sampled from tomoDRGN’s model in Fig. 5a is shown in the same pose for comparison.

Source data

Extended Data Fig. 9 CryoDRGN’s learned latent space embeddings exhibit undesirable correlations with tilt image index.

(a) Two cryoDRGN models were tested on the unfiltered particle stack of Mycoplasma pneumoniae ribosomes from Fig. 5a. The latent space is shown as a KDE plot following UMAP dimensionality reduction. The latent embeddings were binned by the tilt image index, and the median value across each bin is annotated. (b) KDEs from panel A replotted after binning by tilt image index quartiles. (c) KDEs from panel A with annotated positions corresponding to three representative particles evaluated using their 5^th, 15^th, 25^th, or 35^th tilt images. (d) Volumes generated from cryoDRGN using the latent embeddings highlighted in panel C.

Source data

Extended Data Fig. 10 Assessment of tomoDRGN sensitivity to pose accuracy.

(a) The unfiltered stack of EMPIAR-10499 ribosomes in situ from Fig. 5a was used to train a series of tomoDRGN decoder-only models with increasing levels of random perturbations from STA-derived, ‘ground truth’ rotation and translation poses (see Methods). The resulting map-map FSC curves against the STA ribosomal reconstruction are shown. (b) Final tomoDRGN decoder-only reconstructed volumes corresponding to the FSC curves shown in (a). Volumes are lowpass filtered to the resolution where their map-map FSC to the STA ribosomal reconstruction crossed 0.5. (c, d, e) UMAP of first 128 principal components of volume ensembles consisting of volumes generated for every particle, using tomoDRGN models trained on EMPIAR-10499 unfiltered ribosome stacks with indicated levels of pose perturbation. Particles annotated as 70S, 50S, and NR are colored as in Fig. 5c, with representative volumes of each class shown below. Note that NR particles are expected to be structurally diverse.

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1 and 2 and Tables 1–4.

Reporting Summary

Peer Review File

Supplementary Video 1

Continuous conformational heterogeneity recapitulated by tomoDRGN-generated volumes. Volumes corresponding to each purple circle in Fig. 2g were generated using the tomoDRGN model trained on yeast ATP synthase simulated data and are visualized sequentially down the pink-to-purple gradient from Fig. 2g in ChimeraX.

Supplementary Video 2

Structural heterogeneity in the large ribosomal subunit. Volumes were generated using the tomoDRGN model in Fig. 5d from sampling via k = 100 k-means clustering of latent space. Density for the 30S subunit was removed using the volume zone tool in ChimeraX, guided by atomic model PDB 7PHB, to reveal distinct conformation and compositional states of the LSU. Note conformational and compositional heterogeneity in tRNA- and elongation factor-binding sites, which are found along the midline of the particle.

Supplementary Video 3

Membrane-associated ribosomes exhibit diverse membrane-contact angles. Volumes were generated for all particles used to train the model in Fig. 6e. The tertile of volumes with highest SecDF occupancy are displayed, ordered by increasing occupancy (n = 162). Note substantial dynamics in the orientation of the membrane relative to the associated ribosome.

Supplementary Video 4

TomoDRGN-annotated disomes from tomogram 256. TomoDRGN-annotated disomes are as described and colored in Extended Data Fig. 5b. Disomes are aligned on the 3′ monosome.

Supplementary Data 1

Supporting data for Supplementary Fig. 1.

Supplementary Data 2

Supporting data for Supplementary Fig. 2.

Source data

Source Data Fig. 2

Numerical source data for Fig. 2.

Source Data Fig. 3

Numerical source data for Fig. 3.

Source Data Fig. 4

Numerical source data for Fig. 4.

Source Data Fig. 5

Numerical source data for Fig. 5.

Source Data Fig. 6

Numerical source data for Fig. 6.

Source Data Extended Data Fig. 1

Numerical source data for Extended Data Fig. 1.

Source Data Extended Data Fig. 2

Numerical source data for Extended Data Fig. 2.

Source Data Extended Data Fig. 3

Numerical source data for Extended Data Fig. 3.

Source Data Extended Data Fig. 4

Numerical source data for Extended Data Fig. 4.

Source Data Extended Data Fig. 7

Numerical source data for Extended Data Fig. 7.

Source Data Extended Data Fig. 8

Numerical source data for Extended Data Fig. 8.

Source Data Extended Data Fig. 9

Numerical source data for Extended Data Fig. 9.

Source Data Extended Data Fig. 10

Numerical source data for Extended Data Fig. 10.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Powell, B.M., Davis, J.H. Learning structural heterogeneity from cryo-electron sub-tomograms with tomoDRGN. Nat Methods (2024). https://doi.org/10.1038/s41592-024-02210-z

Download citation

Received: 31 May 2023
Accepted: 13 February 2024
Published: 08 March 2024
DOI: https://doi.org/10.1038/s41592-024-02210-z

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links