Abstract

Single-particle electron cryomicroscopy (cryo-EM) involves estimating a set of parameters for each particle image and reconstructing a 3D density map; robust algorithms with accurate parameter estimation are essential for high resolution and automation. We introduce a particle-filter algorithm for cryo-EM, which provides high-dimensional parameter estimation through a posterior probability density function (PDF) of the parameters given in the model and the experimental image. The framework uses a set of random support points to represent such a PDF and assigns weighting coefficients not only among the parameters of each particle but also among different particles. We implemented the algorithm in a new program named THUNDER, which features self-adaptive parameter adjustment, tolerance to bad particles, and per-particle defocus refinement. We tested the algorithm by using cryo-EM datasets for the cyclic-nucleotide-gated (CNG) channel, the proteasome, β-galactosidase, and an influenza hemagglutinin (HA) trimer, and observed substantial improvement in resolution.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Data availability

The entry codes of the datasets from EMPIAR are EMPIAR-10025, EMPIAR-10061, and EMPIAR-10097. The entry codes of the density maps from the EMDB are EMD-6656, EMD-2984, EMD-6287, and EMD-8731. The entry codes of the structure models from the Protein Data Bank are PDB 1PMA, PDB 3WHE, PDB 5H3O, and PDB 5A1A. The calculated density maps that support the findings of this study are available as Supplementary Data and from the corresponding author upon reasonable request.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Merk, A. et al. Breaking cryo-EM resolution barriers to facilitate drug discovery. Cell 165, 1698–1707 (2016).

  2. 2.

    Cheng, Y. Single-particle cryo-EM at crystallographic resolution. Cell 161, 450–457 (2015).

  3. 3.

    Cheng, Y., Grigorieff, N., Penczek, P. A. & Walz, T. A primer to single-particle cryo-electron microscopy. Cell 161, 438–449 (2015).

  4. 4.

    Sigworth, F. J. A maximum-likelihood approach to single-particle image refinement. J. Struct. Biol. 122, 328–339 (1998).

  5. 5.

    Scheres, S. H. W. RELION: implementation of a Bayesian approach to cryo-EM structure determination. J. Struct. Biol. 180, 519–530 (2012).

  6. 6.

    Scheres, S. H. A Bayesian view on cryo-EM structure determination. J. Mol. Biol. 415, 406–418 (2012).

  7. 7.

    Scheres, S. H. Classification of structural heterogeneity by maximum-likelihood methods. Methods Enzymol. 482, 295–320 (2010).

  8. 8.

    Henderson, R. Avoiding the pitfalls of single particle cryo-electron microscopy: Einstein from noise. Proc. Natl Acad. Sci. USA 110, 18037–18041 (2013).

  9. 9.

    Rosenthal, P. B. & Rubinstein, J. L. Validating maps from single particle electron cryomicroscopy. Curr. Opin. Struct. Biol. 34, 135–144 (2015).

  10. 10.

    Henderson, R. et al. Tilt-pair analysis of images from a range of different specimens in single-particle electron cryomicroscopy. J. Mol. Biol. 413, 1028–1046 (2011).

  11. 11.

    Stewart, A. & Grigorieff, N. Noise bias in the refinement of structures derived from single particles. Ultramicroscopy 102, 67–84 (2004).

  12. 12.

    Grigorieff, N. FREALIGN: high-resolution refinement of single particle structures. J. Struct. Biol. 157, 117–125 (2007).

  13. 13.

    Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017).

  14. 14.

    Arulampalam, M. S., Maskell, S., Gordon, N. & Clapp, T. A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Trans. Signal Process. 50, 174–188 (2002).

  15. 15.

    Douc, R. & Cappe, O. Comparison of resampling schemes for particle filtering. in Proc. Int. Symp. Image Signal Process Anal. 64–69 (IEEE, Piscataway, NJ, USA, 2005).

  16. 16.

    Gustafsson, F. et al. Particle filters for positioning, navigation, and tracking. IEEE Trans. Signal Process. 50, 425–437 (2002).

  17. 17.

    Li, M. et al. Structure of a eukaryotic cyclic-nucleotide-gated channel. Nature 542, 60–65 (2017).

  18. 18.

    Mindell, J. A. & Grigorieff, N. Accurate determination of local defocus and specimen tilt in electron microscopy. J. Struct. Biol. 142, 334–347 (2003).

  19. 19.

    Gong, X. et al. Structural insights into the Niemann-Pick C1 (NPC1)-mediated cholesterol transfer and Ebola infection. Cell 165, 1467–1478 (2016).

  20. 20.

    Campbell, M. G., Veesler, D., Cheng, A., Potter, C. S. & Carragher, B. 2.8 Å resolution reconstruction of the Thermoplasma acidophilum 20S proteasome using cryo-electron microscopy. eLife 4, e06380 (2015).

  21. 21.

    Bartesaghi, A. et al. 2.2 Å resolution cryo-EM structure of β-galactosidase in complex with a cell-permeant inhibitor. Science 348, 1147–1151 (2015).

  22. 22.

    Iudin, A., Korir, P. K., Salavert-Torres, J., Kleywegt, G. J. & Patwardhan, A. EMPIAR: a public archive for raw electron microscopy image data. Nat. Methods 13, 387–388 (2016).

  23. 23.

    Noble, A. J. et al. Routine single particle cryoEM sample and grid characterization by tomography. eLife 7, e34257 (2018).

  24. 24.

    Zheng, S. Q. et al. MotionCor2: anisotropic correction of beam-induced motion for improved cryo-electron microscopy. Nat. Methods 14, 331–332 (2017).

  25. 25.

    Brilot, A. F. et al. Beam-induced motion of vitrified specimen on holey carbon film. J. Struct. Biol. 177, 630–637 (2012).

  26. 26.

    Zhang, K. Gctf: real-time CTF determination and correction. J. Struct. Biol. 193, 1–12 (2016).

  27. 27.

    Tan, Y. Z. et al. Addressing preferred specimen orientation in single-particle cryo-EM through tilting. Nat. Methods 14, 793–796 (2017).

  28. 28.

    Kay, S. M., Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory (Prentice Hall, Upper Saddle River, NJ, USA, 1993).

  29. 29.

    Weinstein, E. & Weiss, A. J. A general-class of lower bounds in parameter-estimation. IEEE Trans. Inf. Theory 34, 338–342 (1988).

  30. 30.

    Xu, W., Baggeroer, A. B. & Richmond, C. D. Bayesian bounds for matched-field parameter estimation. IEEE Trans. Signal Process. 52, 3293–3305 (2004).

  31. 31.

    Scheres, S. H. Beam-induced motion correction for sub-megadalton cryo-EM particles. eLife 3, e03665 (2014).

Download references

Acknowledgements

This work was supported by funds from The National Key Research and Development Program (2016YFA0501102 and 2016YFA0501902 to X.L.), National Natural Science Foundation of China (31722015 and 31570730 to X.L., and 61672312 to G.Y.), Advanced Innovation Center for Structural Biology (to X.L., Y.S., and G.Y.), Tsinghua-Peking Joint Center for Life Sciences (to X.L.), One-Thousand Talent Program through the State Council of China (to X.L. and Y.S.), and Intel Parallel Computing Center project (to X.L.). We thank Y. Cheng (University of California San Francisco) for providing the T20S proteasome sample, and X. Zhou and C. Lin (Tsinghua University) for collecting the T20S proteasome data on CCD camera. We thank X. Lin (Intel) for help in optimizing the C++ code of THUNDER. We acknowledge the National Supercomputing Center in Wuxi and the Tsinghua University Branch of the China National Center for Protein Sciences Beijing for providing facility support in computation.

Author information

Author notes

  1. These authors contributed equally: Mingxu Hu, Hongkun Yu, Kai Gu.

Affiliations

  1. MOE Key Laboratory of Protein Science, School of Life Sciences, Tsinghua University, Beijing, China

    • Mingxu Hu
    • , Zhao Wang
    • , Huabin Ruan
    •  & Xueming Li
  2. Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing, China

    • Mingxu Hu
    • , Huabin Ruan
    •  & Xueming Li
  3. National Supercomputing Center in Wuxi, Wuxi, China

    • Mingxu Hu
    • , Hongkun Yu
    • , Zhao Wang
    • , Kunpeng Wang
    • , Siyuan Ren
    • , Bing Li
    • , Lin Gan
    • , Shizhen Xu
    •  & Guangwen Yang
  4. Department of Computer Science and Technology, Tsinghua University, Beijing, China

    • Hongkun Yu
    • , Zhao Wang
    • , Kunpeng Wang
    • , Siyuan Ren
    • , Bing Li
    • , Lin Gan
    • , Shizhen Xu
    •  & Guangwen Yang
  5. Department of Electronic Engineering, Tsinghua University, Beijing, China

    • Kai Gu
    •  & Yuan Shen
  6. Tsinghua-Peking Joint Center for Life Sciences, Beijing, China

    • Xueming Li

Authors

  1. Search for Mingxu Hu in:

  2. Search for Hongkun Yu in:

  3. Search for Kai Gu in:

  4. Search for Zhao Wang in:

  5. Search for Huabin Ruan in:

  6. Search for Kunpeng Wang in:

  7. Search for Siyuan Ren in:

  8. Search for Bing Li in:

  9. Search for Lin Gan in:

  10. Search for Shizhen Xu in:

  11. Search for Guangwen Yang in:

  12. Search for Yuan Shen in:

  13. Search for Xueming Li in:

Contributions

X.L., Y.S., and G.Y. conceived the project; Y.S., M.H., X.L., H.Y., and K.G. designed the algorithm; M.H. and H.Y. implemented the algorithm; M.H. and H.Y. designed and wrote the major part of the program with full functions of 3D reconstruction; K.W., S.R., B.L., L.G., and S.X. wrote part of the program; H.R. optimized and accelerated the CPU version; Z.W. wrote the GPU version; M.H. and X.L. performed the tests; X.L. wrote the manuscript; and all authors revised the manuscript.

Competing interests

The authors declare no competing interests.

Corresponding authors

Correspondence to Guangwen Yang or Yuan Shen or Xueming Li.

Integrated supplementary information

  1. Supplementary Figure 1 3D classification of the CNG dataset.

    a-d) are four classes calculated from 211,826 particles after one round of filtering by 2D classification in our previous work17. The side view (left) and top view (right) are shown. The model with C4 symmetry (PDB entry code: 5h3o) was docked in the maps. The uncovered models, pointed to by orange arrows, indicate the missing part of the subunit in the trans-membrane region of the CNG channel.

  2. Supplementary Figure 2 3D classifications by THUNDER.

    a) Density maps calculated from the dirty dataset of the proteasome. b) Density maps calculated from the dirty dataset of the β-galactosidase. Four classes were calculated with ellipsoid initial models. No symmetry was applied.

  3. Supplementary Figure 3 FSC curves of CNG reconstructions with various numbers of support points after the scanning phase (the first phase of the iteration).

    The tests were performed in the a) rotation subspace and b) x-y translation subspace. The nearly identical FSC curves with various numbers of support points implied that a small amount of support points can give accurate parameter estimation.

  4. Supplementary Figure 4 Support points on different likelihood functions.

    a) The support points distributed on a strong single peak of the LF which was calculated from a “good” particle image. b) The support points distributed on multiple strong peaks of the LF which was calculated from a “bad” particle image.

  5. Supplementary Figure 5 Representative side-chain maps of the β-galactosidase reconstruction with various options.

    Some aromatic rings in β-galactosidase showed clearer holes in the center than that in the published map (EMDB entry code: 2984) (Supplementary Fig. 5).

  6. Supplementary Figure 6 Comparison of the resolution and estimated tilting angle by different programs for the 40o-tilted dataset of the influenza HA trimer27.

    a) FSC curves corresponding to different method of local defocus determination. Two tests were performed to evaluate the defocus refinement by THUNDER. For the first one, CTFTilt18 was used to obtain the initial defocus estimation, and then THUNDER with defocus refinement was used to reprocess the dataset. The final resolution of the reconstruction was reported at 3.69 Å with most side-chains visible (Fig. 5b). The defocus refinement contributed a 0.9 Å resolution improvement (see the red dotted and solid lines). Another test was performed with the defocus value (estimated by GCTF26) from the downloaded parameter file (EMPIAR entry code: 10097), and reported 3.81 Å resolution before the defocus refinement. This result is better than the published one, but slightly worse than the one above, indicating less defocus accuracy of GCTF than that of the defocus refinement by THUNDER. Unexpectedly, the defocus refinement on the basis of the GCTF results did not obviously improve the resolution. Considering the defocus refinement is just a local refinement, a possible reason is that the initial defocus error of some particles is beyond the working range of the defocus refinement. b) Histogram of tilt angles of micrographs. To further validate the estimated defocus, we made statistics of the tilt angles of micrographs, measured by plane fitting of the per-particle defocus values from the same micrograph. The histograms corresponding to CTFTilt and the defocus refinement of THUNDER are pretty identical and closed to the expected 40o tilt. The one from GCTF (from the downloaded parameter file) has a different angle distribution from the other three, which might imply errors in measured defocus.

  7. Supplementary Figure 7 The defocus refinement at different resolution range.

    a) 3D reconstruction of the Thermoplasma acidophilum 20S proteasome calculated by THUDNER, and b) corresponding FSC curves before and after the defocus refinement. c, d and e) the histogram of the defocus statistics before and after the defocus refinement for the proteasome CCD dataset, CNG dataset and the high-resolution proteasome dataset from EMPIAR, respectively. To test the performance of the defocus refinement of THUNDER on low-quality and low-resolution dataset, we collected a dataset of proteasome on a Tecnai F20 200kV microscope with Gatan US4000 CCD camera. This dataset contains 42 untilted micrographs and 44 tilted micrographs (20o tilted). The initial defocus was measured by CTFFind3 for untilted micrographs and CTFTilt for tilted micrographs (the per-particle defocus was estimated according to the measured tilting angle and tilting axis from CTFTilt). Because the per-particle defocus values measured directly from the entire micrograph might be very coarse, it was expected to see more significant improvement after defocus refinement. The final reconstruction at 8.1 Å resolution were calculated from 21404 particles. Comparing the FSC curves before and after defocus refinement, there is just a small improvement after defocus refinement. A further test was then designed by adding ~ 500 Å defocus shift to the refined defocus of each particle and then examining whether the defocus refinement by THUNDER could recover them. c) The histograms of defocus distribution. The black curve is the distribution of the initial defocus, and blue curve is that after a shift of ~ 500 Å, and the red curve is that after the defocus refinement. The vertical dashed lines indicate the mean of defocus distributions. It can be seen that the defocus is partially recovered after defocus refinement. We also did the same tests for d) the CNG and e) the high-resolution proteasome dataset from EMPIAR. It is seen that the defocus can be recovered nearly perfectly. These results indicate that the defocus refinement by THUNDER can also work for the data collected by a CCD camera. But the performance of defocus refinement is limited by the low sensitivity of 3D reconstruction to the defocus errors at ~ 9 Å resolution level.

  8. Supplementary Figure 8 3D reconstruction of three datasets.

    The a) CNG density map, b) proteasome density map and c) β-galactosidase density map are shown. The left images are the whole density maps with the resolution value labeled on the bottom. The right are the representative secondary structures segmented from the maps deposited in EMDB (top) and from the maps on the far left solved by THUNDER (bottom).

  9. Supplementary Figure 9 Resolution comparisons of 3D reconstructions calculated by different programs.

    a) FSC curves of the clean dataset of the proteasome. b) FSC curves of the dirty dataset of the proteasome. Three options of THUNDER, the grading and the defocus refinement enabled (red solid lines), only the grading enabled (red dashed lines), and both the grading and the defocus refinement disabled (red dotted lines), were tested, and the corresponding results were compared with RELION 2.1 (blue dotted lines) and cryoSPARC 0.6.5 (turquoise dotted lines). The clean dataset has less bad particles then the dirty one. Therefore, the resolutions with and without the particle grading are much closed, and also similar to the results from RELION and cryoSPARC. The dirty dataset should contain more bad particles, as expected, the improvement by the particle grading is more obvious than that with the clean dataset. Meantime, the improvements relative to THUNDER and cryoSPARC are also larger than that with the clean dataset, which further demonstrate the advantage of the bad particle tolerance. For the image processing, the default settings of three programs were used, and D7 symmetry was applied during 3D reconstructions. THUNDER used an ellipsoid map as initial model, and RELION used a cylinder with the similar size of the proteasome as initial model. For cryoSPARC, all particles in each dataset were first classified to one 3D class to generate one initial model, and then “Homogenous Refinement” were performed with this initial model.

  10. Supplementary Figure 10 Scalability of the parallel computing of THUNDER.

    Three dirty datasets were used to measure the computing time of a) 3D refinement and b) 3D classification of the CNG dataset, c) 3D refinement and d) 3D classification of the proteasome dataset, e) 3D refinement and f) 3D classification of the β-galactosidase dataset. Each job was submitted to 8, 16, 32 and 64 computing nodes, respectively. The computing time versus the number of computing nodes shows nearly ideal linear trend (red curves). Each computing node has two E5–2680v3 CPU and 128GB DDR4 RAM. The current version of THUNDER needs to load the entire dataset into RAM to reduce the pressure on the storage I/O, which requires that the sum of the available RAM spaces from all computing nodes is larger than the size of the dataset. Due this limitation, the β-galactosidase job on 8 computing nodes failed, and hence was not tested. The data loading strategy will be changed in the next version of THUNDER to avoid this problem.

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figures 1–10, Supplementary Table 1 and Supplementary Notes 1 and 2

  2. Reporting Summary

  3. Supplementary Protocol

    Beginner’s guide of THUNDER

  4. Supplementary Data

    Final cropped maps using the dirty datasets for CNG, proteasome and βgalactosidase, and the HA trimer (EMPIAR-10097). The readme file shows the detailed information of the maps

  5. Supplementary Software

    THUNDER CPU and GPU versions

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41592-018-0223-8