Estimating conformational landscapes from Cryo-EM particles by 3D Zernike polynomials

Herreros, D.; Lederman, R. R.; Krieger, J. M.; Jiménez-Moreno, A.; Martínez, M.; Myška, D.; Strelak, D.; Filipovic, J.; Sorzano, C. O. S.; Carazo, J. M.

doi:10.1038/s41467-023-35791-y

Download PDF

Article
Open access
Published: 11 January 2023

Estimating conformational landscapes from Cryo-EM particles by 3D Zernike polynomials

Nature Communications volume 14, Article number: 154 (2023) Cite this article

5467 Accesses
20 Citations
4 Altmetric
Metrics details

Subjects

Abstract

The new developments in Cryo-EM Single Particle Analysis are helping us to understand how the macromolecular structure and function meet to drive biological processes. By capturing many states at the particle level, it is possible to address how macromolecules explore different conformations, information that is classically extracted through 3D classification. However, the limitations of classical approaches prevent us from fully understanding the complete conformational landscape due to the reduced number of discrete states accurately reconstructed. To characterize the whole structural spectrum of a macromolecule, we propose an extension of our Zernike3D approach, able to extract per-image continuous flexibility information directly from a particle dataset. Also, our method can be seamlessly applied to images, maps or atomic models, opening integrative possibilities. Furthermore, we introduce the ZART reconstruction algorithm, which considers the Zernike3D deformation fields to revert particle conformational changes during the reconstruction process, thus minimizing the blurring induced by molecular motions.

A Bayesian approach to extracting free-energy profiles from cryo-electron microscopy experiments

Article Open access 01 July 2021

3DFlex: determining structure and motion of flexible proteins from cryo-EM

Article Open access 11 May 2023

Improving resolution and resolvability of single-particle cryoEM structures using Gaussian mixture models

Article 16 November 2023

Introduction

Cryo-electron microscopy (Cryo-EM) single particle analysis (SPA)¹ has proven to be a powerful technique to understand the structure of macromolecules. By capturing individual images of the specimen in different poses, it is not only possible to reconstruct the average macromolecular conformation of the specimen under study, but it also brings to light the challenging problem of identifying several conformational states from the acquired dataset.

Generally, compositional heterogeneity, as well as flexibility, have been addressed through 3D classification². This approach allows reconstructing a given number of different states from the particle images based on the assumption that there is a defined number of discrete conformational states being explored by the specimen. This methodology has been very successful in the study of many systems, being recently expanded to increase the number of states being resolved³.

However, the explicit modeling assumption of the existence of discrete motions has obvious limitations in most experimental cases, depending on the actual biological system under study. Clearly, removing this constraint is methodologically very challenging, although the pay-offs are clear, both in terms of obtaining richer conformational landscapes than currently done, and in providing improved algorithmic stability and objectivity, removing many assumptions and trial and error tests.

Limitations faced with discrete flexibility can only be solved at the image processing level by a paradigm change introducing methods able to handle continuous flexibility: the ability to extract macromolecular conformational information at the individual particle level to get a sufficiently rich and populated landscape of molecular states. Several approaches have been previously proposed to face continuous flexibility, each from a different perspective^4,5,6,7,8,9.

In this work, we extend our recent Zernike3D algorithm¹⁰ (specifically designed to deal with continuous heterogeneity) to precisely accomplish the latter task starting from Cryo-EM images with some unique properties, such as (1) the possibility to work with images, maps, and atomic models in the same space, (2) a clear mathematical design that intrinsically helps avoiding over-deformations in projection directions, and (3) a reconstruction algorithm (that we name ZART - Zernike3D-based Algebraic Reconstruction Technique) that takes into account individual particle conformational information, reverts the structural changes, and obtains a new map in which flexible regions have intrinsically increased resolution. Note that property (1) indicates that one can work at the level of structural models, avoiding multiple fitting steps, property (2) drastically reduces flexibility estimation errors that would be very difficult to consider in other mathematical frameworks, and property (3) makes it possible to explore states with a small number of classes while still reconstructing maps with large datasets, though at improved resolution since motion blurring is substantially reduced.

We note that the full derivation of ZART is rather technical, so we present in this work its main properties in the context of continuous flexibility, while the derivation of the algorithm in itself is presented as a separate technical work.

Results

Conformational landscape of EMPIAR-10028 dataset

The following experiment is aimed at assessing the capacity of the Zernike3D algorithm to identify conformational variability on real Cryo-EM data. To that end, we analyzed the EMPIAR-10028 dataset¹¹ corresponding to the P. falciparum 80 S ribosome bound to emetine. This dataset has been extensively studied by other methods^5,6, becoming a popular validation dataset for continuous heterogeneity algorithms.

In this work we have reprocessed that dataset inside Scipion¹², leading to a total of 50,000 particles. The workflow followed included several cleaning steps to reduce as much as possible the number of unwanted particles, followed by some consensus protocols to compare the parameters estimated by different algorithms (angular assignation, shifts, Contrast Transfer Function…) and keep only the particles consistently estimated.

The previous particles were subjected to the Zernike3D analysis, translating them to a set of Zernike3D coefficients. The maximum basis degrees were set to N = 3 and L = 2 for the estimation of the deformation fields. In addition, the particles were downsampled to a box size of 125 voxels to increase the performance of the algorithm. Apart from the Zernike3D analysis, the particles were not subjected to other heterogeneity workflows such as classical 3D classification.

The resulting UMAP (Uniform Manifold Approximation and Projection)¹³ representation of the Zernike3D coefficient space is shown in Fig. 1. As it can be seen from the representation, the Zernike3D coefficient space leads to an informative representation of the heterogeneity present in the dataset. Two clear states are well differentiated, representing the two rotation states of the small subunit of the ribosome, as well as some other more localized movements. The colormap used to represent the embedding describes the amount of deformation associated with each deformation field: purple colors correspond to small deformations and are usually associated with conformational changes similar to the reference map, while yellow colors are associated with bigger changes. The possibility of coloring the coefficient space adds another dimension of information helping in the analysis of the heterogeneity of the dataset.

**Fig. 1: EMPIAR-10028 Zernike3D conformational landscape.**

There are two different possibilities to recover conformational changes from the previous embedding: (1) applying a deformation field to the reference or (2) exploring, by refinement and reconstruction, different areas of the conformational space. Option 1 represents an almost instant and interactive exploration of the conformational space, in which just by placing the cursor on any point of the representation conformation we obtain a Zernike3D synthesis of a map, while Option 2 goes to the original images and aims at exploring whether there are residual errors not accounted for by Zernike3D. In all cases tested so far, the differences between the two options are minimal, as it is shown in Fig. 2a, b. However, the application of the deformation field leads to a higher resolution representation of the conformational change (equal to the resolution of the reference map), while the refinement resolution is intrinsically limited to the number of particles selected from the space.

**Fig. 2: Example of Pf80S Zernike3D states.**

In addition, the Zernike3D coefficients extracted from the conformational space can be applied simultaneously to the reference map and to a structural model traced (or aligned) from the reference. This allows obtaining a rigid fitting of the atomic positions that match the conformation of any particle in the dataset. An example of the application of the Zernike3D coefficients to the ribosome atomic structure can be found in Fig. 2c. However, it is worth mentioning that the Zernike3D coefficients are computed exclusively based on geometrical considerations, so the approximated structural models might need to be refined to correct for stereochemistry artifacts. Indeed, it should always be considered that the estimation of the deformation fields describing a given transition only depends on the rigid alignment of the reference towards the conformation represented by a given particle. Therefore, the estimated deformation field does not consider any stereochemistry constrains, which should be posteriorly imposed to avoid atomic clashes or improve Ramachandran outliers among others.

An example of the simultaneous exploration of the coefficient space performed with the reference map and its structural model is provided in Supplementary Movie 1 (we are aware that this and subsequent videos are only graphical means to make more obvious conformational changes, and that they are not to be considered as suggesting molecular trajectories at all). The different states were obtained by grouping the coefficients with KMeans into 5 clusters. Then, the cluster representatives were used to generate the deformed maps/structures, which were afterward morphed with ChimeraX software¹⁴.

The next step we followed in the analysis of the dataset is to use the estimated deformation fields and the particles to reconstruct a higher-resolution map by correcting the conformational changes of each image with ZART. The comparison between the map reconstructed with CryoSparc¹⁵ and ZART reconstruction algorithm is shown in Fig. 3a. The comparison of the two maps shows a clear improvement at both, the level of maps (a) and slices (b), in the moving and still areas of the molecule. In order to make a more quantitative comparison of the maps, we computed the local resolution histograms of both reconstructions, which are compared in Fig. 3b. Similarly to the visual inspection of the maps, the resolution histograms confirm the improvement in local resolution, being the average resolution of ZART pushed 1.01 Å compared to the mean resolution of CryoSparc.

**Fig. 3: Analysis of EMPIAR-10028 ZART reconstruction.**

Conformational landscape of EMPIAR-10180 dataset

The EMPIAR-10180¹⁶ dataset has become another standard dataset to test continuous heterogeneity algorithms due to the large degree of flexibility information it contains. The dataset corresponds to a pre-catalytic spliceosome exhibiting an extensive heterogeneity already observed by classical methods such as 3D classification.

Since the Zernike3D algorithm focuses on the analysis of continuous heterogeneity rather than compositional heterogeneity, the dataset was preprocessed inside Scipion¹² to clean as much as possible the original deposited particles. The original dataset is composed of around 320k particles, which were reduced to around 180k after the cleaning steps.

The cleaned particles were afterward subjected to the Zernike3D analysis to extract the different conformational changes suffered by the pre-catalytic spliceosome. As we did in the previous experiment, we set the maximum basis degrees to N = 3 and L = 2, and particles were binned to a box size of 128 pixels. The resulting Zernike3D coefficient space is represented in Fig. 4a. The Zernike3D space obtained is similar to the continuous heterogeneity region described by other software like CryoDrgn (Fig. 6 of their manuscript). However, the representation of the conformational changes followed in the Zernike3D approach provides a more versatile manner to assess structural variability.

**Fig. 4: Analysis of the EMPIAR-10180 Zernike3D conformational landscape.**

An example of the versatility of the Zernike3D results is shown in Fig. 4b, c, and Supplementary Movie 2. The maps and structures shown in both Figures were obtained by clustering the Zernike3D space with KMeans into 5 different regions. Then, the representative Zernike3D coefficients of each cluster were extracted to represent the different conformational changes.

Similarly to other algorithms, the conformational changes can be represented at the level of Cryo-EM maps, although the Zernike3D representation will keep the same resolution as the reference map used for the analysis. In addition, the Zernike3D deformation fields can also be applied directly to an atomic structure traced or fitted to the reference map. In this way, it is also possible to compare the different conformations at an atomic level.

An example of the comparison between two of the previous structures is provided in Fig. 5. Thanks to the Zernike3D approach, it is possible to analyze both, the local and global motion of the atoms present in the structure, which provides a more accurate and informative representation of the conformational changes suffered by the spliceosome.

**Fig. 5: Example of recovered Zernike3D spliceosome states at atomic level.**

SARS-CoV-2 spike one RBD up the conformational landscape

We next applied the Zernike3D algorithm to a set of particles acquired from the SARS-CoV-2 spike. In our previous work¹⁷, we followed a discrete classification approach followed by a PCA (Principal Component Analysis)¹⁸ to study the presence of flexibility in these images, revealing two different open conformations of one of the Receptor Binding Domains (RBDs). The conformations represent small motions around an open RBD state.

The analysis of this dataset is useful to assess the ability of the Zernike3D algorithm to detect small motions from the noisy Cryo-EM images. Thus, we estimated the deformation fields for each particle starting from one of the conformations reported in ref. ¹⁷. The parameters set for this execution were the same as those used in the previous experiments (N = 3 and L = 2, yielding a total of 39 components per coefficient set. The particles were also downsampled to a box size of 125 voxels). The UMAP representation of this space is shown in Fig. 6a. The resulting space displays several interesting regions to be analyzed, and it is much richer than the space explored by discrete classification.

**Fig. 6: SARS-CoV-2 Zernike3D conformational landscapes.**

In addition, we can integrate the results of the previous discrete classification analysis, resulting in two main classes, with our continuous flexibility approach, by projecting all this information into the same Zernike3D space (in practice, in the reduced representation of the conformational landscape), effectively combining maps and images. The combined space is shown in Fig. 6b. The new representation simplifies the analysis of the embedding, aiding in the identification of the possible conformational changes of the spike by comparing the continuous states to the information of the discrete classification. Clearly, there is much more flexibility than the one originally accounted for by the discrete classification.

An exploration of the conformational space shown in Fig. 6a is provided in Supplementary Movie 3. The different states presented in the video were obtained by applying a set of 20 Zernike3D coefficients to the reference map and its traced structure, followed by morphing in ChimeraX. The representatives were obtained by clustering the space with KMeans.

The embedding shows an interesting region (composed of a low number of particles) along the direction defined by the white dots representing each classified map. The analysis of this region reveals a conformational change moving in the opposite direction to the one defined by the two discrete classes, which was not previously identified. Supplementary Movie 4 shows the whole motion of the 1Up RBD defined by the main transition identified in the coefficient space. This result shows the importance of analyzing the heterogeneity on a per-particle basis, as discrete classification might not have the ability to resolve low-represented states.

The next step we followed in the analysis of the dataset was to use the estimated deformation fields and the particles to reconstruct a higher-resolution map by “undoing” the conformational changes of each image. The motion-corrected map reconstructed with ZART is provided in Fig. 7a. As expected, the information available in the deformation fields leads to a better resolvability of the moving areas of the spike (the RBDs and N-terminal domains (NTDs) for this specific case), increasing the local resolution of these regions. Fig 7b shows a comparison of the local resolution histograms associated with the maps shown in Fig. 7a. The correction of the per-particle conformational changes leads to a significant increment of the local resolution in the case of ZART, thanks to the reduction of the motion induced blurring present in the CryoSparc reconstruction.

**Fig. 7: Analysis of SARS-CoV-2 ZART reconstruction.**

Discussion

Continuous heterogeneity is widely considered to be a significant breakthrough in the Cryo-EM field, progressively becoming more popular, as shown by the several new software developments to analyze this information from the acquired particle images.

In this regard, we have introduced an extension of the Zernike3D algorithm during this work, which has proven to be a versatile tool to study the continuous motion of macromolecules at the level of maps, structures, and particle images. The extension focuses on the extraction of per-particle conformations, leading to a much more detailed description of the conformational landscape of a molecule compared to classical 3D classification approaches. Furthermore, the versatility of the Zernike3D basis unites maps, particles, and structures into a common framework, opening new possibilities to perform combined heterogeneity analysis with all data available.

Moreover, we have proven that the resulting coefficient space can be applied simultaneously to Cryo-EM maps and atomic coordinates to approximate a new conformational state. The approximation of the conformational changes at the atomic level supposes another step in the connection of the Cryo-EM landscapes with molecular dynamics. This connection will allow getting real energetic landscapes directly based on experimental data in the future.

In addition, we have developed the ZART reconstruction algorithm, which considers deformation fields during the reconstruction to “undo” conformational changes. In this way, it is possible to model the blurring artifacts induced by molecular motions and increase the local resolution of the reconstructed volumes.

Methods

This section is organized starting with general presentations of the Zernike3D basis and its use for the case of particles exhibiting continuous flexibility (first two subsections), and then dedicating several subsections to useful properties of our proposed method, see also Supplementary Methods.

We also provide some metrics regarding the performance of the Zernike3D algorithm in Table 1.

Table 1 Execution times for the Zernike3D algorithm

Full size table

Zernike3D basis definition

We use the Zernike3D to estimate the deformation field associated with a given conformational transition, as we previously described in our work¹⁰.

The Zernike3D basis is an infinite-dimensional function space defined over the unit ball. Thus, it is convenient to express it as the combination of a radial and an angular component. For this basis, we have chosen the normalized and generalized definition of the Zernike polynomials as the radial component:

$${\bar{R}}_{l,n}^{p}(x)=\sqrt{2}\sqrt{2n+l+\frac{p}{2}+1}{R}_{l,n}^{p}(x)$$

(1)

$p$ being a parameter associated with the inner product and dimensionality of the polynomials. For example, in a 3D scenario, the appropriate value of $p$ should be $1$.

The previously mentioned angular component is defined in terms of the real spherical harmonics:

$${y}_{l}^{m}({{{{{\rm{\theta }}}}}},\phi )={(-1)}^{m}\sqrt{\frac{2l+1}{4{{{{{\rm{\pi }}}}}}}\frac{(l-|m|)!}{(l+|m|)!}}{P}_{l}^{|m|}(cos{{{{{\rm{\theta }}}}}})\left\{\begin{array}{cc}1 & if\,m=0\\ \sqrt{2}cos(m\phi ) & if\,m \, > \, 0\\ \sqrt{2}sin(|m|\phi ) & if\,m \, < \,0\end{array}\right.$$

(2)

By combining the previous two components, we obtain the final definition of the Zernike3D basis:

$${Z}_{l,n,m}(r)={\bar{R}}_{l,n}^{1}(r){y}_{l}^{m}({{{{{\rm{\theta }}}}}},\,\phi )$$

(3)

Estimating deformation fields from particles

As we explained in the previous section, the Zernike3D basis was initially formulated to be applied to 3D spaces. Therefore, it is quite direct to estimate conformational transitions from maps or atomic structures, as they live in a three-dimensional space. However, this is not the case for Cryo-EM particles, as we are intrinsically losing information along the projection direction during the acquisition process in the reduction from the three-dimensional space where the Coulomb potential of the specimen is defined to the two-dimensional space of the projection images being acquired in the microscope. In other words, conformational changes along the projection direction cannot be extracted from an individual image, since an infinite number of them would be compatible with the image information.

The algorithm we present in this work starts by computing a reference map/model (in practice, and continuing the presentation for the case of maps, it is common to either use an average map or one of the discrete classes). This map will be the origin (reference) to obtain the deformation fields from the parameters of the Zernike3D basis. The approach is summarized in Fig. 8, and it is a common procedure in optimization. In brief, it is an iterative procedure in which deformation fields are applied to the reference map and the resulting projection images are compared with the experimental ones until convergence.

**Fig. 8: Zernike3D workflow at particle level.**

Following the aforementioned method, finding the deformation field to describe the state represented by a given particle can be expressed as:

$$\mathop{\max }\limits_{{g}_{L}}\rho \,(I,\,C{P}_{{{{{{\rm{\theta }}}}}}}(V(r+{g}_{L}(r))))$$

(4)

$\rho$ being the Pearson’s correlation coefficient, $I$ an experimental Cryo-EM particle, $C$ the CTF estimated for that particle, ${P}_{\theta }$ the projection operator along the 3D direction and in-plane shift defined by the parameters $\theta$, $V$ the reference volume needed to apply the Zernike3D deformation field, and ${g}_{L}$ the displacement suffered by each voxel due to the deformation field. The vector ${g}_{L}$ depends on each Zernike3D component, and it is expressed as:

$${g}_{L}(r)=\mathop{\sum }\limits_{l=0}^{L}\mathop{\sum }\limits_{n=0}^{N}\mathop{\sum }\limits_{m=-l}^{l}\left(\begin{array}{c}{{{{{{\rm{\alpha }}}}}}}_{l,n,m}^{x}\\ {{{{{{\rm{\alpha }}}}}}}_{l,n,m}^{y}\\ {{{{{{\rm{\alpha }}}}}}}_{l,n,m}^{z}\end{array}\right){Z}_{l,n,m}(r)$$

(5)

where the ${{{{{{\rm{\alpha }}}}}}}_{l,n,m}$’s are the Zernike3D coefficients. The previous coefficients determine the contribution of each component of the basis to the deformation field.

The parameters $N$ and $L$ determine the maximum degrees of the Zernike polynomials and spherical harmonics. Therefore, they will determine the accuracy of the deformation fields: higher values will result in sharper and more accurate deformation fields, at the expense of increased execution times. By default, the two previous parameters are set to $N=3$ and $L=2$, which should be enough to avoid overfitting and get meaningful and accurate deformation fields in most cases. Nevertheless, the parameters can be manually set by the user in case higher accuracy is desired.

The maximization of Eq. 4 is achieved through a Powell’s conjugate direction method starting from an initial guess of ${{{{{{\rm{\alpha }}}}}}}_{l,n,m}=0$ for all indices $l,n,m$ and directions $x,y,z$ (that is, no deformation). Thanks to the optimization method and the procedure described in Fig. 8, it is possible to find the different component contributions ${{{{{{\rm{\alpha }}}}}}}_{l,n,m}$ such that the deformation field to be applied to the reference map $V$ leads to a conformational state compatible with the particle $I$.

In order to avoid possible overfitting during the Powell search of the Zernike3D coefficients, an extra regularization term is added to Eq. 4:

$$\mathop{\max }\limits_{{g}_{L}}\rho \,(I,\,C{P}_{{{{{{\rm{\theta }}}}}}}(V(r+{g}_{L}(r))))+{{{{{{\rm{\lambda }}}}}}}_{1}{\int }^{}|{g}_{L}(r){|}^{2}dr$$

(6)

The additional regularization term accounts for the total deformation the reference map has suffered after applying the estimated deformation field. Depending on the value of ${{{{{{\rm{\lambda }}}}}}}_{1}$, the optimization search will be allowed to explore minima leading to a larger or smaller deformation, so it is recommended to set it at a low value to avoid overfitting without compromising the minima search. The user can choose the value of ${{{{{{\rm{\lambda }}}}}}}_{1}$ to be applied to a specific dataset. We recommend selecting a value belonging to the range $[0.01,\,0.001]$ to avoid undesired results during the optimization process.

It is worth mentioning that the Zernike3D algorithm does not require a minimum number of particles to be executed, as the deformation fields are estimated for every particle. Therefore, it is possible to process datasets coming from a consensus or other cleaning methods¹⁹, whose parameters are more accurately estimated but have fewer particles overall.

Deformation field consistency along the projection direction

In this work, we estimate 3D deformation fields from 2D images. However, the information limitations introduced above make this procedure conceptually challenging. Indeed, if we compare the information stored in a projection and a map, it would be possible to check that we have identical information as long as we restrain the comparison to the projection plane where the image exists. In spite of that, the image has an intrinsic loss of information in the projection direction, as we are collapsing the map information stored along this direction.

Following the previous reasoning, the deformation field is well-defined across the projection plane, but it is ill-defined along the projection direction. This implies that there are infinite ways of deforming a map along the projection direction defined by particles so that the projection of the deformed volume is still consistent with the particle. Therefore, the Powell optimization proposed previously might find different solutions for each particle along the projection direction. Moreover, this inconsistency might lead to the global optimization process being more prone to get trapped in local minima, resulting in wrong estimations of the conformational landscape. An example of the undesired effect generated by not considering the missing information along the projection direction is provided in Supplementary Fig. 3.

Ideally, the best solution to the previous problem would be to drive the optima search so that all particles are not deformed along their respective projection direction. For this reason, a sensible choice would be to completely restrict the deformation along the projection direction.

One possible way to achieve this is to include different regularization terms restricting excessive deformations. Nevertheless, it would be challenging to find the weights needed for each regularization term, especially along the projection direction, a situation that introduces a new parameter quite difficult to estimate in the process.

The Zernike3D method overcomes that obstacle by taking advantage of the properties of the basis to altogether remove any deformation along the projection direction defined by a particle, either during the optimization or after it. As we showed in ref. ¹⁰, the Zernike3D basis is closed under rotations. Thus, it is possible to rotate the Zernike3D coefficients towards a different reference frame as follows:

$$A{g}_{L}({A}^{-1}r)=\mathop{\sum }\limits_{n=0}^{N}\mathop{\sum }\limits_{l=0}^{L}\mathop{\sum }\limits_{m=-l}^{l}A\left(\begin{array}{c}{{{{{{\rm{\alpha }}}}}}}_{l,n,m}^{x}\\ {{{{{{\rm{\alpha }}}}}}}_{l,n,m}^{y}\\ {{{{{{\rm{\alpha }}}}}}}_{l,n,m}^{z}\end{array}\right){\tilde{Z}}_{l,n,m}({A}^{-1}r)$$

(7)

$A$ being a rotation matrix. Therefore, it is possible to rotate the Zernike3D coefficients according to the angular information of the particle. For example, we can rotate the coefficients so that the Z direction of the new frame is effectively aligned with the projection direction of a particle. Then, we can cancel the rotated coefficients associated with the Z-axis in this new frame, as those only contribute to the deformation field along the projection direction.

However, it is essential to note that the previous property only holds in a continuous space. Hence, the basis is not closed under rotations due to the discretization and sampling of the space into voxels. Thus, the rotated coefficients cannot be applied to the volume, as the reference frames are entirely different. Instead, we can unrotate the modified Zernike3D coefficients, so their reference frame matches again with the reference map. Thus, it is possible to fully remove the deformation along the projection direction by combing all the previous steps, making all the deformation fields consistent, and avoiding solutions with unrealistic deformations. The whole procedure is summarized in Fig. 9.

**Fig. 9: Projection direction correction workflow.**

Note that we also use a global regularization term in our optimization approach, as previously indicated, but it is global and does not differentiate among projection directions. In our experience, the previously stated “consistency principle” introduced by mathematically clear handling of the lack of information along individual projection directions is quite important factor in our quest for estimating deformation fields, and it represents a clear advantage of the Zernike3D approach.

Zernike3D-based ART reconstruction algorithm

In general, 3D reconstruction algorithms start from the principle that we have a set of projections coming from a homogeneous set of particles. However, this assumption no longer holds for those macromolecules exhibiting large degrees of freedom. Therefore, it is not a surprise that molecular motions are a well-known source of blurring artifacts arising when reconstructing a Cryo-EM map from a set of Cryo-EM images. As a consequence, correcting the motions present in a particle image will be expected to boost the resolution and resolvability of blurry areas in Cryo-EM maps.

The per-particle estimation of the deformation fields by the Zernike3D basis can be effectively applied to correct molecular motions, aiding the reconstruction process with flexible information. To that end, we developed an ART-based (Algebraic Reconstruction Technique) reconstruction algorithm that uses the Zernike3D deformation fields to improve the final quality of motion-related blurry areas.

A detailed description of ART and its application in Cryo-EM can be found at ref. ²⁰. Here it suffices to say that ART finds the map whose projections are compatible with the experimental data through an iterative process of the form:

$${V}^{(k+1)}({{{{{\boldsymbol{r}}}}}})={V}^{(k)}({{{{{\boldsymbol{r}}}}}})+\lambda {P}_{H}^{\ast }({P}_{H}V({{{{{\boldsymbol{r}}}}}})-{I}_{k}({{{{{\boldsymbol{s}}}}}}))$$

(8)

$V$ being the reconstructed volume, ${{{{{\rm{\lambda }}}}}}$ the ART relaxation factor, ${P}_{H}$ the projection operator, and ${P}_{H}^{*}$ its adjoint operator, ${I}_{k}$ the experimental image used at the $k$-th iteration, $r$ a 3D coordinate, and $s$ a 2D coordinate. The previous equation refers to the update to be applied to the reconstruction associated with a single image, although the algorithm will iterate over the whole particle dataset applying the previous correction to achieve the final reconstruction.

One advantage of ART over other reconstruction methods is that it can be easily modified to include new information to be taken into account during the iterative reconstruction process. Thus, it is possible to modify the previous equation by adding the deformation field previously estimated:

$$V{(r)}^{k+1}=V{(r)}^{k}+{{{{{\rm{\lambda }}}}}}{P}_{H}^{\ast }\left(\right.{P}_{H}(V(r+{g}_{L}(r))-{I}_{k}(s))$$

(9)

${g}_{L}\left(r\right)$ being the displacement at a given 3D position computed through Eq. 5.

By introducing the displacements ${g}_{L}$ into the ART algorithm, we are improving the correction value that will be applied to $V$ at each iteration, as the difference between the theoretical and experimental images is taken based on the conformational change present in the particle. Thus, the reconstruction process can generate more meaningful solutions for areas subjected to significant motions.

Moving Zernike3D coefficients through different resolutions

One of the main issues arising when working with Cryo-EM particles is the low signal-to-noise ratios that they exhibit. Although the average of a very high number of images overcomes that problem, the procedure also mixes several conformational changes at the same time. Therefore, the estimation of continuous flexibility is usually done directly on particle images, even if conditions are far from ideal.

To estimate motions more efficiently, it is common to filter the particles at a given resolution to increase the signal-to-noise ratio. Similarly, it is possible to downsample the images after the filtering process to improve the performance of the estimations.

However, the Zernike3D coefficients ${{{{{{\rm{\alpha }}}}}}}_{l,m,n}$ have a strong dependency on the size of the volume under study. This implies that the coefficients computed from a downsampled map cannot be directly applied to the original volume and vice versa.

Luckily, it is possible to move a set of Zernike3D coefficients to a different resolution similar to the procedure described in previous sections. By downsampling a map, we scale its space by a given factor $k$. Therefore, the relation between two vectors with the same direction in the previous two spaces is:

$${r}_{o}=k{r}_{d}$$

(10)

${r}_{o}$ and ${r}_{d}$ being the vectors associated with the original and downsampled spaces, respectively.

We can express the earlier two vectors based on the components of the Zernike3D basis as follows:

$$\begin{array}{c}\mathop{\sum }\limits_{n=0}^{N}\mathop{\sum }\limits_{l=0}^{L}\mathop{\sum }\limits_{m=-l}^{l}{\alpha }_{{{{{{\boldsymbol{l}}}}}},{{{{{\boldsymbol{n}}}}}},{{{{{\boldsymbol{m}}}}}}}^{{{{{{\boldsymbol{o}}}}}}}{\tilde{Z}}_{l,n,m}({{{{{{\boldsymbol{r}}}}}}}_{{{{{{\boldsymbol{o}}}}}}})={{{{{\rm{k}}}}}}\mathop{\sum }\limits_{n=0}^{N}\mathop{\sum }\limits_{l=0}^{L}\mathop{\sum }\limits_{m=-l}^{l}{{{{{{\rm{\alpha }}}}}}}_{l,n,m}^{d}{\tilde{Z}}_{l,n,m}({r}_{d})\\ \,=\mathop{\sum }\limits_{n=0}^{N}\mathop{\sum }\limits_{l=0}^{L}\mathop{\sum }\limits_{m=-l}^{l}k{{{{{{\rm{\alpha }}}}}}}_{l,n,m}^{d}{\tilde{Z}}_{l,n,m}({k}^{-1}{{{{{{\boldsymbol{r}}}}}}}_{{{{{{\boldsymbol{o}}}}}}})\end{array}$$

(11)

Thanks to Eq. 11, it is possible to show that the scaling relation existing between the vectors ${x}_{o}$ and ${x}_{d}$ is shared by the corresponding Zernike3D coefficients:

$${{{{{{\rm{\alpha }}}}}}}_{l,n,m}^{o}=k{{{{{{\rm{\alpha }}}}}}}_{l,n,m}^{d}$$

(12)

This leads to a very convenient and straightforward conversion to use coefficients estimated on low-resolution images in the original high-resolution maps.

Merging embeddings of different nature

Our previous work¹⁰ showed that the Zernike3D basis could effectively study the continuous heterogeneity of Cryo-EM maps and atomic structures converted to electron densities.

Similarly, we have proven in the previous sections the applicability of this same tool to a set of Cryo-EM particles. In all cases, the estimation of the deformation fields represented by the Zernike3D coefficients is comparable, meaning that we are translating the information of the three main Cryo-EM entities (maps, atomic structures, and particles) to a common framework or space defined by the coefficients ${{{{{{\rm{\alpha}}}}}}}_{l,m,n}$.

Translating maps, structural models, and particles to a common framework opens interesting possibilities and advantages, such as studying and comparing discrete and continuous heterogeneity or addressing how well simulated and experimental data correlate.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The datasets analyzed with the Zernike3D algorithm and ZART are publicly available in EMPIAR under the entries: 10028 [https://doi.org/10.6019/EMPIAR-10028], 10514 [https://doi.org/10.6019/EMPIAR-10514], 10516 [https://doi.org/10.6019/EMPIAR-10516], and 10180 [https://doi.org/10.6019/EMPIAR-10180]. The phantom dataset processed in the Supplementary Material is available in GitHub in the repository Zernike3D_Phantom_Data [https://zenodo.org/badge/latestdoi/541505536]²¹.

Code availability

The Zernike3D algorithm has been implemented in Xmipp²² and it is available through Scipion¹² under the plugins “scipion-em-xmipp” and “scipion-em-flexutils”. The protocols corresponding to the algortihms described in this manuscript are “flexutils - angular align - Zernike3D” and “flexutils - reconstruct ZART”.

References

Carroni, M. & Saibil, H. R. Cryo electron microscopy to determine the structure of macromolecular complexes. Methods 95, 78–85 (2016).
Article CAS Google Scholar
Serna, M. Hands on methods for high resolution cryo-electron microscopy structures of heterogeneous macromolecular complexes. Front. Mol. Biosci. 6, 33 (2019).
Article ADS CAS Google Scholar
Gomez-Blanco, J., Kaur, S., Strauss, M. & Vargas, J. Hierarchical autoclassification of cryo-EM samples and macromolecular energy landscape determination. Comput. Methods Prog. Biomed. 216, 106673 (2022).
Article CAS Google Scholar
Jin, Q. et al. Iterative elastic 3D-to-2D alignment method using normal modes for studying structural dynamics of large macromolecular complexes. Structure 22, 496–506 (2014).
Article CAS Google Scholar
Zhong, E. D., Bepler, T., Berger, B. & Davis, J. H. CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks. Nat. Methods 18, 176–185 (2021).
Article CAS Google Scholar
Ludtke, S. J. & Muyuan, C. Deep learning-based mixed-dimensional Gaussian mixture model for characterizing variability in cryo-EM. Nat. Methods 18, 930–936 (2021).
Article Google Scholar
Frank, J. & Abbas, O. Continuous changes in structure mapped by manifold embedding of single-particle data in cryo-EM. Methods 100, 61–67 (2016).
Article CAS Google Scholar
A. Punjani, A. & Fleet, D. J. 3D flexible refinement: structure and motion of flexible proteins from Cryo-EM. bioRxiv, https://www.biorxiv.org/content/10.1101/2021.04.22.440893v1 (2021).
Lederman, R. R., Anden, J. & Singer, A. Hyper-molecules: on the representation and recovery of dynamical structures for applications in flexible macro-molecules in cryo-EM. arXiv, https://arxiv.org/abs/1907.01589 (2020).
Herreros, D. et al. Approximating deformation fields for the analysis of continuous heterogeneity of biological macromolecules by 3D Zernike polynomials. IUCrJ 8, 992–1005 (2021).
Article CAS Google Scholar
Wong, W. et al. Cryo-EM structure of the 80S ribosome bound to the anti-protozoan drug emetine. eLife 3, e03080 (2014).
Article Google Scholar
de la Rosa-Trevı́n, J. M. et al. Scipion: a software framework toward integration, reproducibility and validation in 3D electron microscopy. J. Struct. Biol. 195, 93–99 (2016).
Article Google Scholar
McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).
Article Google Scholar
Pettersen, E. F. et al. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci. 30, 70–82 (2021).
Article CAS Google Scholar
Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017).
Article CAS Google Scholar
Plaschka, C., Lin, P. C. & Nagai, K. Structure of a pre-catalytic spliceosome. Nature 564, 617–621 (2017).
Article ADS Google Scholar
Melero, R. et al. Continuous flexibility analysis of SARS-CoV-2 spike prefusion structures. IUCrJ 7, 1059–1069 (2020).
Article CAS Google Scholar
Jolliffe, I. & Cadima, J. Principal component analysis: a review and recent developments. Philos. Trans. A Math. Phys. Eng. Sci. 374, 20150202 (2016).
ADS MathSciNet MATH Google Scholar
Sorzano, C. O. S. et al. On bias, variance, overfitting, gold standard and consensus in single-particle analysis by cryo-electron microscopy. Acta Crystallogr. Sect. D. 78, 410–423 (2022).
Article CAS Google Scholar
Sorzano, C. O. S. et al. A survey of the use of iterative reconstruction algorithms in electron microscopy. BioMed. Res. Int. 2017, 1–17 (2017).
Article Google Scholar
Herreros, D. Estimating conformational landscapes from Cryo-EM particles by 3D Zernike polynomials https://doi.org/10.5281/zenodo.7334391, (2022).
de la Rosa-Trevı́n, J. M. et al. Xmipp 3.0: an improved software suite for image processing in electron microscopy. J. Struct. Biol. 184, 321–328 (2013).
Article Google Scholar
Heymann, J. B. Guidelines for using Bsoft for high resolution reconstruction and validation of biomolecular structures from electron micrographs. Protein Sci. 27, 159–171 (2018).
Article CAS Google Scholar

Download references

Acknowledgements

Funding is acknowledged from the Ministry of Science and Innovation through grants: Grant PID2019-104757RB-I00 funded by MCIN/AEI/ 10.13039/501100011033/ and “ERDF A way of making Europe”, by the “European Union; the ‘Comunidad Autonoma de Madrid’ through grant S2017/BMD-3817; and the European Union (EU) and Horizon 2020 through grants EnLaCES (H2020-MSCA-IF-2020, Proposal: 101024130 to J.M.K.), HighResCells (ERC-2018-SyG, Proposal: 810057) and iNEXT-Discovery (Proposal: 871037). This work has also been supported by the NIH/NIGMS (No. 1R01GM136780-01 to RRL) and AFSOR FA9550-21-1-0317.

Author information

These authors jointly supervised this work: C.O.S. Sorzano, and J.M. Carazo.

Authors and Affiliations

Centro Nacional de Biotecnologia-CSIC, C/Darwin, 3, 28049, Cantoblanco, Madrid, Spain
D. Herreros, J. M. Krieger, A. Jiménez-Moreno, M. Martínez, D. Strelak, C. O. S. Sorzano & J. M. Carazo
The Department of Statistics and Data Science, Yale University, New Haven, CT, USA
R. R. Lederman
Institute of Computer Science, Masaryk University, Botanická 68a, 60200, Brno, Czech Republic
D. Myška & J. Filipovic
Faculty of Informatics, Masaryk University, Botanická 68a, 60200, Brno, Czech Republic
D. Strelak

Authors

D. Herreros
View author publications
You can also search for this author in PubMed Google Scholar
R. R. Lederman
View author publications
You can also search for this author in PubMed Google Scholar
J. M. Krieger
View author publications
You can also search for this author in PubMed Google Scholar
A. Jiménez-Moreno
View author publications
You can also search for this author in PubMed Google Scholar
M. Martínez
View author publications
You can also search for this author in PubMed Google Scholar
D. Myška
View author publications
You can also search for this author in PubMed Google Scholar
D. Strelak
View author publications
You can also search for this author in PubMed Google Scholar
J. Filipovic
View author publications
You can also search for this author in PubMed Google Scholar
C. O. S. Sorzano
View author publications
You can also search for this author in PubMed Google Scholar
J. M. Carazo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.H. developed and tested the Zernike3D and ZART algorithms presented in the manuscript. R.R.L. helped with the understanding and demonstration of the mathematical properties of the Zernike3D basis. J.M.K. and M.M. helped with the understating of the results and the preparation of the data. A.J.M. helped with the implementation of the Zernike3D algorithm in Scipion. D.M., D.S., and J.F. helped with the optimization of the codes. C.O.S.S. and J.M.C. jointly supervised this work.

Corresponding author

Correspondence to D. Herreros.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Yoel Shkolnisky and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Movie 1

Supplementary Movie 2

Supplementary Movie 3

Supplementary Movie 4

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Herreros, D., Lederman, R.R., Krieger, J.M. et al. Estimating conformational landscapes from Cryo-EM particles by 3D Zernike polynomials. Nat Commun 14, 154 (2023). https://doi.org/10.1038/s41467-023-35791-y

Download citation

Received: 01 June 2022
Accepted: 29 December 2022
Published: 11 January 2023
DOI: https://doi.org/10.1038/s41467-023-35791-y

This article is cited by

Improving resolution and resolvability of single-particle cryoEM structures using Gaussian mixture models
- Muyuan Chen
- Michael F. Schmid
- Wah Chiu
Nature Methods (2024)
CryoTRANS: predicting high-resolution maps of rare conformations from self-supervised trajectories in cryo-EM
- Xiao Fan
- Qi Zhang
- Chenglong Bao
Communications Biology (2024)
Accurate Prediction of Protein Structural Flexibility by Deep Learning Integrating Intricate Atomic Structures and Cryo-EM Density Information
- Xintao Song
- Lei Bao
- Renmin Han
Nature Communications (2024)
Accurate global and local 3D alignment of cryo-EM density maps using local spatial structural features
- Bintao He
- Fa Zhang
- Renmin Han
Nature Communications (2024)
Data-driven regularization lowers the size barrier of cryo-EM structure determination
- Dari Kimanius
- Kiarash Jamali
- Sjors H. W. Scheres
Nature Methods (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.