2D hybrid analysis: Approach for building three-dimensional atomic model by electron microscopy image matching

In this study, we develop an approach termed “2D hybrid analysis” for building atomic models by image matching from electron microscopy (EM) images of biological molecules. The key advantage is that it is applicable to flexible molecules, which are difficult to analyze by 3DEM approach. In the proposed approach, first, a lot of atomic models with different conformations are built by computer simulation. Then, simulated EM images are built from each atomic model. Finally, they are compared with the experimental EM image. Two kinds of models are used as simulated EM images: the negative stain model and the simple projection model. Although the former is more realistic, the latter is adopted to perform faster computations. The use of the negative stain model enables decomposition of the averaged EM images into multiple projection images, each of which originated from a different conformation or orientation. We apply this approach to the EM images of integrin to obtain the distribution of the conformations, from which the pathway of the conformational change of the protein is deduced.


Results
Comparison of two simulated models of EM images. In the 2D hybrid analysis, two kinds of simulated models of EM images-the simple projection model and the negative stain model-were built from each atomic model to select the best-fitting atomic model. The negative stain model was more realistic; thus, we assumed that the atomic model that resulted in the negative stain model being the most similar to an EM image was the best-fitting model. However, because building the negative stain model was time consuming, the simple projection model was also used for squeezing the candidates.
To develop a strategy for the selection of the best-fitting model, we compared the two kinds of simulated models. For this purpose, we built both these models from the (nondeformed) X-ray crystal structure in all possible orientations and calculated the scores for the EM images of clasped integrins in Ca 2+ solution 7 , which had conformations similar to the X-ray crystal structure.
As described in Methods, each atomic model was given a variety of orientations and projected onto the xy plane. The orientation of the atomic model was described using three direction vectors e 1 , e 2 , and e 3 , where e 3 determined the projection direction and e 1 and e 2 determined the rotation of the projection in the xy plane. The position of the projection was described by the vector s. A total of 2,562 different directions were used for e 3 , and for each of these directions, the optimum rotation (e 1 and e 2 ) and optimum position (s) in the xy plane, which gave the maximum Sc 1 score, were determined using the simple projection model. Then, negative stain models in the optimum orientations and positions were built and the Sc 2 scores were calculated. Because the stain thickness h was not provided experimentally, multiple negative stain models with different thicknesses were built to obtain the optimum value. Furthermore, two different models were considered in terms of contact with the supporting film: the top contact model and the bottom contact model. The former contacted the supporting film at the top (maximum point along the z-axis), and the latter contacted the supporting film at the bottom (minimum point along the z-axis) (see Fig. 2C,D). In this way, the optimum parameter sets (e 1 , e 2 , s, h, and the top/bottom contact model) and the scores (Sc 1 and Sc 2 ) were determined for each e 3 . Finally, the optimum e 3 was determined in such a way that it gave the negative stain model with the highest Sc 2 score for the atomic model.
We had 20 averaged EM images of integrins in Ca 2+ solution ( Supplementary Fig. S1). By visual comparison, we judged that 9 out of these 20 EM images were reproduced well by the negative stain model of the X-ray crystal structure in the optimum position and orientation ( Supplementary Fig. S1). One of them is shown in Fig. 2A,B. In Fig. 2E, the Sc 2 score for the EM image is plotted against the stain thickness h with the other parameters (e 1 , e 2 , e 3 , and s) set to the optimum values. A series of negative stain models with different thicknesses and contact models are shown in Supplementary Fig. S2. When the thickness was small, the scores of the top contact models (Fig. 2C) were much higher than those of the bottom contact models (Fig. 2D). The maximum score was observed at h ~ 70 Å in the top contact model. As the thickness increased further, the scores of the two models approached each other. In this example, the top of the atomic model had a relatively large contact area (S/S max = 0.52), whereas the bottom did not (S/S max = 0.08), where the contact area was defined as the area of the minimum convex polygon inside which all the contacting points lied (For example, when the atomic model touched the supporting film by three points, the contact area was defined as the area of the triangle made by the three points). This was also the case for the remaining eight well-reproduced EM images; that is, the highest scores were observed for the contact models with relatively large contact areas. Supplementary Fig. S3 shows the distribution of the contact areas of the atomic models in the optimum orientations for the 9 well-reproduced EM images in comparison to that of the X-ray crystal structure in 2,562 different directions of e 3 . Clearly, these atomic models had relatively large contact areas. Based on this observation, we assumed that the molecules contacted the supporting film by a relatively large contact area. This helped to reduce the number of computations by limiting the projection directions e 3 in the subsequent calculations.
The reason why the two different contact models had nearly the same scores at large h, as shown in Fig. 2E, is explained as follows. When h was large enough, the model was completely buried in the simulated negative stain, and it made no difference whether the top or the bottom of the model contacted the supporting film.
In Supplementary Fig. S4, the contour maps of Sc 1 (simple projection models) and Sc 2 (negative stain models) plotted with respect to the projection direction e 3 are compared for two EM images. This figure shows that the global maxima of the two simulated models were not always observed in the same direction. However, even when the two maxima were not in the same direction, the global maximum of Sc 2 was observed near one of the local maxima of Sc 1 . Thus, we could determine the global maximum of Sc 2 by searching the directions around the local and global maxima of Sc 1 . In this way, we could reduce the number of calculations for Sc 2 , which required a much longer time than did those for Sc 1 .
Based on the above comparison of the two simulated models, we developed a strategy for the selection of the best-fitting atomic model, as explained below.
1. All the orientations of each atomic model were searched using the simple projection model.  Scientific RepoRts | 7: 377 | DOI:10.1038/s41598-017-00337-y atomic models with different conformations in order to determine the best-fitting atomic model.

Selection of best-fitting models for EM images of integrins in Ca 2+ solution.
From the numerous deformed atomic models built by iterative normal mode analysis and small deformations (see Methods), the best-fitting atomic model was selected for each EM image of integrin in Ca 2+ solution. The covariance matrix adaptation evolution strategy (CMA-ES) was used for finding the model with the maximum Sc 2 score 8 . As shown in Table 1, the maximum scores (Sc 2 max ) were generally high, suggesting that the models reproduced the EM images well. As expected from the successful reproduction of the EM images from the X-ray crystal structure, many best-fitting models were not too different from the X-ray crystal structure, with the root-mean-square deviation (RMSD) being less than 10 Å. Corresponding to the small RMSD, the increments of the scores from those of the X-ray crystal structure (Sc 2 0 ) were not very large (the average increments were ~4%). However, there were cases in which the increments were more than 10% (written as bold numerals in Table 1). In such cases, the RMSDs were relatively large and the X-ray crystal structure was often fitted to the EM images in incorrect ways ( Supplementary Fig. S1), indicating that the fitting was sensitive to the conformational changes in the atomic model.
To examine how fitting was dependent on the conformation, we built negative stain models for a range of atomic models r 12 (n 1 , n 2 ), which were built by deforming the X-ray crystal structure along the two lowest-frequency normal modes, and calculated the Sc 2 scores; these scores are shown in Fig. 3 using contour maps.    Supplementary Fig. S1) plotted as a function of index numbers n 1 and n 2 for deformed atomic models r 12 (n 1 , n 2 ). The origin (0, 0) corresponds to the X-ray crystal structure. The contour lines are drawn at an interval of 0.01, starting from the maximum scores. The peaks are indicated by crosses.
Scientific RepoRts | 7: 377 | DOI:10.1038/s41598-017-00337-y For about half of the EM images, we obtained contour maps with a single peak surrounded by crowded contour lines (see Fig. 3A,B), suggesting that the score decreased rapidly as the conformation deviated from the peak. Figure 3A shows the contour map for the EM image reproduced quite well by the X-ray crystal structure, and Fig. 3B shows the contour map for the image that was not reproduced well by the crystal structure. Clearly, the peak was closer to the origin in Fig. 3A than in Fig. 3B, where the origin corresponded to the X-ray crystal structure. Thus, this result showed that it was important to use an appropriate atomic model for achieving good fitting. In other words, this result showed that it was possible to identify a unique atomic model by the proposed 2D hybrid analysis approach.
Multiple peaks seemed to be present in other contour maps (Fig. 3C), indicating that the multiple conformations fitted well. The EM images studied here were averaged images, and in principle, the averaging should have been performed using the raw images of molecules with the same conformation and orientation. However, this is actually a difficult task, as described in Introduction.
This contour map suggested that the raw images of the molecules with relatively large differences in conformations or orientations were averaged. For verification, we combined the negative stain models ρ i j ( , ) p 2 of the peak conformations (see Methods) by using the equation , where p is the index number for the peak conformation and the coefficients c p (∑ p c p = 1) are determined such that the correlation (Sc multi 2 ) between the EM image I(i, j) and ρ i j ( , ) multi 2 is maximized. It should be noted that the peak conformations were selected from not only r 12 (n 1 , n 2 ) but also the entire range of conformations. It should also be noted that we used only those peak conformations whose Sc 2 values were relatively large (>

− .
Sc ). The results are summarized in Table 2. There were several cases in which relatively large increments of the score (∆Sc 2 ) resulting from the combinations of the negative stain models were observed (written as bold numerals in the table). In such cases, many peak conformations were observed, although many of them made only a slight contribution (small c p values) as indicated by the numerals in parentheses. Actually, each averaged EM image was reproduced relatively well by a much fewer number of the negative stain models as shown in Supplementary Fig. S5, where the highest Sc 2 score is plotted when a limited number n c (n c = 1, 2, 3, 4, 5) of the negative stain models of the peak conformations were combined. In Table 2, the smallest number of the negative stain models to achieve the Sc 2 score larger than 99% of Sc multi 2 is listed as n c 99 for each EM image. This number correlated well with ∆Sc 2 . Figure 4 demonstrates how the combination of the negative stain models reproduced an EM image well. The combined negative stain model appeared more similar to the EM image than did any negative stain models of the peak conformations, suggesting that the EM image was indeed the averaged image of the molecules with relatively large differences in conformations or orientations. In Fig. 5A, the combined negative stain models for all EM images of integrins in Ca 2+ solution are shown. These models reproduced the EM images well.
It should be noted that our result was different from the so-called "Einstein-from-noise" 9, 10 , which describes how any image can be reproduced by averaging a lot of noise images. This phenomenon occurs because noise images are uncorrelated to each other. Thus, the more noise images we use, the better the averaged images we get.  The number of peaks whose coefficients (c p ) were larger than 0.01 is given in parentheses.
On the other hand, the negative stain models of the peak conformations were strongly correlated to each other, because they were similar to the targeted EM image. Furthermore, we needed to combine only a few images at most to reproduce the averaged EM images well, and the further increment of the images made little improvement ( Supplementary Fig. S5).

Fitting to EM images of integrins in Mn 2+ solution.
We performed the same analyses of the EM images of clasped integrins in Mn 2+ solution 7 . As is obvious from the EM images (Fig. 5B), the molecules in this condition tended to have extended conformations, which diverged quite considerably from the X-ray crystal structure. The Sc 2 max and Sc multi 2 (especially the former) values for the EM images of integrins in Mn 2+ solution were smaller than those in Ca 2+ solution (Supplementary Table S1). This may partly be due to the fact that the conformations were rather different from the X-ray crystal structure and the fact that it was difficult to accurately simulate the conformational changes using the X-ray crystal structure as the initial model. However, larger increments of the score (∆Sc 2 ) resulting from the combinations, a higher number of peak conformations, and higher n c 99 indicated that these EM images had more mixtures; this explains why the values of Sc 2 max -the maximum score of the negative stain model built from a single conformation and orientation-for the EM images of integrins in Mn 2+ solution were noticeably smaller than those in Ca 2+ solution. The EM images with more mixtures suggested that the integrins in Mn 2+ solution had greater variations of conformations than did those in Ca 2+ solution, which we focus on in the following section.
Analysis of obtained conformations. Thus far, we analyzed the EM images of integrins in Ca 2+ and Mn 2+ solutions and obtained the conformations to reproduce the EM images. We obtained 656 peak conformations from the 20 EM images in Ca 2+ solution and 796 peak conformations from the same number of EM images in Mn 2+ solution. However, these peak conformations should not be treated equally. That is, each EM image was reproduced by combing the negative stain models with the weight c p , and each EM image was originally obtained by averaging n r raw images. Thus, we assumed that each peak conformation was observed c p n r times. Under this assumption, we performed the following statistical analyses. The three characteristic angles (see Fig. 6A and Methods) of the conformations were calculated and plotted in Fig. 6B,C. The average angles (mean ± standard deviation) for extending, swing-out, and twisting deformations were, respectively, 18° ± 13°, 54° ± 0.5°, and −1° ± 5° in Ca 2+ solution and 115° ± 40°, 74° ± 12°, and −9° ± 8° in Mn 2+ solution. It should be noted that the small variations of the swing-out angles in Ca 2+ solution were due to the fact that the swing-out deformations were restricted when the extending angles were small, because the swing-out motions were observed to be energetically unfavorable when the conformations were bent (see Methods). Only when the extending angles were over approximately 50°, we allowed the atomic models to have swing-out deformations. Interestingly, when the extending angles were high (~150°), the variation of the swing-out angles was small with a relatively high mean value (~80°) (Fig. 6C). On the other hand, in the halfway extended conformations with extending angles ranging from 80° to 120°, the variations of the swing-out angles were large. This suggests that the swing-out deformations occurred when the molecules extended halfway. From this observation, we deduced the pathway of the integrin from the closed to the fully extended conformation, which is illustrated in Fig. 6D.

Discussion
We have developed an approach to build atomic models that reproduce the EM images of proteins. Our approach can be applied when the X-ray crystal structure or the modeled structure of the protein is available. In our approach, first, many atomic models with different conformations are built by deforming the X-ray crystal structure or the modeled structure by using a computational method. In this study, we used the technique of normal mode analysis of the elastic network model (ENM). However, other computational methods can also be employed. The use of finer simulations such as all-atom molecular dynamics simulations will increase the reliability of the results. Then, each atomic model is projected in a variety of directions to produce the simulated EM images, which are compared with the experimental EM images to select the best-fitting atomic model. Two kinds of models are used as the simulated EM images: the negative stain model and the simple projection model. The former model is more realistic but building it requires a longer computational time. Thus, the latter model is used to squeeze the candidate atomic models in a shorter computational time.
The use of the negative stain model enabled us to analyze the EM images in detail. In particular, our analyses showed that the averaged EM images were reproduced well by combining the negative stain models of atomic models with rather different conformations. This indicates that our proposed approach can detect the mixture of two or more conformations in the averaged EM images. However, this does not mean that it is possible to classify the raw EM images used for making the averaged EM image into different classes by using the negative stain models, because these raw EM images are similar to each other and contain a lot of noise. Even if the correlations with the negative stain models are calculated for each raw EM image, it will be difficult to detect the differences because of the large contribution from the noise. Thus, only after the averaging is performed to improve the signal-to-noise ratio, the mixture can be detected.
As demonstrated in Fig. 6, a major advantage of our approach is that it easily provides information about the distribution of the molecular conformations. For obtaining the same kind of information using 3DEM maps, many maps need to be reconstructed using an enormous number of EM images and considerable computational and human efforts. Furthermore, only snapshots of conformational changes can be visualized by 3DEM maps; that is, the obtained 3D structures are discrete. Thus, we need to infer the sequential motion of the target proteins based on this discrete structural ensemble. Our 2D hybrid analysis will aid in bridging these conformations. The large-scale dynamic motions revealed by the 2D hybrid analysis will provide new insight into the target proteins, which cannot be obtained by any other methods.

Methods
Expression and purification of integrins. Soluble integrin heterodimers were constructed using a previously described strategy 11 . Briefly, expression constructs for the α subunits contained the extracellular portion of the α-chain (residues 1-960 for αV), which was followed by a 30-residue ACID-Cys peptide. Constructs for the β subunits contained the extracellular portion of each β-chain (residues 1-691 for β3), which was followed by a tobacco etch virus (TEV) protease recognition sequence, a 30-residue BASE-Cys peptide, and a hexahistidine tag. When combined, the C-terminal ACID-Cys and BASE-Cys segments form an inter-subunit disulfide-bridged α-helical coiled coil (called a "clasp") that can be released by treatment with TEV protease 7 . Combinations of α and β constructs were co-transfected into CHO lec 3.2.8.1 cells to establish stable cell lines. Recombinant integrins were purified from the culture supernatants by immunoaffinity chromatography using anti-coiled-coil antibody 2H11 12 , which was followed by gel filtration on a Superdex 200 HR column (1.6 × 60 cm, Pharmacia) equilibrated with 20 mM Tris, 150 mM NaCl, pH 7.5 (TBS) containing 1 mM CaCl 2 , 1 mM MgCl 2 . The peak fraction was concentrated to ~1 mg/ml and stored at −80 °C until used.
EM and image processing. Approximately 10 μg of each purified integrin was subjected to an additional gel filtration process on a Superdex 200 HR column equibrated with 50 mM Tris, 150 mM NaCl, pH 7.5, containing 5 mM CaCl 2 or 1 mM MnCl 2 . The samples after the gel filtration were immediately absorbed to glow-discharged carbon-coated copper grids. Samples were negatively stained with 2.5% (w/v) uranyl acetate and examined under an electron microscope (H9500SD, Hitachi, Japan) operated at 200 kV and a nominal magnification of ×80,000. Images were recorded on a 2,048 × 2,048 CCD camera (TVIPS, Gauting, Germany). Single-particle EM analysis, including particle selection and 2D classification and averaging, was performed using the EMAN suite 13 and IMAGIC program 14 . Particles were selected from individual frames (with an effective pixel size of 0.21 nm) by using the Boxer program in the EMAN suite. The particle images were rotationally and translationally aligned by a multireference alignment procedure and subjected to multivariate statistical analysis by specifying 20 classes using the IMAGIC program.

Construction of ENM.
We used the ENM [15][16][17] to obtain the dynamic structural information of the molecule, i.e., the normal modes. The ENM is composed of points with masses and springs that connect neighboring points. In this study, each amino acid residue was represented by a single point, which was located at the position of the Cα atom and whose mass was the same as the total mass of the residue. The initial conformation of the ENM was built from the X-ray crystal structure of the integrin αVβ3 (PDB ID: 3IJE). The transmembrane region was removed from the model. The representative points of two amino acid residues were connected by a spring with the same spring constant when one of the following two conditions was satisfied 16 . (1) The minimum interatomic distance between the two amino acid residues was smaller than the threshold value d c , which was set to 3.3 Å. (2) The two amino acid residues were on the same chain, and the inter-residue distance was smaller than or equal to 3; that is, if the residue number of one of the amino acid residues was m, that of the other was m ± 1, m ± 2, or m ± 3.

Deformation of ENM.
We built many different atomic models by deforming the X-ray crystal structure along the lowest-frequency normal modes. The atomic model r k , which is the 3N-dimensional vector describing the positions of the N representative points, deformed along the k th lowest-frequency normal mode of the X-ray crystal structure r 0 is described as where u k is the k th lowest-frequency normal mode vector of the X-ray crystal structure and a k is the magnitude of the deformation. In the case of integrin αvβ3, large-scale conformational changes were previously observed by EM 7 . These were the extending deformations and the swing-out deformations of the hybrid domain. In the X-ray crystal structure (i.e., the initial ENM), the extending motion was observed in the 1st lowest-frequency normal mode, and the swing-out motion was observed in the 4th lowest-frequency normal mode. To enhance the deformations, the springs that could have been restraining these motions were removed from the ENM. They corresponded to the nonbonded interactions between specific domains 18 . In addition, the 2nd and 3rd lowest-frequency normal modes, which involved the twisting motions, were used for deformation, because in some of the EM images analyzed here, two legs of the integrin were crossing, which could have been the result of the twisting motions.
For building models with large deformation, it is inappropriate to use Equation (1), because linear movements of atoms often destroy the structure when a k is large. Instead, we applied the normal mode analysis and the small deformation in an iterative manner 3,18,19 to the X-ray crystal structure, i.e., where r k (n) is the atomic model deformed iteratively n times along the k th lowest-frequency normal mode and u k n is the normal mode vector for r k (n) ( = u 1 k n ). In each iteration, the model was deformed so that the RMSD of r k (n) from r k (n − 1) was 1 Å, i.e., = = … = = − a a a N k k k n 0 1 1 . Using this iterative approach, we constructed a deformed atomic model library as follows. First, the X-ray crystal structure was deformed iteratively along the 1st lowest-frequency normal mode, which showed the extending motion, and a series of deformed atomic models r 1 (n 1 ) (n 1 = 0, ±1, ±2, ±3, …) were built. Next, each atomic model r 1 (n 1 ) was deformed iteratively along the 2nd and 3rd lowest-frequency normal modes, and series of atomic models r 12 (n 1 , n 2 ) and r 13 (n 1 , n 3 ) (n 2 , n 3 = 0, ±1, ±2, ±3, …) were built, where r 12 (n 1 , 0) = r 13 (n 1 , 0) = r 1 (n 1 ). It should be noted that in the entire range of n 1 , ⋅ +ũ u 1 k n k n 1 1 1 was satisfied for both k = 2 and k = 3; that is, abrupt changes in these normal modes were not observed.
Then, we built atomic models r 123 (n 1 , n 2 , n 3 ) by combining the deformations along the three lowest-frequency normal modes as ϕ ϕ ϕ ϕ ϕ ϕ = + − + − n n n n n n n n n n ( , , ) ( ) ( ( , ) ( ) ) ( ( , ) ( ) ) , 123  1  2  3  1  1  12  1  2  1  1  13  1  3  1  1 where ϕ 1 (n 1 ), ϕ 12 (n 1 , n 2 ), ϕ 13 (n 1 , n 3 ), and ϕ 123 (n 1 , n 2 , n 3 ) are the atomic models r 1 (n 1 ), r 12 (n 1 , n 2 ), r 13 (n 1 , n 3 ), and r 123 (n 1 , n 2 , n 3 ) described using internal coordinates (bond lengths, bond angles, and dihedral angles), respectively. Internal coordinates were used in order to prevent destruction of the structures of the atomic model. Finally, r 123 (n 1 , n 2 , n 3 ) was deformed iteratively along the normal mode with swing-out motion of the hybrid domain, and a series of atomic models r 1234 (n 1 , n 2 , n 3 , n 4 )(n 4 = 1, 2, 3, …) were built, where r 1234 (n 1 , n 2 , n 3 , 0) = r 123 (n 1 , n 2 , n 3 ). It should be noted that the swing-out motion was not always observed in the 4th lowest-frequency normal mode. In each iteration, we identified the swing-out mode by observing the movement of the hybrid domain. It should also be noted that the normal mode frequencies of the swing-out modes were high when the conformations were close to the X-ray crystal structure ( Supplementary Fig. S6), indicating that the motions were energetically unfavorable. Thus, the swing-out deformations were applied only when the conformations were somewhat extended (n 1 > 25). In this way, a deformed atomic model library containing more than 150,000 deformed atomic models of integrins was constructed.
Fitting of atomic models to EM images using simple projection models. From the numerous deformed atomic models, we selected the model that best reproduced the EM image. For this selection, we built simulated models of EM images from each atomic model. In this study, we built two kinds of models: the simple projection model and the negative stain model. Although the latter is more realistic, building it requires a much longer computational time. Thus, the former model was used to narrow down the number of candidates, and then, the latter model was used for the final selection. Below, we first describe the former model.
An experimental EM image is described as I(i, j), which is the intensity at pixel (i, j) 1, 2, 3, , : 1, 2, 3, , ) max max . The simple projection model is similarly described as ρ 1 (i, j) and computed in the following way. We first replaced each representative point of the atomic model with a uniform-density sphere with a radius of 3 Å to build a sphere model. The grid points within the spheres were projected onto the xy plane. ρ 1 (i, j) was defined as the number of points projected into a pixel (i, j), which was described by p(i − 1) ≤ x < pi and p(j − 1) ≤ y < pj; here, p is the pixel size determined experimentally.
For comparison between I(i, j) and ρ 1 (i, j), we first replaced I(i, j) with I 1 (i, j) (=I(i, j) − 〈I(i, j)〉), where 〈…〉 denotes the average, to remove the background intensity. If I 1 (i, j) was negative, we set it as zero. Then, to quantify the similarity between I 1 (i, j) and ρ 1 (i, j), we defined the score by using the normalized cross-correlation (NCC) as Maximizing this score is equivalent to minimizing the difference between the two images, ∑ (I 1 (i, j) − cρ 1 (i, j)) 2 , where c is a constant.
To maximize the score, we applied rotational and translational manipulations to each representative point r a (a = 1, 2, 3, …, N) of the atomic model as follows: where R is the rotation matrix and s is the translational vector. We assumed = = ± ± ± … .

( )( )
pk pk k k s , , 0 , 0, 1, 2, 3, To sample the entire range of orientations of the atomic model as evenly as possible, we prepared more than 230,000 rotation matrices in advance as follows. The rotation matrix R is described as (e 1 , e 2 , e 3 ), where e 1 , e 2 , and e 3 are unit column vectors and they satisfy e 1 × e 2 = e 3 . We first selected 2,562 different directions for e 3 . These directions were obtained as position vectors of the apexes of the icosahedron-based geodesic sphere 20 , whose center was at the origin. The angle between neighboring vectors was about 4°. Then, vectors e 1 orthogonal to each e 3 were computed at 4° intervals. Finally, e 2 was obtained as e 3 × e 1 .
Contact area. The EM images analyzed in this study were obtained by the negative staining method, and the molecules were supposed to contact the supporting film stably. To measure how stably they contacted the film, we defined the contact area as follows. We assumed that the supporting film was on the xy plane and that the top (representative point with the maximum z-coordinate) or bottom (that with the minimum z-coordinate) of the atomic model was on the film. We regarded representative points within 10 Å from the xy plane as contacting the plane. We defined the contact area S as the area of the minimum convex polygon that included all the contacting points projected onto the xy plane. The contact area S was dependent on the orientation, and the largest one was defined as S max for each atomic model. The ratio S/S max was used as the measure in this study.
Negative stain model of EM image. In some cases, atomic models with quite different conformations and orientations gave similar simple projection models (Fig. 1). To differentiate between these atomic models, we built a more realistic projection model, i.e., the negative stain model, which was originally proposed by Burgess et al. 5 . We adopted their approach as follows. First, low-pass filtering (with cut-off frequency ν 1 ) and thresholding were applied to the volume occupied by the sphere model of the atomic model in order to build an excluded volume model. Then, the volume within h Å from the support film was added to this excluded volume. It should be noted here that the atomic model contacted the support film. Again, low-pass filtering (with cut-off frequency ν 2 ) and thresholding were applied to this volume to obtain a new volume, from which the excluded volume of the atomic model was removed to acquire the volume of the simulated negative stain. The cut-off frequencies ν 1 and ν 2 were constants and were optimized so that the EM images of integrins in Ca 2+ were reproduced well on average by the X-ray crystal structure. On the other hand, values of the thickness h were optimized for each atomic model.
The grid points within the negative stain volume were projected onto the xy plane, and the number of points projected into a pixel (i, j) was counted as ρ N (i, j). We assumed that the intensity of the incident electron beam decayed exponentially with an increase in the thickness of the negative stain. Thus, the negative stain model ρ 2 (i, j) was described as exp(−c d ρ N (i, j)), where c d is a coefficient (>0). Because c d ρ N ≪ 1 was expected, ρ 2 (i, j) was approximately equal to 1 − c d ρ N .
Scientific RepoRts | 7: 377 | DOI:10.1038/s41598-017-00337-y To quantify the similarity between I(i, j) and ρ 2 (i, j), we defined the score by using zero-means NCC (ZNCC) as Because ZNCC remains unaffected by the addition of a constant and multiplication with a positive constant, ρ 2 (i, j) in the above equation could be replaced with −ρ N (i, j).
Definition of peak conformations. We defined the peak conformation as the one that had the highest Sc 2 score among the "nearby" conformations. Here, we defined "nearby" conformations as those conformations whose distance from the specific conformation was less than a specific value, which was set as 10 Å in this study. We defined the distance between the i th and j th atomic models as