Resolving Individual-Atom of Protein Complex using Commonly Available 300-kV Cryo-electron Microscopes

Breakthroughs in single-particle cryo-electron microscopy (cryo-EM) technology have made near-atomic resolution structure determination possible. Here, we report a ∼1.35-Å structure of apoferritin reconstructed from images recorded on a Gatan K3 or a Thermo Fisher Falcon 4 detector in a commonly available 300-kV Titan Krios microscope (G3i) equipped with or without a Gatan post-column energy filter. Our results demonstrate that the atomic-resolution structure determination can be achieved by single-particle cryo-EM with a fraction of a day of automated data collection. These structures resolve unambiguously each heavy atom (C, N, O, and S) in the amino acid side chains with an indication of hydrogen atoms’ presence and position, as well as the unambiguous existence of multiple rotameric configurations for some residues. We also develop a statistical and chemical based protocol to assess the positions of the water molecules directly from the cryo-EM map. In addition, we have introduced a B’ factor equivalent to the conventional B factor traditionally used by crystallography to annotate the atomic resolution model for determined structures. Our findings will be of immense interest among protein and medicinal scientists engaging in both basic and translational research.


Introduction
X-ray crystallography and single particle cryo-electron microscopy (cryo-EM) have become widespread, routine tools for characterizing biochemically purified specimens. cryo-EM has resolved over four thousand structures at near atomic resolutions (2-4 Å 1 ). It is rapidly becoming the method of choice for structure determination of membrane proteins, large assemblies, and multi-protein complexes partly because it does not require a crystal and partly because it can work with specimens with heterogeneous composition and/or conformation 2 , This powerful technique is now capable of resolving biological specimens to better than 2 Å resolution 3 and has been used to solve 3.7-Å resolution structures of specimens as small as ~40 kilodaltons 4 . Cryo-EM instrumentation is being installed in many academic and industrial institutions worldwide. In order to make this imaging modality to generate detailed atomic level structure in the classic chemistry context or the investigation space for drug-design development, one really needs to reach a structure resolution at least 1.5 Å or better.
In our study, we achieved cryo-EM maps of apoferritin, reconstructed from the images collected from the commonly available 300-kV Titan Krios microscopes at 1.34 Å using K3 detector and at 1.36 Å using Falcon4 detector, showing the detailed structural information including the wellresolved amino acid side chain atoms with an indication of hydrogen atoms' presence and position, and the existence of multiple rotameric configurations for some residues.
Since the resolution used in cryo-EM is neither a conventional optics criterion nor the detection of diffraction spots in a diffraction pattern of a crystal. Evaluation of the high resolution structure details in the cryo-EM maps and their quantitative assessments are necessary and important for the field. We recently developed a parameter called Q-score 3 to assess the high-resolution cryo-EM maps up to 1.5 Å. In this study, we further updated the empirical formulation of Q-score for its usability for structure data up to 1 Å. We also develop a protocol to assess the position of the water molecules solely from the cryo-EM map as well as to annotate the atomic model to be reflective of the actually observed density which could be resolved differently throughout the map.

Results and Discussion
Two years ago, we obtained a 1.75-Å apoferritin structure 3 using ~70,000 particle images through 10-h of data collection using a Gatan K2 detector in a Titan Krios G3i electron microscope.
Recently the K3 and Falcon4 detectors were installed on two of our Titan Krios G3i electron microscopes at the Stanford-SLAC Cryo-EM Center, providing us with three times the data collection rate. In this study, using the same apoferritin, we first collected a new dataset using the K3 detector to find out what is the highest resolution structure achievable. The dataset was collected in the electron counting mode using "Faster Acquisition'' in EPU, with a throughput of ~520 movie stacks per hour and a data acquisition result of 8,000 movie stacks per 16 h (Table   S1). Using a standard image processing pipeline 5 (Fig. S1a), a density map of apoferritin with a resolution of 1.34-Å was obtained from ~900,000 particles ( Fig. 1a-d). In addition, we also collected a second dataset using a Falcon4 detector in the electron counting mode (mrc format, not EER format) with the throughput of ~500 movie stacks per hour, resulting in ~ 7,700 movie stacks from a 15-h data collection. A 1.36-Å resolution map of apoferritin was obtained from ~ 500,000 particles ( Fig. S1b-e, Table S1). Both maps have comparable resolution based on the Fourier Shell correlation of 0.143 threshold 6 and similar cumulative B-factor as estimated from reconstructions with varying numbers of particles 7 (Fig. S1f-g).
Our maps show the individual atom density at properly chosen contour level, all well-resolved side-chain atom densities, and even some indications of hydrogen densities as exemplified from the 1.34-Å map (Fig. 1e). The presence and direction of the density for hydrogen, even though it is not distinctly resolved, is visually compelling. Furthermore, the difference between the oxygen and the nitrogen in Asn and Gln, where the N has extended density for its H atoms, whereas the O is rounded, allows one to distinguish these two atoms, which informs whether the atom is a hydrogen bond donor or acceptor. The same is the case for the O vs C in Thr, wherein the terminal methyl group is roughly triangular in shape, and the O again is rounder, and allows one to unambiguously differentiate Thr from Val. However, the side chains of some residues cannot be clearly identified due to multiple rotameric conformations (Fig. 1f), which may be caused by their inherent or electron radiation-induced dynamic properties. 4 The Molprobity and PDB reports of our two models are ranked very highly in all the assessment scores on the adherence of models to the chemical properties of proteins (Fig. S2). Generally speaking, it is difficult to assess the fit of a model to a density map by visual display, partly because of the choice of contour and partly because of the variation of resolvability throughout the map ( Fig. 1c-d, Fig. S1e). Though the overall resolutions of maps are reported based on the Fourier shell correlation (Fig. S1f), some side chains and residues are less resolved. We used Q-scores, a recently proposed cryo-EM structure validation method, to measure the resolvability of individual atoms 3 . Q-scores were shown to correlate strongly to the resolution of the map with the best score normalized to 1.0. Fig. 2a shows a per-residue Q-score plot for our two maps to range between 0.85 and 0.88; most residues have average Q-scores at or above the expected level for this resolution 3 . However, a few dips in the plot can be seen, indicating lower average Q-scores at turns or loops between helices. Visual inspection of the residues with lower Q scores confirms that the side chains are less resolved for these residues. Such residues tend to have alternate conformers ( Fig. 1f) and be on the exterior surface of the complex.
Traditionally, B-factor is used to assess the atom position uncertainty in crystallography and is a weighting factor to allow computing a model-based map identical to the experimental map. Note that the crystallographic B-factor is estimated by iterative and simultaneous refinements among diffraction spot indexing, Fourier amplitudes between observed and computed values, and Fourier phase estimation. However, there is currently no routine procedure to derive an equivalent peratom B-factors in cryo-EM modeling. We thus introduce B' factors derived from per-atom Qscores; the calculation involves a single scaling factor; the optimum scaling factor is determined empirically by testing which value makes the resulting model-map matched the cryo-EM map better by FSC (Fig. 2). The B' factor, which will be deposited to the PDB, serves the same purpose as the crystallographic B-factor in such a way that we can compute a model-based map that can match well with the experimental cryo-EM density map ( Fig. 1g-h, Fig. 2

, and Methods).
Resolving water molecules is an important index for assessing the quality of a true atomic resolution map. We assigned water molecules in our maps using a new procedure based on three criteria: a signal to noise threshold to ignore background noise (2-sigma/RMSD above average), the distance between putative water to the closest protein atoms, and the rules that distinguish 5 water and ions as outlined in ref 8 (Fig. 1i-k; Fig. 3a-d). The distributions show that the procedure places water on well-resolved peaks with Q-scores of 0.5 and higher, even though Q-scores were not used in the procedure itself. The radial-distance plot shown in Fig. 1j shows a peak at 2.8 Å between water atoms to nearby O atoms in protein, as expected. In comparison with other apoferritin maps determined at better than 2 Å, more water molecules were found in higherresolution maps: 217 waters in the 1.25 Å map, 170 waters in our 1.34 Å, 165 waters in our 1.36 Å map, and 126 waters in our previous 1.75 Å map (per protomer) ( Fig. 3e-f). Sharper peaks were observed at higher resolution, meaning that higher resolution maps localize water molecules more accurately. Molprobity results showed only a few of the waters placed with our procedure were found to clash (3 in the 1.34 Å map and 8 in the 1.36 Å map) (Fig. S2).
We compared water molecules placed in different cryo-EM maps ( in one map were also found within 1.0 Å in the other maps ( Fig. 3e-f). Based on this comparison, it is encouraging to note that a high percentage of water molecules placed in cryo-EM maps agree, suggesting good reproducibility of water positions across data sets recorded in different electron microscopes and sample preparations. When comparing cryo-EM maps to the X-ray structure, a lower percentage (36%) of water molecules were within the same 1 Å distance. Given the differences in sample preparation (like concentration and buffer) or conditions (crystal packing of X-ray model vs in-solution state in cryo-EM), it is not surprising that there are differences in water positions. Such discrepancy also applies to ion placements. In placed ions, positions compared amongst cryo-EM maps showed lower similarity (34%). This lower similarity could be due to higher sensitivity of ion locations due to different sample conditions. Water and ion identification is still under active research even in atomic resolution X-ray structure 8 , and it is an emerging but potentially important area in cryo-EM map analysis.
Recently, two unpublished preprints report 1.22 and 1.25-Å-resolution apoferritin structures 9,10 using new electron optics (i.e. cold field emission gun, second-generation spherical aberration lens corrector/monochromator), which are aimed to minimize the deleterious effects of the energy spread to reduce the high-resolution signals 11 . In our study, we show two ~1.35-Å structures of the 6 apoferritin without these new hardware upgrades. We further selected four representative residue types for a detailed comparison at the individual atom level among these cryo-EM maps, a 1.01-Å crystal structure and a 1.75 Å cryo-EM map (Fig. 4). As expected, the two cryo-EM maps better than 1.3 Å have slightly higher Q-scores and atomic separation was slightly better. From the overall and detailed evaluation, our 1.34 and 1.36 Å cryo-EM maps show virtually the same features as the 1.22 Å and 1.25 Å cryo-EM maps and the 1.01 Å crystal structure, and all are better than the 1.75 Å map. Our results demonstrate that an atomic resolution structure can be obtained on a commonly available 300 kV microscope using the K3 or Falcon4 detectors with or without an energy filter (Table S1), which thus gives the current majority of users the possibility of performing single-particle cryo-EM analysis at the atomic resolution level.
Finally, we would like to remind that apoferritin, with its high stability, rigidity, and symmetry, can easily be resolved at atomic resolution by either cryo-EM or X-ray crystallography. However, many biological samples are compositionally or conformationally heterogeneous, as well as often difficult to prepare them vitrified with the current cryo-freeze plunge method 12 . These technical hurdles can hinder solving their structures at high resolution. Achieving atomic resolution structures is not yet a routine task, but with the further development of cryo-specimen preparation, hardware, and software, it should be possible to apply this approach to an even broader spectrum of macromolecules in the context of chemistry to understand the mechanism and/or to apply it in the drug design pipeline.

Cryo-EM sample vitrification and data acquisition
Three microliters of the apoferritin samples with 1.5 mg/ml or 0.2 mg/ml were applied onto glow-

Single-particle image processing and 3D reconstruction
All micrographs were first imported into Relion for image processing. The motion-correction was performed using RELION's own implementation and the contrast transfer function (CTF) was determined using CTFFIND4 13 . For the K3 dataset, 6,951 micrographs were selected with a defocus range from -0.35 --1.3 μm, and "rlnCtfMaxResolution < 4.5". For the Falcon4 dataset, 5,427 micrographs were selected with a defocus range from -0.3 to -1.3 μm, and "rlnCtfMaxResolution < 4". All particles were autopicked using the NeuralNet option in EMAN2 14 . Then, particle coordinates were imported to Relion, where the poor 2D class averages were removed by two rounds of 2D classification. The initial models for both datasets were built in cryoSPARC 15 using the ab-initio reconstruction option with Octahedral symmetry applied. For the K3 dataset, 1,176,336 particles were picked and 902,455 were selected after 2D classification.
For the Falcon4 dataset, 707,350 particles were picked and 500,643 were selected after 2D classification. The 3D refinement was performed using the particle images selected from 2D classification with further "CTF refinement and Bayesian polishing" in Relion, a 1.34-Å map from the K3 dataset or 1.36-Å map from the Falcon4 dataset was obtained (Fig S1). Resolution for the final maps was estimated with the 0.143 criterion of the Fourier shell correlation curve. The figures were prepared using UCSF Chimera 16 . Previously, the width (sigma) of the reference Gaussian was set to 0.6 Å, which resulted in Qscores of ~1 at resolutions of 1.5 Å. With this sigma, Q-scores start to drop at higher resolutions.
Hence, here we adjust sigma to 0.4 Å, so that Q-scores are now highest at a resolution of ~1.2 Å, and a linear correlation can again be seen between Q-scores and resolution (Fig. S3).

B' calculations
We calculate B' factors from atom Q-score using the following formula: This formulation is such that atoms with higher Q-scores produce lower B'-factors, as they are better resolved. We determined the best scaling factor, f, by trying several values (0, 50, 100, 200, 300, 400), and observing which value caused the largest increase in the FSC.

Segmentation-guided Water and Ion Modeling (SWIM)
In this procedure, the cryo-EM map is first segmented using the watershed method, which produces regions corresponding to peaks in the map; the boundaries between regions are the lowest values in the map around these peaks. A threshold of 2-sigma above the mean density value in the map is used, and only voxels with density values above this threshold are included in the segmentation. The resulting regions are then sorted by volume (number of voxels in the region), and considered in decreasing order as follows: 1. For each region, take the point in it with the highest map value as its position (P).

For each nearby atom to P:
a. If the atom is non-polar and non-charged (e.g. carbon atom) and is within 2.6 Å of P, P is ignored and the search continues with the next regions.
b. If the atom is charged (e.g. O in GLU/ASP or N in LYS/ARG/HIS sidechains) and is within a distance of 1.9 to 2.4 to P, it is added to ChargedAtoms list.
c. If the atom is polar (e.g. O or N and any other side chain): i.
If the atom is within 1.9 to 2.5 Å to P, it is added to PolarAtoms list.
ii. If the atom is within 2.5 to 3.3 Å to P, it is added to WaterAtoms list.
d. If the ChargedAtoms list is not empty, P is added as a 2+ ion (e.g. Mg) 9 e. Otherwise, if the PolarAtoms list is not empty, P is added as 1+ ion (e.g. Na) f. Otherwise, if the WaterAToms list is not empty, P is added as a water.
In the above process, regions are considered in order of size, from largest to smallest. Hence, because waters would be placed on a larger region near an O or N atom rather than a smaller one at a similar distance, they would otherwise clash with the water placed on the larger region first.

Declaration of interests
All authors declare no competing interest.

Data deposition
Cryo-EM maps of the apoferritin structure in this study with their associated atomic models have been deposited in the wwPDB OneDep System under EMD accession code XXX and PDB ID code XXX.