High-resolution in situ structure determination by cryo-electron tomography and subtomogram averaging using emClarity

Ni, Tao; Frosio, Thomas; Mendonça, Luiza; Sheng, Yuewen; Clare, Daniel; Himes, Benjamin A.; Zhang, Peijun

doi:10.1038/s41596-021-00648-5

Download PDF

Protocol
Published: 12 January 2022

High-resolution in situ structure determination by cryo-electron tomography and subtomogram averaging using emClarity

Nature Protocols volume 17, pages 421–444 (2022)Cite this article

13k Accesses
28 Citations
18 Altmetric
Metrics details

Subjects

Abstract

Cryo-electron tomography and subtomogram averaging (STA) has developed rapidly in recent years. It provides structures of macromolecular complexes in situ and in cellular context at or below subnanometer resolution and has led to unprecedented insights into the inner working of molecular machines in their native environment, as well as their functional relevant conformations and spatial distribution within biological cells or tissues. Given the tremendous potential of cryo-electron tomography STA in in situ structural cell biology, we previously developed emClarity, a graphics processing unit-accelerated image-processing software that offers STA and classification of macromolecular complexes at high resolution. However, the workflow remains challenging, especially for newcomers to the field. In this protocol, we describe a detailed workflow, processing and parameters associated with each step, from initial tomography tilt-series data to the final 3D density map, with several features unique to emClarity. We use four different samples, including human immunodeficiency virus type 1 Gag assemblies, ribosome and apoferritin, to illustrate the procedure and results of STA and classification. Following the processing steps described in this protocol, along with a comprehensive tutorial and guidelines for troubleshooting and parameter optimization, one can obtain density maps up to 2.8 Å resolution from six tilt series by cryo-electron tomography STA.

Bridging structural and cell biology with cryo-electron microscopy

Article 03 April 2024

Eva Nogales & Julia Mahamid

ReLo is a simple and rapid colocalization assay to identify and characterize direct protein–protein interactions

Article Open access 03 April 2024

Harpreet Kaur Salgania, Jutta Metz & Mandy Jeske

Bioorthogonal masked acylating agents for proximity-dependent RNA labelling

Article 09 April 2024

Shubhashree Pani, Tian Qiu, … Bryan C. Dickinson

Introduction

Cryo-electron tomography (cryoET) has gained increasing importance in the study of molecular architectures of viruses, bacteria and cellular components in situ^1,2,3. It can provide 3D reconstructions of pleomorphic objects such as organelles or cells in their close-to-native states, providing unique opportunities to capture the intermediate biological events in the cellular context. More importantly, the spatial relationship among macromolecules within a cellular tomogram can be determined⁴. In cryoET, a series of images from the same region of the specimen are recorded as the sample is tilted to various angles with respect to the incident electron beam. The images are subsequently aligned and reconstructed to generate a 3D tomogram. When there are many repeating objects, such as macromolecular complexes, in the tomogram, these objects can be aligned and averaged to improve the signal-to-noise ratio (SNR)⁵, a process referred to as cryoET subtomogram averaging (STA).

Compared with cryoEM single-particle analysis (SPA), STA generally results in lower resolution. However, STA can resolve macromolecule structures in situ, unpurified and in the cellular context, as well as provide a spatial relationship between molecules, which is important for interpreting their biological functions. Nonetheless, several studies have yielded high-resolution density maps resolving secondary structural elements, including coat protein complex I (ref. ⁶), nuclear pore complex^4,7, polysomes⁸, chemotaxis signaling arrays⁹, retroviruses assembly^{10,11,12,13,14}, bacteria surface layer¹⁵ and ribosomes¹⁶.

There are multiple additional challenges in STA compared with SPA^1,17,18. First, due to the physical limits of the goniometer as well as increasing sample thickness upon tilting, tilt series are typically limited to tilt angles between −60° and 60°. The densities in a tomogram reconstructed from these tilt series therefore suffer distortions, referred to as missing-wedge effect. This distortion substantially affects the precision of subtomogram alignment and classification and must be considered for high-resolution STA. Second, biological samples are sensitive to radiation damage, and the electron exposure applied to each tilted image is usually limited. As a result, the SNR of a tilted image is much worse compared with images in SPA. Third, specimens for cryoET are usually thick, and the effective thickness of sample increases when sample tilts. The defocus gradient due to the thickness of sample and sample tilt also needs to be considered¹⁹. As many biological objects adopt multiple conformations or compositions, 3D classification is required to delineate these different variances. While STA has, in principle, an advantage in 3D classification over SPA since each particle exists as a unique 3D reconstruction, thus allowing for direct analysis of the 3D variance, the low SNR and missing-wedge effect often pose significant challenges²⁰.

To deal with these challenges, a number of software packages have been developed for STA this far, including PEET (ref. ²¹), EMAN2 (refs. ^22,23,24), RELION (refs. ^25,26), Dynamo (ref. ²⁷), Jsubtomo (ref. ²⁸), PyTom/AV3 (refs. ^29,30), Warp/M (ref. ¹⁶), Protomo/i3 (ref. ³¹) and emClarity (ref. ³²) (see review by Zhang¹ for a comparison). We implemented several key features in emClarity. First, an algorithm was implemented to estimate the defocus and astigmatism for each tilted image within the tilt series, to calculate the contrast transfer function (CTF). The effect of CTF modulation of images is then corrected for during tomogram reconstruction, accounting for the depth of field³². Second, for accurate weighting during alignment, reconstruction and classification, emClarity computes 3D sampling functions (3DSF). The 3DSF of each subtomogram, which accounts for the missing wedge information, is updated during each step of processing and used as a weight. Third, to address sample heterogeneity, emClarity implements a multiscale 3DSF-weighted, principal component analysis (PCA)-based classification method, which allows the user to emphasize specific features of different length scales. Fourth, local specimen motion and deformation place a major restriction on the quality of STA reconstructions. emClarity implemented tomogram constrained projection refinement (tomoCPR) to refine local shifts, rotations and magnification changes in the sample by using subtomograms as fiducial markers. This improves the tilt-series alignment, particularly for in situ cryoET datasets recorded from cryo-focused ion beam milled lamellae, where it would not have made sense to use gold bead fiducials because they would be removed during the milling process.

Several high-resolution cryoEM maps have been successfully obtained by various research groups using emClarity¹, including severe acute respiratory syndrome coronavirus 2 postfusion spikes³³, in situ structure of Parkinson’s disease-linked leucine-rich repeat kinase 2 (ref. ³⁴), cellular reovirus assembly intermediates³⁵, Zika virus capsid protein³⁶, nodaviral replication protein A crown complex³⁷, native Leptospira spirochete flagellar filaments³⁸ and bacterial chemotaxis signaling arrays³⁹.

The new version of emClarity (V1.5.3.10) has some major differences from the original publication (V1.0) (ref. ³²). These include the following:

Per-tilt CTF refinement using embedded CTFFIND4 (ref. ⁴⁰)
Handedness check during CTF estimation
Calculation of per-particle 3DSF
3DSF calculation has been improved
Switch to MATLAB 2019a
Peak masks to limit translational search in alignment: the peak mask can be used to remove the cross-correlation peaks from a given distance of the particle origin, i.e., it defines the maximum translation allowed
Reconstruction using the raw projection images using cisTEM

Here, we describe a detailed workflow and processing steps using the new version of emClarity. The protocol has been tested by several novice users, and the common issues that might arise during the procedure are detailed in Troubleshooting.

Overview of emClarity pipeline

emClarity streamlines all steps in the pipeline (Fig. 1). emClarity can align the raw tilt series automatically using its ‘autoAlign’ program. It can also import the aligned tilt series from external software packages, as long as the file formats and naming conventions follow the requirement (Step 1). It then generates aligned tilt series and estimates the CTF of each tilt series (Steps 2–4). Users define the boundary of subregion(s) in the tomogram for later reconstruction (Steps 5–7). The particles are then picked using template matching (Steps 8–12). emClarity manages the subtomogram-associated metadata in a MATLAB database and updates the metadata after each processing step throughout the pipeline (Step 13). The CTF-corrected tomograms are then generated at the requested binning (Step 14), and STA and alignment can be performed iteratively at each binning (Steps 15–18). tomoCPR can be performed (Steps 19–20) to refine tilt-series alignment as well as subtomogram classification (Steps 22–30), both of which are optional steps. During the iterative alignment and averaging cycles, the data are kept in two fully separate half-sets following the ‘gold-standard’ refinement procedure⁴¹. The half-sets are used to calculate an optimal filter for weighting the reconstructions, while reducing the risk of overfitting⁴². A final map can be generated combining the two half-sets with an additional B-factor sharpening optionally applied (Step 31). A new feature is additionally implemented in emClarity, such that the raw projection images, instead of subtomograms, can also be used for the final reconstruction using cisTEM. Table 1 lists cryoET data collection and processing details. emClarity processing run time for the main steps is illustrated in Table 2, along with specific graphics processing unit (GPU) cards used for processing.

**Fig. 1: emClarity processing workflow.**

Table 1 CryoET data collection and processing details

Full size table

Table 2 emClarity processing run time (five tilt series)

Full size table

Prerequisite for using the protocol

This protocol is broadly applicable to cryoET STA projects, but is focused on providing details needed for high-resolution refinement. emClarity uses GPU accelerations and parallelization tools to cope with large datasets. Since emClarity does not have a graphic user interface, users are expected to have basic knowledge of working with the command line on Unix/Linux-based systems. It is beneficial to have good knowledge of fiducial based alignment as implemented in Etomo⁴³. Familiarity with MATLAB scripting can be helpful, but is not required. Basic knowledge of PCA and commonly used clustering method (such as k-means clustering) is useful when carrying out emClarity subtomogram classification. Users can also refer to the associated emClarity tutorial (Supplementary Information 1 and https://github.com/ffyr2w/emClarity-tutorial) for in-depth understanding algorithms behind each step, as well as detailed step-by-step processes using a ribosome dataset (EMPIAR-10304).

Limitations

Because emClarity uses a template-based particle picking method, it requires users to have a template for the object of interest. One should pay close attention to the template search and be cautious to template bias. We recommend using a low-pass filtered template to minimize template bias. emClarity implement template matching with either non-CTF-corrected or CTF-corrected tomograms, and comparison or combination of these two results can be informative for some challenging datasets. Small objects (<0.5 MD), such as severe acute respiratory syndrome coronavirus 2 spikes in cellular tomography dataset, can be identified through template search, albeit containing false positives. In this case, the existing prior information (such as particle position and orientation relative to membrane) can be used to exclude these false positives. The number of desired particles during template search can be either determined automatically within emClarity or set manually by user. When templates are not available, one can use other software packages, such as Dynamo²⁷ and PEET²¹, to generate an initial template. It is also possible to import particles (coordinates and angles) picked or refined from other software into emClarity (Fig. 1, green dot). Although emClarity can refine tilt-series alignment by tomoCPR, we recommend aligning the initial tilt series to a satisfactory level using emClarity autoAlign or other packages like Etomo⁴³ or AreTomo (https://msg.ucsf.edu/software). In some cases, results of geometry refinement by tomoCPR might be inadequate.

Materials

Equipment and setup

A computer or a computing cluster with NVIDIA GPU cards with at least 12 GB memory, CUDA version 7.5 or greater (version 9 or newer preferred). An emClarity binary (version 1.5.3.10) and installation procedure are available and detailed in emClarity wiki (https://github.com/bHimes/emClarity/wiki).

Input data

Data: raw tilt series

Raw image movies need to be motion-corrected, but without exposure weighting, which is handled internally by emClarity. Motion-corrected images in a tilt series should be ordered in the sequence of tilt angle, from −60° to 60°, for example. Tilt series can be aligned using external software packages like Etomo and imported to emClarity. Users can also import the raw tilt series and use emClarity to align it automatically. Details of required files and formats are listed in Step 1 in the Procedure.

Data: metadata

Microscope imaging conditions: voltage, pixel size, defocus range, amplitude contrast and Cs
Data collection scheme (the order and exposure dose of image acquisition in a tilt series)

emClarity currently uses a parameter file to manage inputs, usually named to reflect their function and cycle, such as param_ctf.m for CTF estimation and param1.m for cycle 1 alignment, averaging and classification. The parameters required for individual step are listed and explained in detail in the tutorial (Supplementary Information 1). A parameter file together with run commands for the processing of human immunodeficiency virus type 1 (HIV-1) Gag dataset in this protocol is shown in Supplementary Information 2, and a template is supplied with emClarity installation.

Procedure

Critical

This protocol presents a stepwise working procedure for STA and classification using emClarity. Users run all the commands through a terminal shell inside the project directory. The entire iterative alignment, averaging and classification procedure can run to the end automatically through a runscript, as long as the parameter files are set properly for each cycle. Users should modify and optimize the key parameters relevant to their projects. In the following processing steps, Steps 1–31, we provide the individual run commands with specific parameters and discuss the results, as well as troubleshoot potential issues. Novice users are recommended to follow the exact steps and check the outputs for each step and compare with the results described here. Users can refer to a more comprehensive tutorial (Supplementary Information 1) (https://github.com/ffyr2w/emClarity-tutorial), which contains a detailed explanation of all parameters and basic algorithm for each processing step in emClarity.

Preparation: arrangement of input files and directories

Timing ~30 min when using autoAlign

Critical

Tilt series can be aligned automatically inside of emClarity, or externally using software like Etomo. In this protocol, some datasets were aligned using Etomo and imported to emClarity, and some were automatically aligned using the ‘emClarity autoAlign’ program. The ‘autoAlign’ function requires motion-corrected image stacks, tilt angle file and tilt axis rotation angle, and it prepares all the necessary files in fixedStacks/. Please refer to the tutorial in Supplementary Information 1 for the parameters. If users align the tilt series using external software like Etomo, please prepare the necessary files as indicated in Step 1.

1
Make a project directory. Within the project directory, make a new directory called fixedStacks/. It is essential to strictly follow the naming conventions. Copy the following files into it.
- <prefix>.fixed: the raw tilt series corresponding to <prefix>.st
- <prefix>.xf: the transformation file generated from tiltalign in Etomo
- <prefix>.tlt: the refined tilt-angle file
- (optional) <prefix>.local: the local alignment transformation file corresponding to <prefix>local.xf from tiltalign in Etomo
- (optional) <prefix>.erase: coordinates of the fiducial beads to erase, corresponding to <prefix>_erase.fid in Etomo
- (optional) <prefix>.order: refined tilt angles listed in the order of image acquisition. For example, if data collection starts from 0° and alternates between positive and negative values as follows: 3°, −3°, 6°, −6°, …, 60°, −60°, then the order file contains a single column listing these angles as 0, 3, −3, 6, −6 … 60, −60. However, we recommend generating the order file if the data acquisition scheme can not be represented by the exposure-weighting parameters (see Step 3)
If there are black images at high angle in the tilt series, we recommend removing these dark images during tilt-series alignment and making sure the corresponding .xf and .tlt are also updated. It is recommended to process the raw tilt series with IMOD CCD eraser to remove hot and dead pixels.
2
Set up appropriate working environment for emClarity (e.g., module load emClarity/1.5.3.11). Run emClarity using the provided command list (Supplementary Information 2). Users can run through the script entirely or run individual command separately as described below. If you have existing IMOD or UCSF Chimera in the environment, make sure there is no conflict. All the emClarity related logs are saved in logFile/emClarity.logfile.

Defocus estimate

Timing ~25 min

3

Estimate the defocus of the tilt series. In this step, the raw tilt series will be transformed into aligned tilt series using the per-tilt transformation file; the gold fiducials will be removed; and the aligned tilt series will be used for per-tilt defocus and astigmatism estimation. The parameter file should contain the necessary imaging parameters. Copy a template parameter file to the project directory and rename it param_ctf.m.

*System parameters:*
nGPUs=4	%% number of visible GPUs
nCpuCores=12	%% maximum number of processes to run in parallel
*Microscope settings:*
PIXEL_SIZE=1.179e-10	%% pixel size of raw tilt series, in meters
SuperResolution=0	%% whether raw tilt-series pixel size corresponds to super-resolution image pixel size
Cs=2.7e-3	%% Spherical aberration of the microscope, in meters
VOLTAGE=300e3	%% accelerating voltage of the microscope, in volts
AMPCONT=0.1	%% amplitude contrast
beadDiameter=7e-9	%% fiducial bead diameter, in meters
*Defocus range:*
defEstimate=2.3e-6	%% initial estimate of the defocus, in meters
defWindow=1.5e-6	%% defocus estimate window, in meters
*Exposure-weighting parameters:*
CUM_e_DOSE=123	%% total exposure dose
doseAtMinTilt=3	%% electron dose at minimum tilt
oneOverCosineDose=0	%% whether Saxon scheme is used
startingAngle=0	%% refined data collection starting angle, in degrees
startingDirection=pos	%% data collection direction
doseSymmetricIncrement=1	%% dose symmetric scheme group size

The last three parameters in exposure weighting are used to indicate the order of image acquisition for exposure weighting, which can also be specified by providing a <prefix>.order file in fixedStacks/. If a <prefix>.order is provided in the fixedStacks/, the exposure-weighting parameters will be ignored. For each tilt series, run the following command:

emClarity ctf estimate <param> <prefix> emClarity ctf estimate param_ctf.m b2tilt20

A new directory aliStacks/ will be generated in the project directory and the aligned tilt series aliStacks/<prefix>_ali1.fixed will be saved. For each tilt series, per-tilt defocus and astigmatism estimation results are saved as fixedStacks/ctf/<prefix>_ali1_ctf.tlt, which contains the tilt geometry information, accumulated exposure dose and per-tilt defocus information. Repeat CTF estimation for all tilt series:

#!/bin/bash for stack in fixedStacks/*.fixed; do prefix=${stack#fixedStacks/} emClarity ctf estimate param_ctf.m ${prefix%.fixed} done

4
Inspect the results of CTF estimation for each tilt series:
- Open the transformed tilt series in aliStacks/<prefix>_ali1.fixed in 3dmod and make sure they are correctly aligned and fiducial beads are removed properly.
- emClarity also prints out the results of a tilt-series handedness check in the logfile/emClarity.logfile. The handedness check informs whether the expected defocus gradient matches the measured value. However, it should be noted that the handedness correctness does not necessarily indicate the biological handedness of density map is correct.
- Open fixedStacks/ctf/<prefix>_ali1_psRadial_1.pdf and check that the theoretical CTF estimate matches the radial average of the power spectrum of the tilt series.
  
  Troubleshooting

Define subregion boundaries

Timing ~10 min

5
In many cases, the regions of interest are in some local areas (subregions) in the whole tomogram. The boundary of a subregion is defined in a binned tomogram with the entire field of view. Copy the recScript2.sh from emClarity installation directory to the project directory. Run the recScript2.sh script; a binned tomogram for each tilt series will be generated in the bin10/ directory:

./recScript2.sh -1
6
Define the subregion boundaries in the bin10 tomogram by defining six points (x_min, x_max, y_min, y_max, z_min and z_max) to enclose the subregion. Inside the bin10/ directory, run:

3dmod <prefix>_bin10.rec

If you have three subregions in one tomogram, you will need to define 6 × 3 = 18 points. Save the model (File → Save model) with the same name as the tomogram but with the .mod extension in the bin10/ directory. One should generate one *.mod file per tilt series. Leave at least a few pixels from the edge of the binned reconstruction for model boundary and subregions in a tomogram should not overlap. Subregions can be as big as the whole tomogram as long as the GPU cards have enough global memory. In practice, splitting the tomogram into two subregions is supported for GPUs with ≥12 GB of memory. In this tutorial, we defined each virus-like particle as one subregion so that multiple subregions can be processed in parallel to maximize computational throughput.
7
Convert the <prefix>_bin10.mod file to an emClarity format. This generates a recon/ directory, within which <prefix>_recon.coords defines the boundary information of each subregion of every tomogram. In the project directory, run:

./recScript2.sh <prefix>

To convert all the subregions of each tomogram, run:

#!/bin/bash for stack in bin10/*.mod; do prefix=${stack#bin10/}; ./recScript2.sh ${prefix%_bin10.mod}; done

Pick particles

Timing ~1.5 h

Critical

emClarity uses a template-based particle picking method. A template is required (Step 8) and template search for each subregion is performed at designated binning (Steps 9 and 10). Check the template search result (Step 11).

8
Prepare the template for particle picking. The template used by emClarity needs to have the same pixel size as that of the raw tilt series (PIXEL_SIZE parameter). One may need to rescale the template from a source map to match the pixel size.

emClarity rescale <input> <output> <inputPixel> <outputPixel> cpu/GPU emClarity rescale EMD-8403.mrc emd_8403rescale.mrc 3.62 1.179 cpu
9
Generate CTF-corrected tomograms for template search. This step generates the binned tilt series and CTF-corrected (i.e., CTF multiplied) tomograms for each subregions and saves them as cache/<prefix>_<sub-region>_binX.rec.
Parameters:

Tmp_samplingRate=8
%% binning factor for tomogram for template search
emClarity ctf 3d param_ts.m templateSearch

10

Run a template search for each subregion from each tomogram. One needs to decide the binning of tomogram for template search. Depending on the subtomogram size, we typically recommend running template search with tomograms at a final pixel size ~8–10 Å/pixel. Ali_mRadius is the alignment mask radii. Test different Ali_mRadius and particleRadius to optimize particle picking, especially for subtomograms arranged in a lattice-like assembly. For the HIV Gag assembly, we set Ali_mRadius with the size of seven Gag hexamers and particleRadius with size of one hexamer, so that the cross-correlation is calculated with a large molecular mass, while the individual hexamers positions can be picked. For the ribosome or apoferritin dataset, Ali_mRadius and particleRadius can be very close. Tmp_angleSearch defines the range and step of out-plane and in-plane angular search as [θ_out, Δ_out, θ_in, Δ_in] in degrees. For example, [180,9,35,7] specifies a ±180° out of plane search, with 9° each step, and ±35° in plane search with a 7° step. For subtomogram with cyclic symmetry, the in-plane search range can be limited to ±180/<symmetry>. Copy a template parameter file, rename it param_ts.m and update the following parameters. The microscope parameters should remain constant as in ctf estimate.

*Parameters:*
Tmp_samplingRate=8	%% binning factor for tomogram for template search
particleRadius=[66,66,56]	%% X,Y,Z particle radius in Å. Cross-correlation peak radius to remove from consideration after a particle in the current peak is selected
Ali_mRadius=[116,116,72]	%% radius of alignment mask in Å
Tmp_angleSearch= [180,9,35,7]	%% in degrees
Tmp_threshold=1000	%% estimate number of particles
symmetry=C6	%% particle symmetry

In the project directory, run:

emClarity templateSearch <param> <prefix> <sub-region> <template> <symmetry> <GPU_id> emClarity templateSearch param_ts.m b2tilt20 1 emd_8403rescale.mrc C6 1

A new directory called convmap_wedge_Type2_binX/ contains the cross-correlation (CC) convolution map <prefix>_<region>_binX_convmap.mrc and model <prefix>_<region>_binX.mod, corresponding to the coordinates of picked particles. The resulting <prefix>_<region>_binX.csv file contains the unbinned coordinate and orientation information on all picked particles. Please refer to emClarity wiki for the convention and format of this file. A representative tomogram (bin8) and convolution map is shown in Fig. 2.

11
Clean the false-positive points using 3dmod. In the convmap_wedge_Type2_binX/ directory, run:

3dmod <prefix>_<sub-region>_binX_convmap.mrc <prefix>_<sub-region>_binX.mod

It is also useful to overlay the raw tomograms with convmap and model:

3dmod ../cache/<prefix>_<sub-region>_binX.rec <prefix>_<sub-region>_binX.mod

Check the <prefix>_<sub-region>_binX_convmap.mrc about the summed CC peaks to see whether they correspond to the desired subtomogram positions. Remove the false positive points, which are common in regions with strong features such as ice contamination, carbon edges and gold bead residues. Save the remaining points using the same model file name. Before averaging and alignment, one should ensure that the picked particles were mostly correct. It might not be necessary to clean all the false positive points as 3D classification usually can remove them.
12
Rename the convmap_wedge_Type2_binX/ to convmap/, as emClarity will look into the convmap/ directory for subtomogram information in the next step.

Initialize the project

Timing ~1 min

Critical

As mentioned above, emClarity stores all the project information in a MATLAB database. The database records information on the tilt series and subtomograms including: subregion boundary (recon/<prefix>.coords), per-tilt CTF estimate (fixedStacks/ctf/<prefix>_ali1.tlt) and information on each subtomogram (convmap/). These metadata will be used and updated throughout the emClarity data processing pipeline. Backup metadata will be saved as cycleXXX_<project>_backup.mat before a new cycle starts. Users can open the database in MATLAB to check the database structure.

13

Generate an emClarity database <project>.mat. Copy param_ctf.m to param0.m and update the following parameters:

*Parameters:*
subTomoMeta=gag	%% project name
Tmp_samplingRate=8	%% binning of the tomograms for template matching binning
fscGoldSplitOnTomos=1	%% whether or not the particles from the same subregions should be kept in the same half-set or distributed randomly

Run the command as follows, which generates a metadata as gag.mat

emClarity init <param> emClarity init param0.m

Note: fscGoldSplitOnTomos is typically set to 0 (randomly splitting subtomograms from each subregion into ODD and EVEN datasets). However, if the particles within the alignment mask overlap substantially with their neighbor particles, such as in the Gag lattice, we used ‘1’ to split subregions instead of subtomograms for ODD and EVEN datasets to avoid floating the Fourier shell correlation (FSC). For a small dataset with a limited number of tilt series, we recommend defining more than two subregions for each tilt series.

Reconstruct the tomograms for alignment and averaging

Timing ~5 min

14
Reconstruct the subregions for all the tilt series. This step generates the binned tilt series and CTF-corrected (actually CTF multiplied) subregions tomograms, which are saved in the cache/ directory and are then used for the subtomograms extraction, averaging and alignment.
Parameters:

subTomoMeta=gag

PIXEL_SIZE=1.179e-10

Ali_samplingRate=6
%% binning of the tomograms for alignment

To generate a tomogram at a binning factor of 6, run:

emClarity ctf 3d <param> emClarity ctf 3d param0.m

CTF-corrected tomograms cache/<prefix>_<sub-region>_binX.rec will be generated and one can check the tomogram with 3dmod in IMOD.

STA and alignment

Timing variable, depending on subtomogram number, size and binning

Critical

STA and alignment are performed iteratively using tomograms at a progressively reduced bin (e.g., from bin6 to bin1). The binned tomograms can enhance the SNR and help subtomogram alignment, at the cost of losing high-resolution information. emClarity does not update alignment parameters automatically and allows users to set the tomogram binning factor (Ali_samplingRate), angular search range and step (Raw_angleSearch) for each cycle and judge whether the refinement has converged. Each cycle starts by generating an average for each half map (Step 15), which is then used as reference for alignment (Step 16). For each binning, it is generally recommended to run several cycles (Step 17). Similar to a template search, for samples with lattice-like structure, it is generally helpful to include several repetitive units (such as Gag hexamers) during the averaging and alignment.

15

emClarity does not extract the subtomograms onto disk by default; instead, the subtomograms will be extracted on the fly when needed, which can save large amounts of disk space for crowded samples.

*Parameters:*
subTomoMeta=gag
PIXEL_SIZE=1.179e-10	%% pixel size in meters
Ali_mRadius=[116,116,72]	%% in Å, enclosing seven hexamers
Ali_mCenter=[0,0,0]	%% in Å
particleMass= 1	%% in Megadalton
Ali_mType=sphere	%% alignment mask type: sphere, cylinder, rectangle
particleRadius=[66,66,56]	%% corresponding to central hexamer size
Raw_className=0	%% class 0
FSC_bfactor=10	%% b-factor applied to half maps
Ali_samplingRate=6	%% binning factor
symmetry=C6	%% symmetry

Run the following command:

emClarity avg <param.m> <cycle_nb> RawAlignment emClarity avg param0.m 0 RawAlignment

This generates two half maps in the project directory: cycleXXX_< project>_class0_REF_EVE/ODD.mrc. The dimensions of maps are calculated based on Ali_mRadius with additional padding. Open these two maps in UCSF Chimera or 3dmod, or any software of your choice able to read MRC files, to check whether the maps match expectation. The corresponding (conical) FSC is available in FSC/cycleXXX_<project>_Raw-1-fsc_GLD.pdf, in which the dashed lines are conical FSC and the solid line is the overall FSC. The total sampling functions for both half maps cycleXXX_<project>_class0_REF_EVE/ODD_Wgt.mrc should be isotropic, if particles do not have preferred orientations in tomograms. Note that a molecular mask (FSC/cycleXXX_<project>_Raw-1-shapeMask_*mrc) is applied during FSC calculation. The overall sampling function and conical FSCs will indicate whether the subtomograms adopts preferred orientation. One can open the sampling function in 3dmod and look through the x–z plane to see whether the amplitude weight is isotropic.

16
After the reference is generated with avg, emClarity can use this reference to align the particles. Similar to Tmp_angleSearch in template search, Raw_angleSearch in alignment step is also defined as [θ_out, Δ_out, θ_in, Δ_in]. Since most of the particles are picked correctly for the Gag dataset (Step 9), the angular search ranges and step sizes for alignment are quite small.

Parameters (other parameters are identical as avg)

Raw_angleSearch=[0,0,20,5]; %% angular search, in degrees.

emClarity alignRaw <param> <cycle_nb> emClarity alignRaw param0.m 0

The changes of rotation and translation for every subtomogram in each subregion are saved in alignResume/cycleXXX_<project>/<prefix>_<sub-region>.txt. The number of lines in each file corresponds to the number of particles aligned in the current cycle. After all the subtomograms are processed, the metadata <project>.mat will be updated.

17
Copy param0.m to param1.m and param2.m, update Raw_angleSearch in these parameter files and repeat STA and alignment for a few cycles (Steps 14 and 15). For the speed of alignment, we usually alternate the in-plane and out-plane angular searches and perform a few cycles at each binning until the changes of rotation and shifts drop to around zero. In the same binning, one can repeat the same angular searches or gradually confine to finer angular searches. For the Gag dataset, two more cycles (cycle 1, 2) were run at bin6. Refer to Supplementary Information 2 for the list of commands and parameters at each cycle.
Parameters:

Raw_angleSearch=[16,4,0,0];
%% in param1.m
Raw_angleSearch=[0,0,9,3];
%% in param2.m

emClarity avg param1.m 1 RawAlignment emClarity alignRaw param1.m 1 emClarity avg param2.m 2 RawAlignment emClarity alignRaw param2.m 2
18
Remove duplicated particles after alignment.

emClarity removeDuplicates param2.m 2

After these averaging and alignment cycles, one can run a tilt-series refinement by tomoCPR (Steps 19 and 20, optional) and/or generate new tomograms and continue averaging and alignment (Step 21).

(Optional) Tilt-series refinement by tomoCPR

Timing variable, depending on subtomogram number, size and binning

Critical

Tilt series can be optionally refined by tomoCPR. STA provides accurate estimates of both particle positions and high SNR reconstructions, making them excellent fiducial markers. It is thus possible to leverage this information for improving the alignment of a tilt series. In this protocol, we run tomoCPR for each binning.

19
When using tomoCPR to refine the tilt-series geometry, the subtomograms are mapped back into raw tomograms to generate a synthetic tomogram containing an estimate of the background noise, plus the higher SNR particle, and projected into each view. A tile is cut out around each projected particle, convoluted with local CTF, and aligned to the corresponding particle in the raw data, to give rise to the particle position in the tilt series. These new positions of particles after local refinement will be used as new fiducial markers in tiltalign to refine the tilt-series alignment. Run the following command:

emClarity tomoCPR <param> <cycle_nb> emClarity tomoCPR param2.m 2

A temporary directory mapBack<n>/ is generated in cache/ and will be moved to project directory only after all the tilt series are successfully processed. <n> indicates the current tomoCPR number. The overall and local transformation files will be written as mapBack<n>/<prefix>_ali<n>_ctf.tltxf and mapBack<n>/<prefix>_ali<n>_ctf.local for each tilt series. The mapBack<n>/ directory should not be deleted since the local transformation file mapBack<n>/<prefix>_ali<n>_ctf.local will be used to generate new tomograms, although any of the image files can be deleted to save disk space. The metadata <project>.mat will be updated to record the current round of tomoCPR.
20
Update the aligned tilt series and geometry file. Copy param2.m to param3.m.
Parameters:

Ali_samplingRate=5;
%% tomogram binning

emClarity ctf update <param> emClarity ctf update param3.m

A new geometry file fixedStacks/ctf/<prefix>_ali<n+1>_ctf.tlt and newly aligned tilt series aliStacks/<prefix>_ali<n+1>.fixed will be created, which will be used to generate new tomograms. One can check whether the newly transformed tilt series look well aligned and do not deviate substantially from original aligned stacks.
21
Generate the new tomogram at next binning (bin5). Run the following command:

emClarity ctf 3d <param> emClarity ctf 3d param3.m

This is essentially repeating Step 14 at a new binning, followed by the STA and alignment cycle (Step 15 and 16), subtomogram duplicates removal (Step 18) and tomoCPR (Steps 19 and 20). The cycle then continues as the binning reduces.

For the Gag dataset, we run three cycles of averaging and alignment using 6×, 5× and 4× binned subtomograms before 3D classification. Update the Ali_samplingRate and Raw_angleSearch in the parameter files at each cycle. Refer to the command list in Supplementary Information 2.

(Optional) Subtomogram classification

Timing ~40 min, depending on subtomogram number, size and binning

Critical

Subtomogram classification (Steps 22–29) is optional in emClarity pipeline. In this protocol, we perform one cycle of 3D classification with bin4 subtomograms after two rounds of tomoCPR and six cycles of STA and alignment (Steps 14–21). emClarity uses a PCA-based classification method, with subtomograms band-pass filtered at various resolutions defined by users. It first computes an average map from all the subtomograms (Step 22). emClarity will then analyze the heterogeneity of the dataset by comparing individual subtomograms with the current average map (the reference). Briefly, difference maps are calculated between each particle and the references, for each resolution band that the user defines. These maps are then analyzed by PCA, using singular value decomposition. This results in a decomposition revealing the major directions of variance (eigenimages) (Step 23). Users will then select eigenimages corresponding to major direction of variance (Step 24), and emClarity will project the whole dataset along each of these eigenvectors. The projected data, which are now denoised and much smaller in size, are then clustered (by default with k-means clustering algorithm, Step 25). Then, the class averages will be generated for each cluster as a montage (Step 26), and particles from the undesired classes can be optionally removed from further analysis (these could be subtomograms that are ‘noise’ or conformations that are not of interest to the user) (Steps 27 and 28).

In principle, one can do classification at any binning and at any cycle. In practice, it is beneficial to have several rounds of alignment before classification and use an intermediate binning factor for a better SNR in tomograms (such as bin4, bin3). It is generally not recommended to conduct classification at bin1 if it was already done at higher binning.

22

Generate an average map for classification. Copy param7.m to param8.m and update flgClassify=1 to turn on classification flag in the parameter file. Besides the parameters inherited from previous alignment cycles, other parameters specific to classification include:

*Parameters:*
Ali_mRadius=[116,116,72]	%% in Å, enclosing seven hexamers
Ali_mCenter=[0,0,0]	%% in Å
Ali_mType=sphere
Ali_samplingRate=4	%% binning factor for averaging
Raw_classes_odd=[0;1.*ones(2,1)]	%% C1 symmetry for half map 1
Raw_classes_eve=[0;1.*ones(2,1)]	%% C1 symmetry for half map 2
Cls_mRadius=[92,92,76]	%% classification mask radius
Cls_mCenter=[0,0,0]
Cls_mType=sphere	%% classification mask type
Cls_samplingRate=4	%% binning factor for classification
flgClassify=1	%% classification flag

emClarity avg param8.m 8 RawAlignment

This will generate two half maps: cycleXXX_<project>_class0_Raw_EVE.mrc and cycleXXX_<project>_class0_Raw_ODD.mrc.

23
Compute the difference map for each particle, with different band-pass filters. We set three band-pass filters at 10, 20 and 40 Å. The band-pass filters are selected according to the object one wishes to classify and typically below the maximum resolution of the current iteration. Most of variance is explained within the first 20–30 eigenimages, and Pca_maxEigs is used to limit the number of eigenimages to save.
Parameters:

pcaScaleSpace=[10,20,40]
%% one can select as many band-pass filters as possible, though three is typically sufficient
Pca_maxEigs=25
%% maximum number of eigenimages to save

Run the following command:

emClarity pca <param> <cycle_nb> <subset> emClarity pca param8.m 8 0

It generates variance maps for each resolution band as cycleXXX_<project>_varianceMap25-STD-*.mrc and principal eigenimages as cycleXXX_<project>_eigenImage25-STD-*.mrc. To aid analysis, it is usually easier to look at cycleXXX_<project>_ eigenImage25-SUM-STD-mont_*.mrc, which add a common reference to the eigenimages.
24
Select the main eigenimages by looking into each cycleXXX_<project>_ eigenImage25-SUM-STD-mont_*.mrc in 3dmod and save the eigenimages numbering into Pca_coeffs. The eigenimages are numbered from 1 to <Pca_maxEigs>, counting from bottom left to top right by rows. For Gag dataset, eigenimages with hexagonal lattice feature can be selected and eigenimages that display missing-wedge effect are usually abandoned. Each resolution band requires the same number of eigenimages to be selected, which can be filled with zeros if there are not enough eigenimages in some resolution bands. Fill Pca_coeffs=[zeros(1,12);7:18;7:18] in param8.m.
25
Cluster the PCA results according to the selected eigenimages; this step groups the subtomograms into different number of classes (Pca_clusters). Multiple classes can be generated.
Parameters:

Pca_clusters=[9 12 16]
%% different number of clusters

emClarity cluster <param> <cycle_nb> emClarity cluster param8.m 8

This will use the Pca_coeffs and perform k-means clustering with 9, 12 and 16 target classes. The metadata will be updated and a text file <project>_cycleXXX_ClassIDX.txt listing the number of particles in each class will be generated.
26
Generate the class averages as a 3D montage. For the Gag dataset, we generated nine classes; the class average is numbered from 1 to <Cls_className>, counting from bottom left to top right by rows (Fig. 3). Set Cls_classes_odd=[1:9;1.*ones(1,9)], the first row specifying the class ID and the second row specifying the cyclic symmetry.
Parameters:

Cls_className=9
%% name of classes
Cls_classes_odd=[1:9;1.*ones(1,9)]
%% C1 symmetry for half map 1
Cls_classes_eve=[1:9;1.*ones(1,9)]
%% C1 symmetry for half map 2
symmetry=C1

Fig. 3: 3D classification and sampling function.
a,b, A montage of nine 3D classes in x–y (a) and x–z slices (b). c,d, 3DSFs of the corresponding classes in x–y (c) and x–z slices (d). Note 3DSF confirms that the classification is not biased by particle orientations, as different classes have similar sampling functions and are nearly isotropic in different orientations.
Full size image

emClarity avg <param> <cycle_nb> Cluster_cls emClarity avg param8.m 8 Cluster_cls

Troubleshooting
27
Inspect the class averages in 3dmod or UCSF Chimera.

3dmod cycle008_gag_class9_Cls_EVE.mrc

We classified the particles into nine classes (Fig. 3a,b). Seven of nine classes show clear hexagonal Gag lattice (classes 1–7) and were merged for further processing. It is generally informative to look at the sampling functions cycle008_gag_class9_Cls_EVE/ODD.Wgt to check whether the resulting classes have isotropic sampling function and proper coverage of defocus range (Fig. 3). Depending on the selection of eigenimages, the missing-wedge effect may dominate the classification, resulting in stretched structures. Create a new model point for each class to remove and save the model file such as cycle008_remove.mod.
28
Remove particles from the selected classes. STD refers to both the even and odd dataset.

emClarity geometry <param> <cycle_nb> RemoveClasses <remove.mod> STD emClarity geometry param8.m 8 Cluster_cls RemoveClasses cycle008_remove.mod STD

Subtomograms in these selected classes will be ignored for further analysis. The cycle008_ClassMods_STD.txt records the classes and number of subtomograms that have been removed. This should correspond exactly to the class populations from the clustering (Step 27) listed in file <project>_cycleXXX_ClassIDX.txt. If it does not, stop and make sure you followed the instructions from Step 24.
29
Skip the alignment for the current cycle, which prepares the metadata for the next cycle.

emClarity skip <param> <cycle_nb> emClarity skip param8.m 8
30
Continue alignment and averaging cycles and tompCPR (optional) as in Steps 15–21. Turn off the classification flag in these parameter files by setting flgClassify=0 and update the Ali_samplingRate and Raw_angleSearch for each cycle. For the Gag project, we ran several cycles of alignment with each binned tomogram and ran tomoCPR in the end of alignment at each binning factor (bin3, bin2 and bin1). Refer to the command list (Supplementary Information 2) for a summary of all the cycles for the Gag project.

Final reconstruction

Timing ~2.5 h

31
For the final reconstruction, the two half datasets are combined. The updated versions of emClarity now offer two possibilities using either 3D subtomograms or their corresponding original 2D projections. To reconstruct through subtomograms, two half maps are reconstructed using avg as Step 15 and the conical FSCs are calculated, as well as the transformation between the two maps. The subtomograms from the second group are re-extracted and aligned to the first group using the aforementioned transformation. A final combined map is then generated averaging all aligned subtomograms from both halfsets and filtered using the FSC calculated, which is further sharpened with various b-factors.
Parameters:
Fsc_bfactor=[10,25,75,100,250]

emClarity avg param19.m 19 RawAlignment emClarity avg param19.m 19 FinalAlignment

This generates the final reconstruction map cycleXXX_<project>_class0_final_<b-factor>.mrc. If one wants to use external software (e.g., RELION⁴⁴, cisTEM⁴⁵, Bsoft⁴⁶) to apply different b-factors, masks or FSC weighting, one can take the raw half maps in the final cycle without FSC weighting FSC/cycleXXX_<project>_Raw-*Ali.mrc.

Alternatively, the final reconstruction can also be calculated from the 2D particles using cisTEM, as implemented in the updated version of emClarity. In this case, emClarity reprojects the 3D coordinates of the particles. A cisTEM STAR file is created, containing parameters such as, for each particle and for each view of the tilt series, its x and y position, rotation, defocus, and pre- and post-exposure. cisTEM will then calculate an initial reconstruction using its reconstruct3d program, then refine it using refine3d (note that the angles are not refined) and then finally calculates the final reconstruction with reconstruct3d using this refinement. For this protocol, we set maximum exposure to 60 electrons to include only the images within this exposure and generated the final map as gag60e_refFilt_refined.mrc. The particleRadius is set to be equivalent to Ali_mRadius to reconstruct the final density map with the same area as alignment.

emClarity reconstruct <param> <cycle_nb> <prefix> <symmetry> <max_exposure> emClarity reconstruct param18recon.m 18 gag60e C6 60

Troubleshooting

Troubleshooting advice can be found in Table 3.

Table 3 Troubleshooting table

Full size table

Timing

The run time for each emClarity processing is listed in Table 2. Please note that the data processing times are for the Gag T8I dataset. The data processing time varies depending on the size of dataset, particle size, number of cycles, GPU models and other factors.

Steps 1–2, arrangement of input files and directories: ~30 min when using autoAlign
Steps 3–4, defocus estimate: ~25 min
Steps 5–7, define subregion boundaries: ~10 min
Steps 8–12, pick particles: ~1.5 h
Step 13, initialize the project: ~1 min
Step 14, reconstruct the tomograms for alignment and averaging: ~5 min, depending on the tomogram binning
Steps 15–18, STA and alignment: variable, depending on dataset size, particle size and binning
Steps 19–21, tilt-series refinement by tomoCPR: variable, depending on dataset size, particle size and binning and other factors
Step 22–30, subtomogram classification: ~40 min, depending on dataset size, particle size, binning and other factors
Step 31, final reconstruction: ~2.5 h, depending on dataset size, particle size and binning

Anticipated results

We illustrate the protocol using four datasets: a wild-type Gag dataset (a subset of 5 tilt series) and a ribosome dataset (a subset of 12 tilt series) from EMPIAR (EMPIAR-10164 and EMPIAR-10304), a GagT8I assembly dataset (5 tilt series) from a previous study⁴⁷ and a new apoferritin dataset (6 tilt series) collected in-house (Table 1).

HIV-1 Gag T8I spherical assemblies

A challenging non-single-particle dataset of HIV-1 Gag T8I immature spherical assemblies with overlapping densities, but no icosahedral symmetry, is illustrated in detail in this protocol. These assemblies were produced in Escherichia coli as part of a study aiming to resolve the extended six-helix bundle of HIV-1 Gag hexamer.

The per-tilt CTF estimation of the tilt series is consistent with expected values from experimental setting. After the template search, the convolution map reveals local peaks corresponding to each Gag hexamer. Most of the hexamers in the lattice are picked for further analysis; a small number of particles were found to be false positives (Fig. 2). Subtomograms from each subregion were assigned to the same half datasets to avoid mixing halfsets that had overlapping peripheral density (fscGoldSplitOnTomos=1). STA and alignment was conducted using subtomograms binned at different factors (from 6× binned tomograms to 1× binned tomograms). After alignment was completed with each binned tomogram (except bin1), a tomoCPR tilt-series refinement was performed.

Since tomoCPR is an optional step and requires tuning of some parameters, we recommend users work on a new STA project to run through iterative STA and alignment without tomoCPR for the first instance.

A 3D classification was performed using bin4, which gave nine classes of images (Fig. 3). The classes display different features as shown in x–y and x–z slices (Fig. 3a,b), along with their corresponding overall 3DSFs in x–y and x–z slices (Fig. 3c,d). Classes 8 and 9 showed no clear Gag lattice (Fig. 3a,b); therefore, objects in these classes were removed from further processing. The sampling functions of the remaining classes reveal no preferential orientation, indicating that the 3D classification is not biased by the particle orientations in the raw tomogram.

Further iterative cycles of STA, alignment and tomoCPR were carried out. The resulting final maps were generated using either subtomograms or 2D images with cisTEM, shown in Fig. 4, along with its corresponding FSC plots. cisTEM reconstruction and refinement resulted in a higher-resolution density map (4.5 Å) compared with averaging from subtomograms (5.0 Å) (Fig. 4).

**Fig. 4: Subtomogram averages and conical FSC plots of HIV-1 Gag T8I assemblies (five tilt series).**

Wild-type Gag

We also reprocessed a published five tilt series of wild-type Gag (EMPIAR-10164, TS_001, 003, 043, 045 and 054), which yielded a subtomogram-averaged map at 3.9 Å resolution previously¹⁹. The alignment procedure for this dataset is similar to that used for the Gag T8I dataset above, but does not include classification (Table 1 and Supplementary Information 2).

Given that the pixel size (1.35 Å) is slightly larger in this dataset, the iterative alignment step used in emClarity starts from bin4 tomograms and three rounds of tomoCPR were conducted at bin4, bin3 and bin2, respectively. The same alignment mask size Ali_mRadius=[116,116,72] encompassing seven hexamers as in the HIV-1 Gag T8I processing was used in the initial averaging/alignment steps. The size was changed to [88,88,72] in the last few iterations at bin1 to further improve the resolution. A final sixfold symmetrized map at a resolution of 3.3 Å was obtained, revealing clear side chains of Gag domains (Fig. 5).

**Fig. 5: STA of WT Gag (five tilt series, EMPIAR-10164).**

Ribosomes

The emClarity processing of the ribosome dataset of isolated single particles (EMPIAR-10304) is included in the software tutorial (https://github.com/ffyr2w/emClarity-tutorial) along with emClarity installation. The tilt series were aligned with emClarity autoAlign function, and particles were picked through template search with bin6 tomograms. Subtomograms within the same subregion were split into two random halves since there is no overlap among them (fscGoldSplitOnTomos=0). The alignment and averaging were performed iteratively from bin5 to bin1 with one round of tomoCPR before transition to each lower binning. The classification was performed at bin3 to remove junk particles (Fig. 6). Four resolution bands were used for 3D classification (pcaScaleSpace=[25,50,80,120]), and several different numbers of classes were tried (2, 3, 4, 6, 8, 14, 18), all of which resulted in classes with junk particles (~13.2%) and a small class (7.4%) containing only the large subunits (Fig. 6a). The final reconstruction and refinement with cisTEM resulted in a 7.0 Å resolution map, showing clear secondary structure elements such as RNA groves and α-helices (Fig. 6b–d).

**Fig. 6: Subtomogram classification and averaging of ribosome (12 tilt series, EMPIAR-10304).**

Apoferritin

The final example is the apoferritin cryoET sample, which was prepared using a graphene-coated EM grid, yielding a mono-dispersed thin layer of apoferritin (Fig. 7a). Tilt series were collected using the parameters presented in Table 1, and the emClarity commands are included in Supplementary Information 2. Six tilt series were aligned with Etomo by patch tracking (no fiducial gold beads) and imported into emClarity. Octahedral symmetry was applied throughout alignment. The final STA map was obtained from <5,000 subtomograms, with 2.86 Å resolution, approaching the Nyquist frequency (2.68 Å) (Fig. 7b–d).

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The Gag dataset (five tilt series) and apoferritin dataset (six tilt series) have been deposited in the EMPIAR database under accession codes EMPIAR-10643 and EMPIAR-10787, respectively. The resulting final reconstructions have been deposited in EMDB under the following accession codes: Gag-T8I, EMD-13390; Gag-WT, EMD-13354; apoferritin, EMD-13271; and ribosome, EMD-13270.

Code availability

The emClarity software is freely available at https://github.com/bHimes/emClarity/wiki. The tutorial documentation is available at https://github.com/ffyr2w/emClarity-tutorial.

References

Zhang, P. Advances in cryo-electron tomography and subtomogram averaging and classification. Curr. Opin. Struct. Biol. 58, 249–258 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kaplan, M. et al. In situ imaging and structure determination of biomolecular complexes using electron cryo-tomography. Methods Mol. Biol. 2215, 83–111 (2021).
Article CAS PubMed Google Scholar
Turk, M. & Baumeister, W. The promise and the challenges of cryo-electron tomography. FEBS Lett. 594, 3243–3261 (2020).
Article CAS PubMed Google Scholar
Mahamid, J. et al. Visualizing the molecular sociology at the HeLa cell nuclear periphery. Science 351, 969–972 (2016).
Article CAS PubMed Google Scholar
Forster, F. & Hegerl, R. Structure determination in situ by averaging of tomograms. Methods Cell Biol. 79, 741–767 (2007).
Article CAS PubMed Google Scholar
Bykov, Y. S. et al. The structure of the COPI coat determined within the cell. eLife https://doi.org/10.7554/eLife.32493 (2017).
Zhang, Y. et al. Molecular architecture of the luminal ring of the Xenopus laevis nuclear pore complex. Cell Res. 30, 532–540 (2020).
Article CAS PubMed PubMed Central Google Scholar
Pfeffer, S. et al. Structure of the native Sec61 protein-conducting channel. Nat. Commun. 6, 8403 (2015).
Article CAS PubMed Google Scholar
Cassidy, C. K. et al. CryoEM and computer simulations reveal a novel kinase conformational switch in bacterial chemotaxis signaling. eLife https://doi.org/10.7554/eLife.08419 (2015).
Dodonova, S. O., Prinz, S., Bilanchone, V., Sandmeyer, S. & Briggs, J. A. G. Structure of the Ty3/Gypsy retrotransposon capsid and the evolution of retroviruses. Proc. Natl Acad. Sci. USA 116, 10048–10057 (2019).
Article CAS PubMed PubMed Central Google Scholar
Mattei, S., Glass, B., Hagen, W. J., Krausslich, H. G. & Briggs, J. A. The structure and flexibility of conical HIV-1 capsids determined within intact virions. Science 354, 1434–1437 (2016).
Article CAS PubMed Google Scholar
Dick, R. A. et al. Structures of immature EIAV Gag lattices reveal a conserved role for IP6 in lentivirus assembly. PLoS Pathog. 16, e1008277 (2020).
Article PubMed PubMed Central Google Scholar
Schur, F. K. et al. An atomic model of HIV-1 capsid-SP1 reveals structures regulating assembly and maturation. Science 353, 506–508 (2016).
Article CAS PubMed Google Scholar
Qu, K. et al. Structure and architecture of immature and mature murine leukemia virus capsids. Proc. Natl Acad. Sci. USA 115, E11751–E11760 (2018).
Article CAS PubMed PubMed Central Google Scholar
von Kugelgen, A. et al. In situ structure of an intact lipopolysaccharide-bound bacterial surface layer. Cell 180, 348–358 e315 (2020).
Article Google Scholar
Tegunov, D., Xue, L., Dienemann, C., Cramer, P. & Mahamid, J. Multi-particle cryo-EM refinement with M visualizes ribosome-antibiotic complex at 3.5 A in cells. Nat. Methods 18, 186–193 (2021).
Article CAS PubMed PubMed Central Google Scholar
Lucic, V., Rigort, A. & Baumeister, W. Cryo-electron tomography: the challenge of doing structural biology in situ. J. Cell Biol. 202, 407–419 (2013).
Article CAS PubMed PubMed Central Google Scholar
Wan, W. & Briggs, J. A. Cryo-electron tomography and subtomogram averaging. Methods Enzymol. 579, 329–367 (2016).
Article CAS PubMed Google Scholar
Turonova, B., Schur, F. K. M., Wan, W. & Briggs, J. A. G. Efficient 3D-CTF correction for cryo-electron tomography using NovaCTF improves subtomogram averaging resolution to 3.4A. J. Struct. Biol. 199, 187–195 (2017).
Article CAS PubMed PubMed Central Google Scholar
Heumann, J. M., Hoenger, A. & Mastronarde, D. N. Clustering and variance maps for cryo-electron tomography using wedge-masked differences. J. Struct. Biol. 175, 288–299 (2011).
Article PubMed PubMed Central Google Scholar
Nicastro, D. et al. The molecular architecture of axonemes revealed by cryoelectron tomography. Science 313, 944–948 (2006).
Article CAS Google Scholar
Chen, M. et al. Convolutional neural networks for automated annotation of cellular cryo-electron tomograms. Nat. Methods 14, 983–985 (2017).
Article CAS PubMed PubMed Central Google Scholar
Galaz-Montoya, J. G., Flanagan, J., Schmid, M. F. & Ludtke, S. J. Single particle tomography in EMAN2. J. Struct. Biol. 190, 279–290 (2015).
Article CAS PubMed PubMed Central Google Scholar
Galaz-Montoya, J. G. et al. Alignment algorithms and per-particle CTF correction for single particle cryo-electron tomography. J. Struct. Biol. 194, 383–394 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bharat, T. A. & Scheres, S. H. Resolving macromolecular structures from electron cryo-tomography data using subtomogram averaging in RELION. Nat. Protoc. 11, 2054–2065 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bharat, T. A. M., Russo, C. J., Lowe, J., Passmore, L. A. & Scheres, S. H. W. Advances in single-particle electron cryomicroscopy structure determination applied to sub-tomogram averaging. Structure 23, 1743–1753 (2015).
Article CAS PubMed PubMed Central Google Scholar
Castano-Diez, D., Kudryashev, M., Arheit, M. & Stahlberg, H. Dynamo: a flexible, user-friendly development tool for subtomogram averaging of cryo-EM data in high-performance computing environments. J. Struct. Biol. 178, 139–151 (2012).
Article PubMed Google Scholar
Maurer, U. E. et al. The structure of herpesvirus fusion glycoprotein B-bilayer complex reveals the protein–membrane and lateral protein–protein interaction. Structure 21, 1396–1405 (2013).
Article CAS PubMed PubMed Central Google Scholar
Forster, F., Pruggnaller, S., Seybert, A. & Frangakis, A. S. Classification of cryo-electron sub-tomograms using constrained correlation. J. Struct. Biol. 161, 276–286 (2008).
Article PubMed Google Scholar
Hrabe, T. et al. PyTom: a python-based toolbox for localization of macromolecules in cryo-electron tomograms and subtomogram analysis. J. Struct. Biol. 178, 177–188 (2012).
Article CAS PubMed Google Scholar
Winkler, H. 3D reconstruction and processing of volumetric data in cryo-electron tomography. J. Struct. Biol. 157, 126–137 (2007).
Article CAS PubMed Google Scholar
Himes, B. A. & Zhang, P. emClarity: software for high-resolution cryo-electron tomography and subtomogram averaging. Nat. Methods 15, 955–961 (2018).
Article CAS PubMed PubMed Central Google Scholar
Liu, C. et al. The architecture of inactivated SARS-CoV-2 with postfusion spikes revealed by cryo-EM and cryo-ET. Structure 28, 1218–1224 e1214 (2020).
Article CAS PubMed PubMed Central Google Scholar
Watanabe, R. et al. The in situ structure of Parkinson’s disease-linked LRRK2. Cell 182, 1508–1518 e1516 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sutton, G. et al. Assembly intermediates of orthoreovirus captured in the cell. Nat. Commun. 11, 4445 (2020).
Article CAS PubMed PubMed Central Google Scholar
Tan, T. Y. et al. Capsid protein structure in Zika virus reveals the flavivirus assembly process. Nat. Commun. 11, 895 (2020).
Article CAS PubMed PubMed Central Google Scholar
Unchwaniwala, N. et al. Subdomain cryo-EM structure of nodaviral replication protein A crown complex provides mechanistic insights into RNA genome replication. Proc. Natl Acad. Sci. USA 117, 18680–18691 (2020).
Article CAS PubMed PubMed Central Google Scholar
Gibson, K. H. et al. An asymmetric sheath controls flagellar supercoiling and motility in the leptospira spirochete. eLife https://doi.org/10.7554/eLife.53672 (2020).
Cassidy, C. K. et al. Structure and dynamics of the E. coli chemotaxis core signaling complex by cryo-electron tomography and molecular simulations. Commun. Biol. 3, 24 (2020).
Article CAS PubMed PubMed Central Google Scholar
Rohou, A. & Grigorieff, N. CTFFIND4: fast and accurate defocus estimation from electron micrographs. J. Struct. Biol. 192, 216–221 (2015).
Article PubMed PubMed Central Google Scholar
Scheres, S. H. & Chen, S. Prevention of overfitting in cryo-EM structure determination. Nat. Methods 9, 853–854 (2012).
Article CAS PubMed PubMed Central Google Scholar
Rosenthal, P. B. & Henderson, R. Optimal determination of particle orientation, absolute hand, and contrast loss in single-particle electron cryomicroscopy. J. Mol. Biol. 333, 721–745 (2003).
Article CAS Google Scholar
Mastronarde, D. N. & Held, S. R. Automated tilt series alignment and tomographic reconstruction in IMOD. J. Struct. Biol. 197, 102–113 (2017).
Article PubMed Google Scholar
Scheres, S. H. RELION: implementation of a Bayesian approach to cryo-EM structure determination. J. Struct. Biol. 180, 519–530 (2012).
Article CAS PubMed PubMed Central Google Scholar
Grant, T., Rohou, A. & Grigorieff, N. cisTEM, user-friendly software for single-particle image processing. eLife https://doi.org/10.7554/eLife.35383 (2018).
Heymann, J. B. Guidelines for using Bsoft for high resolution reconstruction and validation of biomolecular structures from electron micrographs. Protein Sci. 27, 159–171 (2018).
Article CAS PubMed Google Scholar
Mendonca, L. et al. CryoET structures of immature HIV Gag reveal six-helix bundle. Commun. Biol. 4, 481 (2021).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We are grateful to Y. Zhu for discussion and critical reading of the manuscript. We acknowledge Diamond for access and support of the CryoEM facilities at the UK national Electron Bio-Imaging Centre (eBIC, proposal CM26464), funded by the Wellcome Trust, Medical Research Council (MRC) and Biotechnology and Biological Sciences Research Council (BBSRC). The computational aspects of this research were supported by the Wellcome Trust Core Award grant number 203141/Z/16/Z and the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC). This work was supported by the National Institutes of Health grants AI150481, the UK Wellcome Trust Investigator Award 206422/Z/17/Z, the UK Biotechnology and Biological Sciences Research Council grant BB/S003339/1, and the European Research Council Advanced Grant (ERC AdG) grant 101021133.

Author information

These authors contributed equally: Tao Ni, Thomas Frosio, Luiza Mendonça.

Authors and Affiliations

Division of Structural Biology, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
Tao Ni, Thomas Frosio, Luiza Mendonça & Peijun Zhang
Diamond Light Source, Harwell Science and Innovation Campus, Didcot, UK
Thomas Frosio, Yuewen Sheng, Daniel Clare & Peijun Zhang
Howard Hughes Medical Institute, RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA, USA
Benjamin A. Himes

Authors

Tao Ni
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Frosio
View author publications
You can also search for this author in PubMed Google Scholar
Luiza Mendonça
View author publications
You can also search for this author in PubMed Google Scholar
Yuewen Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Clare
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin A. Himes
View author publications
You can also search for this author in PubMed Google Scholar
Peijun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.Z. conceived the research and designed the experiments. Y.S. prepared the apoferritin on graphene grids, and D.C. collected data. T.N., L.M. and Y.S. performed tomography reconstruction and STA and classification. T.F. wrote the emClarity tutorial. B.A.H. and T.F. updated code/binaries with new features in later versions of emClarity. T.N. and P.Z. wrote the manuscript with support from all the authors.

Corresponding author

Correspondence to Peijun Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Protocols thanks Peter J. Peters and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Information 1

Reporting Summary

Supplementary Information 2

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ni, T., Frosio, T., Mendonça, L. et al. High-resolution in situ structure determination by cryo-electron tomography and subtomogram averaging using emClarity. Nat Protoc 17, 421–444 (2022). https://doi.org/10.1038/s41596-021-00648-5

Download citation

Received: 23 February 2021
Accepted: 08 October 2021
Published: 12 January 2022
Issue Date: February 2022
DOI: https://doi.org/10.1038/s41596-021-00648-5

This article is cited by

Parallel cryo electron tomography on in situ lamellae
- Fabian Eisenstein
- Haruaki Yanagisawa
- Radostin Danev
Nature Methods (2023)
Structure and activity of particulate methane monooxygenase arrays in methanotrophs
- Yanan Zhu
- Christopher W. Koo
- Peijun Zhang
Nature Communications (2022)
Structure and assembly of cargo Rubisco in two native α-carboxysomes
- Tao Ni
- Yaqi Sun
- Peijun Zhang
Nature Communications (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

*Parameters:*
Raw_angleSearch=[16,4,0,0];	%% in param1.m
Raw_angleSearch=[0,0,9,3];	%% in param2.m

*Parameters:*
pcaScaleSpace=[10,20,40]	%% one can select as many band-pass filters as possible, though three is typically sufficient
Pca_maxEigs=25	%% maximum number of eigenimages to save

*Parameters:*
Pca_clusters=[9 12 16]	%% different number of clusters

*Parameters:*
Cls_className=9	%% name of classes
Cls_classes_odd=[1:9;1.*ones(1,9)]	%% C1 symmetry for half map 1
Cls_classes_eve=[1:9;1.*ones(1,9)]	%% C1 symmetry for half map 2
symmetry=C1

*Parameters:*
Fsc_bfactor=[10,25,75,100,250]

Subjects

Abstract

Similar content being viewed by others

Introduction

Overview of emClarity pipeline

Prerequisite for using the protocol

Limitations

Materials

Equipment and setup

Input data

Data: raw tilt series

Data: metadata

Procedure

Critical

Preparation: arrangement of input files and directories

Critical

Defocus estimate

Define subregion boundaries

Pick particles

Critical

Initialize the project

Critical

Reconstruct the tomograms for alignment and averaging

STA and alignment

Critical

Parameters (other parameters are identical as avg)

(Optional) Tilt-series refinement by tomoCPR

Critical

(Optional) Subtomogram classification

Critical

Final reconstruction

Troubleshooting

Timing

Anticipated results

HIV-1 Gag T8I spherical assemblies

Wild-type Gag

Ribosomes

Apoferritin

Reporting Summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Related links

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links