Betacoronavirus S proteins are processed into S1 and S2 subunits by host proteases5. Like other class I viral fusion proteins, the two subunits trimerize and fold into a metastable pre-fusion conformation. The S1 subunit is responsible for receptor binding, while the S2 subunit mediates membrane fusion. Coronaviruses typically possess two domains within S1 capable of binding to host receptors: an amino (N)-terminal domain (NTD) and a carboxy (C)-terminal domain (CTD), with the latter recognizing protein receptors for SARS-CoV and MERS-CoV6,7. Although these individual domains have been structurally characterized, the organization of the complete spike has not yet been determined, preventing a mechanistic understanding of S protein function.

Here, we present the structure of the HKU1 S protein ectodomain determined using cryo-electron microscopy (cryo-EM) to 4.0 Å resolution (Fig. 1a and Extended Data Figs 1 and 2 and Extended Data Table 1). The protein construct contains a C-terminal T4 fibritin trimerization motif and a mutated S1/S2 furin-cleavage site (Extended Data Fig. 3). The S1 subunit adopts an extended conformation with short linkers between domains and sub-domains (Fig. 1b). The S1 NTD (amino acids 14–297) has strong structural and sequence homology to the bovine coronavirus (BCoV) S1 NTD (Extended Data Fig. 4), which recognizes acetylated sialic acids on glycosylated cell-surface receptors8. The glycan-binding site in the BCoV S1 NTD is conserved in the HKU1 S1 NTD and is located at the apex of the trimer, oriented towards target cells. Indeed, HKU1 S1 was recently shown to bind O-acetylated sialic acids on host cells, and these glycans were required for efficient infection of primary human airway epithelial cultures9.

Figure 1: Structure of the HKU1 pre-fusion spike ectodomain.
figure 1

a, A single protomer of the trimeric S protein is shown in cartoon representation coloured as a rainbow from the N to C terminus (blue to red) with the reconstructed EM density of remaining protomers shown in white and grey. b, The S1 subunit is composed of the NTD and CTD as well as two sub-domains (SD-1 and SD-2). The S2 subunit contains the coronavirus fusion machinery and is primarily α-helical. c, Domain architecture of the HKU1 S protein coloured as in a.

PowerPoint slide

The HKU1 S1 CTD (amino acids 325–605) consists of a structurally conserved core connected to a large, variable loop (HKU1 S amino acids 428–587)10 that is partially disordered (Extended Data Figs 5 and 6). The CTD is located at the trimer apex close to the threefold axis, and the core interacts with the other two S1 CTD cores and with one NTD from an adjacent protomer. The domain swapping between protomers results in a woven appearance when viewed looking down towards the viral membrane (Fig. 2a). Structural alignment of the SARS-CoV and MERS-CoV CTD–receptor complexes11,12 with the HKU1 pre-fusion S protein reveals that the protein-receptor-binding surface of the S1 CTD is buried in the HKU1 S protein trimer and is therefore incapable of making equivalent interactions without some initial breathing and transient exposure of these domains (Fig. 2b). Although a protein receptor has not yet been identified for HKU1, antibodies against the CTD, but not those against the NTD, blocked HKU1 infection of cells13. These data suggest that the S1 CTD is the primary HKU1 receptor-binding site13, whereas the NTD mediates initial attachment via glycan binding.

Figure 2: Architecture of the HKU1 S1 subunit.
figure 2

a, EM density corresponding to each S1 protomer is shown. The putative glycan-binding and protein-receptor-binding sites are indicated with dashed shapes on the NTD and CTD, respectively. b, The HKU1 S1 CTD forms quaternary interactions with an adjacent CTD using a surface similar to that used by SARS CTD to bind its receptor, ACE2 (ref. 11). c, Sub-domain 1 is composed of amino acid residues before and after the S1 CTD. d, Sub-domain 2 is composed of S1 sequence C-terminal to the CTD, a short peptide following the NTD, and the N-terminal strand of S2, which follows the S1/S2 furin-cleavage site.

PowerPoint slide

HKU1 S1 also contains two sub-domains (which we term SD-1 and SD-2) that lack significant homology to previously determined structures (Fig. 2c, d). These sub-domains are primarily composed of S1 amino acid sequences following the CTD. However, stretches of amino acids preceding the CTD as well as S2 residues adjacent to the S1/S2 cleavage site also contribute to the sub-domains. This complex folding of elements dispersed throughout the primary sequence may allow receptor-induced conformational changes in the CTD to be transmitted to other parts of the structure.

In contrast to other viral fusion proteins such as influenza haemagglutinin (HA)14 or HIV-1 envelope (Env)15,16, the HKU1 S1 subunits are rotated about the trimeric threefold axis with respect to the S2 subunits, causing the S1 subunit from one protomer to sit above the S2 subunit of an adjacent protomer (Extended Data Fig. 7). Similar to HA and Env, a region in the HKU1 S1 CTD (amino acids 371–380) caps the S2 central helix, thereby preventing the fusion machinery from springing into action.

Processing of coronavirus S proteins by host proteases plays a critical role in the entry process5. HKU1 S is cleaved by furin into S1 and S2 subunits during protein biosynthesis. Though mutated in the protein construct used here and disordered in the density map, the HKU1 S furin-cleavage site at the S1/S2 junction lies in a loop of SD-2 (Fig. 3 and Extended Data Fig. 6). Furin cleavage would leave a single S2 β-strand participating in the SD-2 β-sheets (Fig. 2d). Coronavirus S proteins also have a secondary cleavage site, termed S2′ (Arg900)5, adjacent to the viral fusion peptide (amino acids 901–918)17 (Fig. 3b and Extended Data Fig. 6). This is similar to the multiple endoproteolytic cleavage events that occur in the fusion proteins of respiratory syncytial virus (RSV) and Ebola virus18,19. Protease cleavage at S2′ likely follows S1/S2 cleavage and may not occur until host-receptor engagement at the plasma membrane or viral endocytosis5.

Figure 3: HKU1 S2 subunit fusion machinery.
figure 3

a, The HKU1 S2 subunit is coloured like a rainbow from the N-terminal β-strand (blue), which participates in S1 sub-domain 2, to the C terminus (red) before HR2. b, The HKU1 S2 structure contains the fusion peptide (FP) and a heptad repeat (HR1). Protease-recognition sites are indicated within disordered regions of the protein (dashed lines). c, A comparison of coronavirus S2 HR1 in the pre- and post-fusion22 conformations. Five HR1 α-helices are labelled and coloured like a rainbow from blue to red, N to C terminus, respectively. The structures are oriented to position similar portions of the central helix (red).

PowerPoint slide

As in all class I viral fusion proteins, the coronavirus S2 subunit contains the four elements required for membrane fusion: a fusion peptide or loop, two heptad repeats (HR1 and HR2), and a transmembrane domain14,20,21. Refolding of HR1 into a long α-helix thrusts the fusion peptide into the host-cell membrane, and as the two heptad repeats interact to form a coiled-coil, the host and viral membranes are brought together. The fusion peptide, conserved among coronavirus S proteins17 (Extended Data Fig. 6), is located on the exterior of the HKU1 S protein and is adjacent to the putative S2′ cleavage site, which remains uncleaved in our structure. The fusion peptide forms a short helix and a loop, with most of the hydrophobic amino acids buried in an interface with other elements of S2. Unlike influenza HA where the C terminus of the fusion peptide is only 14 amino acids away from the N terminus of HR1, the fusion peptide of HKU1 S is 60 amino acids away from HR1. This span of protein contains four short α-helices and several longer regions lacking regular secondary structure. This intervening sequence is also buried beneath SD-2 and the S2′ cleavage site, suggesting that cleavage may affect the proclivity of S2 for undergoing the transition to the post-fusion conformation.

Coronavirus S protein heptad repeats are unusually large with HR1 encompassing more than 90 amino acids20. In the cryo-EM structure, HR2 is located at the base of the HKU1 S protein near the viral membrane, but is poorly ordered, precluding unambiguous assignment of the residues. However, HR1 is well ordered and arranged along the length of the S2 subunit, forming four short helices and part of the central three-helix bundle. This arrangement of HR1 is similar to that of influenza HA, although in HA the HR1 is organized as two helices connected by a long loop14. Conversion of influenza HA to the post-fusion conformation requires these protein elements to transition into a single long α-helix21. The post-fusion six-helix bundle structures of SARS-CoV and MERS-CoV S2 heptad repeats22,23 reveal that coronavirus S proteins also undergo a similar transition (Fig. 3c). However, the S protein must carry out five such loop-to-helix transitions, highlighting the complexity of S proteins relative to other class I fusion proteins. In addition, the membrane distal regions of the pre-fusion S2 central three-helix bundle (S2 amino acids 1070–1076), which is the C-terminal portion of HR1, are splayed outwards from the threefold axis (Extended Data Fig. 7). In the available coronavirus post-fusion HR1–HR2 structures, this portion of HR1 forms a tight three-helix bundle22,23. Formation of this three-helix bundle may be prevented by interactions between the C-terminal end of the S2 HR1 and the S1 CTD, and thus disruption of these interactions through receptor-induced conformational changes would provide an additional means by which receptor binding in S1 can initiate S2-mediated membrane fusion. Indeed, protease cleavage and an acidic pH are thought to be insufficient to trigger the transition to the post-fusion conformation without additional destabilization provided by receptor binding24,25,26.

The formation of anti-parallel six-helix bundles composed of HR1 and HR2 in the post-fusion conformation is a unifying feature of class I viral fusion proteins. However, the pre-fusion conformations of this protein family are incredibly diverse in size and topology (Extended Data Fig. 8). The HKU1 S protein structure presented here most closely resembles influenza virus HA and HIV-1 Env (Fig. 4), which also have receptor-binding subunits that cap the central helix of the fusion subunit14,15,27,28. However, some core elements of the fusion machinery are conserved amongst all class I fusion proteins, including paramyxovirus F proteins.

Figure 4: Comparison of structurally related class I viral fusion proteins.
figure 4

The fusion proteins from coronaviruses, influenza virus and HIV-1 are cleaved into receptor-binding subunits (pink, light green, light blue) and the viral fusion machinery (dark red, dark green, blue)14,15,16,28. Comparison to other class I fusion proteins can be found in Extended Data Fig. 8.

PowerPoint slide

The HCoV-HKU1 S protein trimer in a pre-fusion conformation is, to our knowledge, the largest class I viral fusion glycoprotein structure determined to date (Fig. 4 and Extended Data Figs 8 and 9). Since betacoronavirus S proteins are similar in size and have a conserved domain organization, our findings should be generally applicable to other betacoronaviruses, including SARS-CoV and MERS-CoV (Extended Data Fig. 6). Our studies provide a structural basis for S protein function wherein the pre-fusion S protein is progressively matured and destabilized by receptor binding and protease cleavage. Following dissociation of the S1 subunits, HR1 would transition to a long α-helix, and the fusion peptide would be released from the side of the S2 subunit and inserted into host membranes. The structure and mechanistic insights presented here should enable engineering of pre-fusion stabilized coronavirus S proteins as vaccine immunogens against current and emerging betacoronaviruses, similar to recent efforts for other viral fusion proteins29,30. This work also acts as a springboard for future studies to define mechanisms of antibody recognition and neutralization, which will lead to an improved understanding of coronavirus immunity.


Data reporting

No statistical methods were used to predetermine sample size. The investigators were not blinded to allocation during experiments and outcome assessment.

Protein expression and purification

A mammalian-codon-optimized gene encoding HKU1 S (isolate N5, NCBI accession Q0ZME7) residues 1–1276 with a C-terminal T4 fibritin trimerization domain, a HRV3C cleavage site, and a 6xHis-tag was synthesized and subcloned into the eukaryotic expression vector pVRC8400. The S1/S2 furin-recognition site 752-RRKRR-756 was mutated to GGSGS to generate the uncleaved construct used for cryoEM studies. Three hours after this plasmid was transfected into FreeStyle 293-F cells (Invitrogen), kifunensine was added to a final concentration of 5 μM. FreeStyle 293-F cells are a high-transfection-efficiency cell line adapted for suspension culture derived from low passage clonal cultures and after purchase were not further authenticated. Cells were not confirmed to be free of mycoplasma, but were only used for protein expression. Cultures were harvested after six days, and protein was purified from the medium using Ni-NTA Superflow resin (Qiagen). The buffer was then exchanged using a HiPrep 26/10 desalting column (GE Healthcare Biosciences) from a high-imidazole elution buffer to a low pH buffer (20 mM Bis-Tris pH 6.5, 150 mM NaCl). Afterward, endoglycosidase H (EndoH) (10% w/w) and HRV3C protease (1% w/w) were added to the protein and the reaction was incubated overnight at 4 °C. The digested protein was further purified using a Superose 6 16/70 column (GE Healthcare Biosciences).

The furin-cleaved HKU1 S construct analysed by negative-stain EM was similar to the one described above except that it encoded residues 1–1249 and contained the wild-type RRKRR furin-recognition site. Expression and purification were also similar, except that a plasmid expressing furin was co-transfected into the FreeStyle 293-F cells to ensure complete processing of the protein.

Sample preparation for negative-stain electron microscopy

HKU1 S proteins were placed directly onto 400 copper mesh grids and then stained with 1% uranyl formate. Tris-buffered saline (TBS) was used as buffer if dilution was necessary.

Negative-stain electron microscopy data collection

Grids were loaded into a Tecnai T12 Spirit operating at 120 keV and imaged using a Tietz TemCam-F416 CMOS at 52,000 × magnification at ~1.5 μm under focus. Micrographs were collected using Leginon31 and processed within Appion32. Particles were picked using a difference-of-Gaussians approach33 and aligned using reference-free 2D classification employing iterative multivariate statistical analysis/multi-reference alignment (MRA/MSA) using a binning factor of 2 to remove amorphous particles34. Particles in classes that did not represent views of HKU1 S proteins were discarded. ISAC35 was used to generate a template stack from which initial 3D models were generated using the EMAN2 (ref. 36) procedure 3D models were refined using EMAN1 (ref. 37).

Sample preparation for cryo-electron microscopy

Sample solution (3 μl) was applied to the carbon face of a CF-2/2-4C C-Flat grid (Electron Microscopy Sciences, Protochips) that had been plasma cleaned for five seconds using a mixture of Ar/O2 (Gatan Solarus 950 Plasma system). The grid was then manually blotted and immediately plunged into liquid ethane using a manual freeze plunger.

Cryo-electron microscopy data collection

Movies were collected via the Leginon interface on a FEI Titan Krios operating at 300 keV mounted with a Gatan K2 direct-electron detector31. Each movie was collected in counting mode at 22,500 × nominal magnification resulting in a calibrated pixel size of 1.31 Å/pix at the object level. A dose rate of ~10 e/((cam pix) × s) was used; exposure time was 200 ms per frame. The data collection resulted in a total of 1,049 movies containing 50 frames each. Total dose per movie was 57 e2. Data were collected at 1.0 to 3.5 μm under focus.

Cryo-electron microscopy data processing

Frames in each movie were aligned38, and CTF estimation was carried out using CTFFIND3 (ref. 39). Particles were picked from a subset of the data employing a difference-of-Gaussians approach33 and aligned using reference-free 2D classification employing iterative MRA/MSA using a binning factor of two34. The resulting 2,188 particles were used to generate an initial 25 Å lowpass-filtered 3D reconstruction using EMAN2. SPIDER refproj.spi40 with a delta theta angle of 15 degrees was used to generate 83 projection images of the initial 3D reconstruction. These projection images were used as templates for picking particles from the entire cryo data set. Particles from the entire data set were aligned and classified with the same methods used for the subset of particles stated above. After 2D classification, unbinned selected particles were symmetrically refined in RELION version 1.3 (refs 41, 42) against the initial 3D reconstruction filtered to 60 Å resolution. This refinement was followed by particle polishing and refinement of the resulting realigned, B-factor-weighted and signal-integrated particles using RELION version 1.4b1. The resolution of the final map was 4.04 Å at an FSC cutoff of 0.143. A mask was generated in RELION using a threshold that accounted for the entire structure. From this threshold, the mask was further dilated by 3 voxels and a Gaussian fall-off was generated over an additional 6 voxels. The mask effect on FSC was taken into consideration. Phases were randomized in the unfiltered half-set maps for initial FSC lower than 0.8 and a new FSC between these phase-randomized maps was generated and used to correct for mask effects in the final FSC-based resolution estimate. The reported resolution of 4.04 Å is the RELION CorrelationCorrected value

The map was B-factor sharpened employing FSC-weighting. The B-factor was estimated in RELION based on the resolution range from 10 Å to 2.62 Å (B-factor = −117 Å2). The detector MTF file was provided to RELION.

Model building and refinement

An initial model of the S1 NTD was generated using the Modeller43 homology modelling tool in UCSF Chimera44 with the BCoV NTD (PDB 4H14)8 as a template. The NTD homology model was docked into the HKU1 S protein EM density and refined with Rosetta density-guided iterative local refinement45 while imposing C3 symmetry. Rosetta output models were clustered based on pairwise r.m.s.d. using a cluster radius of 2.15 Å. The lowest energy model from the largest cluster was selected for additional refinement. This model and the conserved CTD core from SARS-CoV (PDB 2AJF)11 were used as starting structures for model building and refinement. These starting models and the remaining HKU1 protein sequence were modelled manually using COOT46 and refined using RosettaRelax47. Structures were evaluated using EMRinger48 and Molprobity49. Figures were produced in the PyMol50 or UCSF Chimera44 software packages.