Cryo-EM structures of HKU2 and SADS-CoV spike glycoproteins and insights into coronavirus evolution

A new porcine coronavirus SADS-CoV was recently identified from suckling piglets with severe diarrhea in southern China and its genome sequence is most identical (~95% identity) to that of bat α-coronavirus HKU2. It again indicates bats are the natural reservoir of many coronaviruses that have great potential for cross-species transmission to animals and humans by recombination and/or mutation. Here we report the cryo-EM structures of HKU2 and SADS-CoV spike glycoprotein trimers at 2.38 Å and 2.83 Å resolution, respectively. HKU2 and SADS-CoV spikes exhibit very high structural similarity, with subtle differences mainly distributed in the NTD and CTD of the S1 subunit responsible for cell attachment and receptor binding. We systematically analyzed and compared the NTD, CTD, SD1 and SD2 domains of the S1 subunit and the S2 subunit of HKU2 spike with those of α-, β-, γ-, and δ-coronavirus spikes. The results show that the NTD and CTD of HKU2/SADS-CoV are probably the most ancestral in the evolution of spike. Although the S2 subunit mediating membrane fusion is highly conserved, the connecting region after fusion peptide in HKU2/SADS-CoV S2 subunit also adopts a conformation distinct from other coronaviruses. These results structurally demonstrate a close evolutionary relationship between HKU2 /SADS-CoV and β-coronavirus spikes and provide new insights into the evolution and cross-species transmission of coronaviruses.


Introduction
However, the molecular mechanisms underlying the transmission of SADS-CoV from 68 bats to pigs are still unknown and need to be further explored. Recently it was shown 69 that SADS-CoV is able to infect cells from a broad range of species including mouse, 70 chicken, pig, monkey and human, indicating a high potential of the SADS-CoV for 71 interspecies transmission 17 . 72 The spike glycoprotein of coronaviruses mediates viral entry by binding host 73 receptor with the S1 subunit and fusing viral and cellular membranes with the S2 74 subunit, thereby determining viral host range and tissue tropism 18,19 . As a class I viral 75 fusion protein, the spike exists on the envelope of virion as a homotrimer and each 76 monomer contains more than 1000 amino acid residues that can be cleaved into S1 and 77 S2 subunits 18 . For most coronaviruses, the N-terminal domain (NTD) of the S1 subunit HCoV-OC43 utilizes NTD to recognize glycans 29 ; and one exception is MHV, which 83 utilizes the NTD to bind mouse receptor CEACAM1a 30 . Therefore, the S1 subunit, 84 especially its NTD and CTD, is the most variable region of the spike, and is responsible 85 for different tropisms of coronaviruses. In comparison, the S2 subunit containing the 86 fusion peptide (FP) and heptad repeats (HR1 and HR2) for membrane fusion are more 87 conserved in both sequence and structure 18,19 . For the SADS-CoV, receptor analysis 88 indicated that none of the known coronavirus protein receptors including ACE2, DPP4 89 and APN are essential for the cell entry 7,17 . There are also no reports regarding to the The spikes of SADS-CoV (1130 amino acid residues) and HKU2 (1128 amino 102 acid residues) are the shortest among all known coronavirus spike glycoproteins and 103 their amino acid identities to other known coronavirus spikes are lower than 28%, 104 indicating the spikes of HKU2 and SADS-CoV are unique 5-10,47 . In this study, we report 105 the cryo-EM structures of the SADS-CoV and HKU2 spike trimers at 2.83 Å and 2.38 106 Å resolution, respectively. The HKU2 spike trimer structure is the first one from bat 107 coronavirus. We analyzed the HKU2 and SADS-CoV trimer structures and also 108 compared the NTD, CTD, SD1 and SD2 domains of the S1 subunit and the S2 subunit

118
HKU2 ectodomains (residues 1-1066) and SADS-CoV ectodomains (residues 1-1068) 119 were separately cloned into pFastBac-Dual vector (Invitrogen) with C-terminal foldon 120 tag and Strep tag. After expression in Hi5 insect cells and purification to homogeneity, 121 the cryo-EM images on these two spike ectodomains were recorded using FEI Titan 122 Krios microscope operating at 300 KV with a Gatan K2 Summit direct electron detector 123 ( Supplementary Fig. 1). About 1,400,000 particles for HKU2 spike and 900,000 124 particles for SADS-CoV spike were subjected to 2D classification, and a total of 125 421,490 particles of HKU2 spike and 152,334 particles of SADS-CoV spike were selected and subjected to 3D refinement with C3 symmetry to generate density maps 127 ( Supplementary Fig. 2). The overall density maps were solved to 2.38 Å for HKU2 128 spike and 2.83 Å for SADS-CoV spike (gold-standard Fourier shell correlation = 0.143) 129 (Supplementary Fig. 1 and Supplementary Fig. 2). The atomic-resolution density map 130 enabled us to build nearly all residues of HKU2 spike ectodomains (residues 17-995) 131 except for a few breaks (residues 129-141 and 204-204), as well as 48 N-linked glycans 132 (Supplementary Fig. 3a and Supplementary Fig. 4a). The final refined model of SADS-
135 Data collection and refinement statistics for these two structures are listed in Overall structures of HKU2 and SADS-CoV spikes 139 The overall structures of HKU2 and SADS-CoV spikes we determined here resemble 140 the previously reported pre-fusion structures of coronaviruses spikes. Both spike 141 trimers have a mushroom-like shape (~150 Å in height and ~115 Å in width) (Fig. 1a), 142 consisting of a cap mainly formed by β-sheets of the S1 subunit, a central stalk mainly 143 formed by α-helices of the S2 subunit, and a root formed by twisted β-sheets and loops 144 of the S2 subunit (Fig. 1a). In each trimer there is a C3 axis along the central stalk ( Fig.   145 1a). The amino acid identity between HKU2 and SADS-CoV spikes is 86%, and these Cα atoms of the trimer. Due to the high structural similarity, we will use the HKU2 149 structure to present the features of both spikes in the subsequent description, whereas 150 significant differences between them will be pointed out only when necessary.

151
The S1 subunit of the HKU2 spike comprises two major domains, NTD and CTD, 152 which are followed by two subdomains SD1 and SD2 connecting them to the S2 subunit 153 ( Fig. 1b and Fig. 1c). The S1 subunits from three monomers form the cap of the spike, 154 in which the three CTDs in the inner part are at the apex sitting on top of the central 155 stalk and the three NTDs are located outside the CTDs surrounding the central stalk ( Fig. 1a). The NTD, CTD, SD1 and SD2 of the S1 subunit are all mainly composed of 157 β strands regarding to the secondary structure feature (Fig. 1c). In contrast, the upstream 158 helix (UH), fusion peptide (FP), connecting region (CR), heptad repeat 1 (HR1) and 159 central helix (CH) of the S2 subunit are mainly composed of helices, whereas the β-160 hairpin (BH) and subdomain 3 (SD3) at the bottom part of the S2 subunit mainly consist 161 of β stands and loops (Fig. 1c). Moreover, the residues after the SD3, which contain the  The SD1 and SD2 of the S1 subunit and the S2 subunit are highly similar in amino   (Fig. 2b). Besides, C 234 -C 244 (numbered in HKU2 and connecting the bottom helix 216 to the bottom sheet) is conserved in both subtype I and subtype II NTDs (Fig. 2b). 217 The NTDs of β-coronaviruses including BCoV, HCoV-HKU1, HCoV-OC43, 218 MERS-CoV, SARS-CoV and MHV resemble the subtype I, rather than the subtype II 219 NTD in the topology and distribution of the disulfide bonds (Fig. 2c). These β-220 coronavirus NTDs have additional loops in the N-terminus, between β1 and β2 strands, 221 and between β6 and β7 stands (numbered in HKU2 structure), forming an extensive 222 ceiling-like structure on top of the galectin-like fold (Fig. 2c). It has been found that the  The CTD of HKU2 has a twisted five-stranded antiparallel β sheet as the core with 234 connecting loops between the stands (Fig. 3a). It contains four disulfide bonds: C 277 -235 C 300 and C 285 -C 290 at the N-terminus, C 341 -C 397 at the C-terminus and the last one C 331 -236 C 369 connecting the β2 and β5 strands in the core β sheet (Fig. 3a). Interestingly, the 237 CTD core of HKU2 is of high structural similarity with the conserved CTD core of β-238 coronaviruses and the disulfide bonds in the CTD of HKU2 except for C 285 -C 290 are 239 also detected in all β-coronavirus CTDs (Fig. 3b). These CTDs have the core of one 240 twisted β-sheet and here we name them as one-layer CTD subtype ( Fig. 3a and Fig. 3b).

241
The β-coronavirus CTDs always have an insertion consisting of loops and/or stands 242 between the β5 and β6 strands of the core (Fig. 3b). SARS-CoV, MERS-CoV, HKU4 243 and HKU5 have receptor-binding motif (RBM) in this insertion region responsible for 244 binding their respective protein receptors 45 . In the CTD of HKU2, there is only one 245 short loop between the β5 and β6 strands of the core twisted β-sheet (Fig. 3a). 246 Although as members in the α-genus, HKU2 and SADS-CoV CTD structures are 247 significantly different from those of other α-coronaviruses HCoV-NL63, HCoV-229E, 248 PEDV, TGEV and PRCV that contain two layers of β-sheets (Fig. 3c). And we named 249 these CTDs as two-layer CTD subtype. All available two-layer CTD structures can be and SADS-CoV. These two-layer CTDs contain two highly conserved disulfide bonds:

255
The CTD of δ-coronavirus PdCoV have a core of two β-sheets, belonging to the 256 two-layer CTD subtype (Fig. 3c). As for the γ-coronavirus IBV, the core of its CTD is 257 also similar to the typical two-layer CTD (Fig. 3c). However, several β strands are  The SD1 and SD2 are two subdomains following the CTD in the S1 subunit, linking 264 the CTD to the S2 subunit. The HKU2 SD1 is a partial β barrel consisting of five β 265 strands and a disulfide bond (C 409 -C 458 ) connecting its C-terminus to the β1 strand (Fig.   266 4a). This five-stranded β barrel and the linking disulfide bond are conserved among all 267 four genera of coronavirus (Fig. 4a). The HKU2 SD2 has a structure of two layers of 268 β-sheet with an additional short α-helix over the top sheet (Fig. 4b). The additional α-269 helix and the top sheet is linked by a disulfide bond (C 482 -C 509 ), and another disulfide 270 bond (C 524 -C 533 ) links the C-terminal loop to the bottom sheet (Fig. 4b). The two-layer 271 core structure and the second disulfide bond are conserved among all genera of 272 coronavirus, however, the additional α helix and the first linking disulfide bond is a 273 distinct feature of β-coronaviruses plus α-coronavirus HKU2 and SADS-CoV (Fig. 4b). 274 This additional helix appears to be an insertion between the primitive β2 and β3 strands 275 of the SD2, and is retained during evolution of β-coronaviruses.

277
Quaternary packing of the NTD and CTD in the spike 278 It has been observed that coronaviruses have two types of quaternary packing mode of 279 the S1 subunits in the trimer: intra-subunit packing and cross-subunit packing 41 .

280
Actually, this is mainly caused by different positioning and interaction between NTD 281 and CTD in the spike monomer. The HKU2 S1 subunit, similar to those in -282 coronaviruses HCoV-NL63, HCoV-226E and PEDV and -coronavirus PdCoV, have 283 an "inward" CTD which contacts with the NTD (Fig. 5a). The three structural "NTD-284 CTD" modules in the cap region of these spikes are composed of NTD and CTD from 285 the same monomer, forming the intra-subunit packing in the spike trimer (Fig. 5a). The 286 S1 subunits of other coronaviruses in the and -genera including MHV, SARS-CoV, 287 MERS-CoV, HCoV-OC43 and IBV have an "outward" CTD that is far away from the 288 NTD (Fig. 5b). Therefore, the three structural "NTD-CTD" modules in the cap region 289 of these spikes have the NTD from one monomer and the CTD from the adjacent 290 monomer, forming the cross-subunit packing in the spike trimer (Fig. 5b). Interestingly, 291 we found that the "outward" CTDs always have an insertion in the core structure, such 292 as β-coronavirus CTDs and -coronavirus IBV CTD (Fig. 5b). In contrast, all "inward" 293 CTDs only have the one-layer or two-layer core structure without obvious inserted 294 region. 297 Sequence analysis suggested that the S1/S2 protease cleavage site at the boundary 298 between the S1 and S2 subunits is R544-M545 in HKU2 spike and R546-M547 in 299 SADS-CoV spike 5,8,47 . Compared to the S1 subunit, the topology and structure of S2 (residues 888-929) and a β-sandwich like SD3 (residues 930-995) ( Fig. 1b and Fig. 6a). 305 Like in other coronavirus spikes in the prefusion state, the model of HR2 after SD3 was 306 not built in the structure due to poor density. Five disulfide bonds in S2 are detected.

307
Two of them (C 590 -C 612 and C 595 -C 601 ) stabilize the folded helices of UH, C 696 -C 706 308 bends the CR, C 884 -C 895 links the CH and the BH, and C 934 -C 943 is within the SD3 (Fig.   309 6a). The first four disulfide bonds are conserved in all coronaviruses, and the last one  Fig. 4). The typical CR 321 in the S2 subunit contains three helices and one short strand, with a disulfide bond 322 bending the first and second helix to form a turning (Fig. 6b). In HKU2, the second 323 helix is replaced by a short strand (713-716) and the third helix is replaced by a loop 324 (721-741), therefore there are two short strands and only one helix in HKU2 CR (Fig.   325 6c). The conserved disulfide bond C 696 -C 706 makes the first helix of CR in HKU2 spike 326 turn upside down. The S2' cleavage site (between R671 and S672) is then covered by 327 the reversed CR helix and loops, and R671 interacts with E723 in the loop and K697 328 and K698 in helix 1 (Fig. 6c). In other coronaviruses, taking the MHV S2 for example, 329 the helix 1 does not cover the S2' site (between R869 and S870), and R869 only loosely 330 interacts with T929 in helix 3 (Fig. 6b). After the dissociation of the S1 subunit triggered   The HKU2 and SADS-CoV have one NTD in the S1 subunit, and their structures 358 are more similar to the NTD1 than the NTD2 of -coronaviruses HCoV-NL63 and 359 PEDV, whereas the only NTD of HCoV-229E is structurally more similar to the NTD2 360 than the NTD1 (Fig. 2). Therefore, we suggest that α-coronaviruses have two subtypes 361 of NTD. The evolution relationship between them are not clear yet. It was once 362 suggested that the presence of two NTDs in HCoV-NL63 is a result of gene 363 duplication 31 . However, the sequence identity between these two NTDs is only 15.7% 364 in HCoV-NL63 and 12.9% in PEDV. Considering that HKU2 (SADS-CoV) and 365 HCoV-229E have one NTD belonging to either subtype I or subtype II, a more plausible 366 evolution way of the NTD in α-coronaviruses is the recombination of two separate 367 primitive domains into the genome, resulting in the presence of two NTDs in the S1 368 subunit α-coronaviruses including HCoV-NL63 and PEDV. To be note, these two NTD  (Fig. 2d). In contrast, the δ-coronavirus PdCoV NTD is similar to the 377 HCoV-229E NTD representing the subtype II in both architecture and disulfide bond 378 positions (Fig. 2e). A previous study of the IBV spike proposed that α-coronavirus 379 NTDs are probably the most ancestral and the NTDs of the four genera form an 380 evolutionary spectrum in the order of α-, δ-, γ-, and β-genus 41 . Our proposal here is 381 similar to the previous one in the point that two NTD subtypes in -coronaviruses may 382 represent primitive structures that could be the evolutionary ancestors of NTDs.

387
The HKU2 and SADS-CoV CTD structure have a one-layer core consisting of a        (a) α-coronavirus S1 and δ-coronavirus S1 use intra-subunit packing pattern. NTD and CTD from the first monomer are colored blue, the second are colored red, and the third are colored green. PDB codes: HCoV-229E, 6U7H; PdCoV, 6B7N. (b) β-coronavirus S1 and γ-coronavirus S1 use cross-subunit packing pattern. NTD and CTD from the first monomer are colored blue, the second are colored red, and the third are colored green. The extra loop of IBV and the extra domains of βCoVs are colored yellow and labeled as EX. PDB codes: MHV, 3JCL; IBV, 6CV0; SARS, 5XLR.