Structural analysis of the SARS-CoV-2 methyltransferase complex involved in coronaviral RNA cap creation

COVID-19 pandemic is caused by the SARS-CoV-2 virus that has several enzymes that could be targeted by antivirals including a 2’-O RNA methyltransferase (MTase) that is involved in the viral RNA cap formation; an essential process for RNA stability. This MTase is composed of two nonstructural proteins, the nsp16 catalytic subunit and the activating nsp10 protein. We have solved the crystal structure of the nsp10-nsp16 complex bound to the pan-MTase inhibitor sinefungin in the active site. Based on the structural data we built a model of the MTase in complex with RNA that illustrates the catalytic reaction. A structural comparison to the Zika MTase revealed low conservation of the catalytic site between these two RNA viruses suggesting preparation of inhibitors targeting both these viruses will be very difficult. Together, our data will provide the information needed for structure-based drug design.


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused the global pandemic of coronavirus disease (COVID- 19), which has currently led to hundred thousands of deaths and threatens to decimate lives of millions if not billions by causing a global economic crisis. [1] Coronaviruses have long been a threat, but recent developments show that they should be classified as extremely dangerous pathogens and that we must develop effective means to suppress and treat the diseases caused by these viruses. [2] Currently, there is no approved treatment for diseases caused by coronaviruses and therefore there is a pressing need for the discovery and development of novel therapeutic agents for treatment of COVID-19 and other coronavirus infections. [3] Directly acting antiviral agents have revolutionized the treatment of numerous viral disease such as hepatitis B and C and AIDS [4,5] and such a compound, remdesivir, was also very recently FDA approved for emergency treatment of COVID-19 patients. These therapeutics directly aim at a certain viral protein and, therefore, a deeper understanding of the function of individual viral proteins is needed to develop future therapies of COVID-19 and other possible coronavirus infections.
Coronaviruses have the largest genomes of all RNA viruses. In particular, the genome of SARS-CoV-2 has ~29 800 bases, which encodes four structural and sixteen nonstructural proteins (nsp1 -nsp16) that are essential for the replication of this virus. [6,7] As all positive-sense single-stranded RNA viruses (+RNA viruses), SARS-CoV-2 is able to take full advantage of the cell environment and use it for its replication. [8] Importantly, viral RNA must be protected from the cellular innate immunity. One of the most important elements that ensures the integrity of viral RNA is the cap, a specific arrangement at the 5 'end of the RNA molecule that consists of a N-methylated guanosine triphosphate and C2'-O-methylrybosyladenine (type 1 cap, Figure 1). This arrangement resembles the native mRNA of the host cells, stabilizes the RNA, and ensures effective process of its translation. In human cells, however, the cap is installed on newly transcribed mRNA already in the nucleus, to which coronaviruses do not have access. Instead, they possess their own cap-synthesizing enzymes. Clearly, this process is essential for the survival and further replication of viral RNA in cells. In principle, four different processes are necessary for installation of a type 1 cap on RNA (either human mRNA or coronavirus RNA). First, the -phosphate from a 5'-triphosphate end of the nascent RNA is removed by 5'-RNA triphosphatase. Second, a guanosine monophosphate (GMP) is attached to the formed 5'-diphosphate end of RNA by a guanylyltransferase using GTP as the source of GMP. Finally, the methylation steps take place. In this case, two separate enzymatic steps are required: one for N-7 methylation of the GTP nucleobase (N-7 methyltransferase) and the other for C2'-O methylation of the following nucleotide.
Coronaviruses use sequence installation of the cap that is performed by several nonstructural proteins (nsp) encoded by their genome. For coronaviruses, nsp10, 13, 14 and 16 proteins appear to be involved in this process. [9] The primary function of nsp13 protein is the unwinding of viral RNA during the replication. Therefore, it is considered to be essentially the helicase. However, it is also a protein with a 5'-RNA triphosphatase activity responsible for cleaving monohosphate at the 5'-end of the nascent RNA to provide a diphosphate. [10] There is still no clear evidence for any of the coronavirus proteins to possess the guanylyltransferase function associated with the cap creation. [9] Nsp14 and nsp16 are responsible for the methylation of the cap on the guanine of the GTP and the C2' hydroxyl group of the following nucleotide, respectively. Both nsp14 and nsp16 are S-adenosylmethionine (SAM)-dependent methyltransferases (MTases) and seem to be essential for the viral life-cycle. [7] In particular, nsp16 appears to be a very promising molecular target from the perspective of medicinal chemistry and drug design. It has been shown that this 2'-O methyltransferase (MTase) is indispensable for replication of coronaviruses in cell cultures. [11,12] Enzymatic activities of both these MTases (nsp14 and nsp16) are significantly enhanced by nsp10, which is a necessary cofactor for their proper function. [12][13][14][15][16] Here we report on the crystal structure of SARS-CoV-2 nsp10-nsp16 in complex with sinefungin, a pan-MTase inhibitor originally isolated from Streptomyces griseoleus. [17] The structure reveals an overall fold similar to SARS-CoV nsp10-nsp16, and, importantly, reveals atomic details in how sinefungin inhibits the nsp16 MTase. This provides the starting point for specific inhibitor design.

Results
To obtain the nsp10-nsp16 protein complex we co-expressed the appropriate genes together in E. coli. The complex was stable during protein purification suggesting suitability for structural analysis. The nap10-nsp16 complex was supplemented with the pan-MTase inhibitor sinefungin and started crystallization trials. Eventually we obtained crystals that diffracted to 2.4Å resolution. The structure was solved by molecular replacement and revealed a mixed alpha-beta fold with sinefungin bound in a central canyon ( Figure 2). A central feature of the nps16 MTase is a strip of parallel and anti-parallel βsheets (as they appear in the structure from the nsp10 interface: β4, β3, β2, β6, β7, β9, β8, β1) in the shape of the letter J which is stabilized from the inside by surrounding helices α3 and α4 and from the outside by helices α5-α9. Nsp10 could be divided in two subdomains a helical α-subdomain composed of helices α1-α4 and α6 and a β-subdomain composed of two anti-parallel β-sheets (nsp10 β1 and β2), a short helix α5 and several coiled-coil regions. A key feature of the nsp10 fold are two zinc binding sites. One is formed by three cysteine residues (Cys74, Cys77, Cys90) and a histidine residue (His83) and is located between the helices α2 and α3 and appears to stabilize them in the observed conformation. The other zinc binding site is formed by four cysteine residues (Cys117, Cys120, Cys128, Cys130) and stabilizes the very Cterminus of the nsp10 protein. Figure 3: Interface of the nsp10-nsp16 protein complex. A) Surface representation of the nsp10-nsp16 protein complex where the interface is labeled in blue (nsp10) and yellow (nsp16). B) Bottom view of the nsp10 interface; box provides better detail. The interface residues of nsp10 are depicted in blue stick representation. C) Top view of the nsp16 interface in yellow, where all interface residues are depicted in yellow and shown in greater detail in the box. D) Side view of the nsp10-nsp16 interface, Lys93 (nsp10) with main chain groups from Ser105 and with further residues via two water molecules (details shown in SI Figure  S1E). E) Residue Leu45 immersed into the hydrophobic pocket defined by interface residues Pro37, Ile40, Val44, Ala45, Thr48, Leu244 and Met247 of nsp16 (illustrated in partially transparent interface representation in yellow). Waters are not shown.
The nsp10-nsp16 dimer interface is 1983 Å2 large and it is formed by the nsp10 helices α2, α3, α4 and a coiled-coil region connecting helix α1 and the sheet β1 (residues Asn40 to Thr49) and the inner side of the nsp16 J-motif including sheets β4 and helices α3, α4 and α10 (Figure 3). Nsp10 and nsp16 interact through a large network of hydrogen bonds often mediated by water molecules (SI Figure 1) or through hydrophobic interactions. Two residues Val42 and Leu45 of nsp10 are immersed into hydrophobic pockets formed by helices α3, α4 and α10 of nsp16 ( Figure 3F). Nsp10 Val42 is anchored in a nsp16 hydrophobic pocked formed by residues Met41, Val44, Ala73, Val78, and Pro80. Similarly, Leu45 is anchored in a deep hydrophobic pocket formed by nsp16 residues Pro37, Ile40, Val44, Thr48, Leu244 and Met247. Further on, the main chain of the aforementioned nsp10 Leu45 also participates in two hydrogen bonds. The carbonyl group of Leu45 hydrogen bonds with Glu87 of nsp16 directly, while the amine group of the mainchain of Leu45 is connected with residue Thr48 of the nsp16 via water bridge mediated by hydrogen bonds (SI Figure 1). Among other interactions, a positively charged Lys93 of nsp10 forms three hydrogen bonds with two water molecules (waters #54 and #170). They bridge nsp16 with further hydrogen bonds with the side chain of nsp10 Thr106 and the main chain carbonyl group of nsp10 Ala107 and amine group of Ser105 (SI Figure 1). This large and complex interface explains the observed stability of the nsp10-nsp16 complex.
Electron density for sinefungin was clearly visible upon molecular replacement. Sinefungin is bound in the SAM binding pocket that is localized in a canyon within nsp16 (Figure 4). The nsp10 adjacent side is formed by ends of the parallel sheets β2, β3 and β4 and helices α5 and α4. The opposite site of the ligand binding canyon is formed by the sheet β6 and helices α3 and α6. The ligand binding site could also be divided into nucleoside and amino acid (methionine) binding pockets. The nucleoside forms hydrogen bonds with several residues including Asp99, Asn101 and Asp114 while the amino acid part is recognized by Asn43, Asp130 and Lys170 ( Figure 4B). These six residues are absolutely conserved among SARS-CoV-2, SARS and MERS (SI Figure 2) highlighting the importance of RNA methylation for coronaviruses. We were also interested in the possible binding mode of the RNA substrate. We analyzed the electrostatic surface potential to reveal a putative RNA binding site. A positively charged canyon proximal to the SAM binding pocket could be easily spotted ( Figure 5A). To figure out the orientation of RNA in the binding pocket we took advantage of the existing structure of human mRNA (nucleoside-2'-O-)-methyltransferase that was crystallized with a short piece of RNA (PDB code: 4N48). [18] This structure can be superposed with our structure (SI Figure 3) accurately enough to elucidate the RNA binding mode. The first nucleoside, 7N-methylated guanosine is bound in the upper part of the RNA binding canyon while the second nucleoside is positioned in the central part of the canyon in a way that its ribose ring gets in close proximity to the amino group of sinefungin, which in this case represents the methyl group to be transferred structurally explaining how nsp16 performs 2'-O methylation ( Figure 5). We approximate -a real crystal structure is necessitated to obtain atomic details of the methylation reaction.

Discussion
Coronaviruses have the longest genome among RNA viruses. The size of the RNA genome is limited by the (in)stability of the RNA, fidelity of the RdRps, its ability to correct excessive mutations and by the limited space for nucleic acid within the icosahedral capsid. Every viral enzyme is a potential drug target because RNA viruses do not have the luxury to encode non-essential accessory proteins. In this study, we have structurally analyzed SARS-CoV-2 2'-O-ribose methyltransferase, an essential enzyme involved in RNA cap formation which ensures stability of the viral RNA because non-methylated RNA located in cytoplasm is prone to degradation and cannot be efficiently translated. Our analysis revealed overall fold similarity between SARS and SARS-CoV-2 nsp10-nsp16 complexes (RMSD = 0.747) and also a high conservation of the SAM binding site among coronavirus, in fact, all the residues that are involved in ligand-hydrogen bonding are absolutely conserved between SARS, SARS-CoV-2 and MERS. The co-crystal structure we obtained is, however, not with the natural methyl donor, SAM, but with the pan-MTase inhibitor sinefungin.
Interestingly, the nps16 MTase is not active without the accessory protein nsp10. The mechanism of nsp16 activation is elusive because we do not have a structure of any unliganded coronaviral nsp16 protein. Structural analysis of the SARS-CoV nsp10 protein in an unliganded form and in complex with nsp16 reveals no significant conformational change of the nsp10 upon nsp16 binding. [19] Therefore, it is expected that nsp10 induces a conformational change in the nsp16 MTase that switches nsp16 in a productive enzyme. In principle, the nsp10-nsp16 interface is a drug target. However, given the large area of this interface and complex network of hydrogen bonds and hydrophobic interactions it is unlikely that it could be targeted by a small drug-like molecule.
The structure of nsp10-nsp16 complex reveals several important factors that can be exploited to target the installation of a viral cap on a nascent viral RNA molecule in a therapeutic design. Nsp10 -nsp16 complex must be able to bind the previously introduced methylated GTP, and recognize at least the first Figure 5: RNA recognition. The surface of the nsp10-nsp16 was colored according to electrostatic surface potential. Sinefungin is localized to the SAM binding pocket. The RNA binding pocket is characterized by a positively charged surface that interacts with the RNA phosphate backbone. The RNA cap is located at the top of the RNA binding pocket while the active site is located at the interface of the RNA and SAM binding pockets. nucleotide of the RNA strand, as well as the substrate for the methylation reaction, SAM. Based on the position of sinefungin in our structure, it is apparent where both of these two reaction partners bind to nsp16. These two binding sites form well-defined canyons in the structure of nsp16 as seen in Figure 5. The binding site for sinefungin must be in direct contact with the 2′-hydroxy group on the first nucleotide following the introductory methylated GTP moiety. In particular, the chiral amino at C6' of sinefungin has to be directed towards the RNA cap binding site. Therefore, modifications on this part of the molecule, based on a rational structural-based design, has led to the preparation of highly active inhibitors of various methyltransferases as shown previously on various sinefungin-related derivatives. [20] Also detailed knowledge of the amino acid moiety on the sinefungin binding site is extremely important for design of potential SARS-CoV-2 therapeutics, since the possible design of specific bioisosteres of the amino acid scaffold may play a vital role in the development of novel cell permeable compounds as potential inhibitors of this essential MTase.
We next sought to determine whether an inhibitor of a coronaviral methyltransferase could be potentially active against flaviviral methyltransferases. We performed a structural alignment of the SARS-CoV-2 MTase and the ZIKV MTase ( Figure 6). [21] The structural comparison illustrates that development of an MTase inhibitor active against both corona-and flaviviruses is highly unlikely unless it would be a promiscuous inhibitor closely resembling the substrate such as sinefungin. However, two residue pairs are conserved among corona-and flaviviruses: i) the Zika Asp114 and CoV-2 Asp131 that make a hydrogen bond with the adenine base which is essential for its recognition and ii) Zika Asp130 and CoV-2 Asp146 that are close to the methylation reaction center and important for catalysis. Also, this shows that there is a significant lipophilic cavity in close proximity to the adenine nucleobase of sinefungin in both SARS-CoV-2 and ZIKV MTase. In the case of flavivirus MTases, it has been shown, that this part of the enzyme can be effectively targeted by MTase inhibitors without affecting human proteins. Therefore, we believe, that this part of the nsp16 protein may play a very important role in future design of novel COVID-19 therapeutics.
In conclusion, we have acquired a crystal structure of SARS-CoV-2 nsp10-nsp16 complex, the activated 2'-O-methyltransferse, which is essential for RNA capping during the viral cycle. Since this process is essential for viral survival and replication in cells, it can be targeted by novel chemical compounds based on this structural information.

Materials and methods
Cloning, protein expression and purification -Artificial codon optimized genes encoding SARS-CoV-2 nsp10 and nsp16 proteins were commercially synthesized (Invitrogen) and cloned in a pSUMO vector that encodes an N-terminal His8x-SUMO tag. The proteins were expressed and purified using our standard protocols [22,23]. Briefly, E. coli Bl21 (DE3) cells were transformed with the expression vector and grown at 37°C in LB medium supplemented with 25 µM ZnSO4. After OD600 nm reached 0.5, the protein expression was induced by addition of IPTG to final concentration 300 µM and the protein was expressed overnight at 18°C. Bacterial cells were harvested by centrifugation, resuspended in lysis buffer (50 mM Tris, pH 8, 300 mM NaCl, 5 mM MgSO4, 20 mM imidazole, 10% glycerol, 3 mM βmercaptoethanol) and lysed by sonication. The lysate was cleared by centrifugation. Next, the supernatant was incubated with NiNTA agarose (Machery-Nagel), extensively washed with the lysis buffer and the protein was with lysis buffer supplemented with 300 mM imidazole. The proteins were dialyzed against lysis buffer and digested with Ulp1 protease at 4°C overnight. The SUMO tag was removed by a second incubation with the NiNTA agarose. Finally, the proteins were loaded on HiLoad 16/600 Superdex 200 gel filtration column (GE Healthcare) in SEC buffer (10 mM Tris pH 7.4, 150 mM NaCl, 5% glycerol, 1 mM TCEP). Purified proteins were concentrated to 7 mg/ml and stored in -80°C until needed.
Crystallization and structure refinement -Crystals grew in sitting drops consisting of 300 nl of the protein and 150 nl of the well solution (200 mM NaCl, 100 mM Mes, pH 6.5, 10% w/v PEG 4000) in five days. Upon harvest the crystals were cryo-protected in well solution supplemented with 20% glycerol and frozen in liquid nitrogen. Diffraction data were collected at the home source. The crystals diffracted to 2.4 Å and belonged to the trigonal P3121 spacegroup. Data were integrated and scaled using XDS [24]. The structure was solved by molecular replacement (nps10-nsp16 complex PDB ID 6W4H as the search model) and further refined in Phenix [25] and Coot [26] to good R-factors (Rwork = 18.05% and Rfree = 20.86%) and good geometry as summarized in Table 1.

Data deposition
The crystal structure was deposited in the RCSB Protein Data Bank, www.pdb.org under an accession code 6YZ1.

Author Contributions
PK performed all experiments, JS interpreted the diffraction data, RN and EB designed and supervised the project. All authors participated in preparation of the manuscript.