Introduction

A new coronavirus named severe acute respiratory syndrome coronavirus (SARS) 2, or SARS-CoV-2, caused a world pandemic disease called COVID-191,2,3,4. A significant research push is now underway to repurpose existing drugs and to design new therapeutic agents targeting various components of the virus5. The viral single-stranded RNA genome is 82% identical to the earlier SARS coronavirus (SARS-CoV) with some viral proteins being more than 90% homologous to SARS-CoV6. SARS-CoV-2, similar to many other single-stranded RNA viruses, employs a chymotrypsin-like protease (3CL main protease, or 3CL Mpro) to enable the production of non-structural proteins essential for viral replication7,8,9.

3CL Mpro cleaves two large overlapping polyproteins pp1a and pp1ab at least 11 conserved sites, including its own N-terminal and C-terminal autoprocessing sites. The enzyme has a recognition sequence of Leu-Gln↓Ser-Ala-Gly, where ↓ marks the cleavage site, but shows sequence promiscuity. The absolute dependence of the virus on the correct function of this protease, together with the absence of a homologous human protease, makes 3CL Mpro an attractive, albeit difficult, target for the design of specific protease inhibitors10. Unfortunately, to date, no protease inhibitors targeting SARS-CoV 3CL Mpro have been FDA-approved, despite significant research effort during the past fifteen years11,12,13,14,15,16,17.

The 3CL Mpro structure is composed of three domains18,19. Domains I (residues 8–101) and II (residues 102–184) are composed of antiparallel β-barrel structures and are the catalytic domains. Domain III (residues 201–303) is composed of five α-helices and is responsible for the enzyme dimerization. Based on studies of SARS-CoV 3CL Mpro this helical domain plays an essential role in the protease function as the monomeric enzyme is not catalytically active20,21,22,23,24. Thus, 3CL Mpro forms a functional dimer through intermolecular interactions, mainly between the helical domains (Fig. 1a).

Fig. 1: The three-dimensional structure of 3CL Mpro from SARS-CoV-2.
figure 1

a One monomer of the dimer is shown as an orange cartoon, while the other monomer is shown as a teal surface with the catalytic site cavity highlighted with water molecules shown as red spheres. b A closeup view of the catalytic site cavity in which the catalytic residues (Cys145 and His41) are highlighted in purple with the residues that flank the cavity highlighted in green with water molecules shown as red spheres.

3CL Mpro is uniquely diversified to have an unconventional Cys catalytic residue. Unlike other chymotrypsin-like enzymes and many Ser (or Cys) hydrolases, it has a catalytic Cys-His dyad instead of a canonical Ser(Cys)-His-Asp(Glu) triad8. The catalytic residues Cys145 and His41 in 3CL Mpro are buried in an active site cavity located on the surface of the protein. This cavity can accommodate four substrate residues in positions P1’ through P4, and it is flanked by residues from both domains I and II (Fig. 1b).

We present here atomic details pertinent to the function and inhibitor binding to SARS-CoV-2 3CL Mpro. To gain these insights we determined a room temperature (293 K) X-ray structure of the enzyme to 2.30 Å resolution that provides a proper and accurate physiologically relevant template for structure-assisted drug design and molecular simulations.

Results

Atomic details of 3CL Mpro active site at room temperature

We grew large crystals (Supplementary Fig. 1) that could be used on a home source to ensure minimal radiation damage. In our structure of ligand-free 3CL Mpro, the catalytic Cys145 Sγ is 3.8 Å from His41 Nε2, which appears to be too long for the formation of a hydrogen bond (Fig. 2). This is not surprising, taking into account the experimental pKa values of 8.0 ± 0.3 for Cys145 and 6.3 ± 0.1 for His41 measured previously for the SARS 3CL Mpro that shares 96% homology with the SARS-CoV-2 enzyme25,26 and the poor hydrogen bonding properties of thiols. Thus, in our crystallization conditions (see “Crystallization” section in “Methods”) at the pH in the crystallization drop of 7.0, both catalytic residues are expected to be uncharged adopting the ligand-free enzyme’s state before substrate or inhibitor binding.

Fig. 2: The catalytic site of 3CL Mpro from SARS-CoV-2.
figure 2

Hydrogen bonds are shown as blue dashed lines; the distance between Cys145 and His41 is shown as a black dotted line, the dashed red line indicates a strong C–H…O bond. The 2FO – FC electron density map contoured at 1.6 σ level is shown as a violet mesh. All distances are given in Ångstroms.

In this ligand-free state, the thiol of Cys145 is protonated, and the imidazole of His41 is neutral. The catalytic dyad would be activated by a proton transfer from Cys145 to His41 possibly triggered by substrate binding or occurring in a transition state during the attack by the sulfur on the carbonyl carbon atom of the scissile peptide bond. Conversely, His41 makes a strong hydrogen bond with a water molecule (suggestively named H2Ocat), which in turn is stabilized through hydrogen bonds of 2.9 and 3.0 Å with the side chains of Asp187 and His164, respectively. The position of Asp187 is further stabilized through a salt-bridge with the nearby residue Arg40.

H2Ocat is involved in a complex network of interactions, mediating polar contacts between the catalytic His41, a conserved His164, and a conserved Asp187 located in the domain II–III junction. It is not unreasonable to suggest that this water may play a role of the third catalytic residue, completing the non-canonical catalytic triad in 3CL Mpro and acting to stabilize the positive charge on His41 by mediating its electrostatic interaction with the negatively charged Asp187 during catalysis. We note that in some X-ray structures of the ligand-free 3CL Mpro from SARS-CoV-2 (e.g., PDB ID 6M03) obtained at 100 K, this potentially crucial water molecule is absent.

Unsurprisingly, a significant number of reports have now appeared in which 100 K X-ray structures of the ligand-free 3CL Mpro have been used for molecular docking simulations of various small molecules, including many of the therapeutics approved to treat other diseases. Using a least-squares fit in the Coot molecular graphics program27 we superimposed our room temperature structure of 3CL Mpro with one obtained at 100 K (PDB ID 6Y2E)18. Overall, structures are similar with an R.M.S.D. for Cα atoms of 0.32 Å (Fig. 3a). The conformation of residues 192–198 differs between the room temperature and 100 K structures (Fig. 3b). The peptide bond of Ala194 is flipped in the room temperature structure pointing inwards into the P5 inhibitor binding pocket where it adopts a conformation similar to that seen in 3CL Mpro in complex with inhibitor N3 (PDB ID 6LU7)19. Residues Thr196 and Asp197 also differ significantly in their conformations between the room temperature and 100 K structures. The backbone carbonyl oxygen atom of Thr 196 differs in position by 1.3 Å, the CG atoms of Asp197 are separated by 1.9 Å, and the position of backbone carbonyl oxygen atoms of Asp 197 differs in position by 2.6 Å. The conformations observed in the ligand-free enzyme at room temperature may be more relevant for the screening of possible drug candidates.

Fig. 3: Comparison of room-temperature and low-temperature structures.
figure 3

a A superposition of our room temperature ligand-free structure of 3CL Mpro (magenta) with the ligand-free structure of 3CL Mpro (PDB ID 6Y2E) obtained at 100 K (cyan). b Residues 192–198 in the P5 binding pocket differ in conformation between the room temperature and 100 K structures.

Ligand binding induces active site conformational changes

It is also instructive to compare our room temperature structure of the protease with the structure of an inhibitor-bound complex. For this comparison, we chose the complex with a structurally long peptidomimetic inhibitor N319 because it has substituents spanning all substrate binding subsites, including substituents at positions P4 and P5, thus closely resembling an actual substrate. Figure 4 shows the superposition of the two structures. The structural comparison reveals significant structural plasticity of the enzyme in the vicinity of the active site. To accommodate the inhibitor several secondary-structure elements move by more than 1 Å away from their positions in the room temperature structure of the ligand-free form. Such conformational changes can be characterized as induced fit due to ligand binding.

Fig. 4: Comparison of the active site geometries in the ligand-free and ligand-bound structures.
figure 4

Superposition of the room temperature ligand-free structure of 3CL Mpro (green carbon atoms) with the structure of 3CL Mpro in complex with inhibitor N3 (deep purple, PDB ID 6LU7) from SARS-CoV-2. Upon inhibitor binding, residues Met49, Leu50, and Met165 change their conformations (curved black arrows), whereas the small helix with residues 46–50 and the β-hairpin loop with residues 166–170 move apart, resulting in the loop with residues 190–194 which accommodates the inhibitor’s P5 substituent to shift closer to the β-hairpin loop. All distances are given in Ångstroms.

On ligand binding, the small helix near P2 group containing residues 46–50 and the β-hairpin loop near P3–P4 substituents with residues 166–170 shift apart by 2.4 Å, whereas the P5 loop spanning residues 190–194 moves closer to the P3–P4 loop. Two methionines, Met49 and Met165, avoid clashing with the inhibitor’s leucine at position P2 by altering their side-chain conformations in the structure of the complex. Further, the change in Met49 conformation cascades to changes in the side chain positions of Ser46 and Leu50. More dramatic conformational changes due to inhibitor binding occur at the enzyme’s C-termini. Unexpectedly, the C-terminal tail consisting of residues Ser301 through Gln306 swings 180° from its position in the room temperature ligand-free structure and is situated above the helical domain in the N3 inhibitor-bound form (Supplementary Fig. 2).

The drastic flip in the C-terminal loop conformation eliminates several hydrogen bonds made as part of the dimer interface in the ligand-free form, which may destabilize the dimer in the inhibitor-bound form to a certain degree. To assess the flexibility of these enzyme regions, we performed a 1 μs molecular dynamics (MD) simulation of the ligand-free 3CL Mpro. As shown in Supplementary Fig. 3, in our MD simulation the same regions, including the P2 helix (residues 45–50), the P5 loop (residues 190–194), and the C-terminal tail are the most dynamic, showing the largest root-mean square fluctuations (RMSF). Therefore, these structural regions are quite malleable, possibly able to accommodate various chemical groups at the P2–P5 sites of inhibitors.

The conformational flexibility of the enzyme active site detected by comparisons between the room temperature ligand-free structure reported here with the low-temperature ligand-free and inhibitor-bound structures previously reported leads us to suggest that room-temperature structure of the 3CL Mpro ligand-free form may be the more physiologically relevant structure for performing molecular docking studies to estimate drug binding and enable drug design.

Methods

General information

Protein purification supplies were purchased from GE Healthcare (Piscataway, New Jersey, USA). Crystallization reagents were purchased from Hampton Research (Aliso Viejo, California, USA).

Cloning of Mpro gene to MBP self-cleavable fusion

The 3CL Mpro (Nsp5 Mpro) from SARS CoV-2 was cloned similarly to SARS-CoV Mpro (PMID: 17189639) with the exception that upstream protein used was MBP instead of original GST. The gene for 3CL Mpro SARS-CoV-2 optimized for E. coli expression was synthesized (Supplementary Table 1) and cloned directly into pET15b vector (Bio Basic) and named Mpro-pET15b. The Mpro gene was amplified from Mpro-pET15b using the following primers: 5′-gggttggaagttttgagcgctgttctgcagtctggtttccgt and 5′-gtgatggtgatgatgcggaccctggaaggtaacaccagagcactga followed by treatment with T4 polymerase in the presence of dGTP. The vector for inserting Mpro: MBP-TEV-His7 fragment from pMHTDelta238 (PMID: 17543538) was cloned to vector pMCSG81 and named pMCSG81-Delta238. For cloning of Mpro, pMCSG81-Delta238 was amplified with the following primers: 5′-catcatcaccatcaccattgagatccggctgctand and 5′-caaaacttccaacccggcaccgtcgccgttaat. The PCR product was purified and T4 treated in the presence of dCTP. Resulting T4 treated fragments were mixed and transformed into BL21-Gold(DE3) cells (Agilent, Santa Clara, CA) and selected against ampicillin. Plasmid from a single colony was purified and sequenced. The plasmid name was designated as pCSGID-Mpro. In this expression system at the N-terminus, the construct is flanked by the maltose binding protein followed by the 3CL Mpro autocleavage site SAVLQ↓SGFRK (arrow indicates the cleavage site) corresponding to the cleavage between NSP4 and NSP5 in the viral polyprotein. At the C-terminus, the construct codes for the human rhinovirus 3C PreScission protease cleavage site (SGVTFQ↓GP) connected to a His6 tag. The authentic N-terminus is generated by 3CL Mpro autoprocessing during expression, whereas the authentic C-terminus is generated by the treatment with PreScission protease, similar to the published methodology18.

Protein expression and purification

Expression of 3CL Mpro using Luria-Bertani, supplemented with 1 g l−1 glucose, was performed in E. coli (BL21-DE3) cells using carbenicillin antibiotic (150 mg l−1 of culture). The cells were grown to an OD600 of 0.8 at 37 °C before induction with the addition of 0.2 mM isopropyl-d-thiogalactoside. The temperature was then dropped to 18 °C and 3CL Mpro was overexpressed for 18 h. The harvested cells were resuspended in the lysis buffer containing 20 mM TRIS pH = 8, 40 mM imidazole, 150 mM NaCl and 1 mM TCEP. After cells were lysed by sonication the insoluble fraction was removed by centrifugation at 30,000 × g for 30 min; the supernatant was then loaded onto a HisTrap FF column. 3CL Mpro was eluted using a linear gradient of buffer containing 20 mM TRIS pH = 8, 500 mM imidazole, 150 mM NaCl and 1 mM TCEP. The fractions containing the protease were then pooled, and PreScission protease containing a His6 tag (Sigma-Aldrich, St. Louis, MO) was added at a 500:1 molar ratio. The mixture was then dialyzed against a solution containing 20 mM TRIS pH = 8, 150 mM NaCl, and 1 mM TCEP for 18 h at 4 °C to remove the C-terminal His6 tag, resulting in a 3CL Mpro with authentic N-termini and C-termini. The PreScission-treated 3CL Mpro solution was applied to a HisTrap FF column to remove the PreScission protease, the C-terminal tag, and 3CL Mpro with uncleaved His tag. The authentic 3CL Mpro was collected in the flow-through and concentrated to 4 mg ml−1.

Crystallization

The concentrated protein solution (4 mg ml−1) was first sent to the High-Throughput Crystallization Screening Center at the Hauptman–Woodward Medical Research Institute (Buffalo, NY), where 1536 crystallization conditions were screened using 96-well sitting drop plates28. Thin plate-like crystal “flowers” appeared in several conditions within a week and were set up manually in the lab to reproduce the crystal growth. The best-looking crystals grew in 0.1 M BIS–TRIS pH = 6.5, 25% PEG3350. Several crystal “flower” aggregates were collected from this condition and were used to make microseeds using Hampton Research seed beads. For X-ray crystallography, crystals were grown in 10 μL drops made by mixing the protein sample (4 mg/mL), reservoir solution (0.1 M BIS–TRIS pH = 6.5, 20% PEG3350) at a 1:1 ratio and 0.2 μL of miscroseeds (1:100 dilution) in a sitting drop setup. The resulting crystal drop pH was measured using a microelectrode to be 7.0. Single plate-like crystals grew in several days (Supplementary Fig. 1a). To collect a room-temperature diffraction dataset, a crystal of 3CL Mpro was mounted using the MiTeGen (Ithaca, NY) room-temperature capillary setup (Supplementary Fig. 1b, c).

X-ray data collection and structure refinement

Room temperature X-ray crystallographic data for ligand-free 3CL Mpro were collected on a Rigaku HighFlux HomeLab instrument equipped with a MicroMax-007 HF X-ray generator and Osmic VariMax optics. The diffraction images were obtained using an Eiger 4 M hybrid photon counting detector. Diffraction data were integrated using the CrysAlis Pro software suite (Rigaku Inc., The Woodlands, TX). Diffraction data were then reduced and scaled using the Aimless29 program from the CCP4 suite30, molecular replacement using PDB code 6M03 was then performed with Molrep30 from the CCP4 program suite. Refinement of the protein structure was conducted using Phenix.refine from the Phenix31 suite of programs and the COOT27 molecular graphics program. The geometry of the final structure was then carefully checked with Molprobity32; the data collection and refinement statics are shown in Supplementary Table 2.

MD simulation

Classical MD simulation was prepared, conducted, and analyzed on an apo 3CL Mpro dimer adapted from deposited structure PDB code 6Y84 using GROMACS 202033. The system was described using the CHARMM36m force field34. The dimer was solvated in a rhombic dodecahedron with 10 Å from the nearest cell edge using the TIP3P35 water model and 8 Na ions. Periodic boundary conditions were applied, and the system was minimized in less than 1000 steps using the steepest descent algorithm. The system was equilibrated to 300 K and 1 bar using the V-rescale thermostat36 and Berendsen barostat37. A 1 µs production MD was performed using the leap-frog integration with Nose-Hoover38,39 and Parinello–Rahman couplings40. All bonds involving a hydrogen atom were constrained using the SHAKE algorithm41. Atomic coordinates were saved every 10 ps. RMSF was calculated for protein backbone atoms after RMSD convergence. Reasonable invariance in radius of gyration (Rg) values of 26.2 ± 0.15 Å over the 1 µs timescale indicates compactness and stability of the protein dimer (Supplementary Fig. 3).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.