X-Ray Structure and Inhibition of 3C-like Protease from Porcine Epidemic Diarrhea Virus

Porcine epidemic diarrhea virus (PEDV) is a coronavirus that infects pigs and can have mortality rates approaching 100% in piglets, causing serious economic impact. The 3C-like protease (3CLpro) is essential for the coronaviral life cycle and is an appealing target for the development of therapeutics. We report the expression, purification, crystallization and 2.10 Å X-ray structure of 3CLpro from PEDV. Analysis of the PEDV 3CLpro structure and comparison to other coronaviral 3CLpro’s from the same alpha-coronavirus phylogeny shows that the overall structures and active site architectures across 3CLpro’s are conserved, with the exception of a loop that comprises the protease S2 pocket. We found a known inhibitor of severe acute respiratory syndrome coronavirus (SARS-CoV) 3CLpro, (R)-16, to have inhibitor activity against PEDV 3CLpro, despite that SARS-3CLpro and PEDV 3CLpro share only 45.4% sequence identity. Structural comparison reveals that the majority of residues involved in (R)-16 binding to SARS-3CLpro are conserved in PEDV-3CLpro; however, the sequence variation and positional difference in the loop forming the S2 pocket may account for large observed difference in IC50 values. This work advances our understanding of the subtle, but important, differences in coronaviral 3CLpro architecture and contributes to the broader structural knowledge of coronaviral 3CLpro’s.

Scientific RepoRts | 6:25961 | DOI: 10.1038/srep25961 understanding of the subtle differences between phylogenetically related 3CL pro 's and provides key insights that can be used to inform and guide the design of inhibitors against PEDV-3CL pro .

Results and Discussion
PEDV-3CL pro crystallized in space group P2 1 2 1 2 1 as a dimer in the asymmetric unit ( Table 1). As reported for both human coronavirus 229E 3CL pro and SARS-3CL pro , PEDV-3CL pro is a homodimer containing three domains in each monomer ( Fig. 1) 11,12 . The active site of PEDV-3CL pro , which contains a catalytic dyad formed from residues Cys144 and His41, is located in a cleft between domains I and II (Fig. 1a). Domain III is involved in monomer dimerization, which is ultimately responsible for forming the active protease ( Fig. 1a) 12,13 . In the absence of substrate, water and solvent molecules (MPD, DMSO, and IPA) reside in the active site of PEDV-3CL pro , which is solvent exposed on one side (Fig. 1a,b). Interestingly, different solvent molecules are found in the respective active sites of the PEDV-3CL pro monomers composing the PEDV-3CL pro dimer, where the a single DMSO molecule resides in the Chain A active site and the Chain B active site houses both isopropanol (IPA) and 2-methyl-2,4-pentanediol (MPD) (Fig. 1b). The presence of different solvent molecules in each of the active sites of the dimer supports the assignment of one biological dimer in the asymmetric unit.
Least-squares (LSQ) superposition of PEDV-3CL pro and the unbound form of human coronavirus 229E 3CL pro (PDB entry 1P9S) 12 , which are both from the same alpha-coronavirus phylogenetic group and share 69.3% sequence identity, resulted in an all-atom root-mean-square deviation (RMSD) value of 1.69 Å and a C-alpha RMSD value of 1.28 Å (Fig. 2a). The LSQ superposition of 229E-3CL pro and PEDV-3CL pro shows that the overall architectures of both the 229E-3CL pro and PEDV-3CL pro active sites in their unbound states are structurally very similar, and the active site residues of the catalytic dyad, residues Cys144 and His41 in both PEDV-3CL pro and 229E-3CL pro , are located in almost identical structural space within the active site cavity, which is solvent exposed on one side (Fig. 2b). In the absence of substrate, both water and non-water solvent molecules (dioxane in 229E-3CL pro and MPD, DMSO, and IPA in PEDV-3CL pro ) are found in the active site (Fig. 2b).
In order to better understand the features of the PEDV-3CL pro active site that are important in inhibitor and substrate binding, we generated an LSQ superposition of PEDV-3CL pro and an inhibitor-bound form of feline infectious peritonitis virus 3CL pro (FIPV-3CL pro , PDB entry 4ZRO), which belongs to the same alpha-coronavirus lineage as PEDV-3CL pro and has 61.9% sequence identity (Fig. 2c) 14 . LSQ superposition resulted in an all-atom RMSD value of 2.11 Å and a C-alpha RMSD value of 1.69 Å. We found the overall active site architectures of the unbound PEDV-3CL pro and the inhibitor-bound form of FIPV-3CL pro to be remarkably similar with the catalytic dyad residues (Cys144 and His41 in both FIPV-and PEDV-3CL pro ) in nearly identical orientations despite Cys144 of FIPV-3CL pro being covalently modified by the inhibitor, compound 6 ( Fig. 2d). Interestingly, in both the superimpositions of PEDV-3CL pro with 229E-3CL pro and FIPV-3CL pro , the loops comprising the protease subsites of the 3CL pro active site are in nearly identical structural locations, with the exception of the loop that comprises the S 2 subsite, the S 2 loop. The S 2 loop forms the outer boundary of the S 2 binding pocket and shows positional variability across the X-ray structures of 229E-, FIPV-, and PEDV-3CL pro , which may lead to differences in the size of the S 2 subsites across 3CL pro 's ( Fig. 2b,d).
Our observations of the overall conserved structural features surrounding the PEDV-3CL pro catalytic dyad, but subtle differences in the overall active site architecture, made us curious as to whether one of the inhibitors we developed for SARS-3CL pro would also inhibit PEDV-3CL pro15 . SARS-3CL pro belongs to a different phylogenetic lineage than PEDV-3CL pro and shares lower sequence identity (45.4%) with PEDV-3CL pro ; however, we reasoned that the similar tertiary structure and conserved active site architecture of 3CL pro 's would allow for inhibition by the same molecule. We therefore tested the inhibition of PEDV-3CL pro by (R)-16, which was developed as a non-covalent inhibitor against SARS-3CL pro with potential broad-spectrum activity (Fig. 3a) 15 . We found (R)-16 to inhibit PEDV-3CL pro with an IC 50 value of 25.4 ± 1.4 μ M, where the representative curve for (R)-16 inhibition of PEDV-3CL pro is shown in Fig. 3a and the data for (R)-16 inhibition of SARS-3CL pro has been previously published 15 . The previously reported IC 50 of (R)-16 against SARS-3CL pro is 1.5 ± 0.3 μ M 15 , which indicates a ~17-fold weaker interaction of (R)-16 with PEDV-3CL pro . The inhibition of PEDV-3CL pro by (R)-16, though weak, is significant because it indicates that the development of non-covalent broad-spectrum inhibitors of 3CL pro 's may be possible.
To gain structural insights into how (R)-16 may bind to PEDV-3CL pro , a structural alignment of the X-ray structure of SARS-3CL pro :(R)-16 complex with PEDV-3CL pro was generated (Fig. 3b). LSQ superposition of unbound PEDV-3CL pro and inhibitor-bound SARS-3CL pro (PDB entry 3V3M) 15 , where PEDV-3CL pro is from the alpha-coronavirus phylogenetic group and SARS-3CL pro is from the beta-coronavirus phylogenetic subgroup 2b, resulted in an all-atom RMSD value of 5.18 Å and a C-alpha RMSD value of 4.98 Å. The structural alignment shows that, similarly to 229E-3CL pro and FIPV-3CL pro , the overall active site architectures of SARS-3CL pro and PEDV-3CL pro are largely similar despite their lower sequence identity ( Fig. 3b-d), and the residues directly involved in binding to (R)-16 via hydrogen-bonding interactions are all conserved (Gly142 and His162 in PEDV-3CL pro and Gly143 and His163 in SARS-3CL pro , Fig. 3c,d).
The active sites of both SARS-3CL pro and PEDV-3CL pro are solvent exposed on one side with the residues of their catalytic dyads, His41 and Cys144 (for PEDV-3CL pro ) or Cys145 (for SARS-3CL pro ), located in almost identical structural space (Fig. 3c,d). The superposition shows that (R)-16 similarly occupies the S 2 -S 1 ′ subpockets in each of the 3CL pro active sites, where the tert-butyl amide resides in the channel leading to the active site, the tert-butylanilido group (P 2 ) sits in the S 2 pocket, the 3-pyridyl group (P 1 ) resides in the S 1 region, and the tetrahydrofuran (P 1 ′ ) occupies the S 1 ′ subsite (Fig. 3c,d). As shown in the SARS-3CL pro 3V3M structure, the 3-pyridyl nitrogen of (R)-16 acts as a hydrogen-bond acceptor for SARS-3CL pro His163, with a distance of 2.8 Å between PEDV-3CL pro is colored in orange and 229E-3CL pro is colored light pink. (b) Zoom-in of PEDV-3CL pro :229E-3CL pro superimposition Chain A active site, where catalytic dyad residues and solvent are shown in stick, colored according to atom and to which protein they belong. (c) Superimposition of PEDV-3CL pro and FIPV-3CL pro , where each enzyme is represented as a ribbon. PEDV-3CL pro is colored in orange and 229E-3CL pro is colored magenta. (d) Zoom-in of PEDV-3CL pro :FIPV-3CL pro superimposition Chain A active site, where catalytic dyad residues and solvent are shown in stick, colored according to atom and to which protein they belong. The covalent FIPV-3CL pro inhibitor, 6, is represented as ball-and-stick and colored according to atom.
Scientific RepoRts | 6:25961 | DOI: 10.1038/srep25961 heteroatoms (Fig. 3d). This interaction is likely conserved in the inhibition of PEDV-3CL pro by (R)-16 as His162 of PEDV-3CL pro is in an almost identical structural location as that of SARS-3CL pro His163. Additionally, the bifurcated interaction between the furan ring oxygen and the amide carbonyl oxygen of (R)-16 and the backbone amide NH of SARS-3CL pro Gly143 is mimicked by PEDV-3CL pro Gly142 (Fig. 3d).
Though (R)-16 apparently binds in the same orientation in the PEDV-3CL pro active site as it does in the SARS-3CL pro active site, and likely utilizes the same hydrogen-bonding interactions, the IC 50 of (R)-16 against PEDV-3CL pro is 17-fold higher than against SARS-3CL pro . We were therefore curious if the difference in position of the S 2 loop between PEDV-and SARS-3CL pro may be important for (R)-16 binding. A structural alignment of the ligand-bound structures of FIPV-and SARS-3CL pro (4ZRO and 3V3M, respectively) shows that compounds 6 and (R)-16 bind to the respective 3CL pro 's by positioning their sterically bulky, hydrophobic groups in the S 2 subsite (leucine and t-butylanilido, respectively; Fig. 4a). This suggests that hydrophobic interactions between the inhibitor or substrate and residues of the S 2 loop are important for their binding.
We then analyzed the reported X-ray structures and sequences for other alpha-coronavirus 3CL pro 's in the PDB databank including the enzymes from the following coronaviruses: FIPV, transmissible gastroenteritis virus (TGEV), and human coronaviruses NL63 and 229E [12][13][14][15][16][17][18][19][20] . A sequence comparison of the S 2 loops of FIPV-, TGEV-, NL63-, 229E-, PEDV-, and SARS-3CL pro shows that the SARS-3CL pro S 2 loop is one residue longer than those from the alpha-coronaviruses and shares no sequence identity with FIPV-, TGEV-, NL63-, 229E-and PEDV-3CL pro (Fig. 4b). Additionally, the S 2 loop of SARS-3CL pro contains anionic amino acids, Glu and Asp, at positions 47 and 48, respectively. These variations in the S 2 loop between the alpha-coronavirus 3CL pro 's and SARS-3CL pro may explain the large observed difference in the IC 50 value of (R)-16 against PEDV-3CL pro as compared to SARS-3CL pro (25.4 ± 1.4 μ M vs. 1.5 ± 0.3 μ M, respectively). Furthermore, the combination of the increase in length and charge properties of the S 2 loop of SARS-3CL pro relative to that of the alpha-coronavirus 3CL pro 's likely changes the size and shape of the S 2 binding pocket and therefore may allow for increased variability at the P 2 site of inhibitor and substrate molecules.
To further investigate the variation in the S 2 -subsite and relationship to inhibitor and substrate binding, we examined the recognition cleavage sequences in polyprotein 1ab of both PEDV-and SARS-3CL pro (Fig. 5) 21,22 . The S 2 -subsite residue preference is identical for PEDV-and SARS-3CL pro in most all cleavage sites, with the exception of nsp5/6 and nsp12/13. SARS-3CL pro can accommodate a larger P 2 -phenylalanine, e.g. at the nsp5/6 cleavage site, while PEDV-3CL pro recognizes residues only as large as a P 2 -leucine at this position in the polyprotein. At the nsp12/13 cleavage site, PEDV polyprotein 1ab has a serine at the P 2 site, while SARS polyprotein 1ab has a P 2 -leucine. A larger size of the S 2 pocket in SARS-3CL pro is consistent with the SARS polyprotein 1ab having a phenylalanine residue at the P 2 position of the nsp5/6 cleavage site. This larger S 2 -pocket in SARS-3CL pro may allow for better binding of (R)-16, allowing the bulky t-butylanilido group to take full advantage of hydrophobic interactions within the S 2 subsite and position (R)-16 to optimize hydrogen-bonding with SARS-3CL pro Gly143 and His163.

Conclusions
In this study, we have determined the X-ray crystal structure of an unbound form of PEDV-3CL pro to 2.10 Å resolution. We found the structure of PEDV-3CL pro to be similar to the X-ray structure of the unbound human coronavirus 229E 3CL pro , which belongs to the same coronaviral phylogenetic subgroup as PEDV-3CL pro . To investigate the role of inhibitor binding on the overall architecture of the catalytic site, we generated a superimposition of PEDV-3CL pro with FIPV-3CL pro (PDB entry 4zro), which belongs to the same alpha-coronavirus phylogenetic group as PEDV-3CL pro and shares 61.9% sequence identity, in complex with the covalent inhibitor 6. We observed little difference between the active site architectures of unbound PEDV-3CL pro and the inhibitor-bound form of FIPV-3CL pro , except for differences in the position and composition of a loop comprising the S 2 subsite, which was also observed in comparison to the 229E-3CL pro structure.
We tested PEDV-3CL pro for inhibition by a known SARS-3CL pro inhibitor, compound (R)-16 and found (R)-16 to be capable of inhibiting PEDV-3CL pro although the IC 50 value was roughly 17-fold higher than the reported IC 50 for (R)-16 against SARS-3CL pro . Structural comparison of SARS-3CL pro bound with (R)-16 and PEDV-3CL pro in its unbound form revealed that the residues that directly interact with (R)-16 in SARS-3CL pro are conserved in PEDV-3CL pro . Additional analysis of the S 2 loop across alpha-coronavirus 3CL pro 's and SARS-3CL pro proved that the sequence identity of the S 2 loop is not conserved across alpha-and beta-coronaviral 3CL pro 's and the SARS-3CL pro S 2 loop is one residue longer than that of the alpha-coronaviral 3CL pro 's, which likely increases the effective volume of the SARS-3CL pro S 2 protease subsite relative to the alpha-coronaviral 3CL pro S 2 subsites. These findings provide a potential explanation for the roughly 17-fold weaker inhibition of PEDV-3CL pro by (R)-16 compared to SARS-3CL pro .
This work advances our understanding of the subtle, but important, structural differences between 3C-like proteases from different coronaviral phylogenetic groups and contributes to the broader structural knowledge of coronaviral 3CL pro 's. Small structural changes have been shown to be essential in enzymatic catalysis and an understanding of these such differences is also very pertinent for the design of both broad-spectrum and selective coronaviral 3CL pro inhibitors for the treatment of coronaviral infection 23 . Methods Protein expression and purification. The gene encoding the 3CL pro of PEDV (residues 2998-3299 in the Porcine epidemic diarrhea virus polyprotein AHA38151.1) 24 was codon optimized for expression E. coli and cloned into pET-11a expression vector with an N-terminal (His) 6 -tag followed by nsp4-/5 auto-cleavage site by BioBasic Inc. This pET-11a PEDV-3CL pro construct was used because it results in the expression of PEDV-3CL pro without an N-terminal or C-terminal extension.
E. coli BL21(DE3) cells were transformed with the pET-11a PEDV-3CL pro plasmid and then grown at 25 °C for 24 hours in 500 mL Super LB media (3 g potassium phosphate monobasic, 6 g sodium phosphate dibasic, 20 g tryptone, 5 g yeast extracts, 5 g sodium in 1 L water, pH 7.2 adjusted with 1 M NaOH) that was also supplemented with 1 mL 100 mg mL −1 carbenicillin, 25 mL 8% lactose, 10 mL 60% glycerol, and 5 mL of 10% glucose per 1 L of expression culture. The cells were harvested by centrifugation (8,400 g for 20 min) to yield 14.5 g L −1 of cells. The cell pellet was then re-suspended in Buffer A (50 mM Tris pH 7.5, 0.2 M ammonium sulfate, 0.05 mM EDTA, 5 mM BME) containing 1 mg mL −1 lysozyme, where 5 mL of Buffer A was used per 1 g cell pellet. After the cells were homogenized, they were lysed via sonication for a total of 10 minutes with 10 s pulses and 20 s delays at 50% amplitude using a Branson Digital Sonifier.
The cell lysate was clarified by pelleting the cell debris via centrifugation (28,960 g, 4 °C, 20 minutes). The resultant supernatant was loaded onto a 60 mL Phenyl Sepharose 6 Fast Flow HiSub (GE Healthcare) column equilibrated with Buffer A. Protein was eluted with a linear gradient to 100% Buffer B (20 mM Tris pH 7.5, 0.05 mM EDTA, 5 mM BME) over five column volumes (300 mL) collecting 5 mL fractions. Fractions containing PEDV-3CL pro enzymatic activity were pooled and loaded onto a 60 mL DEAE Sepharose Fast Flow (GE Healthcare) column equilibrated with Buffer B. Protein was eluted with a linear gradient to 50% Buffer C (50 mM Tris pH 7.5, 1 M sodium chloride, 0.05 mM EDTA, 5 mM BME, 10% glycerol) over five column volumes (300 mL) collecting 5 mL fractions. Fractions containing pure PEDV-3CL pro , as judged by SDS-PAGE and specific activity, were pooled, dialyzed into storage buffer (25 mM HEPES, pH 7.5, 2.5 mM DTT, 10% glycerol), and concentrated using a spin concentrator (Millipore) with a 10 kDa molecular weight cutoff membrane. The protein was flash-frozen in liquid nitrogen and then stored at − 80 °C in a freezer until further use (Fig. 1a). Prior to crystallization, PEDV-3CL pro was concentrated to 4.0 mg mL −1 using a spin concentrator (10 kDa cutoff membrane, Millipore) and loaded onto a 24 mL Superdex 75 (GE Healthcare) size exclusion chromatography column equilibrated with 25 mM HEPES pH 7.5 and 2.5 mM DTT. Peak fractions containing PEDV-3CL pro , as determined via visualization of SDS-PAGE and confirmed by protein activity assay, were pooled and concentrated to protein concentrations of 1.5, 2.8, and 4.0 mg mL −1 for crystallization.
Crystallization. The PEGS II Suite (Qiagen) sparse-matrix screen was used to screen for initial PEDV-3CL pro crystallization conditions. Crystallization trials (150 nL protein and 150 nL of crystallization solution) were set up in a 96-well sitting drop tray (Intelli-Plate 96) using a TTP LabTech Mosquito ® liquid robotics system at PEDV-3CL pro concentrations of 1.5, 2.8, and 4.0 mg mL −1 in buffer containing 25 mM HEPES pH 7.5 and 2.5 mM DTT. An initial hit of clusters of needle-like crystals was obtained at 20 °C from the PEGS II Suite condition No. 50 consisting of 10% (w/v) PEG-4000 and 20% (w/v) isopropanol. Crystals were also obtained at each of the other PEDV-3CL pro concentrations. Optimization of this condition was achieved by varying the concentration of both isopropanol and PEG-4000 between 15-30% and 7-22% respectively, where diffraction quality crystals grew at a variety of conditions from this round of optimization. The best crystals grew at 20 °C in 25% isopropanol and 22% PEG-4000 at a protein concentration of 1.5 mg mL −1 and appeared after 24-72 hours (Fig. 4). Crystals were transferred using 0.05-0.1 μ m nylon loops to small drops containing the crystallization solution plus the cryo-protectant, which was 15% 2-methyl-2,4-pentanediol (MPD). After cryo-protection, the crystals were remounted into the nylon loop and rapidly flash-cooled in liquid nitrogen.
Data collection and structure refinement. X-ray diffraction data were collected on LS-CAT beamline 21-ID-F at the Advanced Photon Source (APS) at Argonne National Laboratory, Argonne, Illinois, USA. Data were indexed, integrated and scaled using HKL-2000 25 and the resulting structure factor amplitudes were used for molecular replacement (MR). The program Phaser-MR (simple interface) module of PHENIX 26 was used to perform MR and the X-ray structure of the human coronavirus 229E-3CL pro in complex with the peptidomimetic compound EPDTC (PDB entry 2zu2 13 ) with ligands and waters removed was used as a search model. Iterative rounds of manual building and structural refinement were completed using PHENIX, and manual inspection, rebuilding and the addition of water molecules were accomplished using the programs Coot 27 and the refinement module of PHENIX. The final data collection and refinement statistics are summarized in Table 1.
Inhibitor characterization. The compound (R)-16 was synthesized according to Jacob et al. 15 . Inhibition of PEDV-3CL pro by (R)-16 at a concentration of 100 μ M was first tested in an enzymatic assay containing the following buffer (50 mM HEPES, 0.1 mg/mL BSA, 0.01% TritonX-100, 1 mM DTT). The assays were carried out in triplicate using Costar 3694 EIA/RIA 96-Well half-area, flat bottom, black polystyrene plates from Corning Incorporated. 1 μ L of 100X inhibitor stock in DMSO was added to 79 μ L of enzyme in assay buffer and the enzyme-inhibitor mixture was incubated for 10 minutes. The reaction was initiated by the addition of 20 μ L of 10 μ M UIVT3 substrate, a custom synthesized Förster resonance energy transfer peptide substrate with the following sequence: HilyteFluor TM 488-ESARLQSGLRKAK-QXL520 TM -NH 2 , producing final concentrations of 100 nM for the 3CL pro enzyme and 2 μ M UIVT3 substrate. The increase in fluorescence intensity of the reaction was then measured over a period of 10 minutes as relative fluorescence units (RFU t ). An excitation wavelength of 485 (bandwidth of 20 nm) and an emission wavelength at 528 (bandwidth of 20 nm) was used to monitor the reactions using a BioTek Synergy H1 multimode microplate reader. The percent inhibition of the PEDV-3CL pro was determined using the following equation (1) The IC 50 value of (R)-16 against PEDV-3CL pro was determined at ambient temperature using 100 μ L assays conducted in the following buffer: 50 mM HEPES, 0.1 mg/mL BSA, 0. The increase in fluorescence intensity over time during the reaction was then measured. The average percent inhibition of PEDV-3CL pro was then calculated from triplicate data, and the final averaged data with standard deviation were then plotted as a function of inhibitor concentration. The data were fit to the following equation (2)  where %I max is the percent maximum inhibition of 3CL pro . The errors in the IC 50 and %I max values are the errors in the fitted parameters resulting from fitting of the equations to the data.