Pro108Ser mutation of SARS-CoV-2 3CLpro reduces the enzyme activity and ameliorates the clinical severity of COVID-19

Recently, an international randomized controlled clinical trial showed that patients with SARS-CoV-2 infection treated orally with the 3-chymotrypsin-like protease (3CLpro) inhibitor PF-07321332 within three days of symptom onset showed an 89% lower risk of COVID-19-related hospital admission/ death from any cause as compared with the patients who received placebo. Lending support to this critically important result of the aforementioned trial, we demonstrated in our study that patients infected with a SARS-Cov-2 sub-lineage (B.1.1.284) carrying the Pro108Ser mutation in 3CLpro tended to have a comparatively milder clinical course (i.e., a smaller proportion of patients required oxygen supplementation during the clinical course) than patients infected with the same sub-lineage of virus not carrying the mutation. Characterization of the mutant 3CLpro revealed that the Kcat/Km of the 3CLpro enzyme containing Ser108 was 58% lower than that of Pro108 3CLpro. Hydrogen/deuterium-exchange mass spectrometry (HDX-MS) revealed that the reduced activity was associated with structural perturbation surrounding the substrate-binding region of the enzyme, which is positioned behind and distant from the 108th amino acid residue. Our findings of the attenuated clinical course of COVID-19 in patients infected with SARS-CoV-2 strains with reduced 3CLpro enzymatic activity greatly endorses the promising result of the aforementioned clinical trial of the 3CLpro inhibitor.

Phylogenic tree analysis in our study cohort. We investigated whether any of the phylogenic clade containing non-synonymous mutations contributed to a milder clinical course. The overall genetic diversity was relatively low, presumably because effective international border restrictions and successful quarantine efforts were in place. A divergent tree analysis of the whole viral genome sequences and classification at Keio University Hospital (N = 179) according to the internationally recommended nomenclature showed that most patients (i.e., 172 [96.1%]) patients had strains derived from Clade 20B (Fig. 1a) 6 . The remaining 5 and 2 patients in our cohort study belonged to Clade 19A and 20C, respectively 7,8 ; these patients were therefore excluded from further study. Patients from Clade 20B were additionally divided into two subgroups by defining each subgroup as containing patients who had strains with no more than 5 nucleotide differences. The first subgroup was designated as the Subclade 20B-T (Pangolin lineage B.1.1.284 9 ; N = 87 [50.6%]), which had the basic haplotype of Clade 20B with the addition of 6 single nucleotide mutations: c.4346 U > C, c.9286 C > U, c.10376 C > U, c.14708 C > U, c.28725 C > U and c.29692 C > U (Fig. 1a, yellow). The transmission of this infection in the central downtown area led to this strain spreading to the rest of Japan in June 2020. Of the six mutations, four were non-synonymous: c.4346 U > C (Ser543Pro in papain-like protease [PL pro ]), c.10376 C > U (Pro108Ser in 3CL pro ), c.14708 C > U (Ala-423Val in RNA-dependent RNA polymerase [RdRp]), and c.28725 C > U (Pro151Leu in nucleocapsid protein); the remaining two other mutations did not affect the amino acid translation of the viral proteins. The second subgroup was designated as Clade 20B-nonT (N = 85 [49.4%]), which showed the haplotype of Clade 20B and was defined by the presence of seven possible mutations separating the lineage from the founding Wuhan haplotype, although each case had fewer than five single nucleotide mutations. Analyses of the cumulative total number and frequency curve showed that the relative fraction of Clade 20B-T (B.1.1.284) increased during the time frame of this study (Fig. 1b, c). Mapping of the suspected geographic locations where the infection of individual patients was likely to have occurred indicated that the patients with Clade 20B-T (B.1.1.284) or Clade 20B-nonT were infected in the Tokyo Metropolitan area and its neighboring prefectures (Fig. 1d). This observation, together with a lack of patients with strains belonging to other clades (except for the 5 and 2 patients with strains belonging to Clade 19A and 20C, respectively) suggested that Clade 20B and its variation Clade 20B-T (B.  Table 1. Age, sex, symptoms at admission, and outcome did not differ significantly between the two main groups. However, the numbers of patients who required oxygen supplementation and methylprednisolone treatment were significantly lower among the patients with Clade 20B-T (B. Molecular evolutionary characterization of four non-synonymous mutations unique to Clade 20B-T. We used molecular evolutionary analyses to decipher which of the four non-synonymous mutations that characterize the Clade 20B-T haplotype contributed to the milder clinical course. Studies of the conservation of the amino acid residues around the non-synonymous mutations in Clade 20B-T indicated that residues at and around Pro108Ser in the 3CL pro (NSP5) and those at and around Pro151Leu in the nucleocapsid protein were highly conserved among β-coronaviruses (Fig. 2a, b). By contrast, amino acid residues at and around Ser543Pro in the PL pro (NSP3) and Ala423Val in RNA-dependent RNA polymerase (RdRp, NSP12) were only weakly conserved (Fig. 2a). On the other hand, the serine in the PL pro at residue 543 and the Ala at residue 423 in RdRp were substituted with proline and valine in some β-coronaviruses; both of these mutations were observed      (Fig. 3b, c). Based on the results described above, because 3CL pro has been well characterized as a critical function for viral replication by biochemical and pharmacological analyses 10,11 , and because the phylogenic tree analysis in Japan showed that the Pro151Leu mutation in nucleocapsid protein occurred earlier than the Pro108Ser mutation ( Supplementary Fig. 1), we focused on the function-structure relationship of the Pro108Ser mutant of 3CL pro for further investigation. P108S 3CL pro reduces the catalytic activity and attenuates the sensitivity to GC376. We prepared recombinant proteins of WT and P108S of SARS-CoV-2 3CL pro to determine their enzymatic activities A B C Figure 3. Phylogenic tree analysis, temporal trends in Japan registered by GISAID database. (a) Most of the Japanese strains were derived from National Institute of Infectious Diseases (NIID) submitted on 10th January 2021 in GISAID excluding airport quarantine, but were not specified by towns/cities or precise obtaining dates (obtaining month only). Therefore, we designated all the NIID data for the first day of the month (i.e., 2020/4 → 2020/4/1). A magnified view of the dotted square is shown in Supplementary Fig. 1 www.nature.com/scientificreports/ using a fluorescence-based cleavage assay (Fig. 4a) 11 . The uncropped gel image of Fig. 4 was shown in Supplementary information. The enzymatic activity of the P108S was significantly suppressed, compared with that of the WT (Fig. 4b). The Km value of P108S (215.7 μmol/l) was lower than that of the WT (110.3 μmol/l), and the activity also decreased by 58%, as determined by a comparison of the Kcat/Km values for the WT and P108S 3CL pro enzymes (Fig. 4c). These results suggest that the P108S mutation interferes with the ability of the enzyme to allow substrate binding. We further examined the sensitivity of the P108S mutant against a competitive 3CL pro inhibitor GC376. Recently, a feline infectious peritonitis virus (FIPV) inhibitor GC376 has been reported to block the SARS-CoV-2 3CL pro activity by binding to the substrate-binding pocket 12,13 . The enzymatic activity of the WT protein was potently inhibited (Ki = 1.93 μmol/l) by GC376. On the other hand, the inhibitory effect of GC376 on the P108S mutant was decreased (Ki = 3.74 μmol/l; Fig. 4d).
Since previous studies of SARS-CoV 3CL pro have indicated that the dimerization of 3CL pro activates its enzymatic activity 14,15 , we analyzed the dimeric states of SARS-CoV-2 3CL pro WT and P108S using sedimentation velocity analytical ultracentrifugation (SV-AUC). The results indicated comparable concentration dependencies of the weight averaged s-value (Fig. 5), indicating that the values of the monomer-dimer dissociation constants were comparable between WT and P108S mutant proteins within the given concentration ranges. An analysis using circular dichroism (CD) spectroscopy showed no discernible differences in the secondary and tertiary structures between the two proteins ( Supplementary Fig. 2). These results led us to further examine alterations in the microenvironments in and around the substrate-binding site of 3CL pro . HDX-MS enabled the detection of structural perturbations around the substrate-binding pocket including C128-L141 close to P108 and Y161-D176 ( Fig. 6a-d), suggesting that the P108S mutation perturbs the pocket that is behind and distant from the mutation.

Discussion
Patients in this study cohort who were infected with the virus sub-lineage that carried the Pro108Ser of 3CL pro (N = 87) (which is the main protease in the virus that cleaves the viral polyproteins into individual proteins that exert viral functions) 16 tended to have a milder disease course than those infected with a viral sub-lineage that did not carry this mutation of 3CL pro (N = 85). An in vitro enzymatic assay of recombinant Pro108Ser showed that the Kcat/Km value of the mutant 3CL pro containing Ser108 was 58% lower as compared with that of 3CL pro containing Pro108.
HDX-MS revealed that the substrate binding site is locally impacted by the mutation, leading to a reduced substrate binding affinity, even though the Pro108Ser did not affect the overall structure or association site the SV-AUC and CD spectroscopy sites. Thus, despite the distance between the mutation site and the substrate-biding site, Pro108Ser appears to play a critical role in the reduced enzyme activity and may abrogate both the replication potency and pathogenicity of the virus. While the mechanisms by which the proline substitution reduces the 3CL pro enzyme activity remain unknown, the notion that the proline residue in the protein both disfavors helix formation and confers local rigidity to the polypeptide chain 17 , as demonstrated by HDX-MS, suggests that structural perturbation surrounding the substrate-binding region, which is positioned behind and distal to the 108th amino-acid residue of 3CL pro , plays a critical role in the decline of the enzyme function. While the 3CL pro -mutant SARS-CoV-2 disappeared by January 2021 in Japan, we believe that any variants possessing the P108 3CL pro mutation that might emerge in the future would be less pathogenic, but also less sensitive to smallmolecule compounds targeting the substrate-binding sites of this enzyme that are under development. Further investigation based on global surveillance of new variants would be needed.
During the COVID-19 pandemic, many countries have seen the spread of SARS-CoV-2 variants belonging to multiple different clades 18,19 , but in Japan in summer of 2020, variants belonging mainly to Clade 20B have accounted for most of the viral spread throughout the country. This study has serendipitously served as experiment of nature examining the roles of 3CL pro activity in the virus in the presence of minimally divergent spread of different variants, presumably because of the successful quarantine measures in Japan since March 2020 20,21 .
The viral-producing proteins, such as the main protease or RNA polymerase, have been considered as therapeutic targets for coronaviruses, and various therapeutic agents and vaccines have been developed 22 . Apart from human coronavirus infections, GC376 (a bisulfite prodrug) has been shown to be effective against FIPV, which belongs to the α-coronavirus family. The administration of GC376 is associated with a high rate of disease remission and no significant adverse effects when used against FIPV 23 . GC376 is known to be a broad-spectrum protease inhibitor, and is converted to the peptide aldehyde GC373 and interacts covalently with the catalytic cysteine of coronavirus 3CL pro . In the presence of GC373, the N terminal finger (Ser1, Phe140, and Glu166) of 3CL pro is strongly hydrogen-bonded with the active site (His41, Cys145) of the 3CL pro . Hence, GC373 stabilizes the substrate-binding site (Glu166), inhibiting the dimerization of 3CL pro24-26 .
PF-07321332, an orally bioavailable derivative of GC373, was recently shown to exhibit antiviral activity both in vivo and in vitro 27 . PF-07321332 inhibited SARS-Cov-2-induced cytopathic effect in VeroE6 cells enriched with ACE2, and inhibited SARS-CoV-2 replication in A549 cells expressing ACE2 27,28 . PF-07321332 was also shown to be effective in a mouse-adapted SARS-CoV-2 MA10 model 29 . Based on these preclinical data, an international randomized controlled clinical trial named "Evaluation of Protease Inhibition for COVID-19 (EPIC)" was initiated. An interim analysis of the EPIC phase 2/3 trial conducted in high-risk patients showed that the inhibitor drug is very effective in reducing hospitalization/death.
The clinical efficacy data will be submitted to the U.S. FDA for Emergency Use Authorization. Phase 2/3 studies of the efficacy of the enzyme inhibitor in standard-risk patients and as post exposure prophylaxis are under way. Our documentation of an attenuated clinical course in patients infected with the mutant 3CL pro endorses the notion that 3CL pro inhibitors might be effective in reducing the severity of COVID-19 in humans. Cell-based cytotoxic and replication assays, similar to the ones performed in preclinical studies of PF-07321332, www.nature.com/scientificreports/ whole viral genome sequencing. Of these, 50 patients with only partial genome sequences resulting from insufficient PCR amplification were excluded, leaving 179 patients for inclusion in the present analysis (Supplementary Fig. 3). The cases of 32 of these 179 patients had been reported previously 4 . The medical records of all 179 patients were reviewed to obtain data on clinical characteristics and the treatments that were received, and PCR  In some of patients with mild or moderate symptoms, the presence of pneumonia could not be determined because they did not undergo chest X-ray or computed tomography examinations. Therefore, we classified disease severity into the following three categories: "mild to moderate" (patients did not require supplementary oxygen); "severe" (patients required oxygen supplementation but not a ventilator); and "critical, " (patients who developed sepsis or acute respiratory distress syndrome and required a ventilator [Supplementary

DNA sequencing method. Whole viral genome sequencing, PCR-based amplification and phylogenic
tree analysis were performed as previously reported (Takenouchi et al.) 4 . All point mutations including nonsynonymous and synonymous mutations were annotated with ANNOVAR software and assessed with VarSifter (https:// resea rch. nhgri. nih. gov/ softw are/ VarSi fter/). The multiple amino acid sequence alignments of various β-coronaviruses were compared using Molecular Evolutionary Genetic Analysis software (MEGA, https:// www. megas oftwa re. net/) (Supplementary Table 3). The functional relevance of non-synonymous mutations was predicted with a Protein Variation Effect Analyzer (PROVEAN v1.1.3, http:// prove an. jcvi. org/ seq_ submit. php), the calculations of which are not dependent on sequence conservation among animals. Scores under a threshold value of − 2.50 were considered deleterious. We also used the software Phylogenetic Assignment of Named Global Outbreak Lineages (Pangolin; https:// cov-linea ges. org/ index. html) to assign viral lineages in an automatic and precise manner 9 .
Cloning and protein preparation of SARS-CoV-2 3CL pro . The SARS-CoV-2 3CL pro DNA fragments encoding the Wuhan strain or the strain containing Pro108Ser in non-structural polyprotein 5 (NSP5) gene were prepared using a reverse transcription kit (SuperScript III, ThermoFishher) and were amplified by PCR using primers (forward: 5′-TTT GGA TCC AGT GGT TTT AGA AAA ATG GCA -3′, reverse: 5′-TTT GTC GAC TCA TTG GAA AGT AAC ACC TGA GCA-3′). The fragments were digested with Bam HI and Sal I and then ligated into pCold GST containing the cleavage site for PreScision Protease (GE Healthcare) at the N-terminal region. The expression vectors for the 3CL pro Wuhan strain type (WT) or the Pro108Ser mutant (P108S) were transformed into BL21 (DE3), and the bacteria were incubated in LB with ampicillin at 37 °C until OD600 was reached at 0.8. Protein expression was induced by 1 mM isopropyl-β-thiogalactopyranoside for 16 h at 4 °C. The cell pellets were re-suspended in a buffer containing 20 mmol/l Tris-HCl (pH7.5), 100 mmol/l NaCl, and 0.1% Tween 20, sonicated twice for 5 min at 4 °C, and centrifuged at 20,000×g for 30 min. The supernatant was incubated with glutathione Sepharose 4B (GE Healthcare) for 2 h at 4 °C. The resin was then washed five times with the same buffer, and the GST tag was cleared by the addition of PreScision Protease and further incubation for 16 h at 4 °C. Then, the 3CL pro was prepared using size-exclusion chromatography (Superdex 200; GE Healthcare).
Enzyme kinetics analysis using fluorescence resonance energy transfer-based assay. The enzymatic activities of 3CL pro WT and P108S were determined using a fluorescent substrate with the cleavage site of SARS CoV-2 3CL pro (Dabcyl-KTSAVLQ↓SGFRKME-Edans; GL Biochem). 3CL pro WT or P108S at a final concentration of 5 μmol/l was incubated in a buffer of 20 m mol/l Tris-HCl (pH7.5), 100 m mol/l NaCl, and 5 m mol/l DTT with the addition of the substrate at a final concentration of 3.125, 6.25, 12.5, 25, 50, 100 or 200 μmol/l at room temperature. The change in fluorescence intensity was monitored with a fluorescence spectrophotometer (Cytation 5; BioTek) at an emission wavelength of 460 nm and an excitation wavelength at 340 nm. The kinetic parameters were determined with GraphPad Prism 8 software and the initial rate measurement of the substrate cleavage. For the inhibition assay, the SARS-CoV 3CL pro inhibitor GC376 (Selleck) at a Figure 6. HDX-MS results of SARS CoV-2 3CL pro WT and P108S. (a) Structurally influenced regions accompanied by a single mutation at 108th amino acid from proline to serine. HDX-MS showed more protected regions (magenta) and more exposed regions (cyan) in SARS-CoV-2 3CL pro Pro108Ser mutant compared to SARS-CoV-2 3CL pro . Mutation of proline to serine at 108th amino acid induces structural alternation at the regions from C128 to L141 and from Y161 to D176, where C128-L141 is sandwiched between P108 and Y161-D176 which is located at the substrate binding region. www.nature.com/scientificreports/ final concentration of 1, 2, 5, 10, or 20 μmol/l was incubated with 5 μmol/l 3CL pro WT or P108S and 12.5, 25 or 50 μmol/l of substrate.
Circular dichroism (CD) spectroscopy. CD spectra were collected in the far-UV (200-260 nm) and the near-UV (250-340 nm) spectral regions. Spectra were recorded with a CD spectrometer J-1500 (JASCO Corporation) in a quartz cuvette (1 mm cell length for far-UV and 10 mm for near-UV) at 20ºC. The protein samples were prepared in 20 mmol/l Tris-HCl buffer solution (pH 7.3) containing 150 mmol/l NaCl with concentration of 5 µmol/l for the far-UV CD measurements and 20 µmol/l for the near-UV CD measurements. The spectrum of the buffer was measured as a blank and was subtracted from the sample data. Four scans were averaged for each spectral region with a scan rate of 50 nm/min. The data pitch and the bandwidth were 0.5 nm and 1 nm, respectively.

Sedimentation velocity analytical ultracentrifugation (SV-AUC).
The SV-AUC experiments were performed using the Optima AUC (Beckman Coulter) at 20 °C with 1, 2.5, 5, 10, 20, 40, and 80 µmol/l of 3CL pro WT and P108S dissolved in 20 mmol/l Tris-HCl buffer solution (pH7.3) containing 150 mmol/l NaCl. Next, 390 µL of each sample was loaded into the sample sector of a 12-mm double-sector charcoal-filled Epon centerpiece, and 400 µL of buffer was loaded into the reference sector of each cell. Data collection was performed at 42,000 rpm using a UV detection system. Data were collected every 240 s with a radial increment of 10 µm at 230 nm for 1, 2.5, and 5 µmol/l samples, at 235 nm for 10 µmol/l samples, at 240 nm for 20 µmol/l samples, at 290 nm for 40 µmol/l samples, and at 295 nm for 80 µmol/l samples. The collected data were analyzed using a continuous c(s) distribution model implemented in program SEDFIT (version 16.2b) with fitting for the frictional ratio, meniscus, time-inmutation noise, and radial-inmutation noise. Both of the partial specific volumes of WT 3CL pro and P108S were 0.731 cm 3 /g, which was calculated based on the amino acid composition of each sample using the program SEDNTERP 1.09. The buffer density and viscosity were calculated using the program SEDNTERP 1.09 as 1.00499 g/cm 3 and 1.0214 cP, respectively. The c (s 20, w ) distribution figures were generated using the program GUSSI (version 1.3.2) 31 . The weight-average sedimentation coefficient of each sample was calculated by integrating the range of sedimentation coefficients where peaks with an obvious concentration dependence were observed. To determine the dissociation constant of the monomer-dimer equilibrium (KD), the concentration dependence of the weight-average sedimentation coefficient was fitted to the monomer-dimer self-association model implemented in the program SEDPHAT (version 15.2b) [32][33][34] .

Hydrogen Deuterium Exchange Mass Spectrometry (HDX-MS).
HDX-MS experiments were conducted using Waters HDX with the LEAP system (Waters). Eighty μmol/l of protein solutions (SARS CoV-2 3CL pro WT and SARS CoV-2 3CL pro P108S) were diluted 20-fold with 20 mmol/l Tris-HCl buffer solution (pH 7.3) prepared with D 2 O containing 150 mmol/l NaCl, and incubated at 20 °C for various hydrogen/deuterium exchange time period (0.5, 1, 10, 60, or 240 min). The concentration of the protein solution during deuterium exchange was 4 μmol/l; and based on the K D estimated from SV-AUC, each protein was considered to be present as more than 98% monomer. The exchange reaction was quenched by dropping the pH to 2.4 while mixing with an equal volume of 4 mol/l guanidinium chloride and 0.5 mol/l Tris (2-carboxyethyl) phosphine hydrochloride (TCEP) at pH2.2. One hundred picomoles of quenched samples were immediately injected, desalted, and separated online using a Waters UPLC system based on the nanoACQUITY platform. The online digestion was performed over 5 min in water containing 0.05% formic acid at 4 °C at a flow rate of 100 μL/min. The digested peptides were trapped on an ACQUITY UPLC BEH C18 1.7 μm peptide trap (Waters) maintained at 0 °C and desalted with water and 0.1% formic acid. Flow was diverted using a switching valve, and the trapped peptide fragments were eluted at 40 μL/min onto a column of 1 × 100 mm (C18 1.7 μm; ACQUITY UPLC BEH, Waters) held at 0 °C, with a 12-min linear acetonitrile gradient (8%-40%) containing 0.1% formic acid. The eluate was directed into a mass spectrometer (Synapt HD, Waters) with electrospray ionization and lock mass correction (using Glu-fibrinogen peptide B). Mass spectra were transformed using MassLynx (Waters) and acquired over the m/z range of 100-2000. Pepsin fragments were identified using a combination of exact mass and MS/ MS, aided by ProteinLynx Global SERVER (PLGS, Waters). Peptide deuterium levels were determined using DynamX 3.0 (Waters). The relative deuterium uptake percentage was calculated for each peptide by dividing the mean of the deuterium uptake, m , by the number of backbone amide hydrogens. In comparing the HDX results between two samples, the mass difference of hydrogen deuterium exchange for each peptide at each exposure time point (∆HX) was calculated as follows; For the statistical analysis of significant difference, a volcano plot, which is a scatter-plot of ΔHX versus a probability value (p value) determined using the Welch t-test, was used 35 . The significance limits for the vertical HX value were calculated as follows. A pooled sample standard deviation (s p ) for 610 standard deviations was calculated from ∆HX. A propagated standard error of the mean ( SEM HX ) was calculated from s p . A significance limit for the ∆HX values can be calculated using the following equation; We set k = 4.60 using a Student's t-distribution value for a two-tailed test with four degrees of freedom at a significance level (α) of 0.01 (99% confidence level). For the horizontal p value, the significance limits were defined at α = 0.01 (99% confidence level). www.nature.com/scientificreports/ Statistical analysis. The main parameter of the clinical study was the grade of disease severity. Comparisons of categorical variables between the two groups were assessed using the Fisher exact test. A Student t-test was used to compare abnormally distributed quantitative variables between the two groups. An exact logistic regression analysis was used to examine the odds ratio for requiring supplemental oxygen. The following covariates were considered for inclusion in the multivariate model: age group (< 65 years or ≧ 65 years), sex (male or female), Charlson Comorbidity Index group (0 or ≧ 1), temporal trend (March to October 2020 or November 2020 to January 2021) and infection group (Clade 20B-T or 20B-nonT). Statistical analyses were performed using R statistical Software (version 3.6.2), and all the statistical tests were two-sided. A p values of < 0.05 was considered significant.

Data availability
The authors declare that the data supporting the findings of this study are available within the article, Supplemental Data, and Supplemental Method files. Source data are provided with this paper. We downloaded the full nucleotide sequences of the SARS-CoV-2 genomes from the GISAID database (https:// www. gisaid. org/). A table  of the contributors is available (acknowledgment table). We have uploaded the full nucleotide sequences of our cohort to the GISAID database.