A structural and mechanistic study of π-clamp-mediated cysteine perfluoroarylation

Natural enzymes use local environments to tune the reactivity of amino acid side chains. In searching for small peptides with similar properties, we discovered a four-residue π-clamp motif (Phe-Cys-Pro-Phe) for regio- and chemoselective arylation of cysteine in ribosomally produced proteins. Here we report mutational, computational, and structural findings directed toward elucidating the molecular factors that drive π-clamp-mediated arylation. We show the significance of a trans conformation prolyl amide bond for the π-clamp reactivity. The π-clamp cysteine arylation reaction enthalpy of activation (ΔH‡) is significantly lower than a non-π-clamp cysteine. Solid-state NMR chemical shifts indicate the prolyl amide bond in the π-clamp motif adopts a 1:1 ratio of the cis and trans conformation, while in the reaction product Pro3 was exclusively in trans. In two structural models of the perfluoroarylated product, distinct interactions at 4.7 Å between Phe1 side chain and perfluoroaryl electrophile moiety are observed. Further, solution 19F NMR and isothermal titration calorimetry measurements suggest interactions between hydrophobic side chains in a π-clamp mutant and the perfluoroaryl probe. These studies led us to design a π-clamp mutant with an 85-fold rate enhancement. These findings will guide us toward the discovery of small reactive peptides to facilitate abiotic chemistry in water.


a. Chemicals
Decafluorobiphenyl was purchased from Oakwood Chemicals (West Columbia, SC). Tris (2- Table S1. Yields for peptide substrates were determined by integrating total ion current (TIC) spectra.
First, using Agilent MassHunter software package, the peak area for all relevant peptidic species on the chromatogram were integrated. In cases where no side product was generated in the experiments, the conversion of the limiting reagent equals to the yield of the product. Conversion was calculated by integrating the total ion current (TIC) of the same limiting peptide species within the dynamic linear range of the LC-MS instrument. Then the yield was calculated as following: %yield = %conversion = 1 -S t /S 0 where S t is the peak area of the limiting reagent at time t, and S 0 is the peak area of the limiting reagent at time 0.

Kinetics Study
The reactions were carried out with 200 mM phosphate, 20 mM TCEP at 37 °C unless otherwise noted. To measure the second order rate constants, reaction mixture was prepared on ice and divided into several 10-µL aliquots. All aliquots were immediately put in 37 °C water bath unless otherwise noted. For reactions that takes more than 1 hour to monitor, all aliquots were heated in a PCR machine set at 37 °C to prevent solvent evaporation. Reactions were quenched by addition of 100 µL 50% water: 50% acetonitrile: 0.5% TFA at different time points and then subjected to LC-MS analysis. The initial concentration of probe and substrate were known. The second-order rate constants were determined by fitting the following kinetics equation: Error of reaction rate constant was obtained from the linear fitting of the kinetics curves for measuring the reaction rate constants.

Determination of the standard enthalpy/entropy of activation
The secondary rate constant (k) for the reaction between π-clamp peptide and probe 2 was experimentally measured at different temperatures (T). Then ln(k/T) was plotted against 1/T. The standard enthalpy of activation (ΔH ‡ ) and the standard entropy of activation (ΔS ‡ ) were calculated (Table S5) by fitting ln(k/T) against 1/T with the following Eyring equation: where κ is transmission coefficient (κ = 1), 5 is Boltzmann constant, ℎ is Planck's constant, T is absolute temperature and R is gas constant. The errors for ΔH ‡ and ΔS ‡ were obtained from the linear fitting. ∆G ‡ was calculated as ΔH ‡ -TΔS ‡ , and the error for ∆G ‡ was calculated from error propagation.
The Eyring plots were summarized in Fig. S37.

Computation Studies
Density functional theory DFT was used to calculate the reaction free energy of the reaction between perfluoroaryl probe and π-clamp peptide, based on π-clamp representative structures observed in previous molecular dynamics (MD) sampling. All DFT computations were carried out using the Q-Chem 4.1 3 software package. To reduce the computational cost in the DFT calculation, the calculation only involved the 4-residue π-clamp sequence (FCPF; the rest of the peptide 1A was not involved) and the perfluoroaromatic probe. The free energy (∆G) was calculated as: where E product is the energy of arylated product, E HF is the energy of hydrogen fluoride, E peptide is the energy of cysteine peptides, and E perfluoroaromatics is the energy of perfluoroaryl probe.
To calculate the free energy of peptides with 5,5-dimethylproline (5,5-dmP) in DFT, we extracted 4 snapshots from MD simulations with different starting structures of cis-proline peptides and manually added two methyl groups to form 5,5-dmP. Similarly, for starting structures of peptides with α-methylproline (αMePro), we extracted 3 snapshots from MD simulations of trans-proline peptides and manually added a methyl group to form αMePro. For the product's starting structure, we manually connected the perfluoroaryl group to the peptide cysteine.
In each case, four gas-phase geometry optimizations were performed on structures sampled from the MD trajectory, using the B3LYP exchange-correlation functional 4 in the 6-31G* basis set 5 . To account for potential π-π interactions, we also include Grimme's DFT-D3 empirical dispersion correction 6 for the optimization. Once a potential energy minimum was located, we refined the energy by performing a single point energy calculation with the more accurate combination of the rPW86 exchange functional 7 , the PBE local correlation functional 8 , and the VV10 non-local correlation functional 9 to accurately handle the long-range dispersions critical to the π-π interaction. For these calculations, we also employed the larger 6-31G** basis set 5 and a large non-local integration grid 10 . We then calculated the binding energies in both the gas phase and in water. We approximate the latter by the polarizable continuum model (PCM) 11 , for which we used 302 PCM grid points and a dielectric of 78.39. The calculated free energy results are summarized in Table S3.

Cysteine pK a measurement
Absorbance at 240 nm was measured in solutions containing 0.05 mM peptide and 20 mM buffer.

a. Sample preparation
Peptide 5 was synthesized by solid phase peptide synthesis (SPPS) as described before. To prepare peptide 6, peptide 5 (0.09 mmol) was dissolved in 6 mL acetonitrile, followed by the addition of probe 2' (0.2 mmol) and triehtylamine (1 mmol). The mixture was stirred at room temperature for 2h and filtered. The precipitation containing peptide 6 was collected, re-dissolve in 90%A, 10% B and water, 1% acetic acid, 1% acetonitrile to exchange the counter ion. The peptides were eluted with 50% water, 50% acetonitrile, 0.1% acetic acid and lyophilized.
Before solid-state NMR experiments, peptides 6, 7 and 8 were mixed with a few drops of buffer containing 5 mM phosphate (pH 8.0) to form semi-dry samples. Peptide 5 was measured in dry state, due to unfavorable dynamics of this very soluble peptide in a hydrated state.

b. Solid-state NMR experiments
Solid-state NMR experiments were carried out on a 400 MHz (9.4 T) spectrometer using a 4 mm 1 H/ 19 F/ 13 C probe and on an 800 MHz (18.8 T) spectrometer using a 3.2 mm 1 H/ 13 C/ 15 N probe. Typical radiofrequency (rf) field strengths were 71-100 kHz for 1 H, 71 kHz for 19 F, 50-71 kHz for 13   kHz MAS using a REDOR period of 1.1 ms for polarization transfer. 12 These spectra were measured on the 800 MHz spectrometer to obtain high resolution. 13 C-19 F REDOR experiments for measuring 13 C-19 F distances between the fluorinated tag and 13 C-labeled peptide were carried out at 243 K under 8 kHz MAS on the 400 MHz spectrometer. The REDOR mixing times ranged from 1 ms to 8 ms. 19 F 90˚-180˚-90˚ composite pulses were used to compensate for B 1 field inhomogeneity, while a 13 C selective Gaussian pulse of two rotor periods (250 µs) was used for 13 C refocusing. 13 C and 15 N chemical shifts of the FCPF segment of the peptide were compiled and used as input in the TALOSN program 13 to predict the backbone (f, y) torsion angles for each conformation of the PFA-bound p-clamp peptide 6. The predicted torsion angles do not vary significantly between different forms, so the (f, y) torsion angles were averaged and used as a starting point for further refinement of the side chain torsion angles.

c. Torsion angle prediction, REDOR simulations and structural modeling
The 13 C-19 F REDOR S/S 0 dephasing curves provide information on the distances between 13 C and 19 F labeled sites, thus restraining the position of the perfluoroaromatic tag with respect to the peptide. The PFA tag has a total of eight 19 F atoms, which makes de novo determination of 13 C-19 F distances through 9-spin simulations time-prohibitive. The experimental data points were first compared to two spin 13 C-19 F REDOR simulations. A set of simulated REDOR curves was generated using the dipolar coupling between one 13 C and one 19 F atom as input, corresponding to distances in the range of 1-12 Å in 0.1 Å increments (111 total curves). For each curve, the RMSD was calculated for each experimental REDOR data set, and the minimum RMSD then corresponds to the best fit two-spin 13 C-19 F distance (Fig. S51).
We next ran 5-spin SIMPSON simulations to see if we could better capture the complex nature of the experimental spin system, and adopted a model-dependent approach to determine the optimal side chain conformations that agree with the measured REDOR data. With the backbone fixed to the chemical-shift constrained (f, y) angles, we iteratively rotated the relevant sidechain dihedral angles and extracted the corresponding 13 C-19 F distances and atom positions. In the present case, the y and c 1 angle of Phe1, c 1 , c 2 and c 3 of Cys2 and c 1 of Phe4 were varied. The Phe1 y angle was rotated since Phe1 is the N-terminal residue of the peptide, and as such is not predicted by TALOSN. Cysteine does not usually have a c 2 or c 3 angle, but these are now present due to the PFA tag being attached. Since some of the 13 C signals overlap in the fully labeled peptide, peptides 7 and 8 were synthesized, one with only Phe1 uniformly labeled and the second with Pro3 and Phe4 uniformly labeled. For all six of the angles that were varied, each angle was set at either 60°, 180° or -60° based on the most common rotamers 14 , yielding a total of 729 dihedral angle combinations. With each combination of six dihedral angles, the extracted 13 C-19 F distances and relative 13 C-19 F dipolar orientation angles were used in 5-spin REDOR simulations using the SIMPSON program. 15 The five spins in the simulation reflect the 13 C site of interest and the four nearest-neighbor 19 F on the PFA tag. The resulting REDOR dephasing curves from these geometries were then compared with the experimental REDOR data, and the RMSD between the simulated and measured S/S 0 was calculated for each of the carbon and added together to obtain a total RMSD score for each angle combination.      Table S5. Summary of thermodynamic parameters (310 K) for cysteine arylation reaction.

7-Cys
116 ± 10 -48 ± 10 68 ± 14 26.8 X-C-P-X (X=PyrenylAla) * ΔH ‡ and ΔS ‡ were calculated by fitting ln(k/T) against 1/T with Eyring equation and the errors were obtained from the linear fitting. ∆G ‡ was calculated as ΔH ‡ -TΔS ‡ .     Methyl group was manually added to form The perfluoroaryl group was manually added, followed with geometry optimization to get lowest energy structures.