Revisiting sulfur H-bonds in proteins: The example of peroxiredoxin AhpE

In many established methods, identification of hydrogen bonds (H-bonds) is primarily based on pairwise comparison of distances between atoms. These methods often give rise to systematic errors when sulfur is involved. A more accurate method is the non-covalent interaction index, which determines the strength of the H-bonds based on the associated electron density and its gradient. We applied the NCI index on the active site of a single-cysteine peroxiredoxin. We found a different sulfur hydrogen-bonding network to that typically found by established methods, and we propose a more accurate equation for determining sulfur H-bonds based on geometrical criteria. This new algorithm will be implemented in the next release of the widely-used CHARMM program (version 41b), and will be particularly useful for analyzing water molecule-mediated H-bonds involving different atom types. Furthermore, based on the identification of the weakest sulfur-water H-bond, the location of hydrogen peroxide for the nucleophilic attack by the cysteine sulfur can be predicted. In general, current methods to determine H-bonds will need to be reevaluated, thereby leading to better understanding of the catalytic mechanisms in which sulfur chemistry is involved.


Re-refinement of the models for AhpE improves protein stability during MD simulations
To study the interactions of the sulfur as a hydrogen bond donor or acceptor atom, we used the enzyme Alkyl hydroperoxide reductase E (AhpE) from Mycobacterium tuberculosis as a case study, since it is a one-cysteine (Cys45) peroxiredoxin (Prx) and the protein is relatively moderate in size (153 amino acids). AhpE is part of the defense mechanism of this pathogen through its ability to detoxify peroxides through sulfur oxygen chemistry on its cysteine. The accompanying reaction mechanism has been well characterized, thus allowing the validation of our computational results. The active site of AhpE is of particular interest, since it is one of the few in its family that has no resolving cysteine. Furthermore, the Cys pK a (5.2) suggests that the majority of the cysteine population should be present in the thiolate form 1 .
Both the reduced (PDB: 1xxu) and the sulfenylated (PDB: 1xvw) form have been crystalized 2 . AhpE is a monomer in solution, but the crystal structure shows 2 dimers per asymmetric unit, which are denoted chains A and B. We decided to use both of these chains in our analysis, because the cysteine environment in both chains is different. In chain A, the side chain of the conserved arginine residue (Arg116) is located away from the cysteine and directed outwards to the flexible β9-α5 loop, while in chain B it is turned inwards to the active site cysteine (Fig. S1). For other peroxiredoxins, the inwards orientation, as observed in chain B, has been described as the active form, and the activity of the enzyme has been coupled to the movement of Arg116 3,4 . However, after inspection of the electron density maps of the structures available in the PDB, we observed that there are several questionable sites, justifying a re-refinement of the protein structures.
We started by looking at the updated AhpE structures found in the PDB REDO database 5 , since they were found to be significantly better than the ones in the PDB. After several cycles of manual re-modelling and REFMAC 6 re-refinement, the new structures of the reduced and sulfenylated AhpE were deposited to the PDB under accession codes 4x0x and 4x1u, respectively.
The R-factors decreased with 3%, so our new models of AhpE are an improved representation of the experimental data (Table S1). Based on the electron density maps and taking into account the chemical environment, ten highly improbable amino acid rotamers per chain were optimized, and several side chains were flipped (Table S2). Five molecules of the cryo-protectant glycerol and several missing water molecules (158 in case of 1xxu, and 54 in case of 1xvw) were added, and they all contributed to the improvement of the model and the decrease of the R-factors. To check whether the fitting of the protein chains to the density map also would improve the R-factor, we performed another REFMAC refinement on the re-refined structure in the absence of the small molecules. It turned out that the new waters only account for ~ 0.5 % decrease in the R-factors. So the majority of the improvement is due to the re-refined protein chains.
Important changes include Gln46, the amino acid adjacent to the peroxidatic cysteine. Flipping the side chain of Gln46 allows it to be involved in H-bonding interactions with the side chains of the conserved Trp80, Ser84 and Asp50, which may contribute to the stability of the active site (Fig. S2c). The new interaction to Trp80 is of special importance, since this residue is supposed to be predominantly causing the decrease of fluorescence signal upon sulfenylation of AhpE 1 , which effect is used to characterize the kinetic behaviour of the enzyme 7 . According to the crystal structures, the main change in the environment of Trp80 upon oxidation of AhpE is the change in oxidation state of Cys45. As the Cys45 side chain lays only 5 Å away from the benzene part of the indole ring, it can influence the electron density of this Trp80.

Gauche minus is the active conformation of the peroxidatic cysteine
In the active site of chain A of the re-refined reduced AhpE structure(4x0x), the electron density map implies a different conformation of the peroxidatic Cys from the one in the PDB. In 1xxu, Cys45 A was modeled in gauche plus conformation 2 , with χ 1 of +61°; however, both the 2F o -F c and the F o -F c density maps showed that the gauche minus conformation is more probable (Fig.   S2a). At an rmsd level of 3, a small negative peak on the F o -F c difference density map was observed at the original position of the Cys45 side chain, while a positive peak on the map suggested a possible alternative location of the side chain. Therefore, Cys45 A was modeled both as adopting only the gauche minus conformation, and as a 50-50% mixture of the gauche minus and plus conformers, and both structures were subjected to 10 REFMAC refinement cycles. The refined density map supports the gauche minus form as the most probable one.
As an alternative, Cys45 was also modeled and refined in its oxidized form as a sulfenic acid, but this results in a negative peak in the difference density map. Apart from that a positive peak in the difference density map was observed 2.6 Å away from the side chain of Cys45 A (Fig. S2a), similarly to its non-crystallographic symmetry mate, the Cys45 D . Therefore, these densities were both modeled as water molecules. After this modification, the side chain of the peroxidatic cysteine pointed towards the surface of the protein with the χ 1 torsion angle close to -60°, similarly to the three other protein chains in the asymmetric unit. Based on the analysis of about 80 homologue peroxiredoxin structures of the PDB, it can be concluded that the gauche minus conformation of the peroxidatic cysteine dominates. This conformation was detected in more than 80 % of the peroxiredoxins, including peroxiredoxins in complex with hydrogen peroxide 8 , which suggests the gauche minus conformation as being the active conformation of the peroxidatic cysteine.

Figure S1. Structural overlay of chains A and B of AhpE.
The inset shows the conformational differences between the side chains of Arg116 in the two chains. Figure S2 .The conformation of the AhpE active site Cys45 and Gln46 residues after re-refinement and after MD equilibration. The original PDB entry is shown in yellow carbon atoms, while the re-refined structure is shown in green carbon atoms. (a) The original structure represents the peroxidatic Cys45 of chain A in gauche plus conformation, while the electron-density maps imply a gauche minus conformation with Cys45 pointing outwards the binding cleft. (b) The differences in the conformation of Cys45 from the input structures are retained in the MD equilibrium structures. (c) The fitting of Gln46 in the density map at 1.9 Å resolution does not improve by flipping the side chain of Gln46, although it provides a chemically more reasonable interactions with Trp80, Ser84 and Asp50. (d) During the MD simulation the active site remains more stable in the re-refined structure than in the original one.

Re-refinement method
Coordinates and structure factor amplitudes were retrieved from the PDB server. The mmcif file containing structure factors was converted to mtz format using the mmcif2mtz tool using the CCP4 suite 9 . Electron density maps were generated with the help of the Electron Density Server hosted at Uppsala University (http://eds.bmc.uu.se/eds). Maps were visualized by the program COOT 10 Table S1. everything below 85º to a scale of 0, while for angles from 95º to 180º it is set to 1 (Scale H-A-X ). Values between 85º and 95º are scaled from 0 to 1. When X is a hydrogen atom, the scales are adjusted to a 10º lower range. The cutoff for E HBo to count a H-bond is 6.25 kJ/mol, implying that the calculated energy needs to be at least 25% of the optimum value for a H-bond.

The Non-Covalent Index (NCI) approach
The NCI approach allows identification of interactions in real space, based on the peaks that appear when plotting the reduced density gradient (s) versus density (ρ). The reduced density gradient is given by , since the use of exponential functions to build the promolecular density allows first and second derivatives to be obtained analytically 12 . To obtain the NCI plots, a cubic grid of side length 20 Å with the peroxydatic sulfur at the cube center ( Fig. S3) in which the promolecular density (ρ) and the reduced density gradient (s) was computed.
The strength of these interactions is derived from the density value at these low-gradient peaks. When more density accumulates between two different atoms, the stronger the interaction will be, which results in an increase of the density value of the low-gradient spike. Consequently, dispersion interactions, like van der Waals, appear at very low-density values (ρ < 0.01 a.u.), whereas stronger interactions, like H-bonds and steric hindrance, appear at higher density values (ρ > 0.01 a.u.). In this way, the nature of the atoms involved is taken into account, while in most of the conventional methods the distance of the H-bond is indicative of its strength: shorter H-bonds means stronger interaction. To distinguish between attractive and repulsive interactions, the NCI method uses the sign of the second eigenvalue of the electron-density Hessian matrix (λ 2 ): λ 2 < 0 means bonding interactions, while λ 2 > 0 defines non-bonding ones. λ 2 is derived from the Laplacian (∇ 2 ρ), a widely used tool to distinguish between the different types of strong interactions 13 , when expressed as the three components along the three principal axes of an interaction (equation 3).
with λι (i=1, 2, and 3) being the three eigenvalues of the electron-density Hessian matrix.
The combination of the regions of low density and low reduced gradient (equation 2) with the sign of λ 2 (equation 3) gives a unique pattern for the different type of non-covalent interactions. Therefore, the reduced gradient density (s) is plotted against the product of the sign(λ 2 ) and the electron density function (ρ) to generate a bi-dimensional NCI plot (Fig. S3a). This plot can be divided into three regions for different types of NCI; H-bonds are found in the region of -0.06 a.u. < sign(λ 2 )ρ < -0.01 a.u., van der Waals are in the region of -0.01 a.u. < sign(λ 2 )ρ < 0.01 a.u. and steric hindrance is found in the region of 0.01 a.u. < sign(λ 2 )ρ < 0.06 a.u.
Finally, the low-gradient iso-surfaces are plotted in real space (3D) and the isosurfaces are colored according to their relative strength using the value of sign(λ 2 )ρ. A RGB-color scale (red-green-blue) is normally used, where red isosurfaces represent repulsive interactions, green very weak van der Waals interactions and blue attractive interactions (Fig. S3b).

Computations using the Promolecular Density
In order to assess how the use of promolecular densities affects the NCI results, we have performed NCI calculations for model arginine···cysteine complexes with the peroxidatic Cys as thiolate and thiol forms. The starting geometries for Arg···CysSand Arg···CysSH were taken from the equilibrated structure of chain B of the re-refined AhPE. The geometries of both complexes were fully optimized and characterized at the M06-2X/6-311+G** level of theory. 14 All the optimizations were performed with the universal solvation model 15 using a dielectric constant of 20. The use of an implicit solvation model is the most common technique of mimicking the environment at the proteins surface. 16 Then, the NCI calculations were carried out using the self-consistent DFT and the promolecular densities and the results are shown in Figure S5. For both complexes, the NCI results at the self-consistent and promolecular level are quantitatively equivalent. The low-gradient peaks associated to H-bonds remain at very similar density values, as can be inferred from the strength collected in Table S3. The largest shift towards smaller density values were observed for the peaks corresponding to nonbonding interactions upon density relaxation, introducing less repulsion and greater stability. Therefore, the use of promolecular densities becomes a reliable approach to establish the H-bond network of the large AhpE.     Table S5. Difference between the H···A distance and the R min distances of the heavy atoms involved in the H-bond. R min = ½(R min,D + R min,A ) with R min,A is the minimum of the Lennard-Jones potential between two atoms A, as implemented in CHARMM.