Dimeric interactions and complex formation using direct coevolutionary couplings

We develop a procedure to characterize the association of protein structures into homodimers using coevolutionary couplings extracted from Direct Coupling Analysis (DCA) in combination with Structure Based Models (SBM). Identification of dimerization contacts using DCA is more challenging than intradomain contacts since direct couplings are mixed with monomeric contacts. Therefore a systematic way to extract dimerization signals has been elusive. We provide evidence that the prediction of homodimeric complexes is possible with high accuracy for all the cases we studied which have rich sequence information. For the most accurate conformations of the structurally diverse dimeric complexes studied the mean and interfacial RMSDs are 1.95Å and 1.44Å, respectively. This methodology is also able to identify distinct dimerization conformations as for the case of the family of response regulators, which dimerize upon activation. The identification of dimeric complexes can provide interesting molecular insights in the construction of large oligomeric complexes and be useful in the study of aggregation related diseases like Alzheimer’s or Parkinson’s.

Furthermore, due to its simplicity, modifications and exploration of conformational changes driven by different interactions in the system is straightforward to implement. Therefore, SBM are suitable to study conformational changes related to folding, functional mechanisms and binding in proteins.
The Hamiltonian of a SBM can be represented as follows: where V B (r ij ) and V NB (r ij ) stand for the potentials between covalently bound and noncovalent atom interactions, respectively. The potential that describes covalent interactions can be further described by potentials accounting for each internal degree of freedom of the molecule, as shown above: 2) where N corresponds to the native reference state used to generate the Hamiltonian.
The first term accounts for the energy changes due to variations in the bond length between two atoms. Similarly, the second and third terms consider the energy changes due to variations of binding angles and dihedrals relative to the equilibrium positions, θ ijk N and Φ ijkl N , respectively. The constants k a , k b and k d penalize deviations of bonds and angles from the native structure, biasing the system changes towards the native conformation.
The potential that describes the non-covalent interactions in a SBM Hamiltonian can be expressed as follows: The type of non-bonded interactions can be selected by considering value of C ij = 1 for pairs in contact in the contact map of a reference structure and C ij = 0 for pairs that do not make contact. The last condition defines V NB (r ij ) as a simple repulsion interaction that accounts for excluded volume of atoms. The repulsion term in the equation above is defined as: where ε is a reduced unit of energy (ε = k b T) and d is the atomic radius. Historically, SBM Hamiltonian employs a Lennard-Jones (LJ) type potential to stabilize interactions between atoms that are in contact in the reference structure. In the case of coarse grained (C α ) models, this potential is expressed as shown below: The shape of this LJ potential is completely defined by the distance between the bead pair r ij . A recently developed and powerful method used in this work allows controlling the form of the non-covalent interaction potential replacing the LJ potential by a similar Gaussian potential that can have its parameters independently modified 4

. This
Gaussian potential is defined as follows: where G ij (r ij ) is given by: The product V R ij (r ij )G ij (r ij ) ensures that the minimum of the combined potential is at the native distance. The shape parameters A ij and w ij control the amplitude (well depth, or the interaction force factor) and the decay (well width) of the Gaussian SBM potential, respectively.

Supplementary Figures
Supplementary Figure S1. RMSD progression of the SBM+DCA methodology for all the proteins studied. The different gray tones indicate different stages where the contact distance parameters are gradually being decreased from 50 Å to 8 Å and the decay w of the Gaussian curves is also reduced from 4 to 0.5. The RMSD is computed using the dimeric structures with the PDB accession code shown in each graph. The iRMSD curves have a very similar behavior with slightly lower values in general.

Supplementary Figure S2. Contact maps of different dimeric systems and their predicted contact maps.
The upper triangular map shows the native monomeric contacts (brown) along with the native dimeric contacts (orange). The circular symbols represent the top couplings estimated using DCA, the solvent accessibility criterion and removing contacts close to the monomeric map. The lower triangular map shows the best complex prediction. Monomeric contacts are shown in blue and resulting dimeric contacts in green. A comparison between the native dimeric (orange) and predicted dimeric (green) as well as the DCA couplings shows that only a few coevolutionary contacts are needed to be able to recapitulate the remaining contacts that seem to be formed as a consequence of bringing the couplings together.  Table S4 for all the comparisons.

Supplementary Tables
Supplementary Table S1. Predictive performance of the SBM+DCA methodology for dimers studied in this work. The comparison with native structures is done using the complex RMSD and iRMSD. The average RMSD/iRMSD was computed using the frames of last stage of the methodology including 2000 structures in each system.

Supplementary Movies
Supplementary Movie S1. The movie shows the output frames of a simulation using the SBM+DCA protocol for protein tRNA methyltransferase. At the beginning the monomers are unbound but as simulation progresses dimer formation occurs until reaching a RMSD of 1.5 Å with respect to the experimental structure PDB 1UAL.