Dual binding in cohesin-dockerin complexes: the energy landscape and the role of short, terminal segments of the dockerin module

The assembly of the polysaccharide degradating cellulosome machinery is mediated by tight binding between cohesin and dockerin domains. We have used an empirical model known as FoldX as well as molecular mechanics methods to determine the free energy of binding between a cohesin and a dockerin from Clostridium thermocellum in two possible modes that differ by an approximately 180° rotation. Our studies suggest that the full-length wild-type complex exhibits dual binding at room temperature, i.e., the two modes of binding have comparable probabilities at equilibrium. The ability to bind in the two modes persists at elevated temperatures. However, single-point mutations or truncations of terminal segments in the dockerin result in shifting the equilibrium towards one of the binding modes. Our molecular dynamics simulations of mechanical stretching of the full-length wild-type cohesin-dockerin complex indicate that each mode of binding leads to two kinds of stretching pathways, which may be mistakenly taken as evidence of dual binding.

. Schematic representation of cellulosome organization in C. thermocellum. The cellulosomal structural and enzymatic subunits comprise modular components. The cellulosomal enzymes each carry one or more catalytic modules and a single dockerin module. The scaffoldin is multifunctional, whereby the nine cohesin modules (enumerated) integrate nine dockerin-bearing enzymes into the complex, the carbohydrate-binding module (CBM) binds the complex to the cellulosic substrate, and the C-terminal dockerin module is involved in anchoring the cellulosome complex to the bacterial cell surface. The modules are separated from each other by defined linker segments.
phenomenon does exist, but in each mode there are two kinds of the stretching trajectories. The two effects are mixed in the SMFS experiment since it is done without any control of the starting situation.
Thus we consider the existing evidence for the dual binding mode to be inconclusive and requiring further studies. In this paper, we provide another theoretical approach to the binding problem by analyzing the energy landscapes involved by using an empirical model known as FoldX 41,42 . In particular, we investigate the influence The three α-helices forming the Doc domain are labeled as 1 α ′, 2 α ′ and α ′ 3 . The Doc-bound Ca 2+ ions are shown as gray spheres. The Doc residues making contacts with the Coh in the binding mode I include R23, L22, N44, S45, T46 and R53. They are shown in the van der Waals representation. The Doc residues making contacts with the Coh in mode II include R57, L56, N10, S11, T12 and R19. The locations of R57, L56, N10, S11, T12 and R19 in mode II are analogous, respectively, to the positions of R23, L22, N44, S45, T46 and R53 in mode I. Panel (C) shows structures of the Coh-Doc complex given by the PDB:1OHZ (Coh in blue, Doc in red) and PDB:2CCL (Coh in cyan, Doc in orange). The Coh structures are superimposed. The spheres represent the dockerin-bound Ca 2+ ions. The Doc helices are not aligned, indicating the existence of the two different binding modes. Panel D shows the PDB structure 1OHZ. Coh and Doc are shown in blue and red, respectively. The black line shows the derived axis of the symmetry, denoted here as the Z-axis. The sense of the rotation is indicated at the top.
Recent experiments on the ScaA Doc module from R. flavefaciens show that physical interactions between an N-terminal Trp and a C-terminal Pro in Doc confer stability on it 43 . However, neither PDB:1OHZ nor PDB:2CCL, encompasses the corresponding residues at the N-and C-terminus of the C. thermocellum Doc. Both of these Coh-Doc structures lack the the N-and C-terminal tails in Doc. We modeled these terminal segments and, interestingly, discovered that they indeed have a substantial influence on the energy landscapes of the Coh-Doc complex. In the presence of the full-length N-and C-terminal tails in the WT Doc, the free energy of binding in modes I and II are found to be comparable, which is consistent with the fact that-despite widespread efforts-no crystal structure of the full-length C. thermocellum Coh-Doc complex has been ever resolved. In the absence of the terminal tails in the WT Doc, on the other hand, mode I is found to have a substantially lower free energy than mode II, which is consistent with the Coh-Doc structure PDB:1OHZ. The two-site mutation-independent of the presence or absence of the terminal tails in Doc-is found to shift the equilibrium significantly towards mode II, as exemplified by the crystal structure PDB:2CCL.
In addition to the free energy calculations, we also simulate an AFM-like stretching of the tail-extended Coh-Doc complex using a structure-based coarse-grained model [44][45][46][47][48][49] . Our particular implementation of the structure-based model is related conceptually to several other approaches [50][51][52][53] . It has been selected optimally out of 62 variants of structure-based models considered in ref. 54 by making comparisons to experimental data on stretching. For a given binding mode, we still observe two types of force-extension patterns which reflect different scenarios of structural rearrangements during the stretching. Interestingly, the two terminal tails in Doc make the Coh-Doc complex more stable mechanically. In particular, the presence of the tails in Doc affects the probabilities of observing the different types of the force-extension patterns.

Methods
Structural models of Coh and Doc. The Docs present in the PDB:1OHZ and PDB:2CCL correspond to the C-terminal docking domain of endo-1,4-β-xylanase Y from C. thermocellum. According to the Uniprot database, entry P51584, this Doc contains 69 amino acid residues. In keeping with the convention used in the entries PDB:1OHZ and PDB:2CCL, these residues are numbered from −5 to 64. However, the Doc structures given in PDB:1OHZ and PDB:2CCL have different sequential lengths: PDB:1OHZ contains the Doc residues with numbers from 1 to 56 whereas PDB:2CCL-residues from -2 to 59. When analyzing the Docs without the terminal tails, we truncate the Doc structure given in PDB:2CCL to retain only the residues with the sequential numbers ranging from 1 to 56. Otherwise, the residues at sites -5, -4, -3 are P, P, and V respectively; at sites 60, 61, 62, 63, and 64-D, K, F, P, and V respectively. Also the sequences of Cohs given in the two PDB entries have different lengths: PDB:1OHZ contains the Coh residues numbered from 5 to 144 whereas PDB:2CCL comprises the Coh residues from 5 to 153. We thus discard the Coh residues 145 through 153 from PDB:2CCL (when dealing with or without the tails in Doc) so that both Coh structures have identical sequences in our analysis.
We consider eight Coh-Doc complexes that are defined in Table 1. Here, we introduce the following notation: The capital letter C denotes the structure of Coh with the amino acid sequence comprising residue numbers from 5 to 144. The small letter d denotes the truncated Doc with the sequence comprising residue numbers 1 through 56. The asterisk denotes the Doc sequence with the two single-point mutations, i.e., S45A and T46A. The subscripts I and II indicate that the protein structures taken as an initial state for energy calculations are PDB:1OHZ and PDB:2CCL, respectively. This convention implies, for example, that the arrangement denoted by C I d I corresponds to the Coh-Doc corresponding to PDB:1OHZ, where the WT Doc makes a complex with Coh in mode I. Similarly, in the arrangement C II d I the Coh of 2CCL is bound to the Doc of PDB:1OHZ.
We also study systems in which the Doc of PDB:1OHZ undergoes the two-site mutation to make it sequentially identical to the Doc of PDB:2CCL. Such Doc is denoted in Table 1  Since neither PDB:1OHZ nor PDB:2CCL contains the N-and C-terminal tails of the Docs, we model these terminal segments in two ways: by using the Swiss Model 55 and iTASER 56 . Because of the uncertainty involved in protein structure modeling, in addition to using the two methods, we also consider three different PDB entries as template structures. They are listed in Table 2. The twelve resulting models of the Coh-Doc complex with the tails are summarized in Table 2. We use the convention in which the capital letter D denotes the full-length Doc which comprises the amino acid residues with the sequential numbers from −5 to 64. As before, the asterisk denotes the Doc sequence with the two single-point mutations.
As we explain in the subsequent paragraph, it is possible to determine a symmetry axis, Z, such that a rotation of Doc around Z transforms the Coh-Doc structure between the two binding modes. The geometry involved is illustrated in panel D of Fig. 2. Here, ϕ is the rotation angle around the Z-axis and the coordinates of Coh-Doc at Z = 0 and ϕ = 0 correspond to PDB:1OHZ. Similarly Z 0 ≈ and ϕ π ≈ correspond to PDB:2CCL. We use the convention in which positive values of Z correspond to shifting the two molecules closer together and negative values-to shifting them further away. The rotations and forward shifts are not implemented when prohibited by steric constraints.
The secondary structure of Doc consist of three helices: α ′ 1 constituted by the residues with numbers from 11 to 23, 2 α ′ formed by the residues ranging from 28 to 36, and α ′ 3 comprising residues from 45 to 56. Here, in keeping with the convention used in refs 40,57 the primed and unprimed symbols indicate the secondary structures belonging to Doc and Coh, respectively. In each of the binding modes only the first and third of these helices couple to Coh, which implies that the symmetry axis can be determined based only on 1 α ′ and 1 α ′. To determine the symmetry axis we use the following procedure: (i) We superimpose the Cohs of PDB:1OHZ and PDB:2CCL. (ii) We introduce a set of vectors formed by pairs of equivalent α-C atoms in helices α ′ 1 and 3 α ′ as described in PDB:1OHZ and PDB:2CCL. (iii) We rotate the coordinate system in such a way that the Z-components of these vectors are brought closer to zero. (iv) We iterate step (iii) to bring the Z-components of these vectors to zero. The Z-axis of the final, rotated coordinate system gives the symmetry axis. We find that switching from mode I to mode II requires a rotation around the Z-axis by ϕ = 174°. Rotating by 360° brings the system back to mode I. Free energy calculations using FoldX. The purpose of our calculations is to determine the free energy of the Coh-Doc system as a function of parameters Z and ϕ, which describe the location of Doc relative to Coh, as depicted in panel D of Fig. 2. One way to determine the energy landscape associated with the protein complex would be by using all-atom simulations and averaging over classes of conformations. However, this procedure would be way too costly numerically when repeating it for various values of Z and ϕ, even when adopting an implicit solvent approach. Instead, we use an empirical force field known as FoldX 41,42 which has been designed primarily for predicting free energy differences between a WT protein and its mutant. Here, however, the distinct structures correspond to different positions of Doc relative to Coh. We predict the structures and their free energies as a function of Z and ϕ. The prediction also applies to the Coh-Doc complexes with the mutated (S45A and T46A) or reverse-mutated (A45S and A46T) Doc. We consider the systems both with and without the tails in the  Table 2. The Coh-Doc complexes comprising the full-length Doc. As in Table 1, C I and C II denote the Coh structures derived from PDB:1OHZ and PDB:2CCL, respectively. D I and D II denote the full-length Doc structures associated with PDB:1OHZ and PDB:2CCL, respectively. They both comprise residues with sequential numbers from -5 to 64. The asterisk denotes the two single-point mutations (S45A and T46A) in Doc. The PDB-unavailable tail parts of the structures are derived either by using the Swiss Model or iTaser server, as listed in the fourth column. The fifth column indicates the PDB structure code of the templates used.
The sixth column provides the index, k, associated with the complex. The seventh column lists the values of the minimal free energy corresponding to mode I of binding whereas the eighth column-to mode II.
Scientific REPORTS | (2018) 8:5051 | DOI:10.1038/s41598-018-23380-9 Doc. The effects of the dockerin-bound calcium ions are included in the calculations. FoldX employs an energy function that consists of ten terms that are listed in Supplementary Information (SI). We perform the free energy calculations for two temperatures: T = 298 K, which is the FoldX default parameter, and T = 308 K, i.e., the maximum temperature for which the FoldX energy function has been parametrized. The free energy is optimized with respect to the side-chain conformations while the backbone atoms are kept at fixed positions. When not considering the sequence tails in Doc, the initial structures for the free energy minimization are as in the PDB structure files, as listed in Table 1. Otherwise, the structures are predicted by iTaser or the Swiss Model, as specified in Table 2. In all cases, the dockerin-bound calcium ions are included in the structural models and taken into account in the FoldX energy calculations 58 .
The resulting ΔG, minimized with respect to the side-chain orientation, is the free energy subject to the constraints on the positions of the backbone atoms of Coh and Doc. Therefore, ΔG depends not only on the coordinates Z and ϕ but also on the structural model taken as input for the calculations. In particular, selecting the lowest free energy is subject to a substantial uncertainty and, perhaps more importantly, does not provide a proper estimate of the binding strenght because it does not involve any information about the lateral extension of the minimal basins.
In order to estimate the free energy of binding in a more robust way, we perform a second stage of calculations in which ΔG determined for various complexes corresponding to the same sequence (mutated or not) serve as an input. In the case of Docs with tails, we also include the various of determining the structure. Specifically, we define the free energy of Coh-Doc binding in mode I as Here, the sum over index k corresponds to averaging over the input structures listed in Tables 1 or 2. The values of k are written in the last columns of these tables. Accordingly, computed for the input structure with index k. The unit step function, θ, is defined as follows: Thus the configurations (Z, ϕ) corresponding to the binding mode I are those with /2 /2 π ϕ π − < < and free energies ΔG k smaller than a cut-off value E c . As we show in the Results section, E I does not depend on the choice of the cut-off as long as E c is larger than about −25 kcal/mol. By analogy, the free energy of Coh-Doc binding in mode II is The configurations (Z, ϕ) corresponding to the binding mode II are those with π ϕ π < < /2 3 /2 and energies ΔG k smaller than a cut-off value E c .
If the equilibrium probability of finding the Coh-Doc complex in mode I is denoted by p I and in mode II by p II then The dual binding occures if the probabilities p I and p II are of the same order of magnitude.
Free energy of binding by using all-atom simulations. The employment of the empirical models described above allows for the elucidation of the free energy landscape through the usage of the Z and ϕ variables. It also allows for a considerable probing of the side-chain conformations. However, the drawback involved is the lack of the flexibility of the backbone. It is thus worthwhile to compare the binding energies to those obtained through all-atom molecular dynamics (MD) simulations.
To this end, we used the Amber 14 package 59 with the AMBER force field 99SB 60 and the TIP3P model for the molecules of water 61 . The Newton equations of motion were integrated by using the leapfrog algorithm with the time step of 2 fs. All bonds with the hydrogen atoms were constrained through the SHAKE algorithm 62 . The T was maintained around 300 K by adding terms corresponding to the Langevin dynamics 63 with the collision frequency of 2 ps −1 . A 10 Å cut-off was applied to all non-bonded interactions. The Particle Mesh Ewald method 64 was used to treat the long-range electrostatics.
The Coh-Doc complex (shown in Fig. S1 in SI) was solvated in a truncated octahedron box of water that is large enough to avoid interactions between the protein complex and its images in the adjacent cells. The system was neutralized by adding Na + ions. The energy of the system was then minimized in three stages: the first minimization was done with all atoms of the protein complex being restrained to let the water molecules move to empty places around the proteins; in the second minimization stage, only the backbone heavy atoms are restrained; and in the last stage, no restraints were applied. Each minimization stage involved 2500 steepest descent steps that removed steric clashes and then 2500 conjugate gradient steps to achieve quick convergence. After the energy minimization, the system was heated up from 0 to 300 K in 2.5 × 10 4 steps with weak restrains applied to the atoms of the Coh-Doc complex. This was followed by 2.5 × 10 4 relaxation steps at T = 300 K (with the restrains kept in place), which brought the system to the density of 1 g/cm 3 , and then by 2.5 × 10 5 steps of equilibration without any restrains. Both the energy minimization and the heating-up simulations were performed at constant volume. The subsequent relaxation and equilibration simulations were performed at constant pressure p = 1 atm. In the production runs, pressure was also maintained at p = 1 atm. The coordinates of all atoms of the Coh-Doc complex were recorded every 10 ps. Five independent trajectories of 200 ns were generated for each of the systems under study. For the recorded conformations of the complex, the root-mean-square distance (RMSD) from the native structure was computed by considering only the backbone atoms. An example of the dependence of RMSD on time is shown in Fig. S2 in SI. Based on the RMSD plots we selected, for each of the trajectories independently, a time interval in which RMSD attained a steady state and fluctuations in RMSD did not exceed 2 Å. The Coh-Doc conformations recorded in these time intervals were taken as input for implicit-solvent free-energy calculations, which were performed within the framework of the Molecular Mechanics/Poisson Boltzmann Surface Area (MM/PBSA) method.
The MM/PBSA method is based on calculating the free energy difference, ΔG bind , between the bound and unbound states Here, E ele Δ and E vdW Δ denote the electrostatic and van der Waals contributions. These energy contributions are the same as in the AMBER force field used in the MD simulations. ΔG PB and ΔG SUR are the polar and non-polar solvation energy terms. The entropic contribution, TΔS, is estimated by the normal mode approximation method using the mmpbsa_py_nabnmode program implemented in the AMBER package. The solvation energy was calculated by pbsa, which is also included in the AMBER package. The polar term, G PB Δ , was obtained by solving linearized Poisson-Boltzmann equation numerically. The non-polar term is defined by where SASA is the solvent-accessible surface area that was calculated by the LCPO method 65 . The regression coeficients α and β are set to 0.005 kcal mol −1 Å −2 and 0, respectively. For a given MD trajectory, ΔG bind is averaged over the selected time interval. The resulting values of time-averaged ΔG bind are summarized in Tables S1-S5 in SI.

Structure-based coarse-grained simulations.
We use a structure-based coarse-grained model [44][45][46][47][48][49] in which amino acid residues are represented by single beads centered on their α-C atoms. The beads are tethered together into chains by strong harmonic potentials with the spring constant where ε is the depth of the potential well associated with the native contacts, which serves as the basic energy scale in our model. We assume ε = ± (110 30) pN Å, as determined by benchmarking against experimental results for 38 proteins 45 . The native contacts are identified using an overlap criterion 49 applied to the coordinates of all heavy atoms in the native structure. Here, the van der Waals radii of the heavy atoms are taken from ref. 66 The effective spheres associated with the atoms, when checking for the overlaps, have radii which are 1.24 times larger 67 (this factor corresponds to the point of inflection in the Lennard-Jones potential). In addition, the amino acid pairs that are very close sequentially, (i, i + 1) and (i, i + 2), are excluded from the contact map. Examples of contact maps for Doc and Coh-Doc are shown, respectively, in Figs S3 and S4 in SI.
The interactions within the native contacts are described by the Lennard-Jones potential Here, r ij denotes the distance between residue beads i and j. The parameters σ ij are chosen so that each contact in the native structure is stabilized at the minimum of the Lennard-Jones potential. The contacts between the proteins are treated in the same manner as the contacts within the proteins as both sets are dominated by hydrogen bonds. The contacts between the dockerin and the Ca 2+ ions are determined on the basis of the overlap criterion 49 with the van der Waals radius of Ca 2+ being 1.53 Å, as in the AMBER force-field 68 . The interactions between the residues that do not form native contacts are purely repulsive and given by the truncated and shifted Lennard-Jones potential corresponding to σ = r / 2 ij 0 6 with = r 4 0 Å. The energy function comprises also harmonic terms that favor the native values of local chiralities in each amino acid chain 69 .
The solvent is implicit and the system evolves in time according to the Langevin dynamics. The overall force acting on a particular bead i is a sum of three terms: (i) the direct force F i that derives from all the potential energy terms, (ii) the damping force that is proportional to the velocity of the bead, and (iii) the random force, i Γ , that represents thermal noise. The corresponding equations of motion, , are solved by the fifth-order predictor-corrector algorithm with the time step of 0.005 τ. Here, γ is the damping coefficient, and all beads are assumed to have the same mass m. The dispersion of the thermal noise is given by γk T 2 B , where k B is the Boltzmann constant. The damping coefficient is set to m 2 / γ τ = . This value corresponds to the overdamped case-practically Brownian dynamics-and the characteristic time scale, τ, is of the order of 1 ns, as argued in refs 70,71 .
Thermal stability of dockerin. We use the structure-based coarse-grained simulations to investigate the thermal stability of the Docs with and without the terminal tails. To this end, we compute the probability of finding the Doc in the native state, P 0 , as a function of T. The definition of P 0 involves counting the conformations in which all native contacts are present. The native state probability P 0 is thus different from the average fraction of native contacts, Q, which is often used to characterize the deviation from the native state.
To compute P 0 (T) and Q(T), we perform MD simulations for T ranging from . Each MD simulation is preceded by a 10 4 τ equilibration run and gives τ × 3 10 5 of dynamics. At any given T, we run 101 Scientific REPORTS | (2018) 8:5051 | DOI:10.1038/s41598-018-23380-9 independent trajectories. P 0 is then determined as an average over the simulation time and over the trajectories. We perform the simulations for the Docs with the terminal tails (index k = 1) and without the terminal tails (index k = 9, 11 and 12).
Another way to assess thermal stability of proteins is to simulate their unfolding at elevated temperatures. The unfolding simulations start at the native state and finish when all nonlocal contacts get broken, which defines the unfolding time t unf . Specifically, the nonlocality refers to the sequential distance | − | > i j 4. At any given T between ε . k 0 9 / B and k 2 1 / B ε .
, we run 301 independent trajectories of 10 5 τ each and, thus, obtain 301 values of t unf . As the characteristic unfolding time, t u , we take the median of the distribution of unfolding times t unf . We perform the simulations for the Doc with the tails (k = 1) and without the tails (k = 9,11 and 12).
Stretching simulations of the Coh-Doc complex. We use the structure-based coarse-grained simulations to investigate also the mechanical stability of the Coh-Doc complex. Stretching of the Coh-Doc complex is implemented by attaching two harmonic springs to the N-terminal amino acids of Coh and Doc. (Other ways of pulling are discuss in ref. 40 ). The N-terminus of Doc is denoted as N' and the C-terminus as C' . One of the springs is fixed in space and the other one is moved at a constant speed, v p , so that the distance it travels in time t is d v t p = . The force constant of the pulling springs is taken as , which corresponds to about 1 pN/nm and is close to the elasticity of typical AFM cantilevers 45 . All pulling simulations are performed for T k , which is near-optimal in folding kinetics and is of the order of room temperature.
In our simulations, the response force F acting on the pulling spring is measured and averaged over time periods that correspond to the spring displacements of 0.5 Å 45 . The F-d curves may come with several peaks, and the height of the largest of them is denoted by Fmax. As in ref. 40 we perform simulations for the pulling speed nm/ms which is close to the experimental speeds. All of the pulling simulations start from the native state. In the course of the simulations, the breaking and re-formation of native contacts is followed in time. The native contact between residues i and j is considered broken if the inter-residue distance r ij exceeds a cutoff length of 1 5 ij σ .
. Due to thermal fluctuations, the broken contacts may get re-established. To characterize the unfolding and dissociation patterns, we record the spring displacements at which the native contacts break for the last time.
Data availability. All data generated or analysed during this study are included in this published article (and its Supplementary Information files). Figure 3 illustrates the FoldXderived free-energy landscapes for systems C I d I (panels A, B and C) and C d I I ⁎ (panels D, E and F). The panels A and D are color-maps showing ΔG k for k = 1 and k = 5, respectively, as a function of Z and ϕ. (Here, ΔG is minimized with respect to the side-chain orientations for the input structures with indices k = 1 and k = 5). The value of ΔG is indicated by colors according to the scale bar on the right-hand side. The regions with smallest/largest values of ΔG are marked in blue/red. We note that there are two pronounced local minima in ΔG. The first one is located around ϕ = 0 (binding mode I), whereas second one is around ϕ = π (binding mode II). More precisely, for system C I d I , the two minima are located at ϕ = 3° and ϕ = 173°. They have different depths, −38.2 and −28.7 kcal/mol, and the parameter region corresponding to mode I has a larger area in the (Z, ϕ) plane. For system C d I I ⁎ , the two minima are located at ϕ = 3° and ϕ = 173°. Their depths are −35.2 and −28.1 kcal/mol. Panels B and E in Fig. 3 show the Z-dependence of ΔG for two fixed values of ϕ, i.e., for ϕ = 3°, corresponding to mode I (upper subpanels), and for ϕ = 173° (panel B) or ϕ = 172° (panel D), corresponding to mode II (lower subpanels). The data points are fitted to parabolas around the minima to help estimate their depth. Panels C and F in Fig. 3 show the ϕ-dependence at Z equal to the value at the deepest minimum, Z min . The solid lines are curves that are fitted to the data points.

Energy landscapes of Coh-Doc interactions. Docs without the tails.
The values of ΔG min , corresponding to the two binding modes for the eight complexes under study, are listed in Table 1, columns number five and six. For a given binding mode (I or II) and a given sequence (WT or two-site mutant), the values of ΔG min depend on k because the positions of the backbone atoms are not identical in the structural models that are taken as input for the FoldX calculations. For the systems comprising the WT Doc, for which k 1, ,4 = … , the values of ΔG min are observed to be scattered between −38.2 and −35.1 kcal/mol in mode I and between −35.5 and −28.7 kcal/mol in mode II. For the systems comprising the mutated Doc, with = … k 5, ,8, they are seen to be between −36.0 and −31.0 kcal/mol in mode I and between −34.0 and −28.5 kcal/ mol in mode II.
The probability that Doc binds Coh in mode I or II is not determined only by the corresponding values of ΔG min but also by the widths of the two valleys in the Coh-Doc energy landscape. For this reason we consider the binding free energies F I and F II as well as the binding probabilities p I and p II as defined by Eqs (1-3) with the condition that p I + p II = 1. The binding free energies F I and F II for the WT sequence are given by Eqs (1 and 2) with k 1, ,4 = … . By analogy, the binding free energies F I and F II for the two-site mutant are given by Eqs (1 and 2) with = … k 5, ,8. Figure 4A shows F I and F II as a function of the energy cut-off E c for the WT case. We observe that F I and F II saturate above a threshold of about −25 kcal/mol, and we take the saturation values as estimates of the binding energies (from now on F I and F II denote the saturation values).
For . Thus mode I is still more preferred than mode II, but the equilibrium shifts towards mode II. It should be noted, however, that FoldX does not strictly endorse mode II for the mutated sequence that is evidenced by PDB:2CCL.
The MM/PBSA calculations yield the following results: G 36 3 2 9 bind Δ =− . ± . kcal/mol for the binding of the WT Doc to the Coh in mode I, i.e., for the Coh-Doc system with index k = 1, and Δ =− . ± . G 34 5 2 9 bind kcal/ mol for the binding of the mutated Doc the Coh in mode II, i.e., for the Coh-Doc system with index k = 8. These binding energies are comparable within the statistical error. They are also consistent with the binding free energies obtained from the FoldX calculations.
Docs with the tails. Figure 5 illustrates the FoldX-derived free-energy landscapes for systems C II D II (k = 10; panels A, B and C) and C D II II ⁎ (k = 16; panels D, E and F). Overall, they resemble the energy landscapes obtained for systems C I D I and ⁎ C d I I , in which the Docs lack the terminal tails (compare with Fig. 3). The noticeable differences are slight shifts in the minimum-energy angles ϕ. In the case of system C II D II , ΔG min favors binding mode II by 2 kcal/mol. In the case of system ⁎ C D II II , however, ΔG min for mode I is smaller by 0.5 kcal/mol than ΔG min for mode II. The values of ΔG min corresponding to modes I and II for k between 9 and 20 are listed in Table 2.
The binding free energies for the WT system (k 9, ,14 = … ) are F I = −29.2 kcal/mol and F II = −30.2 kcal/mol, as can be seen in Fig. 4C. Thus mode II has a somewhat lower free energy than mode I: F I -F II , is only of order of 0.5 kcal/mol. This implies that p p / 04 I II ≈ . , which suggests that the WT full-length Doc should bind its partner in modes I and II with comparable probabilities. This result rationalizes the fact that no crystal structure of the full-length Coh-Doc WT complex has ever been solved. When the sequence is truncated, mode I is preferred, and that is why it was crystallized as PDB:1OHZ. It should be noted that none of our predicted models indicates any presence of flexible or disordered parts in the tails. Such a flexibility would provide the usual mechanism of not having a well defined structure.
For the system with the double-site mutation k ( 15, ,20) = … , we get F I = −29.6 kcal/mol and F II = −32.4 kcal/ mol, as can be seen in Fig. 4D We also use the MM/PBSA method to compute the binding free energy between the Coh and the Doc with the tails. For the system in which the Coh binds in mode I to the WT Doc, we obtain Δ =− . ± . G 39 4 7 2 bind kcal/  Table 1 for index k = 1, and for system ⁎ C d I I (panels D-F) corresponding to index k = 5 in Table 1 mol. For the system in which the Coh binds in mode II to the mutated Doc, the MM/PBSA method yields Δ =− . ± . G 43 6 6 0 bind kcal/mol. The difference in the binding free energies is only about 4 kcal/mol and, thus, is comparable to the statistical error on ΔG bind . Therefore, one can not state with confidence which of the two systems is bound more tightly. However, the difference between F II for the full-length Coh-Doc* complex and F I for the full-length Coh-Doc complex, as obtained from the FoldX calculations, is about 3 kcal/mol, which compares well with the results obtained in the framework of the MM/PBSA method.
For the WT Coh-Doc complex in the binding mode I and II, the MM/PBSA method yields G 39 4 7 2 bind Δ =− . ± . kcal/mol and Δ =− . ± . G 45 5 7 3 bind kcal/mol, respectively. The difference in these two energies is smaller than the statistical error of the MM/PBSA calculations, which supports the hypothesis of dual binding. We also note that truncation of the terminal tails from the Doc leads to an increase inΔG bind from −39.4 ± 7.2 kcal/mol to −36.3 ± 2.9 kcal/mol for the WT system in mode I, and from −43.6 ± 6.0 kcal/mol to −34.5 ± 2.9 kcal/mol for the mutated system in mode II. These results indicate that the presence of the tails in the Doc enhances the Coh-Doc binding affinity.
The effects of heating. We have used FoldX to perform analogous free-energy calculations for T = 308 K, which is 10 K higher than the room temperature considered so far. The results of these calculations are shown in Fig. S5 in SI. For the Coh in complex with the Doc without the tails, we obtain the following free-energy values: F I = −30.8 kcal/mol and F II = −26.4 kcal/mol for the WT system (Fig. S5A) and F I = −28.6 kcal/mol and F II = −26 kcal/mol for the mutated system (Fig. S5B). Therefore, mode I is seen to be dominating in both cases. For the Coh in complex with the Doc containing the tails, we obtain the following results: F I = −20.8 kcal/mol and F II = −21.9 kcal/mol for the WT system (Fig. S5C) and F I = −22.8 kcal/mol and F II = −23.7 kcal/mol for the mutated one (Fig. S5D). These results show that mode II is somewhat more favorable than mode I but the free-energy difference between the two modes is only of the order of 1 kcal/mol. We thus conclude that the dual binding persists at T = 308 K.
The results of coarse-grained simulations shown in Fig. S6 indicate that the WT Doc with the tails is thermally more stable than without the tails. For instance, the characteristic time of thermal unfolding, t u , at a given temperature, is longer for the Doc with the tails than for the Doc without the tails. We observe this dependence both for the WT Doc and for the Doc with the two-site mutation. The corresponding simulation data are shown in the upper and lower panels of Fig. S6. In addition, the characteristic temperature at which the average fraction of native contacts, Q, takes the value Q 1 2 = is higher for the Doc with the tails than for the Doc without the tails (data not shown). However, another characteristic temperature, T 0 , at which P 0 crosses 1 2 does not distinguish between the systems outside of the error bars (data not shown). Nevertheless, these observations, taken together, are consistent with recent fluorescence experiments on the ScaA dockerin from R. flavefaciens indicating that interactions between the N-and C-terminal tails in Doc have a significant influence on thermal stability 43 .
The differences in the thermal stability can be explained, to some extent, in terms of the contact map shown in Fig. S3. The Doc in PDB:1OHZ has 137 native contacts, according to the overlap criterion that underlies the coarse-grained simulations. By building the terminal tails into the dockerin structure, the number of native contacts is increased by 20 to 23, depending on the procedure used to generate the full-length structure (20 contacts for iTaser with the 1OHZ template, 23 contacts for iTaser with the 4DH2 template, and 22 for Swiss modeller with the 4DH2 template). Similarly, the mutated Doc in PDB:2CCL has 134 native contacts and including the tails adds between 29 and 30 new contacts (30 contacts for iTaser with the 2CCL template, 29 contacts for iTaser with the 4DH2 template, and 30 for Swiss modeller with the 4DH2 template). In both cases, the increased number of the native contacts results in the enhanced thermal stability. These new contacts are not related to the presence of the Ca 2+ ions.
Stretching of the Coh-Doc complex with tails. We follow the procedure and notation as described in ref. 40 except that now we consider Docs with the tails. We illustrate our findings only for C I D I (k = 9) and ⁎ C D II II (k = 16). Examples of the force-displacement (F-d) curves are shown in Fig. 6, the left and right panels respectively.
The figures also show displacements at which particular types of contacts are breaking. For instance, β β − 1 2 ,9 indicates contacts between 1 β -2 β and β 1 -9 β in Coh, whereas α ′ − ′ N 1,2 means contacts between the N-terminal region on Doc and the first two helices in Doc. The former contacts break first in ⁎ C D II II and the latter in C I D I . The interesting observation is that for both binding modes, there are short and long trajectories. For C I D I , about 66% of the trajectories (100 trajectories were considered) are of the long type and 34% of the short type. The sequence of the unravelling events in the short trajectory is like in the short trajectory for PDB:1OHZ. In the long trajectory, it is similar to the sequence in the dominant trajectory for PDB:1OHZ (the middle panel in Fig. 4 in ref. 40 ).
For C D II II ⁎ most of the trajectories are short. Only 3% were long, but for the corresponding system without the tails, we have not observed any long trajectories. Both kinds of the trajectories start with unravelling of β β − 1 2 ,9 , Figure 5. Analogous to Fig. 3 but for system C II D II (panels A-C) corresponding to index k = 10 in Table 2 whereas for PDB:2CCL, i.e. without the tails, they do with unravelling of α ′ − ′ N 1 (as shown in ref. 40 ). Whenever a trajectory for the system with the tails has a corresponding (long or short) trajectory for the system without the tails, the force peaks are found to be identical within the thermal noise of of 0.1 ε/Å. However the locations of the peaks and of the points of dissociation are often shifted. For instance, long trajectories in C I D I result in dissociation around 215 Å but in C I d I -around 180 Å.

Conclusions
Our computational studies show that the full-length WT Coh-Doc complex exhibits dual binding at room temperature, and the ability to bind in the two modes persists at temperatures elevated by 10 K. At the same time, our MD simulations indicate that each mode of binding leads to two kinds of dissociation pathways in AFM-like stretching trajectories. In our opinion, experimental tests of dual binding require other kinds of spectroscopy experiments. In addition, it would be interesting to find other examples of protein complexes with dual binding. Our MD simulations show also that the full-length Doc is thermally more stable than the truncated Doc missing the terminal tails. This result is fully consistent with recent fluorescence experiments 43 . Taken together, our computational studies provide a detailed analysis of the Coh-Doc energy landscape and of the role of the short, terminal segments in the Doc module.