Spatio-temporal coordination among functional residues in protein

The microscopic basis of communication among the functional sites in bio-macromolecules is a fundamental challenge in uncovering their functions. We study the communication through temporal cross-correlation among the binding sites. We illustrate via Molecular Dynamics simulations the properties of the temporal cross-correlation between the dihedrals of a small protein, ubiquitin which participates in protein degradation in eukaryotes. We show that the dihedral angles of the residues possess non-trivial temporal cross-correlations with asymmetry with respect to exchange of the dihedrals, having peaks at low frequencies with time scales in nano-seconds and an algebraic tail with a universal exponent for large frequencies. We show the existence of path for temporally correlated degrees of freedom among the functional residues. We explain the qualitative features of the cross-correlations through a general mathematical model. The generality of our analysis suggests that temporal cross-correlation functions may provide convenient theoretical framework to understand bio-molecular functions on microscopic basis.

physical mechanisms leading to the non-linear couplings is not understood, primarily due to lack of experimental probes. Moreover, time is introduced in the formalism on an ad-hoc basis 17 .
Alternatively, temporal cross-correlations of fluctuations of two physical quantities A(t) and B(t′ ) at times t and t′ with respect to their mean values, also known as two-point correlations functions 18,19 , are used to describe time scales of correlated stochastic processes 20 . The equal time correlation function, t = t′ , is the statistical Pearson Correlation. Temporal cross-correlation functions can be thought of generalization of Pearson correlation in time domain. Two-point correlation functions do not contain information on non-linear coupling. One major advantage of two-point correlation functions is that they are experimentally accessible by scattering techniques. Interestingly, temporal cross-correlations between fluorescence intensities show asymmetry with inversion in time which has been used to study co-localization of proteins 21 . This observation seems to suggest that temporally causal connections can be extracted from the two-point correlation functions. The power of two-point cross-correlation functions has not been exploited for in-depth understanding of bio-molecular phenomena. Time dependent dihedral cross-correlation functions (TDCF) have been employed for correlation between protein residues only up to a few hundred pico-seconds (ps) 22 , far too low compared to the bio-molecular binding time scales to have functional relevance. We have established in an earlier work 23 from much longer simulations that TDCF can relate large scale changes in a protein upon ligand binding.
With this backdrop we examine here the TDCF by long computer simulations and mathematical modeling to understand functional co-ordination among residues. We show that the TDCF can explain the causal connection between the functional residues of Ub in the time scale of tens of nano-seconds. We explain the qualitative features of TDCF using simple mathematical model. Interesting aspect of our result is that TDCFs contain temporal information which may be useful to understand biological processes which are orders of magnitude slower than atomic motions without invoking the non-linear effects.

Molecular Dynamics and TDCF
We perform 1.05 microseconds (μ s) long all-atom MD simulations 24 (see Methods, Note 1) for Ub with initial input from crystal structure 2 in explicit water. We analyze data using the portion of the simulated trajectory where the root mean squared deviation (RMSD) of the backbone atoms is saturated (excluding initial 50 ns, see Supplementary Fig. S2). We calculate the dihedral angles for backbone (ϕ, ψ) and side-chain (χ 1 ) of the residues of the protein. We plot the dihedrals χ 1 for residue pairs Isoleucine: I13 and Phenylalanine: F45 denoted by χ 1 I13 and χ 1 F45 respectively as functions of time in Fig. 1(a). I13 and F45 have backbone distance (d α−α ), given by that of their C α atoms as large as 1.5 nm. Despite that both χ 1 I13 and χ 1 F45 exhibit correlated behavior: The increase in (c) Convergence of TDCFs for three different t = 500 ns (green), 950 ns (blue) and 1.05 μ s (red); one is coupled to increase in the other till very long time. Similarly, for Histidine: H68 and I44 with d α−α ~ 0.5 nm, the plots of χ 1 H68 of H68 and χ 1 I44 of I44 as functions of time ( Fig. 1(b)), reveal anti-correlated behavior even at long times. Now we proceed to quantify temporal correlations between these time series. We extract the TDCF between dihedral θ of residue i and θ′ of residue j in time interval Δt from equilibrated trajectory (see Supplementary Fig. S3). TDCF is denoted as , for a time interval Δt = t 2 − t 1 (see Methods, Eqs 1 and 2, Note 2). We compute TDCF from MD trajectory for three different sets of maximum time up to t = 500 ns, 950 ns and 1.05 μ s of the simulated trajectory. Since the correlation functions are computed from the dihedral values at two time intervals over trajectory, averaging at larger time interval gets better with longer time trajectory. We show (2) 1 1 ( Fig. 1(c)) and (2) 1 1 ( Fig. 1(d)) for three different cases. Both Fig. 1(c) and (d) show that data for trajectory up to t = 500 ns have differences with respect to larger time trajectories. However, data with trajectory up to t = 950 ns and 1.05 μ s are comparable, indicating saturation in the temporal behavior of the TDCFs.
We further report our analysis based on all the data for the longest trajectory, t = 1.05 μ s. Figure 2(a) and (b) bring out further non-trivial aspect of the TDCFs, exhibited by several dihedral pairs, despite large separation between the residues. We show TDCFs for both forward and reverse direction, obtained by interchanging i and j and θ and θ′ for the longest trajectory.
at Δt = 0. This decays with increasing time interval.
at Δt = 0 following which it decays to zero for large Δt. We also observe in Fig. 2 is different in forward and reverse directions, the decay time scales being different indicating asymmetry in TDCFs.
, the correlation function (for additional cases see Supplementary Fig. S4). For small s, F I13,F45 (χ 1 χ 1 ; s) has a maximum where there is a statistical correlation ( Fig. 2(a)), while F H68,I44 (χ 1 χ 1 ; s) < 0 and having a minimum in case of statistical anti-correlation ( Fig. 2(b)). The asymmetry in F i,j (θθ′ ; s) under interchanges of i and j and θ and θ′ is evident from Fig. 2 in Fig. 2(e) and (f) respectively. We observe that a strong correlation exists between peak values of TDCFs and θθ′ . However, the timescales are not correlated to θθ′ . This is not surprising, for statistical correlation coefficients do not contain temporal information.

Functional relevance
Our analysis yields a detailed map of correlated residue pairs R i and R j , as shown in Fig. 3(a). For a particular residue pair R i and R j we compute θθ ∆ i j for every possible pairs of degrees of freedom (dof), like ϕϕ ϕψ ϕχ ψϕ ψψ ψχ is maximum is considered to determine the direction of correlation in time domain, namely, if any perturbation at R i affects R j at a later time or vice versa. We generate a 76 × 76 matrix by noting θθ for all of the 76 residues of ubiquitin. By applying the condition of directionality we obtain the upper triangular matrix showing detailed TDCF map.
This map can be used to understand correlated path among the residues. Let us consider the terminal residue G76 which binds to AMP during Ub activation in ubiquitination. The dihedral ϕ of G76, ϕ G76 is correlated to R74 by ϕ R74 , ψ R74 and χ 1R74 both in forward and reverse direction. However, among all these correlated dihedrals ϕϕ is the largest which we take as an indication that G76 is downstream correlated to R74 via dihedral ϕ of both the residues. Similarly G76 is downstream correlated to other set of residues, like L73, L67, Glutamine: Q62, Tyrosine: Y59, L56, R54, Aspartate: D52, K48, F45, L43, Q41, Q40, D39, Proline: P38, K33, I30, K27, V26, Glutamate: E24, I23, D21, T12, T7 and V5. Among all the downstream correlated residues to G76 we find that R74 is having the shortest d α−α , which is the mean distance between C α atoms over the entire trajectory. Similarly, the closest downstream correlated residue to R74 is R72. In this way we construct the path of downstream correlated dihedrals of different residues, ϕ ϕ ϕ ψ ϕ , as shown in a snapshot of ubiquitin obtained from simulation ( Fig. 3(b)). Among these temporally correlated residues G76, R74 and R72 belong to the C terminal loop region. The residues V70, L69, H68 and L67 belong to β 5, while V5 belongs to β 1 in β strands of the crystal structure 2 . Crystal structure of ubiquitin activation enzyme-E1 loaded with Ub molecules indicates 4,5 that the hydrophobic surface patch of Ub including L8, I44, V70 and C terminal tail of Ub (G76, R74, R72) interact with the activation enzyme. The temporally correlated path with G76 contains many of the residues, like R74, R72 and V70. Besides, the slowest time scale in this path is that between G76 and R74, around 125 ns. This time scale is comparable to the rotational time scale of the enzyme which is about 90 ns obtained using the Stokes-Einstein 25 equation. Thus the path obtained using TDCF analysis is functionally relevant.
In order to get mechanistic view of long distance correlations, we calculate variance of the distances between residues belonging to the temporally correlated path. We compute var(d α−α ) which represents variance of d α−α . Similarly for backbone-side chain distances, we calculate var(d α−β ), where d α−β denotes distance between C α and C β atoms of the residue pair. For side chain dihedrals, we compute var(d β−β ), the variance of distance between C β atoms of the correlated pairs. We plot θθ | ′ | F ( ) i j max , versus these variances in Fig. 3(c). We observe that the large correlation amplitudes are clustered near smaller values of the variance. This indicates that dihedral dynamical correlations are destroyed by large fluctuations.
We compute transfer entropy from mutual information 26 between maximally correlated pairs of fluctuating degrees of freedom which constitute the functionally relevant path. We assign the directionality of entropy transfer between two residues by the larger magnitude of the transfer entropy for the correlated pairs in forward and reverse directions. For instance, in case of θθ ∆ , ϕ G76 and ϕ R74 are the maximally correlated dof both in forward and reverse directions (see Methods, Note 3). We find that mutual transfer entropy − 0.17 for ϕ R74 to ϕ G76 and that for the reverse direction is 0.55, indicating that the transfer of information takes place from ϕ G76 to (2) 1 1 for similar residues. The symbols have the same meaning in (e) and (f). ϕ R74 similar to experimental observations. We construct in similar way the direction of entropy transfer and time scales of the correlated dof over the path, as given in Supplementary Table S2. It is clear from the table that the directionality of path is not maintained between R74 and R72 where the entropy transfer takes place from R72 to R74. Moreover, the time scale of optimum mutual information is in sub-ns range, orders of magnitude shorter than biologically relevant time scales. Thus the TDCF describes the functionally relevant path in more reliably.

Mathematical Model
We model the qualitative behaviors of the TDCF in terms of equations of motion (see Methods, Eqs 3 and 4, Note 4 and for the details of calculations see Supplementary Note. S1) of two dihedrals θ i (t) and θ j (t) which are coupled to each other with strengths α′ and β′ respectively. Let the characteristic frequencies associated with them be ω i and ω j . They perform motions in a solvent experiencing drags proportional to Γ θ −  i i and Γ θ −  j j respectively 18 . We calculate the Laplace transformed correlation function from the equations of motion, θ θ 〈 〉 s s ( ( ) ( )) i j in frequency, s by averaging over initial conditions on the variables. We find that in s → 0 limit, θ θ θθ being the statistical correlation coefficient. There is thus maximum in the low s limit if the TDCF shows statistical correlation, , while a minimum for statistically anti-correlated TDCF with θθ′ . These are qualitatively similar to low s behaviour of the simulated TDCFs. We get an algebraic tail for large s, where the exponent is universal and independent of the parameters in the model. This universality is revealed by the simulated TDCF, albeit with exponent 1.0. The difference in the exponent may be due to simplicity of the model equations of motion where all effects are neglected except solvent drag and mutual coupling. Moreover, we find that so far as as seen in simulations. Direct probe of dynamical correlation among the dihedrals is difficult due to limitation of probes. However, our analysis suggests an indirect way of probing the dynamical correlations. Our analysis shows that the residues, like R74, R72 and V70 lie in dynamically correlated path with G76 where ubiquitination initiates. We expect these residues to play an important role in the process which can be tested experimentally. R72 is experimentally known to give specificity to the ubiquitin activation enzyme-E1 binding 4,5 . The role of the other residues needs to be looked into.
To conclude we show with long molecular simulations and mathematical modeling that TDCFs explain the causal connection between binding sites in Ub in the biologically relevant temporal regime. More importantly, our studies indicate that non-linearties are not the primary deciding factor for causal connections between functional sites. Although the simulations are illustrated for ubiquitin activation enzyme-E1 binding to Ub, the generality of our mathematical analysis shows that qualitative features of TDCF can be extended to any microscopic degrees of freedom. On a wider perspective, two point cross-correlation functions between relevant microscopic variables may provide a correct description of bio-molecular function and the related kinetics in terms of underlying microscopic dynamics without invoking the nonlinear effects.

Methods
Note 1: Details of MD Simulation. We perform MD simulation using NAMD at 310 K and 1 atm pressure, following standard protocols for NPT ensemble. We use TIP3P water model, periodic boundary condition and CHARMM27 27 force field with 1 femto-second time step. Electronutrality is maintained by adding proper number of mono-valent ions Na + and Cl − . Long ranged electrostatic interaction is included by PME 28 method. Energy minimization was done for first 10,000 steps and simulation was performed for 1.05 μs. Equilibration is ensured by RMSD plot over entire simulation time.