Protein tertiary structure and the myoglobin phase diagram

We develop an effective theory approach to investigate the phase properties of globular proteins. Instead of interactions between individual atoms or localized interaction centers, the approach builds directly on the tertiary structure of a protein. As an example we construct the phase diagram of (apo)myoglobin with temperature (T) and acidity (pH) as the thermodynamical variables. We describe how myoglobin unfolds from the native folded state to a random coil when temperature and acidity increase. We confirm the presence of two molten globule folding intermediates, and we predict an abrupt transition between the two when acidity changes. When temperature further increases we find that the abrupt transition line between the two molten globule states terminates at a tricritical point, where the helical structures fade away. Our results also suggest that the ligand entry and exit is driven by large scale collective motions that destabilize the myoglobin F-helix.


A. The Frenet Equation
The geometry of a class C 3 differentiable curve x(s) in R 3 is governed by the Frenet equation, described widely in elementary courses of differential geometry [1]. We parametrize the curve with its proper length s ∈ [0, L] where L is the length of the curve in R 3 . We introduce the unit length tangent vector the unit length bi-normal vector b = x s × x ss ||x s × x ss || (2) and the unit length normal vector, The three vectors (n, b, t) define the orthonormal, right-handed Frenet frames. We can introduce this framing at every point along the curve, whenever x s × x ss = 0 (4) The Frenet equation transports the frames along the curve as follows, Here is the curvature and τ (s) = (x s × x ss ) · x sss ||x s × x ss || 2 is the torsion. Both κ(s) and τ (s) are extrinsic geometric quantities i.e. they depend only on the shape of the curve in R 3 . Conversely, if we know the curvature and torsion we can construct the curve, by first solving for t(s) from the Frenet equation followed by integration of (1). The solution is unique, modulo a global translation and rotation.

B. Frame rotation
We start with the observation that the normal and bi-normal vectors do not appear in (1). As a consequence a rotation around t(s), has no effect on the curve. For the Frenet equation this rotation gives The form of (9) suggests to combine the two κ dependent contributions into a single complex quantity [2][3][4], We may then introduce the following notations/conventions when representing curvature and torsion in arbitrary frame, Here d is a parameter that we introduce for future convenience; for the Frenet equations we may set d = 2. With these variables, (9) admits the manifestly frame covariant form: with e ± = e 1 ± ie 2 ⇒ e ± → e ±iη e ± and we remind that t is frame invariant.

C. The Kirchhoff elastic rod and its generalizations
The curvature and torsion are the only quantities available to construct energy functions for filamentous, inextensible elastic rods. According to Kirchhoff the energy is [5] where α and β are some parameters. The case β = 0 corresponds to Euler's elastica; in a biological context this defines the worm like chain (WLC) model that is commonly used to describe long and flexible linear (bio)polymers [2] The energy function (13) describes the bending and twisting of a thin rod in the limit of very small curvature and torsion. But this energy function is not capable of describing phenomena such a supercoiling, nor structures such as helix-loop-helix that are common in case of proteins. For this we need to include higher order, non-linear contributions to (13).
To do this systematically, we need a guiding principle: Note that even though framing is a necessary intermediate step to construct the curve from the knowledge of its curvature and torsion, the shape of a curve can not depend on the way how it is framed. Indeed, the Frenet equations can be presented in the frame covariant form (12). Thus, the energy function should similarly admit a frame covariant form, one that is the same independently of the framing when expressed in the frame covariant variables (φ, A) in (11). An example of a frame covariant energy function is [2][3][4], The first two terms have the functional form of the Hamiltonian that appears in the Abelian Higgs model. They remain manifestly intact under a frame rotation (11).
The third term, with parameter a, is the one dimensional Chern-Simons term. It breaks chirality which ensures that the curves are chiral, either right-handed or left-handed depending on the sign of parameter a. Note that under a frame rotation this terms transforms by a derivative; see (11). Thus it remains invariant when there are no end point frame rotations.
The fourth term in (14) is called the Proca mass in the context of the Abelian Higgs model. It is not covariant under a frame rotation but we included it for completeness since it yields the second term in (13), in Frenet frames.
In term of the geometric curvature and torsion, the energy density of (14) translates to We introduce the Hasimoto variable [3,4,6], to combine the curvature and torsion into a single frame invariant complex quantity In terms of (16), we find that (15) includes the following, The energy (15) is a combination of H −2 , H 1 and H 3 , except for its last term, the Proca mass. From the perspective of the NLS hierarchy, the momentum H 2 should also be included so that at the end we have the energy density The standard NLS equation is the paradigm equation that supports solitons [7,8]; depending on the sign of λ the soliton is either dark (λ > 0) or bright (λ < 0). In particular, the torsion independent contribution supports the double well topological soliton: When m 2 is positive and when κ can take both positive and negative values, the equation of motion is solved by The energy function (19) is quadratic in the torsion. Thus we can eliminate τ using its equation of motion, and we obtain the following equation of motion for curvature, where This shares the same large-κ asymptotics, with the potential in (20). With properly chosen parameters, we expect that (23), (24) continue to support topological solitons, but we do not know their explicit profile, in terms of elementary functions.
The curve is the constructed as follows: Once we have the soliton of (23), we evaluate τ (s) from (22). We substitute the ensuing (κ, τ ) profiles in the Frenet equation (5) and solve for t(s). We then integrate (1) to obtain the curve x(s) that corresponds to the soliton. A generic soliton curve looks like a helix-loop-helix motif (more generally a regular secondary structure -a loop -a regular secondary structure), familiar from crystallographic protein structures.

II. POLYGONS AND GENERALIZED KIRCHHOFF ENERGIES
and the unit normal vector The orthonormal triplet (n i , b i , t i ) defines a discrete version of the Frenet frames (1)-(3) at each position r i along the chain.
In lieu of the curvature and torsion, we have their discrete analogues, the bond angles and torsion angles. When we know the vertices we also know the Frenet frames and we can compute these angles: The bond angles are and the torsion angles are Conversely, when the values of the bond and torsion angles are all known, we can use the discrete version of the Frenet equation (5) to compute the frame at position i + i from the frame at position i. Once all the frames have been constructed, the entire string is given by discrete version of (1), In the case of a protein, it is sufficient to take |r i+1 − r i | = 3.8Å; this is the average distance between neighboring Cα atoms. The bond oscillations are very fast, and over time intervals in the scale of microsecond the average values can be used.
In constructing the chain, without any loss of generality we may choose r 0 = 0, make t 0 to point into the direction of the positive z-axis, and let t 1 lie on the y-z plane.

B. frame rotations
The vectors n i and b i do not appear in (31). Thus, as in the case of continuum curves, a discrete chain remains intact under frame rotations of the (n i , b i ) zweibein around t i . This local SO(2) rotation acts on the frames as follows [9]  Here ∆ i is the rotation angle at vertex i and T 3 is one of the SO(3) generators Using these matrices we can write the effect of frame rotation on the bond and torsion angles as follows Since the t i remain intact under (32), the gauge transformation of (θ i , ψ i ) has no effect on the geometry of the discrete string.
A priori, the fundamental range of the bond angle is θ i ∈ [0, π] while for the torsion angle the range is ψ i ∈ [−π, π). Thus we identify (θ i , ψ i ) as the canonical latitude and longitude angles of a two-sphere S 2 . For practical purposes we find it useful to extend the range of θ i into negative values θ i ∈ [−π, π] mod(2π). We compensate for this two-fold covering of S 2 by a Z 2 symmetry which takes the following form: This is a special case of (33), (34), with ∆ k = π for k ≥ i + 1 ∆ k = 0 for k < i + 1

C. Generalized discrete Kirchhoff energy and solitons
The energy function used in the article is obtained by a direct naive discretization of (19), and by replacing curvature and torsion by the discrete bond and torsion angles [2,4,9] . In particular, we use Thus, which is the (θ, ψ) contribution to the energy function in Eqn. (4) of the article.
The conventional discrete NLS equation is known to support solitons [10]. Thus we expect that (36) supports soliton solutions as well: We follow (22) to eliminate the torsion angle, For bond angles we then have We set θ 0 = θ N +1 = 0, and V [θ] is given by (24). To solve this numerically, we use the iterative equation [2,11] where {θ (n) i } i∈N is the n th iteration of an initial configuration {θ (0) i } i∈N and is some sufficiently small but otherwise arbitrary numerical constant. We choose = 0.01, in our simulations. The fixed point of (39) is independent of the value of , and clearly a solution of (38).
Once the fixed point is found, the corresponding torsion angles are obtained from (37).
The frames are then constructed from (30), and the entire chain is constructed using (31).
We do not know of an analytical expression of the soliton solution to the equation (38).
But an excellent approximative solution can be obtained by discretizing the topological soliton (21) [2]: Here (γ 1 , γ 2 , µ 1 , µ 2 , s) are parameters. The µ 1 and µ 2 specify the asymptotic θ i -values of the soliton. Thus, these parameters are entirely determined by the character of the regular, constant bond and torsion angle structures that are adjacent to the soliton. In particular, these parameters are not specific to the soliton per se, but to the adjoining regular structures.
The parameter s defines the location of the soliton along the string. This leaves us with only two loop specific parameter, the γ 1 and γ 2 . These parameters quantify the length of the bond angle profile that describes the soliton. On the regions adjacent to a soliton, we have constant values of (θ i , ψ i ). In the case of a protein, these are the regions that correspond to the standard regular secondary structures.
For example, the standard right-handed α-helix is obtained by setting α − helix : and for the standard β-strand β − strand :  Figure 1 we show the (θ i , π i ) spectrum both for 1ABS, and for the multisoliton we have constructed; the Cα RMS distance between the two is around 0.8Å. In Table I   of computers is insufficient for any kind of serious all-atom folding simulations. In these models the folded configuration is presumed to be known; the individual atomic coordinates of the folded protein chain appear as an input. A simple energy function is then introduced, tailored to ensure that the known folded configuration is a minimum energy ground state; the energy could be as simple as a square well potential which is centered at the native conformation.
Since the positions of all the relevant atoms appear as parameters in these models, they contain more parameters than unknown and thus no predictions can be made. Only a description is possible. From the point of view of a system of equations, these models are over-determined. In any predictive energy function the number of adjustable parameters must remain smaller than the number of independent atomic coordinates.