Atomistic modeling of liquid-liquid phase equilibrium explains dependence of critical temperature on γ-crystallin sequence

Liquid-liquid phase separation of protein solutions has regained heightened attention for its biological importance and pathogenic relevance. Coarse-grained models are limited when explaining residue-level effects on phase equilibrium. Here we report phase diagrams for γ-crystallins using atomistic modeling. The calculations were made possible by combining our FMAP method for computing chemical potentials and Brownian dynamics simulations for configurational sampling of dense protein solutions, yielding the binodal and critic temperature (Tc). We obtain a higher Tc for a known high-Tc γ-crystallin, γF, than for a low-Tc paralog, γB. The difference in Tc is corroborated by a gap in second virial coefficient. Decomposition of inter-protein interactions reveals one amino-acid substitution between γB and γF, from Ser to Trp at position 130, as the major contributor to the difference in Tc. This type of analysis enables us to link phase equilibrium to amino-acid sequence and to design mutations for altering phase equilibrium.

potential was calculated by the FMAP method 13 .Finally, the most realistic but also most expensive approach is one where both protein and solvent molecules are represented at the AA level and allowed to be flexible.We have reported the binodal of a tetrapeptide using this approach 12 .AA explicit-solvent simulations have also been used to calculate potentials of mean between pairs of amino acids or for the dissociation of a short peptide from a tetrameric assembly to provide mechanistic insight into the effects of salt and bsheet clusters on phase separation [14][15][16] .
Our goal here is to model the sequence dependence of the binodals for the phase separation of structured proteins.For quantitative predictions of sequence dependence, an AA representation of the proteins is required.However, it is also clear that a full AA S3 model with molecular flexibility is not feasible.We thus chose the next best level, i.e., an AA representation but with a rigid treatment of the proteins along with implicit solvent.

Interaction energy function.
The next issue to be determined is the energy function.
Energy functions for implicit-solvent models are generally written as the sum of molecular mechanics (MM) terms and solvation ("solv") terms: The MM terms typically include van der Waals interactions and Coulomb interactions, whereas the solvation terms include nonpolar solvation and polar solvation.For example, in the MM/GBSA model 17 , the polar solvation term is determined by the generalized Born model, whereas the nonpolar solvation term is the product of the solvent accessible surface area and a constant (equivalent to the protein-solvent interfacial tension).
Even with the protein treated as rigid and solvent treated implicitly, calculating the chemical potential for determining the binodal for an AA representation of the protein is still very expensive.This calculation was made feasible by the FMAPµ method 13 .The key idea behind this method is to express interaction energy terms as correlation functions, which are then evaluated via fast Fourier transform (FFT).The FFT route places certain restrictions on the forms of the interaction energy terms.Partly on account of this restriction, we combined the van der Waals interaction and nonpolar solvation terms and assumed a scaled Lennard-Jones potential for their combined contribution (Eqs [4] and [6]).Likewise, we combined the Coulomb interaction term and polar solvation into a scaled Debye-Hückel potential (Eqs [4] and [7]).
In comparison, the more sophisticated Rosetta energy function 18 comprises weighted terms including a Lennard-Jones potential, a desolvation term, an electrostatic term using a distance-dependent dielectric constant, an orientation-dependent hydrogen bonding term, and statistical potentials for backbone and sidechain torsional preferences.

Factors that affect FMAP-based calculation results. The components in our
procedure for calculating the binodal, in particular the FMAPµ method, have been carefully validated to ensure numerical accuracy 19 .Still, other factors can affect the calculation results and their accuracy.One is the parameterization of the interaction energy function and another is the configurational sampling used for chemical potential calculation.
Regarding parameterization, two notable parameters are the scaling constants,  & and  ' , for the nonpolar attraction and electrostatic terms (see Eq [4]).In a 2019 paper 20 , we carried out this parameterization using experimental results for the second virial coefficients of proteins.Here we have also explored the effects of these two scaling constants (Supplementary Fig. 7).The default values for  & and  ' were set to 0.16 and 1.6, respectively.The resulting second virial coefficient for gB agrees well with experimental data (Fig. 4A inset).However, there are also other combinations of  & and  ' , by either giving a higher weight to the nonpolar attraction but with a commensurate reduction in the weight for the electrostatic term or vice versa, that produce a similar agreement (Supplementary Fig. 7a).For all these combinations of  & and  ' , a higher Tc is predicted for gF than for gB (Supplementary Fig. 7b).
As for configurational sampling, we have taken great pains to achieve convergence.
As stated in the main text, each  () calculation is based on 1.57464 ´ 10 14 interaction energies.So within the limitations of the model (i.e., rigid proteins and implicit solvent), the calculated results are accurate and the conclusions drawn are robust.

Possible treatment of protein flexibility in binodal calculations on all-atom models.
We are not aware of any binodal calculations based on AA flexible protein models with implicit solvent (hatched box in Supplementary Fig. 2).Full treatment of flexibility seems extremely challenging at this time, but it might be feasible to include a limited extent of flexibility.One possibility is to allow the protein molecules to sample a S5 preselected library of structures.This possibility was explored in Monte Carlo simulations of many-protein systems 21 .As an initial step toward sampling multiple conformations in chemical potential calculations for dense protein solutions, we tested conformational sampling in calculations of the second virial coefficient.The preselected library consisted of all the high-resolution (2.3 Å or better) X-ray structures in the Protein Data Bank (PDB) for a given g-crystallin.For example, there are four such structures available for rat gE.As shown in Fig. 7, the  ' / "* values calculated from the individual PDB structures range from -1.8 to -2.4.This range demonstrates that input structures have nonnegligible effects on the calculated B2 results.A simple way to remove the uncertainty in B2 arising from the choice of input structure is to assume that the protein samples all these structures with equal probability.The resulting B2, by averaging values from the 4 ´ 4 = 16 binary combinations of structures, is -2.0 ± 0.2 (Supplementary Fig. 11).
The structures in the preselected library could have both global changes and local changes (e.g., sidechain rotation), but the number of structures is limited by the amount of computational time that can be afforded.In the foregoing B2 calculations, the computational time is proportional to the number of structure pairs included.For proteins like g-crystallins that do not show any sign of large conformational changes (e.g., as evidenced by close similarities between multiple X-ray structures), it may be more important to sample sidechain rotations that are induced by intermolecular interactions in a dense protein solution.We thus tested for the effects of sidechain repacking using RosettaDock 22 , on the 1,000 lowest interaction energy poses from FMAPB2 calculations.
We obtained the Rosetta interaction energies either after a simple energy minimization (Supplementary Fig. 10a) or after subsequent sidechain repacking (Supplementary Fig. 10b).As expected, the gF poses have lower FMAP interaction energies than the gB poses, as illustrated by an energy gap of 0.20 kcal/mol at CDF = 100 (Supplementary Fig. 10c).
Interestingly, the gap widens to 0.59 kcal/mol according to the Rosetta energy function,

S6
and widens even further to 0.92 kcal/mol upon sidechain repacking (Supplementary Fig. 10d).These results demonstrate two points.First, a more sophisticated energy function like Rosetta may more accurately capture the gap in Tc between gF and gB.Second, sidechain repacking could contribute to the gap in Tc.