Replica exchange molecular dynamics simulations reveal self-association sites in M-crystallin caused by mutations provide insights of cataract

Crystallins are ubiquitous, however, prevalence is seen in eye lens. Eye lens crystallins are long-lived and structural intactness is required for maintaining lens transparency and protein solubility. Mutations in crystallins often lead to cataract. In this study, we performed mutations at specific sites of M-crystallin, a close homologue of eye lens crystallin and studied by using replica exchange molecular dynamics simulation with generalized Born implicit solvent model. Mutations were made on the Ca2+ binding residues (K34D and S77D) and in the hydrophobic core (W45R) which is known to cause congenital cataract in homologous γD-crystallin. The chosen mutations caused large motion of the N-terminal Greek key, concomitantly broke the interlocking Greek keys interactions and perturbed the compact core resulting in several folded and partially unfolded states. Partially unfolded states exposed large hydrophobic patches that could act as precursors for self-aggregation. Accumulation of such aggregates is the potential cause of cataract in homologous eye lens crystallins.

Replica exchange molecular dynamics simulation. We performed REMD simulations of M-crystallin-WT and its three mutants (M-crystallin-SM, M-crystallin-DM and M-crystallin-TM). REMD is an enhanced simulation technique as it lowers the free energy barrier at higher temperatures by sampling high energy conformations which otherwise might not be accessible at a moderate temperature. In the REMD simulation, simultaneously multiple parallel runs were started at predefined temperatures which were derived from an exponential distribution. The exchange rate were maintained constant across all the replicas by keeping relatively more temperature gap at higher temperatures and lesser gap at lower temperatures. Likewise we considered sixteen replica temperatures spanning from 280 to 340 K which are 281.85, 285. 39 20 . Besides, there are number of other REMD studies which use similar temperature range [30][31][32] . The target temperature for our analysis is 300 K as it closely matches with the physiological temperature. The temperatures range for REMD simulation were derived from an exponential distribution in such a way that the temperature of interest (300 K) lies in the intermediate region so that there will be an equal probability of exchange between the neighboring replicas on either side of the target temperature. The replica exchange probability was 60% for all the REMD simulations. The conformational coordinates were exchanged between the neighboring replicas following the Metropolis algorithm 33 . This algorithm was evaluated at every 2 ps in order to facilitate the exchange of coordinates. When the Metropolis algorithm was satisfied, the exchange attempt was considered as successful and the coordinates between the neighbouring replicas were exchanged. The velocity of each atom was then rescaled to the changed target replica temperature and in this way REMD simulations were carried out.
Simulation details. The REMD simulations were run using AMBER 14 molecular modeling package 34 and AMBER FF99SB force field 35,36 . We also performed classical molecular dynamics (MD) simulations on M-crystallin wild type and its mutants at 339.88 K temperature which is the highest temperature of REMD simulation employing FF14SB force field 37 in TIP3P 38 explicit solvent. This is one of the recent force fields where www.nature.com/scientificreports/ backbone and side chain modifications are introduced. This force field shows overall improvement in ϕ and ψ sampling and secondary structural content 37 . The details of the explicit MD simulation method is given in the supplementary information (SI). We performed at least 120-145 ns of simulation for each of the protein. Generalized Born Onufriev, Bashford and Case implicit solvent model (GB-OBC) with igb = 5 option was used to treat solvent 39 . The advantage of using implicit solvent is that it enhances conformational sampling faster by reducing solvent viscosity 22,24,[40][41][42] . Certain GB implicit solvent model although associated with limitation 23 , combined use of GB implicit solvent model with REMD simulations provides results in good agreement with experimental findings [18][19][20]43 . The GB solvation model includes non-polar contribution of solvation energy implicitly 39 . The total energy of the solvated molecule in GB-OBC model is given as E vac + ΔG solv where E vac is the energy of the molecule in vacuum and ΔG solv is the free energy of transferring the molecule from vacuum to solvent. ΔG solv is composed of electrostatic (ΔG el ) and non-polar part (ΔG surf ). ΔG surf is proportional to total solvent accessible surface area of the molecule with proportionality constant derived from experimental solvation energies of small non-polar molecules. In GB-OBC model the effective radius of the atoms are rescaled in such a way that for deeply buried atoms it is large while for the expose atoms it is small. Thus, it minimizes the biases towards folded structure. All the simulations were done under NVT ensembles 44 . The force-field parameters, topology and coordinates were generated using tleap program of AMBER 34 . The side-chains of the amino acid residues which are polar and charged were adjusted to physiological pH. Energy minimization of the starting structures were carried out for 2000 cycles of which steepest decent energy minimization was performed for the first 1000 cycles, followed by conjugate gradient minimization for next 1000 cycles to minimize any atomic overlap. The possibility of unwanted rotation at high temperature was desisted by generating chirality constraints on the minimized structure. SHAKE algorithm was used to constrain bonds stretching freedom which involves hydrogen atoms 45 .
The non-bonded van der Waals potential cut-off was kept at 16 Å. Langevin thermostat was used to maintain the replica temperature by weak coupling with a collision frequency of 1 ps −1 . Equilibration molecular dynamic run was performed on the system for 200 ps. During this the temperature of each replica was increased gradually from 0 to the corresponding target temperature. Following equilibration, REMD simulation was started for the sixteen systems with 2 fs as integration time step. The exchange attempts between the neighboring replicas were made at every 2 ps. The REMD simulation were performed using Verlet integration algorithm. Multisander program of AMBER molecular modelling package was used to run the simulations 34 . A time step of 2 ps was used to write the coordinate and output files. The trajectory corresponding to each replica temperature was filtered using cpptraj program and backbone Rg and C α RMSD analyses were determined for all (SI Fig. S1) 34 . The trajectory corresponding to the physiological temperature of 300 K was used for the data analysis. The REMD simulations were performed in the high performance computing facility instituted at Tata Institute of Fundamental Research, Hyderabad and the National PARAM Supercomputing Facility (NPSF) of C-DAC.
Convergence. REMD simulations were performed for M-crystallin-WT, M-crystallin-SM, M-crystallin-DM and M-crystallin-TM for a total 0.8 µs. Backbone Rg and C α RMSD were monitored for all trajectories corresponding to each replica temperature which show a good convergence (SI Fig. S1). Additionally, we performed convergence measurements following Sawle and Ghosh 46 where number of clusters and cluster entropy were estimated as a function of time to ensure adequate conformational sampling. The cluster entropy was determined using − ∑P j log (P j ), where P j is the probability of jth observed cluster. The C α RMSD plots of all four simulations at 300 K show a steady value after 70 ns onward till the end (200 ns). Therefore, 70-200 ns stretch from each simulation is considered as equilibrated region for further analysis (SI Fig. S2, left panel). The stretch was then divided into 20 ns overlapping time segments for which number of clusters and the distribution of cluster entropy were determined (SI Fig. S2, right panel). For clustering the structures, GROMOS clustering algorithm 47 with a cut-off of 5 Å was used. This algorithm counts the number of neighbours under a specified cut-off and takes the structure with largest number of neighbours as cluster centre and all its neighbours as the cluster members and eliminates these structures from the pool of structures. The same procedure is repeated till the remaining structures in the pool are assigned to a cluster. The number of clusters as a function of time showed a marginal increase for M-crystallin-TM while for the rest of the simulations, it was almost constant. Further, the distribution of cluster entropy was steady for all the REMD simulations indicating convergence. Hence, the trajectories corresponding to 70-200 ns at 300 K were used for data analysis.
Analyses. For the analysis, we monitored several conformational parameters such as radius of gyration (Rg) of backbone atoms, root mean square deviation (RMSD) of C α atoms with respect to X-ray structure of M-crystallin (3HZ2), root mean square fluctuation of C α atoms (RMSF), network cluster layout, native contact analysis, surface charge potential determined using APBS tool of PyMol, principal component analysis, absolute entropy calculated for all the simulations following Schlitter's method 48 which make use the covariance matrix of C α atomic fluctuations, hydrophobic solvent accessible surface area per residue (rhSASA), ionic interaction and hydrogen bond following the criteria defined by Visual Molecular Dynamics (VMD) 49 . Native contacts were determined by an in-house program where C α -C α atoms in X-ray structure of M-crystallin separated at least by 3 residues (non-local contact) come closer than 6.5 Å were considered as a reference native contacts and how many of these contacts were present in a given conformation represents the number of native contacts. The percentage of native contacts of a given cluster was computed by taking average over all cluster member's percentage. Hydrophobic solvent accessible surface area per residue was calculated following rolling ball algorithm 50 where solvent of 1.4 Å radius was used to probe the expose surface of a hydrophobic residue which was determined using VMD. The continuous hydrophobic patch area was calculated by selecting the connected hydrophobic residues in van der Waals representation and determining their hydrophobic solvent accessible surface area together following the same rolling ball algorithm. The ionic interactions were calculated by deter- www.nature.com/scientificreports/ mining all the possible distances between the positive (NE, NH1, NH2) and negative charged (OD1 and OD2) sidechain atoms of the interacting residues and an ionic interaction was considered to be present if any one of the six distances was less than 4 Å at a given time. The standard errors in ionic and hydrogen bonding interactions were determined by dividing the equilibrium segments into intervals of 10 ns from which mean and standard deviation were computed. Trajectories visualization were done using VMD software. Data analyses were done using VMD tcl script, Matlab 51 and graphs were prepared using Xmgrace 52 .
Network cluster layout. The network cluster layout is used to visualize the major conformational clusters and the connectivity between them. The ensemble of conformations resulted from the equilibrated region of each simulation were grouped into distinct clusters based on their conformational similarity using network clustering method. The layout was generated from the pairwise C α RMSD matrix of the ensemble of conformations. A pairwise cut-off value was used to establish connections between the conformations. A network cluster layout was built from nodes and links. Each node represents a conformation and the link connecting the nodes were built based on the chosen pairwise cut-off. Choosing the right cut-off is crucial because taking a larger cut-off can result a single cluster while taking a smaller cut-off can lead to fragmented clusters. So, we chose a cut-off close to the mean of pairwise C α RMSD distribution as described by Ahlstrom et al. 53 . Profuse-force-directed algorithm 54 implemented in Cytoscape version 3.5.1 55 was used for its construction and visual representation. Nodes were assigned with similar repulsive charges while links were treated as springs of attractive forces having same spring constants. The entire system was considered as pseudo-physical system and was subjected to minimization during which more links of attractive forces can overcome the repulsive charges of nodes resulting in a well disperse network layout. In the resulted network the highly connected nodes form a single cluster while sparsely connected nodes stay apart. Average linkage algorithm implemented in MATLAB was used to generate the centroid structure of a cluster 56 .

Results and discussion
Choice of specific mutations in M-crystallin. Crystallins are well known for their unusual stability.
Stability comes from specific arrangement of the double Greek key motifs. βγ-crystallins from microbial origin undergo further stabilization upon binding to Ca 2+ while eye lens crystallins are stable without Ca 2+ . Microbial βγ-crystallins possess canonical motifs N/D-N/D-X 1 -X 2 -S/T-S to coordinate with Ca 2+ . In M-crystallin the canonical motifs are 32-NDKISS-37 and 75-DNSISS-80 in the N-and C-terminal Greek keys respectively (Fig. 1A). The X 1 position of canonical motifs (K34 and S77) provide direct coordination sites for Ca 2+ through its mainchain carbonyl. In order to understand the effect of mutations at X 1 position in M-crystallin, we carried out K34D and S77D mutations (Fig. 1B). Several βγ-crystallins possess buried Trp and/or Tyr residues in the core surrounded by aromatic and hydrophobic residues as in M-crystallin (SI Fig. S3) which are conserved in many crystallins. Studies reveal that the Trp corner and Tyr corner are important for the βγ-crystallin domain stability 9 . In human γC and γD crystallins, there are four Trp residues, W42 and W68 in the N-terminal domain surrounding Y62 corner and W131 and W157 in the C-terminal domain surrounding Y151 corner are buried in the core (Fig. 1C). Further, they are surrounded by several other aromatic and hydrophobic residues in the core to provide enormous stability to the eye lens crystallins 14,57 . W42R mutation in γD crystallin causes congenital cataract 27 . Therefore, we mutate the conserved W45R in M-crystallin (W45 here corresponds W42 in γD Crystallin) to unravel the effect of mutation on βγ-crystallin stability and aggregation propensity (Fig. 1B).

Conformational flexibility in the mutants of M-crystallin.
Root mean square fluctuation measures the average fluctuations of each C α atoms. Thus, it indicates flexibility of a given residue and identifies the highly flexible regions. The average RMSF for M-crystallin-WT is 1.4 Å for most of the residues except for the N-and C-terminal ends, loop1 between β 2 and β 4 strands and loop2 between α 1 and β 8 (Fig. 1D). The RMSF values are mapped onto the starting structure of M-crystallin-WT and represented as sausage plot which clearly indicates that loop1 and loop2 are highly flexible (Fig. 1E). In M-crystallin-SM the average RMSF value is 1.9 Å which is marginally higher than that of M-crystallin-WT. Sausage plot identifies similar regions as highly flexible region in M-crytallin-SM and follows almost similar trend as M-crytallin-WT. The only difference observed in Greek key (small dashed) and C-terminal Greek key (bigger dashed). The signature sequence of N-terminal Greek key is shown in blue lined boxes and C-terminal Greek key in red lined boxes. Amino acid code in blue: negatively charged residues; magenta: positively charged residues; red: hydrophobic residues; green: polar residues. The starting structures M-crystallin-WT and mutants used in the REMD simulations and explicit solvent MD simulations (B). Mutations are made in the hydrophobic core (W45R) and in the Ca 2+ binding sites K34D and S77D are shown in purple stick representation. X-ray structure of γD-crystallin from human eye-lens (1HK0) displaying N-and C-terminal domains, inter-domain interface, and arrows pointing the inactive Ca 2+ binding sites, two Tyr corners and four Trp residues (C). RMSF of C α atoms in all four simulations (D). The relative flexibility are mapped onto the starting structure of each simulation and represented as sausage plots with B-factor colour codes (E www.nature.com/scientificreports/ M-crytallin-SM compared to M-crystallin-WT is that the N-terminal end is highly flexible (Fig. 1D,E). M-crystallin-DM shows relatively higher fluctuations compared to M-crystallin-WT and M-crystallin-SM with an average value of ~ 2.4 Å. For loop1 and loop2 the fluctuation is even higher (Fig. 1D,E). The M-crystallin-TM shows significantly large fluctuation with an average value of ~ 4.2 Å across the polypeptide chain except for the β 1 , β 5 , β 6 and β 8 strands. Here, the N-terminal end and the loop1 show dramatic increase in RMSF (Fig. 1D,E). Thus, overall higher flexibility is observed for loop1 and loop2 of all simulations which is remarkably large in M-crystallin-TM. RMSD, Rg and RMSF are also determined for these proteins from explicit solvent MD simulations performed at 339.88 K temperature (SI Fig. S4). Explicit solvent simulations are known to provide accurate results when performed for a longer time of microsecond to millisecond. In the present study simulations are carried out for about 120-145 ns which is considerably less in time scale required for equilibration in explicit solvent simulations. In spite of short simulations, we obtained variation in RMSF especially in the loop region which is comparable to GB-OBC implicit solvent model where M-crystallin-TM shows higher flexibility, followed by M-crystallin-SM, M-crystallin-DM and M-crystallin-WT which shows the lowest (SI Fig. S4). Similar findings are observed from eigenvector analysis which also shows highest magnitude of concerted motion (11.1) for M-crystallin-TM while lowest (2.8) for M-crystallin-WT as described in SI (Fig. S5). The absolute entropy calculated using Schlitter method 48  Conformational clusters in mutants and wild type M-crystallin. The C α RMSD distribution determined for all the simulations illustrate that single narrow distribution for wild type while wider distributions are observed for mutants (details in SI, Fig. S6). Free energy landscape shows more numbers of distinctive minima for mutants than the M-crystallin-WT (details in SI, Fig. S7). In the network analysis, we observed single cluster accounting 100% in M-crystallin-WT. The centroid structure of the cluster is similar to the X-ray structure of M-crystallin suggesting sampling of βγ-crystallin fold (Fig. 2). In M-crystallin-SM where W45 residue is mutated to R45, the number of clusters observed are four. Out of which cluster 1 is the major cluster with 92% of population, cluster 2 is about 6% while the rest two are minor clusters having 1.4 and 0.6% of population (Fig. 2). Cluster 1 has βγ-crystallin-like fold whereas cluster 2 has deformed loops but has βγ-crystallin-like topology. The two minor clusters sample partially unfolded conformations where the central compact core is lost however βγ-crystallin-like topology is retained. In M-crystallin-DM, the network layout also shows four conformational clusters. Of which, cluster 1 is the major cluster having 93% while cluster 2 is only 4% and the rest two minor clusters are 2% and 1% of population, respectively (Fig. 2). Native contact analysis segregates conformational clusters into folded and unfolded states. Native contact percentage is used to identify whether a given cluster is folded like a βγ-crystallin or unfolded. If the percentage of native contact of a given cluster is more than 60%, it is considered as folded or else it is unfolded and are labelled as F and U respectively as shown Fig. 3A. The single cluster of M-crystallin-WT possesses 76.5 ± 4.3% of native contacts suggesting all the conformations are well folded and have βγ-crystallinlike fold (Fig. 3A) (Fig. 3A). Native contact analysis reveals that the observed unfolded states possess ~ 40-50% of native contacts and thus are not completely unfolded like random coil but are partially unfolded states.
Structural insights into the folded and partially unfolded states. The X-ray structure of M-crystallin has βγ-crystallin fold which has two Greek key motifs formed by two β-sheets. In this arrangement, three (β 5 , β 6 and β 8 ) of the four β-strands of C-terminal Greek key form one β-sheet and the remaining one, β 7 -strand pairs with three β-strands (β 1 , β 2 and β 4 ) of N-terminal Greek key forming the second β-sheet which are tightly linked by several interactions (Fig. 3B and SI Fig. S3). β 3 -strand is not found in M-crystallin instead a long loop1 is present in the N-terminal Greek key. There is a second loop2 in the C-terminal Greek key which extend and include unwound α-helix and β 7 -strand in the unfolded states (Fig. 3B). There are several interlocking Greek key interactions holding loop1 and β 8 -strand, and loop1 and loop2 similarly holding β 7 and β 4 -strands together (Fig. 3B). Besides, it has several intra-β-sheet interactions in addition to the strong hydrophobic core holding the two opposite β-sheets (SI Fig. S3 and Fig. 3B). The interlocking Greek key interactions play key role for the    www.nature.com/scientificreports/ stability of βγ-crystallin fold as unlocking these interactions can unwind the interlocked Greek keys. Although, there are several unfolded states observed in the mutants broadly they can be divided into two main types. In one type, the two interlocked Greek keys are widely separated by losing all interlocking interactions. It has only β 1 , β 4 -strands and loop1 from N-terminal Greek key and β 5 , β 6 -strands and loop2 from C-terminal Greek key (Fig. 3B, bottom left). In the second type of unfolded state, the central hydrophobic compact core is lost however it is held by loop1 and β 8 -strand, β 4   www.nature.com/scientificreports/ determined. However, we did not find any correlation between unfolded cluster and reduced number of hydrogen bond because certain clusters which were unfolded also have more number of hydrogen bonds than the folded one. Such situation can arise only when there are more number of non-native hydrogen bonds than the native one. Further, we determined hydrogen bonds and ionic interactions contributed by the residues undergone mutations (Table S1). Out of seven native interactions only three are observed in M-crystallin-SM, one is observed in M-crystallin-DM and none of the native interactions are observed in M-crystallin-TM suggesting non-native interactions also contribute in unfolding and drive towards partially unfolded states.
Partially unfolded states display large hydrophobic solvent accessible surface area. We calculated hydrophobic solvent accessible surface area per residue in five known single domain βγ-crystallin proteins (M-crystallin, Hahellin, Flavollin, Clostrillin and Ci-crystallin) from protein data bank (Fig. 4). In spite of sequence difference, the common patterns emerged from these βγ-crystallins are that the completely buried hydrophobic residues are located at the junction of the two Greek key motifs (Fig. 4). Secondly, the hydrophobic residues located in the N-terminal Greek key motif are relatively more exposed than the C-terminal Greek key motif. These findings clearly indicates that there is a characteristic pattern of hydrophobic burial in βγ-crystallin fold which is probably necessary for the βγ-crystallin domain stability. Alteration in the characteristic pattern of rhSASA might cause unfolding of the βγ-crystallin domain. Subsequently, the rhSASA is estimated for folded (F) and unfolded (U) conformational states of a given simulation which are grouped so based on their native contact percentage (Fig. 5A). In M-crystallin-SM, the rhSASA of the residues V68, A71, I73 and F81 are relatively more exposed in unfolded state compared to its folded counterpart (Fig. 5A). These residues are located in the loop2 region of C-terminal Greek key. Similarly, in M-crystallin-DM hydrophobic exposure is observed in the similar region corresponding to the residues L60, V68, A71, I73, I78 and F81. In addition, there are two more regions corresponding to the residues I35, I38, V40 and F47 located at the junction of N-and C-terminal Greek keys and residues A3 and V5 of N-terminal Greek key are showing higher values of rhSASA in the partially unfolded states compared to that of folded state. Similarly, in M-crystallin-TM similar regions as observed in M-crystallin-DM are exposed (Fig. 5). These residues showing higher values of rhSASA in the unfolded state are mapped onto the centroid structure of folded and unfolded clusters as shown in Fig. 5B. There is a remarkable difference in the hydrophobic surface exposure of a folded and unfolded conformations. The folded conformations from wild type, mutants and X-ray structure show, a significant burial of hydrophobic residues while on the partially unfolded states there is conspicuous exposure of hydrophobic residues which form continuous hydrophobic patches of the size ~ 500 to 700 Å 2 (Fig. 5C). These sites can act as attachment sites for self-aggregation into higher molecular weight aggregates and thus can lead to cataract in homologous eye-lens crystallins. In a recent study, γS-crystallin-G18V mutation is reported to cause cataract 58 . In that study, ANS fluorescence assay and NMR method are used to probe hydrophobic exposure of γS-crystallin-G18V mutant which shows non-specific ANS binding indicating significant hydrophobic exposure.

Causes of conformational change in the mutants of M-crystallin.
We observed a modest change in RMSF of M-crystallin-SM compared to M-crystallin-WT while M-crystallin-DM shows relatively higher fluctuations and M-crystallin-TM shows remarkably higher fluctuations ( Fig. 1D and SI Fig. S4). In terms of hydrophobic surface area exposure M-crystallin-DM and M-crystallin-TM show relatively larger area than M-crystallin-SM. In M-crystallin-SM unfolding is in the central core region due to W45R mutation which makes the structure floppy. However, it does not unfold completely separating the two Greek keys apart probably due to formation of an alternate weak hydrophobic cluster which causes least exposure of hydrophobic residues. The formation hydrophobic cluster is evident from Fig. 5B and C where centroid structure of cluster 1 as well as cluster 3 of M-crystallin-SM display lesser hydrophobic surface area while majority of the hydrophobic residues are still buried in the loose core. A close look at the centroid structures of M-crystallin-SM shows βγ-crystallin-like topology for all its clusters (Fig. 2).  . The light blue shaded boxes indicate the residues with higher hydrophobic exposure in unfolded states. These residues are mapped on the centroid structure (cluster number indicated below the centroid structure) in blue van der Waals spheres, the rest hydrophobic residues in cyan and residues other than the hydrophobic residues are in green color (B). Two color codes are used to indicate hydrophobes (purple) and nonhydrophobes (green) to indicate the hydrophobic patches (C www.nature.com/scientificreports/ hydrophobic surface area exposure of the mutants. The destabilizing effect of K34D and S77D mutation can easily be verified experimentally in the future. In Hahellin presence of Asp residues at the similar position in the Ca 2+ binding sites destabilize the entire protein 20 . That is why in the absence of Ca 2+ Hahellin adopts intrinsically disordered states 20 . Further, addition of two positively charged Arg residues (S41R and S80R) at both the Ca 2+ binding sites reverts the intrinsically disordered states of Hahellin to a well-folded ordered state 16 .

M-crystallin wild type is well folded even in the absence of Ca 2+ .
In M-crystallin both Ca 2+ bound form (holoform) and unbound form (apoform) are well folded as revealed by NMR study 8 . However, they are not equally stable. Thermodynamic study of apoform and holoform of M-crystallin reveals that holoform is more stable than apoform. The unfolding temperature midpoint (T m ) for apoform is 55 °C while for holoform it is 71 °C. The enthalpy changes at the respective T m are 69 and 102.2 kcal mol −1 for apo and holoform respectively 8 . In our REMD simulation, M-crystallin-WT sampled homogenous single conformational form indicating stability of apoform as seen in the NMR study 8 . Similar stability of apoform is also observed in tunicate βγ-crystallin 62 , protein S 63 where removal of Ca 2+ has no effect in CD signature and fluorescence spectra, indicating it is a common phenomenon in the βγ-crystallins which possess high intrinsic stability. The eye-len γB-crystallin too shows a weak affinity for Ca 2+ but are highly stable 10 . The Ca 2+ binding affinity studied from isothermal titration calorimetry (ITC) reveals that it is low for M-crystallin with a dissociation constant (K d ) of about 80 µM, similar to the affinity seen in the eye-lens βγ-crystallins (70-100 µM) 8 . These Ca 2+ binding affinities are much lower than the Ca 2+ binding affinity seen in microbial βγ-crystallins which are in the range of 10-30 µM 10 . Since, M-crystallin, protein S, eye lens crystallins, etc. possess weaker affinity for Ca 2+ absence of Ca 2+ does not make a large difference in the structural stability of βγ-crystallin fold.
Aromatic residues play a key role in βγ-crystallin stability. Trp and Tyr corners play important role in providing stability to the βγ-crystallin fold. Generally, the corner residues Trp and/or Tyr are located at the beginning or end of an anti-parallel β-strand which is hydrogen bonded in case of Tyr and stabilizes the β-barrel structure of the Greek key motif 14,64 . In M-crystallin, W45 forms the Trp corner and Y65 forms the Tyr corner, and are located in the hydrophobic core surrounded by several aromatic (6-Y, 2-W and 4-F) and hydrophobic residues (SI Fig. S3) 8 . In the homologous γC (14-Y, 4-W and 4-F) and γD-crystallins (14-Y, 4-W and 6-F) there are a total of 22 and 24 aromatic residues respectively. The Tyr corners in the N-and C-terminal domains of γC and γD-crystallins are Y63 and Y151 (Fig. 1C) 14 . The four Trps residues in human γC and γD-crystallins (positions 42, 68, 131 and 157) are buried within the hydrophobic core in the interior of the protein, leading to a compact of structure. W42R mutation in human γD-crystallin is associated with congenital cataracts 27 . X-ray structure of W42R reveals that minimal changes are observed in the structure w.r.t. wild type γD-crystallin. The W42R mutant of γD-crystallin produces a small population of partially unfolded states which are in chemical exchange with folded state as revealed by NMR 27,28 . In line with this, the conserved W45R mutation in M-crystallin generates folded and partially unfolded states in our simulation. It is presumed to be driven by formation of altered hydrophobic core triggered by W45R mutation. In another multiscale atomistic simulation of γD-crystallin having W42R mutation, adopted a distinct conformation in solution where Greek key domains are more or less intact while a large perturbation is observed in the inter-domain interface region. This mutant shows a large hydrophobic exposure and can acts as primary sites for aggregation 65 . In another simulation study, W42 is mutated to polar residues such as Lys and Arg which denatures the Greek key domain as solvent enter into the core and causes hydrophobic exposure of the core residues 66 . In all these studies W42 mutation leading to perturbation of hydrophobic core is accompanied by hydrophobic exposure which is a common event irrespective of different independent studies.
Indirect relation between highly flexible region of M-crystallin and the region of highest hydrophobic exposure. The mutants of M-crystallin show high flexibility for the N-terminal motif as evident from RMSF and eigenvector analysis. However, this highly flexible region does not correlate directly with the highest hydrophobic exposure. Most of the hydrophobic residues at the N-terminal Greek key do not show any difference in rhSASA between folded and unfolded conformations (Fig. 5) while significant difference is seen for the C-terminal Greek key as most of the hydrophobic residues in this region are exposed in partially unfolded states while they are buried in the folded state. The flexibility in the N-terminal is enhanced by breakage of several interactions connecting loop1 and β 8 -strand and loop1 and loop2 which in turn cause exposure of the C-terminal Greek key hydrophobic residues and can drive intermolecular aggregation. Recently, NMR study and dynamics light scattering (DLS) data reveal formation of oligomer in M-crystallin as a result of lowering the temperature which is accompanied by large dynamics in the Ca 2+ binding sites 15 . There is an intricate relation between unfolding and aggregation 5,67,68 . V75D, W42R mutations in γD-crystallin 28,65,66,69 , G75V mutation in γS-crystallin 69 , S228P in βB1-crystallin 70 are associated with partial unfolding leading to hydrophobic exposure. Our study affirm that partially unfolded species are formed due to mutations which can serve as precursors for aggregation and can lead to cataract.

Conclusion
In our REMD simulations of M-crystallin wild type and its three mutants, the dynamic nature of loop1 of N-terminal Greek key is clearly evident. In M-crystallin-WT no dramatic conformational change is observed because of optimal packing of hydrophobic residues in the core, and presence of several interlocking Greek key interactions. On the other hand in all mutants of M-crystallin, formation of heterogeneous mixture of folded and partially unfolded states are observed. The partially unfolded states are mainly of two types. In one type, the two Greek key motifs are apart without any significant interactions between the two Greek keys while in second type, www.nature.com/scientificreports/ the central hydrophobic core is largely perturbed and the compact core is lost but still held by several interactions from both the Greek keys especially between loop1 and loop2 and between loop1 and β 8 -strand. In either types of unfolded states, the buried hydrophobic residues are exposed giving rise to large hydrophobic patches. These hydrophobic patches are mainly contributed by the hydrophobic residues located at the junction of both Greek keys and at the C-terminal Greek key motif. These hydrophobic patches can provide attachment sites for association into higher order molecular aggregates. Thus, hydrophobic patches on the partially unfolded crystallin are the main determinant of βγ-crystallin aggregation. Interactions restricting the loop dynamics and promoting the strength of hydrophobic core can reduce the hydrophobic exposure and thus can prevent the aggregation of crystallin which is the prime cause of cataract.