Ligand-induced conformational transitions in the activation mechanism of SHP2 tyrosine phosphatase

SHP2 is a protein tyrosine phosphatase with a key role in multiple signaling pathways, including the RAS-MAPK cascade. Germline mutations in PTPN11, the gene encoding SHP2, occur in 50% of individuals affected by Noonan syndrome, whereas somatic mutations in this gene cause more than 30% of cases of juvenile myelomonocytic leukemia (JMML), and are variably found in other pediatric hematologic malignancies and tumors. Inhibition of the wild-type protein has been recently demonstrated as a novel effective therapeutic strategy for many forms of cancer. Under basal conditions, wild-type SHP2 is inactive because its N-terminal Src homology 2 (N-SH2) domain blocks the active site of the protein tyrosine phosphatase (PTP) domain. Previous work established that binding of a phosphopeptide ligand to the N-SH2 domain causes SHP2 activation by favoring dissociation of the N-SH2 and PTP domains, however the conformational transitions of N-SH2 controlling ligand affinity and PTP dissociation are not well understood. Based on molecular simulations and free-energy calculations, we revealed a 20Å-spanning allosteric network restraining the N-SH2 domain to two distinct states, a SHP2-activating and a stabilizing state. We find that ligand binding is necessary but not sufficient for SHP2 activation, as only ligands selecting for the activating conformation of N-SH2, depending on ligand sequence and binding mode, are predicted to be effective activators. The proposed model of SHP2 activation is further validated by rationalizing modified basal activity and responsiveness to ligand stimulation of N-SH2 variations at residue Tyr42. This study provides mechanistic and energetic insight into SHP2 activation and may open novel routes for SHP2 regulation.


INTRODUCTION
Reversible tyrosine phosphorylation is a post-translational modification reciprocally controlled by protein-tyrosine kinases (PTKs) and phosphatases (PTPs), which plays a key role in regulating cell proliferation, differentiation, migration, and apoptosis as well as cellcell communication. Abnormal control of tyrosyl phosphorylation can result in various human diseases, including cancer. 1,2 Among PTPs, the SH2 domain-containing phosphatase SHP2 is a regulator of signaling downstream of several cell surface receptors, functioning as positive or negative modulator in multiple signaling pathways. 3 Notably, SHP2 is required for full activation of the RAS/MAPK signaling cascade, and dominantly acting mutations of PTPN11, the gene encoding SHP2, cause developmental disorders (i.e., Noonan syndrome 4,5,6,7 and Noonan syndrome with multiple lentigines 6,8 ). Somatic mutations in PTPN11 contribute to childhood malignancies, among which juvenile myelomonocytic leukemia (JMML) represents the archetypal disorder resulting from RAS signaling upregulation. 5,9,10,11,12 Recent studies suggested SHP2 inhibition as a promising strategy for treating a large class of receptor tyrosine kinase-driven cancers 13 and for combating resistance to targeted anticancer therapies. 14,15,16 In addition, SHP2 plays a role in Helicobacter pylori-induced gastric cancer mediated by activation by the bacterial protein CagA, 17 and SHP2 is responsible for the suppression of T-cell activation by programmed cell death-1 (PD-1), a receptor hijacked by tumor cells for evading the immune response. 18 All these recent discoveries indicate SHP2 as an attractive target in future anti-cancer therapies. 19,20,21 The structure of SHP2 includes two tandemly-arranged Src homology 2 domains, called N-SH2 and C-SH2, followed by the catalytic PTP domain, and a C-terminal tail with a still poorly characterized function. The SH2 domains are recognition elements that allow SHP2 to bind to signaling partners containing a phosphorylated tyrosine (pY). 22 Crystal structures have revealed an allosteric regulation of SHP2 activity. 23 Namely, under basal conditions, SHP2 is in an autoinhibited state where the blocking loop of the N-SH2 domain occludes the catalytic site of the PTP domain. Association of SHP2 to its binding partners via the SH2 domains favors the release of the autoinhibitory N-SH2-PTP interactions, rendering the catalytic site accessible to substrates. 23,24,25 Whereas this mode of regulation is widely accepted, the molecular mechanisms underlying SHP2 activation, binding partner recognition, and allostery of the N-SH2 domain are not well understood.
Atomic structures are available for the autoinhibited, closed state of SHP2, 23,26 and for the isolated N-SH2 domain, either in binding with a phosphopeptide or in its apo form. 24 In addition, a recent structure of the "open" state (obtained for the basally active, leukemiaassociated E76K mutant) revealed an alternative relative arrangement of N-SH2 and PTP domains, with the active site within the PTP domain exposed to the solvent. 25 According to these structures, N-SH2 undergoes a conformational transition between the inactive and active states, leading to a loss of complementarity between the N-SH2 blocking loop and the PTP active site. Previous studies hypothesized that the loop connecting -helix B and the strand G of N-SH2 plays an important role in activation. 27 Two alternative hypotheses have been put forward for the molecular events leading to functional activation of SHP2. First, a "conformational selection" mechanism was proposed based on the observations that (i) the crystal structure of the autoinhibited protein shows a completely inaccessible active site, and (ii) even under basal conditions in the absence of SH2-binding phosphopeptide ligands, SHP2 exhibits some activity, indicating a transient population of open, active conformations. According to this hypothesis, phosphopeptide binding to the SH2 domains simply stabilizes the fraction of active SHP2, only after dissociation of N-SH2 from the PTP domain has already taken place. 7,17,28 Second, according to an "induced fit" model, peptide binding might take place in the inactive state of SHP2, triggering the concomitant opening of the protein. 29 However, since the N-SH2 binding site is closed in the inactive state, phosphopeptide binding to the inactive state would require a conformational transition of the N-SH2 domain with respect to the crystal structure.
A better understanding of SHP2 activation and N-SH2 allostery is the key to clarify how SHP2 function is regulated and how pathogenic mutations cause protein dysfunction.
Guided by extensive molecular dynamics (MD) simulations, we present an atomistic model of SHP2 activation based on a concerted conformational rearrangement of the N-SH2 domain upon phosphopeptide binding. According the model, N-SH2 predominantly adopts two distinct conformations, denoted α-and β-state, where only the α-state is activating, whereas the β-state stabilizes the N-SH2-PTP interface. Phosphopeptides may bind to the inactive state, subsequently promoting the release of N-SH2 domain from active site; however, the mere phosphopeptide binding is not sufficient for SHP2 activation, as only sequences selecting for the α-state are effective activators. Moreover, the model rationalizes the effect of certain pathogenetic mutations in terms of altered equilibria between the activating and inhibiting N-SH2 conformations.

RESULTS AND DISCUSSION
Designing an atomistic model of the activation mechanism of SHP2 mediated by the ligand-induced conformational changes requires understanding of the conformational states of the N-SH2 domain. The N-SH2 domain has the two-fold role of blocking the active site of the catalytic domain and recognizing the phosphorylated sequence, the latter triggering the structural rearrangements that precede SHP2 opening and activation.
Review of N-SH2 structure and phosphopeptide-N-SH2 interactions. Figure 1 presents two representative structures of a phosphopeptide bound to the N-SH2 domain: a crystal structure 24 of the N-SH2 domain in complex with the SPGEpYVNIEFGS peptide (panel A), and a conformation extracted from the MD simulation with the SLNpYIDLDLVK peptide (panel B). Like other SH2 domains, N-SH2 consists of a central antiparallel -sheet, composed of three -strands denoted B, C and D, flanked by two helices denoted A and B. The peptide binds in an extended conformation to the cleft that is perpendicular to the plane of the -sheet. The phosphotyrosine binding site is delimited by (i) the A helix, (ii) the BC loop, hereafter referred as pY loop, connecting two strands of the central -sheet, respectively B and C, and (iii) the side chains on the adjacent face of the -sheet. 24 The key residues involved in the interaction with phosphotyrosine are Arg 32 , Thr 42 , and Lys 55 , lining the central -sheet, and residues Ser 34 , Lys 35 and Ser 36 , belonging to the pY loop. 24 Residues Ser 34 , Thr 42 and Ser 36 form hydrogen bonds with the phosphate group, whereas Arg 32 and Lys 55 form salt bridges. Lys 35 contributes to stabilizing the phosphate group in the binding site by electrostatic interactions. Regarding the peptide sequence, the specificity of peptide-N-SH2 interactions is determined by the peptide residues following immediately after the phosphotyrosine, and interacting with the residues of the N-SH2 domain, placed just on the other side of the central -sheet. 24 Generally, the residues at position +1 and +3 relative to the phosphotyrosine are hydrophobic side chains, pointing towards the N-SH2 domain, while residues at position +2 and +4 are solvent-exposed. Residues at position +2 and +4 interact with the residues belonging to the BG loop, whose sequence represents the principal structural determinant of specificity for the N-SH2 domain.
The residue at position +5 is generally inserted into the so-called +5 site, a pocket formed together by the BG and the EF loops. However, also residues further upstream from the phosphotyrosine may interact with N-SH2; for instance, the lysine at positions +7 in the example of Figure 1B favorably interacts with the acidic side chains of residues Asp 64 and Glu 69 , belonging to the EF loop.
Phosphopeptides retain the native binding mode throughout the simulations.
MD simulations were initially performed on the isolated N-SH2 domain in solution, complexed with a set of phosphopeptides. We considered up to 12 peptides, differing either in length or in sequence and chosen for their high experimental binding affinity or chosen to test the influence of sequence variability ( Figure S1). During the simulations, all peptides remained in the binding cleft over the entire 1 μs of simulation, in a conformation very similar to the initial binding mode observed in experimental structures. In Figure S1, the RMS deviation of the most representative MD conformation from the corresponding reference crystal structure is reported for each Cα atom of the phosphopeptide. The leastsquares fit was performed considering only the domain backbone. Evidently, for all structures the deviation from the experimental binding mode remained below 2-3 Å for positions -1 to +6 relative to the phosphotyrosine, with the exception of GDKQVEpYLDLDLD, where the deviation is below 4 Å. Only the peptide termini exhibit larger RMSD values, indicating weaker structural determinants for binding. Overall, these findings demonstrate that, during the whole simulation, the peptides are tightly bound to the N-SH2 domain.
The N-SH2 domain adopts at least two distinct conformations.
Whereas the 12 different phosphopeptides maintain their usual binding mode during the simulations, the N-SH2 domain can undergo several conformational transitions. Starting from the crystal structure, in which the pY loop is tightly wrapped around the phosphotyrosine (see Fig. 1A), an opening of the pY loop and the consequent rearrangement of the N-SH2 domain into another distinct structure was observed in several simulations. To characterize the conformations adopted by the N-SH2 domain, principal component analysis (PCA) was applied to the cumulative trajectory of all 12 simulations. Hence, the PCA eigenvectors do not necessarily represent the conformational transitions undergone by the N-SH2 domain during an individual simulation; instead, they describe the overall conformational space accessible to the N-SH2 domain.
PCA suggested that the N-SH2 domain adopts two main conformational states, hereafter called  and . These two states were resolved by the first PCA vector, which was representative of almost 40% of the overall domain fluctuations. The correlated structural rearrangements detected by the first PCA vector are visualized in Figure 2A by means of the extreme projections onto the vector (see also Fig. 3A), and the related structural rearrangements are quantified in Fig. 2E-G. Namely, the  state shown in Fig. 2A (transparent) is characterized by (i) a closed pY loop (Lys 35 Cα-Thr 42 Cα distance ~9 Å), (ii) an increased distance between the ends of two -strands, C and D, leading to breaking of three inter-strand hydrogen bonds and spreading of the central -sheet into a Y-shaped structure (Gly 39 C-Asn 58 N distance ~12 Å), and (iii) the closed +5 site with a narrow, less accessible cleft (Tyr 66 Cα-Leu 88 Cα distance ~7 Å; see also Figure 3A, left). The  state shown in Fig. 2A Figure 3A, right).
A correlation analysis showed that the conformation adopted by the pY loop is strictly coupled to the spread of the central -sheet ( Figure 2B, correlation R = -0.82). Structurally, this coupling is imposed by a competition between (a) inter-strand hydrogen bonds (one between Gly 39 -Asn 58  The correlations between the spread of the central -sheet and pY loop (Fig. 2C, R = -0.73) as well as the between the pY loop and the opening/closure +5 site (Fig. 2D, R = 0.64) are slightly weaker yet clearly detectable. The latter correlation represents a potential mechanism for modulating the conformational rearrangement of the N-SH2 domain, through the recognition of a particular peptide sequence at the +5 site, as analyzed in the following in detail.
The conformational changes of pY loop and of the +5 site are highly concerted.
Our simulations suggest a coupling between the pY loop and the +5 site, revealing an allosteric network for controlling the state of the pY loop through binding of a specific amino acid sequences at the +5 site (Fig. 2D). To analyze the coupling during simulations with different peptides, we split the PCA vector into two subvectors that describe the motion either of the pY loop or of the +5 site. In principle, projections on these two subvectors could characterize four conformational states given by the two possible states of the pY loop times two possible states of the +5 site (four shaded areas in each panels of Figure  As reported in Figure 3B/C, a correlation between the pY site and the +5 site is observed in all simulations, where most peptides strictly select for either the  or for the  state ( Fig.  3B/C, blue/green areas). In other words, when the pY site is closed, the +5 site is mostly closed ( state); when the pY site is open, the +5 site is always open ( state). Consequently, a conformational change at the pY site is accompanied by concerted conformational change at the +5 site, spanning ~20 Å across the N-SH2 domain. These results support the presence of an allosteric mechanism that couples two binding sites with different physicochemical properties: the pY site, characterized by a high affinity for phosphotyrosine but lacking specificity, and the +5 site, whose role is to induce specificity in the binding of different amino acid sequences.

The phosphopeptide binding mode and sequence play important roles in selecting the N-SH2 domain conformation.
The equilibrium MD simulations showed that each peptide may induce the N-SH2 domain to populate mostly one among the two conformations, and the simulations revealed a correlation between the pY and +5 sites. However, it is unlikely that each individual simulation may explore all possible binding modes of a phosphopeptide within 1 s of simulation time. Indeed, the interconversion between binding modes would first involve a weakening of peptide-N-SH2 interactions, before the required side chain and backbone reorientations may occur. Consequently, the equilibrium MD simulations are insufficient for unambiguously deciding whether a specific peptide sequence selects for the  or the  state.
However, the simulations did reveal how specific binding modes may lock N-SH2 in the  or  state. Figure 3C compares the simulation of the SLNpYIDLDLVK peptide with simulations of three peptide analogues with modifications at positions +6, +7 and +8, thereby adopting different polarities at the capped carboxy terminus. These subtle differences in the sequence affected the orientation of the residue in position +5. In case of SLNpYIDLDLVK and SLNpYIDLDLVND, the leucine side chain at position +5 was exposed to the solvent, thereby allowing closure of the +5 site and consequently the population of the  state. In contrast, in case of SLNpYIDLDLV and SLNpYIDLDLVKD, the leucine side chain points towards the binding cleft, keeping the +5 site open and the N-SH2 domain in the  state.
To investigate in more detail the role of the residue at the +5 site in imposing the N-SH2 conformation, we carried out a large set of additional free energy calculations. To this end, we computed the change of preference of the ligand/N-SH2 complex for the α or the  state upon introducing a mutation in the ligand, according the thermodynamic cycle reported in Figure S2. Figure S3 presents the computed ΔΔG values for the ligands SLNpYIDLDL+5VK and SPGEpYVNIEF+5GS for all possible mutations (except proline) at position +5 and, as a control, at position +6. Positive ΔΔG indicates an augmented preference for the α state, and a negative ΔΔG an augmented preference for the  state. Evidently, mutations at the +5 position modulate the α-versus- preference by up to 14 kJ/mol, whereas mutations at the +6 position have a much smaller effect. Generally, bulky hydrophobic aminoacids at +5 such as Phe, Leu, Ile seem to stabilize the  state ( Figure S4), whereas substitutions with bulky polar amino acids such as Asp, Glu, Arg, Asn, or Gln seem to favor the α state ( Figure S5), although exceptions exist. These trends in stabilizing α versus , as derived form the free energy calculations, are well confirmed by additional free simulations ( Figure S4 and S5).
Taken together, these data further corroborate that the N-SH2 domain may adopt two alternative conformations, constituted by a coupling between (i) the pY loop, which binds the phosphotyrosine, and (ii) the +5 site, which recognizes the peptide sequence downstream of the phosphotyrosine. However, the preference for one of the two alternative conformations,  or , depends not only on the particular peptide sequence but also on the binding mode adopted by the ligand. In particular, both the polarity and the spatial orientation of peptide residue at the +5 site emerge as key determinants for the N-SH2 conformation.
Free-energy calculations confirm that the ligand residue at the +5 site is a key determinant for the preference N-SH2 for the  or  state.
To overcome sampling problems of unbiased MD simulations and to rationalize the preference for the  or  states in energetic terms, we computed the free energy profile of the α-β conformational transition. Here, we used the distance between the central -strands as the reaction coordinate because (i) this distance is strongly correlated with the α-β transition (see Figs. 2B, C, E) (ii) pulling along a center-of-mass distance may be less prone to sampling problems as compared to pulling along a collective PCA vector.
The conformational equilibrium between the  and  states is determined by a balance of different energetic contributions: (i) the intermolecular interactions between the ligand and the N-SH2 domain, and (ii) the intramolecular interactions between the residues of N-SH2, as rationalized for three typical binding modes:  In absence of a ligand ( Figure 4A), the  state represents the free-energy minimum, presumably favored by the formation of inter-strand H-bonds between the closed central sheet. In turn, increased free energy is required for spreading the -strands into a Y-shaped structure, so that adopting the  state is unfavorable.
 In presence of a truncated ligand, such as SPGEpYVNI ( Figure 4B), which contains a phosphotyrosine but lacks the residues flanking the +5 site, the α state is the free-energy minimum. In this case, the pY loop-phosphate interactions prevail over inter-strand Hbonds of the central -sheets, leading to closure of the pY loop and opening of the -sheets.
 With the ligand SPGEpYVNIEFGS, the  state is the free-energy minimum ( Figure  4C). Here, a bulky phenylalanine at position +5 occupies the cleft of the +5 site, thereby keeping the +5 site open. Such conformation stabilized the central -sheet, aiding the interstrand H-bonds of the -sheet to prevail over the pY loop-phosphate H-bonds. Consequently, -sheets close and pY-loop opens.
The free energy profiles suggest that a subtle balance between (i) ligand/N-SH2 interactions and (ii) intradomain interactions within N-SH2 determine the conformation of N-SH2. The ligand residue at the +5 position plays a key role in imposing the α versus the  state. However, because the side chain at ligand position +5 may adopt different orientations relative to the +5 site (towards the cleft or solvent-exposed), not only the type of side chain, but also the binding mode of the overall ligand may influence the N-SH2 conformation.
The N-SH2 domain populates the  state in the autoinhibited state of SHP2.
One of the main features that distinguishes the  state from , namely the spread of the central -strands, can be also revealed by comparing the crystallographic structures of the autoinhibited conformation of SHP2 with those of the N-SH2 domain bound to a phosphopeptide. Table S1 reports the distance between the two central -strands, C and D, for each crystal structure. Evidently, in the autoinhibited conformation of SHP2, the distance between the -strands is small (3.7 to 4.0 Å) indicative of closed -sheets. 23,26 Hence, N-SH2 bound to PTP adopts a conformation similar to the  state observed in simulations. In contrast, when N-SH2 is complexed with a phosphopeptide, this distance is greater (~7-8 Å) indicative of open -sheets, a peculiarity of the  state. 24 Whereas the central -strands of N-SH2 have clearly taken the -state in crystal structures of autoinhibited SHP2 (last paragraph), the structures of the pY loop and of the +5 site have been crystallographically less well defined. The pY loop was typically modeled with a partly open conformation, corresponding to an intermediate conformation between the α-and βstates described here (2SHP, 4DGP). 23,26 However, the electron densities at the pY loop were much lower compared to the density of the nearby β-strands (e.g. 2SHP, 4DGP, 4GWF), 23,26 indicating increased disorder of the pY loop. Further, the partially closed pY might have been stabilized by crystal contacts (e.g. 2SHP, 4DGP) 23,26 and by a buffer phosphate bound to the phosphotyrosine binding site, as modeled by the 5EHP structure. 30 Indeed, during the MD simulations with CHARMM36m or different Amber force fields, the pY loop of autoinhibited SHP2 reproducibly opened up, taking the open pY loop conformation of the βstate. Likewise, the electron density of the BG loop, which forms the pocket of the +5 site together with the EF loop, was poorly defined in several SHP2 crystals and, consequently, only partly modeled against the data by several authors (5EHP, 4GWF). 30 Again, during free simulations of apo SHP2 in solution, we repeatedly observed a complete opening of the cleft at the +5 site, consistent with the  state. Taken together, crystallographic data unambiguously showed that that central -strands of N-SH2 take the -state in the autoinhibited state of SHP2. For the pY and +5 site, crystallographic data is more ambiguous and, hence, simulations presented here were needed to reveal that also the pY and +5 site take the -state in autoinhibited SHP2.
The  state promotes SHP2 opening and activation.
Based on the observation that the  state is found in association with the autoinhibited SHP2, whereas in crystal structures, the isolated, peptide-bound N-SH2 structure resembles the α state, we hypothesized that the  state represents an activating state. According to this model, binding of an activating phosphopeptide would trigger the first steps of SHP2 activation by switching the N-SH2 domain towards the α conformation and, subsequently, weakening the interactions between the blocking loop and the catalytic site.
To test this model, we computed free energy profiles of SHP2 activation, with the N-SH2 domain restrained either to the  or to the  state ( Figure 5). As a measure for SHP2 activation, the distance between the blocking loop and the catalytic PTP loop was chosen as reaction coordinate. Upon pulling the simulation system along this coordinate, the N-SH2 domain moved from its position in the autoinhibited state ( Figure 5A) to a different position on the PTP surface ( Figure 5B), typically by sliding over the PTP surface. The final position of N-SH2 differed among independent simulation runs, indicative of a large accessible conformational space of activated SHP2. Three independent free energy profiles for the activation of SHP2 were computed for each setup with N-SH2 either in the  or in the  state, and using the CHARMM36m force field ( Figure 5C/D). Evidently, with the N-SH2 domain assuming the  state, a consistently high free energy penalty is associated with SHP2 activation, indicating that this conformation favors a stable autoinhibited state ( Figure 5D). In contrast, upon restraining N-SH2 in the  state, two of the three independent simulations documented structural rearrangement towards the active state with only a small barrier; in these cases, the active state represented the free-energy minimum ( Figure 5C).
To exclude that the choice of the force field influences these qualitative differences, we further computed three independent profiles for each, -restrained and -restrained N-SH2, using the AMBER99SB*-ILDNP force field ( Figure S6). Overall, because Amber force fields with the default TIP3P water model may overstabilize protein-protein contacts, 31 we found increased free energies for activation compared to CHARMM36m simulations. However, consistent with the results from the CHARMM36m force field, we obtained greatly increased free energy for activation in the  state as compared to the α state, corroborating that the α state destabilizes the N-SH2/PTP interface, thereby promoting activation.
Amino acid substitutions at codon 42: basal activity and response to ligand stimulation are rationalized with altered conformational selection between the α and β state.
Many pathogenic mutations of SHP2 cluster at the N-SH2/PTP interface, where they may destabilize the N-SH2/PTP interactions, causing constitutive activation of SHP2. 29 However, the molecular effects of certain pathogenic mutations are more subtle and cannot be explained by mere steric effects on N-SH2/PTP complex stability. A typical example is the Noonan Syndrome (NS)-causing Thr42Ala substitution that replaces a conserved threonine in the central -sheet. Because Thr 42 forms an H-bond with the phosphotyrosine in wild-type (wt) N-SH2, one might expect that Thr 42 contributes to the stability of the N-SH2/phosphopeptide complex. However the substitution with alanine leads to an increase in phosphopeptide binding affinity, 6 as documented by the dramatically enhanced catalytic activity of the SHP2 A42 mutant following stimulation with a bisphosphoryl tyrosine-based activation motif (BTAM) peptide. 6,28 Notably, among all possible mutants arising from single base changes in codon 42, all but two SHP2 mutants exhibit a basal catalytic activity comparable to wild-type SHP2. The exceptions are SHP2 P42 that showed a threefold increase, and SHP2 I42 that showed a 50% increase. 6 Upon BTAM peptide stimulation, mutants not associated with NS were either responsive (SHP2 S42 , SHP2 R42 ), with a variable increase of the catalytic activity respect to the basal condition, or unresponsive (SHP2 P42 , SHP2 I42 , SHP2 K42 ). Relative to the wild-type protein, SHP2 S42 exhibits a 50% higher stimulated activity, whereas in SHP2 R42 the stimulated activity was significantly lower. 6 Surprisingly, despite the different range of catalytic activities exhibited by these mutants, the binding affinities were only moderately affected, with the exception of the above-mentioned SHP2 A42 . 6 These modifications of basal activity and responsiveness to peptide stimulation cannot be easily explained by perturbed interaction between residue 42 and the phosphotyrosine. Previous simulations, which showed reduced fluctuations of the pY loop in the Thr42Ala mutant did not reveal a molecular mechanism of the mutation effects. 6 Therefore, we hypothesized that the overall change of the catalytic activity might derive from a combination of two mechanisms: (i) the perturbation of phosphotyrosine interaction upon amino acid substitution, which may affect both the  and the  state, (ii) the shift of the equilibrium between the activating  state and the stabilizing  state, which may arise from the replacement of the conserved threonine at the central -sheet.
We computed the free energy profile for the opening of the central -sheet, as required for the  to α transition, for wild-type N-SH2 and for various mutants of the domain in absence of phosphopeptide ( Figure 6A/B). Remarkably, the Thr42Ile mutant greatly stabilized an open -sheet conformation corresponding to the activating α state, which is compatible with the experimentally observed increased basal activity of SHP2 I42 ( Figure 6B, green solid line). 6 All other simulated mutants had a smaller effect on the conformation of the central sheet, in line with a basal activity of those mutants similar to the wild type. These findings support the hypothesis that mutations may modify the basal activity of SHP2 by shifting the conformational equilibrium of apo N-SH2. In addition, the simulations corroborate our hypothesis that the α state, as stabilized by the Thr42Ile mutant, is indeed an activating state.
Next, we evaluated whether mutations at codon 42 may modulate the binding affinity of a peptide to N-SH2 either in the activation α state or in the stabilizing β state. Because it is difficult to sample a complete ligand binding process or the α-β transition in the presence of a peptide, we used alchemical free-energy calculations and thermodynamic cycles. Accordingly, we here computed the free energies for alchemically transforming Thr 42 to amino acids Ser, Ala, Ile, Lys, or Arg, either in presence (denoted ΔGP-P*[bou]) or absence (ΔGP-P*[unb]) of a bound reference peptide SLNpYIDLDLVK. Using a thermodynamic cycle, the difference between these two ΔG-values, denoted ΔΔG, yields the change in peptide binding affinity upon introducing the mutation. Critically, because these calculations were carried out with N-SH2 restrained either in the α or in the β state, the computed ΔΔG(α) and ΔΔG(β) values show whether a Thr 42 mutation (de)stabilizes ligand binding to the α state, to the β state, or to both.
We found that the Thr42Ile, Thr42Lys, Thr42Arg substitutions greatly affect the affinity of the peptide to the activating α state, as evident from large positive ΔΔG values in a range of tens of kilojoule per mole ( Figure 6C, red bars). However, the same mutations only moderately affect the peptide affinity to the  state, as shown by ΔΔG the range of only a few kilojoules per mole ( Figure 6C, blue bars). Hence, upon introducing these amino acid changes, the peptide may still bind to SHP2, but N-SH2 gets locked in the  state thereby stabilizing the autoinhibited state of SHP2. In turn, the activating α state may be greatly destabilized, thereby rendering the mutations Thr42Ile and Thr42Lys completely unresponsive to ligand stimulation (ΔΔG(α) values of +43kJ/mol or +21kJ/mol, Fig. 6C), in qualitative agreement with the experimental findings. 6 The substitution Thr42Arg with a ΔΔG(α) value of +13kJ/mol only moderately destabilizes the α state, thereby rendering the mutant less responsive than the wild type, qualitatively in line with experimental data. 6 Structurally, the perturbations of the affinity to the α and  states in Thr42Ile, Thr42Lys and Thr42Arg SHP2 mutants may be rationalized considering that any substitution to a larger side chain impedes the formation of the directed hydrogen bonds involved in the closed pY loop. In contrast, in the  state, the substitution with isoleucine does not significantly affect the salt bridges with domain residues Arg 32 and Lys 55 , and the steric hindrance of arginine and lysine is balanced by the new electrostatic interaction with the phosphate group.
The ΔΔG values for the Thr42Ser and Thr42Ala substitutions are qualitatively different compared to the Thr42Ile, Thr42Lys and Thr42Arg (Fig. 6C). For Thr42Ser and Thr42Ala, the respective ΔΔG(α) and ΔΔG(β) are similar, suggesting that ligand binding does not lock N-SH2 in the stabilizing  state but instead ligand binding still allows population of the activating α state. Consequently, Thr42Ser and Thr42Ala are responsive to ligand stimulation, again in qualitative agreement with the experiment. 6 Taken together, both the basal activity of mutants at codon 42 as well as their response to ligand stimulation correlate with modified α-β equilibria revealed by the free-energy calculations, corroboration that α is an activating state.

The increased peptide affinity of Thr42Ala might be associated with a modified α-β equilibrium.
In line with the available experimental observations, Thr42Ile, Thr42Lys and Thr42Arg have only a small effect on the overall ligand affinity, as evident from ΔΔG(β) values close to zero; instead these mutations merely shift the equilibrium of the peptide-bound N-SH2 towards the  state (see above). Rationalizing experimental ligand affinities of Thr42Ala and Thr42Ser is, however, more difficult in the light of the ΔΔG values. Namely, for the Thr42Ala N-SH2, we found slightly positive ΔΔG values for both the α and the β state, indicating a reduced ligand affinity ( Figure 6C). For Thr42Ser, we found slightly negative ΔΔG values, indicating an increased ligand affinity. These ΔΔG values are structurally expected because Thr42Ala leads to a loss of the H-bond of residue 42 with the phosphotyrosine, whereas this H-bond is maintained in Thr42Ser. However, at first sight, these values seem to contradict surface plasmon resonance experiments that reported an increased ligand affinity to Thr42Ala and an unmodified affinity to Thr42Ser. 6 We hypothesize that the structurally unexpected, but experimentally found, increased affinity to Thr42Ala may be rationalized by a modified α-β equilibrium of apo N-SH2. The free energy profiles for the opening of the central -sheet suggest that a substitution of Thr42 with an aliphatic residue, such as alanine or isoleucine, renders the N-SH2 domain more prone than the wild type to spread the -strands into a Y-shaped structure and, hence, to adopt the  state ( Figure 6B, compare cyan/green lines with red line). We expect that such increased population of the  state would facilitate peptide binding because, only in the  state, the pY loop is tightly wrapped around the phosphotyrosine, thereby forming strong polar interactions with the peptide. To validate this model of Thr42Ala affinity in a quantitative manner, calculations of the absolute peptide binding affinity, possibly with N-SH2 restrained to the α or to the β state, as well as structural studies will be required in the future. However, considering that modulations of conformational equilibria are widely adopted by enzymes to control ligand affinity, 32 a modified α-β equilibrium is a plausible mechanism of the increased peptide affinity of Thr42Ala.

CONCLUSIONS
We presented an atomistic mechanism for activation of SHP2, mediated by ligand-induced conformational changes of the N-SH2 domain. Extensive simulations showed that N-SH2 in solution adopts two distinct conformations, denoted α and β. The α and β states exhibit different structures of the central β-sheet and of two ligand interaction sites, which are responsible respectively for the binding of the phosphotyrosine (pY site) and for recognizing peptide residues downstream of the phosphotyrosine (+5 site).
The simulations revealed an allosteric interaction between the pY and +5 sites, mediated by the conformation of the central β-sheet: when the pY site is closed, the +5 site is mostly closed, with the central β-sheet arranging in a Y-shaped conformation; when the pY site is open, the +5 site is always open, with the central β-sheet retaining parallel β-strands.
In absence of a ligand, the N-SH2 domain remains in the β state, both in its isolated form and in the autoinhibited structure of SHP2. Peptide binding may either lock the N-SH2 domain in the β state or trigger the transition to the α state, depending on the sequence and the binding mode of the peptide around the +5 site. Hence, the allosteric network provides a means for controlling the conformations of both the pY site and of the central β-sheet by specific peptide sequences and peptide binding modes at the +5 site.
A ligand-induced transition from the β to the α state has two consequences; (i) the phosphotyrosine is bound more tightly because only in the α state the pY loop is sufficiently flexible in order to fully close onto the phosphate group, thereby forming stronger polar interactions. This mechanism provides a means for controlling ligand affinity via the conformation at the +5 site; (ii) in SHP2, with N-SH2 in the α state with a Y-shaped central β-sheet, interactions between the N-SH2 domain and the catalytic PTP domain are weakened, thereby facilitating the dissociation of N-SH2 and hence the activation of SHP2. According to this model, peptide binding to N-SH2 is necessary but not sufficient for SHP2 activation as only those peptides that select for the α state are activating (Figure 7).
The model of SHP2 activation was further validated by rationalizing the effects of amino acid substitutions of Thr 42 in terms of altered equilibria between the α and the β state. We found that Thr 42 mutations may shift the α-β equilibrium of N-SH2 both in the apo form and in contact with ligands, thereby modulating the basal activity (Thr42Ile) or the responsiveness to ligand stimulation, in qualitative agreement with experimental data. We further hypothesized that an altered α-β equilibrium may underlie the structurally unexpected increased peptide affinity of the Thr42Ala mutant.
Understanding of the activation mechanism of SHP2 also forms the basis to a mechanistic understanding of SHP2 inhibition by small molecules or peptide mimics and, consequently, such understanding may guide the development of new therapeutic options. Specifically, our finding that the selection of particular binding modes leads to the activation, rather than mere binding of the effector, opens novel strategies for SHP2 regulation. As such, we hope that the allosteric activation mechanism of SHP2 proposed here will support ongoing efforts against genetic disorders involving SHP2. 33,34,35,36

MD simulations of the N-SH2 domain complexed with different phosphopeptides
Molecular dynamics simulations were performed on the isolated N-SH2 domain in solution, complexed with a set of phosphopeptides, comprising SLNpYIDLDLVK, IEEpYTEMMPAA, QVEpYLDLDLD, SVLpYTAVQPNE, AALNpYAQLMFP, RLNpYAQLWHR, SPGEpYVNIEFGS, VLpYMQPLNGRK and their analogs. The initial atomic coordinates were derived from crystallographic structures (Table S2). For some peptides (e.g., SPGEpYVNIEFGS) the sequence in the simulation exactly matched the original sequence, and the crystal structure has been used without any modification. For other peptides (e.g., SLNpYIDLDLVK), the crystal structure with the highest peptide sequence similarity was chosen, and the structure was edited by means of Molecular Operative Environment (MOE), 37 followed by a conformational analysis and a local energy minimization with side chain repacking, yielding a reasonable binding pose for all peptides (Table S2). The termini of peptides were capped by acetyl and amide groups. N-SH2 domains in simulation comprised the residues from position 3 to 103. Each protein molecule was put at the center of a dodecahedron box, large enough to contain the domain and at least 0.9 nm of solvent on all sides. The protein was solvated with explicit TIP3P water molecules. 38 All MD simulations were performed with the GROMACS software package, 39 using AMBER99SB force field, 40 augmented with parm99 data set for phosphotyrosine. 41 Long range electrostatic interactions were calculated with the particle-mesh Ewald (PME) approach. 42 A cutoff of 1.2 nm was applied to the direct-space Coulomb and Lennard-Jones interactions. Bond lengths and angles of water molecules were constrained with the SETTLE algorithm, 43 and all other bonds were constrained with LINCS. 44 The pressure was set to 1 bar using the weak-coupling barostat. 45 The temperature was fixed at 300 K using velocity rescaling with a stochastic term. 46 For all systems, the solvent was relaxed by energy minimization followed by 100 ps of MD at 300 K, while restraining protein atomic positions with a harmonic potential. The systems were then minimized without restraints and their temperature brought to 300 K in 10 ns, in a stepwise manner. Finally, productive runs of 1 s were performed.

Principal component analysis (PCA) of the N-SH2 domain dynamics
PCA was performed on a cumulative trajectory of all simulations of the N-SH2 domain, after removing ligand coordinates. The structures were superimposed by a least-squares fit on the backbone considering only the residues with small root mean squared fluctuation, representing the relatively rigid core of the domain. The core was defined by the sequence ranges Phe 7 -Pro 33 , Asp 40 -Arg 47 , Ala 50 -Asn 58 , Asp 61 -Leu 65 , Phe 71 -Tyr 81 , Leu 88 -Glu 90 , Val 95 -Pro 101 . This procedure was chosen to avoid artifacts owing to larger fluctuations of loops and termini. The covariance matrix was built using backbone atoms, excluding the flexible termini and considering only residues from Trp 6 to Pro 101 .

MD simulations of the activation of SHP2, with N-SH2 restrained into two distinct states.
The initial coordinates of SHP2 were taken from its autoinhibited conformation (2SHP). 23 The protein was positioned at the center of a dodecahedral box, large enough to contain the protein and at least 0.9 nm of solvent on all sides, and solvated with ~23,000 explicit water molecules. The structure was minimized and brought to 300 K using the procedure described above. A simulation of 1 s was performed to ensure a fully equilibrated autoinhibited structure in solution. Then an energy minimization was performed. Next, the N-SH2 domain of SHP2 was restrained in one of two distinct states (denoted below as α-and β-state) using a harmonic restraint on a single principal component of C atoms from residue Trp 6 to Pro 101 (k = 1000 kJ/mol nm 2 ). All other degrees of freedom were unrestrained. This procedure allowed good conformational sampling while maintaining the N-SH2 domain in a particular conformational substate. The restraining potential was introduced gradually, while the temperature was increased from 50 to 300 K in 5 ns and then maintained constant for another 5 ns.
Next, N-SH2 was pulled away from PTP using 600 ns pulling simulations 47,48 coupled with simulated tempering 49 at constant volume. For simulated tempering, temperature steps of 5 K between 300 and 400 K were applied. Temperature changes were controlled according to the Metropolis algorithm 50 to obtain canonical ensembles at all temperatures. The initial weights were chosen following Park and Pande. 51 Temperature transitions were attempted every 1 ps, and the weights were updated throughout the whole simulation according to the Wang-Landau adaptive weighting scheme. 52 The pull force was set to 10000 kJ/mol nm 2 , whereas the reference position was changed with a velocity of 2.5x10 -6 nm/ps. Two sets of simulations were performed using AMBER99SB*-ILDNP 53 or CHARMM36m, 54 respectively. First, with the AMBER force field, the reaction coordinate was defined as the distance between the Asp 61 C  (N-SH2 blocking loop) and the Ala 461 C  (catalytic PTP loop). The distance was pulled from 0.55 nm to 1.97 nm. Second, because the Amber force field seemed to overstabilize protein-protein contacts, the calculations were repeated using CHARMM36m, treating hydrogen atoms as virtual sites, allowing a time step of 4 fs. Here, the reaction coordinate was defined as the distance between the backbone centers of mass of residues 60-62 and residues 460-462, and the distance range spanned from 0.73 nm to 2.15 nm. To obtain statistically independent pathways for SHP2 opening, three pulling simulations were carried out for each force field, each restrained either in the α-or in the β-state (12 simulations in total).

Free energy profiles of the SHP2 opening
The free energy profiles describing the opening of SHP2 were obtained using umbrella sampling and weighted histogram analysis method (WHAM). 55,56 For each profile, 72 windows were used. The umbrella force constant was set to 4000 kJ/mol nm 2 , and a sampling of 10 ns was performed for each window. The initial configurations, comprising the velocities, were extracted from the steered MD trajectory, by means of a cluster analysis, performed on the configurations belonging to the ensemble at 300 K. The configurations, collected and grouped in sub-ensemble according the projection along the reaction coordinate, were clustered using the GROMOS method. 57 The cutoff was chosen on the basis of the root mean square deviation distribution in each sub-ensemble, picking the value corresponding to the first relative maximum in abscissa. Umbrella sampling simulations were performed at constant pressure, using the Parrinello-Rahman barostat. 58 PMFs were computed with N-SH2 either restrained in the α-or in the β-state, and using either the AMBER99SB*-ILDNP or the CHARMM36m force field. Three statistically independent PMFs were computed for each force field/N-SH2 state combination (12 PMFs in total), where the initial frames for independent PMF calculations were taken from different, independent pulling simulations. This protocol excludes undesirable hidden correlations among independent PMF estimates.

Simulations of the Thr 42 mutants of the N-SH2 domain and opening of the -strand.
MD simulations were carried out on the N-SH2 domain in its apo form, considering five possible mutants that have been studied experimentally, 6 arising from a single-base change at codon 42. These mutations involved the substitution of the Thr 42 residue with Ala, Ser, Arg, Lys or Ile. Starting from the wild-type structure, mutations were introduced with the Swiss-PdbViewer software package 4.1 59 and the rotamers were chosen based on (i) favorable contacts with the protein and (ii) potential energies. The equilibration procedure was the same as for wild-type N-SH2 domain. Production runs were performed for 500 ns. For pulling simulations, the structures were further equilibrated applying a harmonic potential and then a holonomic constraint at the hydrogen bond between Gly 39 and Asn 58 , forming a strand, while two simulated tempering simulations of 10 ns each were performed. Then, pulling simulations were carried out for gradually opening the -strand over 250 ns. Again, simulated tempering 49 was used to enhance sampling, using a temperature range from 300 to 380 K in steps of 10 K. The pull force was set to 1000 kJ/mol nm 2 , whereas the reference Gly 39 C-Asn 58 N distance between the -strands was changed with a velocity of 5x10 -6 nm/ps. The free energy profiles were obtained using umbrella sampling and WHAM. 55,56 For each profile, 54 windows were used, the umbrella force constant was set to 1000 kJ/mol nm 2 , and a sampling of 500 ns was performed for each window. Statistical errors of the PMFs were estimated by bootstrapping complete histograms, yielding a maximum error of less than 2 kJ/mol. 56

Effects of Thr 42 substitutions on the propensity of N-SH2 for two conformations.
To test whether substitution of Thr 42 modulates the propensity of N-SH2 for the α-and βstates described below, free-energy calculations were carried out along an alchemical reaction coordinate . Thr 42 in wild type (=0 state) was alchemically mutated to Ala, Ser, Arg, Lys and Ile, respectively (=1 state), while restraining N-SH2 in the α-or in the β-states with a restraint on a principal component (force constant 1000 kJ/mol nm 2 ). 60 The AMBER99SB force field was used and the hybrid structure and parameter set were generated using PMX. 61 Each system was initially equilibrated, and slowly transformed from state =0 to state =1 within 10 ns. Then, in order to calculate the free energy differences, equilibrium simulations for every mutation were performed for 10 ns, for the states =0 and =1, separately. From every equilibrium trajectory, 200 fast nonequilibrium runs of 100 ps each were spawned morphing the system from =0 to =1, and other 200 from =1 to =0. A softcore potential was used for Lennard-Jones and Coulomb interactions. 62 The free energy difference was computed using Crooks Fluctuation Theorem. 61,63,64 The statistical error was estimated using bootstrapping. These free-energy calculations were performed for the N-SH2 domain in its apo form or bound to the phosphopeptide SLNpYIDLDLVK respectively.

Effects of peptide mutations on the affinity to the α-and β-states of N-SH2.
To test if certain peptide mutations select for a conformational state of N-SH2 (denoted αor β-state), simulations were carried out on the N-SH2 domain in solution bound to SLNpYIDLDLVK or SPGEpYVNIEFGS, while restraining N-SH2 in the α-or β-state with a harmonic restraint on a principal component (k = 1000 kJ/mol nm 2 ). Every substitution, with the exception of proline, was applied on these two phosphopeptides at the sequence position +5 and +6 relative to phosphotyrosine. Free-energy calculations were performed using the procedure reported above, starting a set of five uncorrelated initial configurations obtained from independent equilibration simulations.  Conformational transition from  (opaque) to  (transparent), representing the two main conformational states adopted by the N-SH2 domain, here visualized as the extreme projections onto the first PCA vector. The residues used to quantify the β-sheet spread, the pY loop opening, and the +5 site opening are highlighted in red, green, and blue, respectively. (B-D) Correlation between the β-sheet spread, pY loop opening, and +5 site opening, as taken from microsecond simulations of N-SH2 bound to 12 different peptides. The distances were defined as described in the main text. (E-G) Correlation between the projection η1 onto the first PCA vector and β-sheet spread, pY loop opening, and +5 site opening (see axis labels).   (B) Open and active structure obtained by pulling simulation coupled with simulated tempering, using the color code of panel A. (C) Free energy profiles of SHP2 opening with the N-SH2 restrained to the  state (left) or to the  state (right). Profiles from three independent simulations using the CHARMM36m force field (grey lines) and their average (green/blue) are presented. The distance between the blocking loop and the catalytic PTP loop was taken as reaction coordinate. Figure 6. (A) Cartoon representation of N-SH2, wild type and Thr 42 mutants (A42, S42, I42, K42, R42), sorted according the size of the sidechain (Ala < Ser < Thr < Ile < Lys < Arg). (B) Free energy profile for the opening of the central -sheet in wild type and T42 mutants, as a function of the N-C distance between the residues Gly 39 and Asn 58 . Statistical errors were <2 kJ/mol (not shown for clarity). (C) G of binding of the T42 mutants respect to the wild-type form, calculated for the N-SH2 domain in  or in  state.