Insights into the mechanism of the PIK3CA E545K activating mutation using MD simulations

Phosphoinositide 3-kinase alpha (PI3Kα) is involved in fundamental cellular processes including cell proliferation and differentiation and is frequently mutated in human malignancies. One of the most common mutations is E545K, which results in an amino acid substitution of opposite charge. It has been recently proposed that in this oncogenic charge-reversal mutation, the interactions between the protein catalytic and regulatory subunits are abrogated, resulting in loss of regulation and constitutive PI3Kα activity, which can lead to oncogenesis. To assess the mechanism of the PI3Kα E545K activating mutation, extensive Molecular Dynamics simulations were performed to examine conformational changes differing between the wild type (WT) and mutant proteins as they occur in microsecond simulations. In the E545K mutant PI3Kα, we observe a spontaneous detachment of the nSH2 PI3Kα domain (regulatory subunit, p85α) from the helical domain (catalytic subunit, p110α) causing significant loss of communication between the regulatory and catalytic subunits. We examine the allosteric network of the two proteins and show that a cluster of residues around the mutation is important for delivering communication signals between the catalytic and regulatory subunits. Our results demonstrate the dynamical and structural effects induced by the p110α E545K mutation in atomic level detail and indicate a possible mechanism for the loss of regulation that E545K confers on PI3Kα.


Supporting Information
Insights into the mechanism of the PIK3CA E545K activating mutation using MD simulations

Specific contacts between E545 -nSH2 and role of the charge reversal
The specific contacts of E545 with the nSH2 domain are:  Table S4).
The charge reversal plays the following role in the sequence of events present in the detachment: 1) Residues 545K and K379 interact through side chains with van der Waals and hydrogen bond interactions. 2) Gradually they come apart (due to charge repelling) and 545K forms a new salt bridge with D549, while K379 forms a salt bridge with D421 (nSH2).

3) Then 545K interacts with D421 and K379 with K548. 4) Then a hydrogen bond is formed between K379 with the backbone O of L380.
When the side chain hydrogen bond of K379 and 545K is lost, the backbone hydrogen bond between 545K and L380 is gradually lost. The role of the charge reversal is therefore that the neighboring hydrogen bonds are weakened in mutant.

Model Construction
The atomistic model of the human WT PI3Kα was created based on the recently published crystal structure (PDB ID: 4OVU) 1 . The few missing residues were added and refined through homology and loop modelling using Prime from the Schrödinger suite 2 . The missing residues were: a) in the catalytic domain, p110α: 0-5, 309-318, 410-415, 515-518, 1053-1068 and b) in the regulatory domain, p85α: 322-326, 513-516. A schematic representation of PI3Kα heterodimer with the p110α catalytic and p85α regulatory subunits and all functional domains is depicted in Fig. 1a,b. The E545K mutant was built with a single substitution of residue Glu 545 to Lys. Notice, that after introducing the mutation and after minimization (see below), the position of E/K545 backbone nitrogen atom in the helical (p110α) domain is almost the same allowing hydrogen bond interactions with the neighboring L380 backbone oxygen of nSH2 (p85α) (Fig. 1c).

MD simulations
All simulations were performed with the GROMACS 5.0.4 software 3 , using the AMBER99SB-ILDN all-atom force field 4 and the TIP3P water model 5 . As box shape, the rhombic dodecahedron of GROMACS was used, a special case of a triclinic box. This shape was chosen because it truncates the corners of the cubic box and minimizes the water needed for the simulation. The number of the water molecules in the mutant simulations were 110982, while in the WT 110996. The long-range electrostatic interactions were treated using the fast particle-mesh Ewald summation method. The temperature during simulations was kept constant at 310 K using the Nose-Hoover thermostat with a time constant of 1 ps. The pressure was isotropically maintained at 1 atm using the Parrinello-Rahman coupling with a time constant of 5 ps and compressibility of 4.5e-5 bar −1 . A time step of 2 fs was used with all bond lengths constrained using the LINCS algorithm. The non-bonded potential energy functions (electrostatic and van der Waals) were smoothly decaying between cut-off distances 0.8 -1.0 nm. Prior to MD simulations, both structures were relaxed by 10,000 steps of energy minimization using the steepest descent algorithm, followed by positional restraint equilibration first in the NVT and then in the NPT ensemble for 200 ps each. Finally, unbiased MD simulations were carried out with the atomic coordinates of the systems saved every 2 ps (Table S1).

Distance fluctuation (DF) maps
To assess the intrinsic flexibility and the apparent plasticity of the protein, each MD trajectory was projected on the first 10 principal components and maps of distance fluctuations were constructed following the same procedure as in Ref 6 . The distance fluctuation (DF) is defined as the time-dependent mean square fluctuation of the distance r ij between atom i and j, which in our case they correspond to Cα atoms of the residues: Where brackets indicate time-average over the trajectory.
Analyses of residues root mean square fluctuation (RMSF), principle component analysis (PCA) and distance fluctuations (DFs) were applied to the last 200 ns of each trajectory, in which the structures were considered to be in equilibrium based on their root mean square deviation (RMSD) from the initial crystal structure ( Supplementary Fig. S1). The trajectory was analyzed using GROMACS tools v5.0.4 3 and VMD 7 .

Introduction
Dynamical network analysis is a computational method that allows exploring putative allosteric communication pathways. 8 With this method, time-dependent positional correlations between residues are calculated. Based on these residue-correlated motions a network representation of the protein is constructed. Through the dynamical network analysis, the intramolecular interactions in a protein can be collectively represented in the form of a protein structure network, where the residues (or sets of atoms) are the nodes of the network, which are connected to each other by links (edges) that depend on the node interaction strength.
The method outputs shortest paths between residues (nodes) as the most dominant mode of their communication. Edges with the greatest number of shortest paths that cross that edge are named "critical edges" and the nodes connected by these edges are established as "critical nodes" for allosteric signal transduction.
In our study, the dynamical network analysis was carried out using a similar protocol to Ref 8 , which has been previously applied to study a tRNA-protein, as well as several proteins and protein complexes [8][9][10][11][12][13][14][15] . To create a network model for our protein, the nodes and edges need to be created. As mentioned before, in dynamical network analysis, a node represents some set of atoms, e.g. an amino acid within a protein; here, as nodes we have used the Cα atoms of each PI3Kα residue. To define which pairs of nodes will be connected through edges, we have connected nodes, whose residues are within a cutoff distance of 4.5 Å of another residue for at least 75% of the MD simulation. Namely, if residues were within 4.5 Å of another residue for 75% of the trajectory, a correlation value (which will be explained later) was assigned to their edge; otherwise, the correlation value was set to zero. This contact-map filtering prevents the introduction of edges between residues that are physically far apart.

Correlations, edge weights, critical edges, critical nodes
The pairwise correlations are defined as C ij , where ! is the position of the i th node.
The pairwise correlations, C ij , of the motion between nodes i and j, determine the information transfer between these nodes. In this way, the motion of node i can be used to predict the direction of motion of node j, since they are correlated.
Edge weights, w ij , are derived from the probability of information transfer across an edge: Thus, the dynamical networks are weighted networks in which the weight (w ij , equation (2)) of an edge between nodes i and j is the probability of information transfer across that edge as measured by the correlation values (C ij ) between the two nodes.
In this model, we assume that the most likely biologically relevant pathways are those that minimize the network distance. Finding an optimal path is equivalent to minimizing the distance travelled between nodes in the network. The length of such a path, Dij, between distant nodes i and j is the sum of the edge weights between the consecutive nodes (k,l) along the path: The shortest distance, Dij 0 , between all pairs of nodes in the network is the most dominant mode of communication between the nodes, calculated by using the Floyd-Warshall algorithm.
Edges with the greatest number of shortest paths that cross that edge are named "critical edges" and the nodes connected by these edges are established as "critical nodes" for allosteric signal transduction.

Community Analysis
The physical network of nodes and edges can be clustered into substructures or communities of nodes that are more densely interconnected to each other than to other nodes in the network. Nodes (residues) that show a high correlation based on their w ij (see equation (2) in SI for its definition), can be grouped in protein regions; these regions are termed communities. The community structure is identified by using the Girvan-Newman algorithm, which uses a top-down approach to iteratively remove the edge with the highest betweenness and recalculate the betweenness of all remaining edges until none of the edges remains. The betweenness of an edge, defined as the number of shortest paths that pass through the edge in the network, is used to measure the importance of the edge for communication within the network.
According to our simulations, these communities largely reflect the functional domains of PI3KA. Allosteric signalling between communities occurs through particular node pairs that are important for the community crosstalk. These "critical nodes" transfer a relatively large degree of information across their edge. As critical node pairs carry out intercommunity crosstalk, they may provide favorable sites of signal disruption.
To identify residues that are important in transmitting the signal from one region of a protein to another, one should define the path through which the signal is transmitted. According to del Sol et al. (2006), residues that are key in preserving short paths in network communication between the source of the allosteric signal and the end-point of the allosteric signal, are crucial for the transmission of the signal. 17 The shortest (optimal) path between two nodes in a network is determined by the Floyd-Warshall algorithm. 18,19 The shortest paths between pairs of nodes belonging to two different communities are calculated and analyzed for communication across communities in the network.    Figure S7. Time series of distances between Cα atoms of key residues around the PIP 2 substrate binding pocket for all simulated systems. All distances were calculated with respect to Cα of R461 (p85α), which is assumed to bind PIP 2 i) K941(activation loop) residue also binding PIP 2 ii, iii) H936(activation loop) and H917(catalytic loop) that could act as a catalytic base and iv) I800, which is part of the ATP pocket.

Note: Some of the graphs axes have been manually changed to include a period to indicate
the decimal place instead of a comma. Great Britain and the United States are two of the few places in the world that use a period to indicate the decimal place, while many other countries, including Greece, use a comma instead. In our original submission, the decimal place in some plots was indicated with a comma. To follow the submission guidelines, since Scientific Reports uses UK English spelling, we have corrected the format in the graphs and used a period to indicate the decimal place. To do so, we used Adobe Photoshop CS6 software.

Hydrogen Bond and Salt Bridge criteria
VMD was used to calculated hydrogen bonds and salt bridges, and their frequency (percentage of occupancy). A hydrogen bond is defined so that the distance between the donor and acceptor atoms is less than 3.5 Å and the angle between donorhydrogen atom-acceptor is less than 30 degrees. A salt bridge is considered to be formed if the distance between any of the oxygen atoms of acidic residues and the nitrogen atoms of basic residues are within 4.0 Å, which is defined as the cut-off distance. The frequency of interaction is counted as a percentage of occurrence over all the frames of the trajectory. This frequency may be greater than 100%, because a given residue-pair may contain more than one hydrogen bond, each of which is counted separately.