Introduction

Human EGFR is one of the most studied members of the receptor tyrosine kinase (RTK) family owing to its vital role in the signal transduction pathways that regulate key cellular functions and its importance as a drug target1. Multi-domain protein EGFR consists of a single transmembrane domain, extracellular domain and intracellular tyrosine kinase (TK) domain. As shown in Fig. 1, EGFR kinase domain consists of an N-terminal lobe (N-lobe), C-terminal lobe (C-lobe) and a hinge region connecting the two lobes. Residue T790 is in the hinge region, whereas residue G719 is in the P loop region that comprises part of the ATP-binding pocket. The ATP-binding pocket consists of a hinge region, p-loop, C helix and activation loop. Threonine residue at 790th position is known as a gatekeeper, which controls the access of the inhibitors to a deep hydrophobic pocket in the ATP-binding site. Activation of the receptor with growth factors or other cognate ligands induces receptor dimerization and the auto-phosphorylation of key tyrosine residues within the carboxyl terminal portion of the receptor. These phosphorylated tyrosine residues serve as active sites for various signal transducers, which initiate multiple signaling pathways, including those resulting in cancer phenotypes2. The aberrant activation of EGFR has been implicated in several key aspects of human neoplasia, including the increased proliferation, survival and invasiveness of cancer cells. Recent studies reported the association of mutations in TK domain of EGFR with NSCLC patients3,4. Cells bearing mutant EGFR proteins show oncogenic properties but typically also exhibit enhanced sensitivity toward inhibitors than the wild-type (WT) EGFR protein.

Figure 1
figure 1

Schematic representation (ribbon shape) of crystal structure of EGFR kinase domain bound to gefitinib by PyMol.

Stick representation of gefitinib according to the atomic color scheme (C in green, O in red and N in blue). Structural elements N-lobe (grey, red and cyan), C-lobe (White), hinge region (residues 788-797-Violet), P loop (residues 712–731-Red), C helix (residues 752–767- green) and activation loop (855–877, in blue).

Gefitinib, the most common TK inhibitor (TKI), blocks signal transduction pathways implicated in cancers5. NSCLC patients who initially respond to TKIs but eventually results in acquired drug resistance by the initiation of secondary mutation T790M4,6. Mutation of the gatekeeper residue threonine at position 790 was first thought to reduce the affinity of the protein to the drug by creating steric hindrance in the binding site6. However, Yun et al. (2008)7 showed that both the single T790M mutant and the double-mutant L858R/T790M maintain the same low nanomolar affinity for gefitinib as the L858R mutant. By contrast, the T790M mutation confers a higher affinity toward ATP than the L858R mutant such that the combined double mutant L858R/T790M results in an activated enzyme that is resistant to ATP-competitive TKIs8.

Recent report by Yoshikawa, S. et al. (2013) demonstrated the acquired resistant of double mutant G719S/T790M (DM) to gefitinib9, G719S mutation occurs within the phosphate-binding loop (P-loop) and not observed frequently10. The structure of the EGFR DM (G719S/T790M) was solved and deposited in PDB9. Although the biological effects of the important mutations in EGFR at molecular level is clear, a mechanistic explanation linking the mutation to change in the explicit dynamic properties remains unclear. Thanks to the advances in force fields11 and the use of specialized computer architectures12 or enhanced sampling methods13, it is now possible to use all-atom molecular dynamic (MD) simulations accurately to portray the complex conformational transitions involved in drug resistance14. Therefore, to elucidate the structural and dynamic consequences of the DM on the catalytic domain of EGFR and its affinity toward gefitinib, we performed molecular dynamic simulations (50 ns) of the wild type (WT)-EGFR and three oncogenic mutants: G719S, T790M and DM in complex with gefitinib.

Results

To examine the structural basis of the acquired drug resistance, we analyzed the structural dynamics and energetic effects of single and double EGFR mutations (G719S/T790M).

Molecular modeling

The 3D structure of EGFR mutant model G719S and T790M was predicted computationally by RosettaBackrub server. Among the top ten models built by the server, the best model was identified using the confidence score of the structure modeling, which estimates of the quality of the predicted models (Supplementary Table 1). Modeled structure was further validated using the SAVES server (Supplementary Table 2). The validation statistics showed a good stereo-chemical quality with more than 96% residues in the core region. The final protein model was subjected to MD simulation (50 ns) via Gromacs to energy minimize and stabilize the protein.

Flexibility and compactness in WT and mutant EGFRs

The time evolutions of RMSD of the protein backbone atoms for the simulations with WT and mutant EGFR were analyzed (Supplementary Figure 1). For each case, the energy of the minimized starting structure was taken as a reference. To bring back all-atoms level of detail, two individual MD simulations for a time scale of 50 ns was initiated to the WT and mutant structures G719S, T790M and DM. For analyzing the degree of convergence and consistency of the system, we performed two individual MD simulations for 50 ns. No significant drift was observed in the amino acid trajectories initiated from the repeated MD of EGFR structures. Both the simulated protein structures were aligned with root-mean-square deviation (RMSD) for backbone atoms below 3.5 Å (Supplementary Figure 1).

After ~10 000 ps, mutant T790M showed a different deviation pattern until the end of the simulation resulting in backbone RMSD of ~0.29 to 0.45 nm, whereas mutant G719S and DM (Fig. S1) did not stay too far from the WT protein toward the end of the simulation period. This magnitude of fluctuation, together with a small difference in the average RMSD value leads to the conclusion that the simulation produced a stable trajectory, thus providing a suitable basis for further analysis.

Comparison of RMSF values between WT and DM

The RMSF values of the C-alpha atoms for each residue were computed for WT and DM to understand the residue-wise mobility of the two proteins (Fig. 2). We observed that the DM tended to show fewer fluctuations than each of the single point mutations and the WT. DM caused a decrease in mobility, specifically in alpha helix 2 and also around the central part of the protein. The functional importance of the region was explored via the docking of gefitinib to EGFR, demonstrating the active participation of the region in protein-inhibitor complex formation. Thus, a decrease in the mobility of this region might be responsible for the alteration in the functional activity of the mutant protein. It may be emphasized that an increase in rigidity and reduction in overall flexibility was observed upon mutation, effects that mutation might impact the binding properties of the protein.

Figure 2
figure 2

RMSF of the Cα atoms for each residue of WT, G719S, T790M and DM over the 50 ns of the trajectory.

Color scheme: WT: Black, G719S: Green, T790M: Red, DM: Violet.

To analyze the effect of mutation on the collective movement of the protein, we performed a fluctuation analysis for the average 50,000 ps structures of WT and all the mutant structures and characterized them with respect to the P-loop (residues 712–731) and hinge regions (residues 788–797). In the P-loop region of EGFR, reduced flexibility was observed with both the DM and G719S mutations, whereas T790M exhibited a higher fluctuation (Fig. 3), confirming the reduced P-loop flexibility caused by the G719S mutation within this loop, as proposed by Yoshikawa et al (2013). Similarly, both the DM and G719S mutations exhibited a lower fluctuation than T790M in the hinge region. Thus, a reduction in the mobility of this region might be responsible for the observed alteration in the functional activity of the mutant protein (Fig. 4).

Figure 3
figure 3

Comparison of the RMS fluctuation of the P-loop of WT, G719S, T790M and DM over the 50 ns of the trajectory.

Color scheme: WT: Black, G719S: Green, T790M: Red, DM: Violet.

Figure 4
figure 4

Comparison of the RMS fluctuation of the hinge region of WT, G719S, T790M and DM over the 50 ns of the trajectory.

WT: Black, G719S: Green, T790M: Red, DM: Violet.

Impact of mutations on secondary structural elements and the binding pocket

The time-dependent distance between the mass centers of each pair of the P-loop and activation loop was calculated to detect the relative movement induced by the mutations (Fig. 5). Our analysis revealed that the T790M mutation significantly increased the distance between the P-loop and activation loop, whereas the G719S mutation significantly shortened the distance between the P-loop and activation loop (Supplementary Figure 2). This result indicates that the secondary T790M mutation retains the active conformation of the binding site as proved by Yoshikawa et al. (2013). To further characterize the effect of the T790M mutation on the conformational distribution of structural elements, we calculated the time-dependent distance between the mass centers of gefitinib and the active site residue M793 for the WT and the mutant proteins. Based on our analysis, the docking mode of gefitinib with EGFR WT and mutant were in agreement with the recently reported low-resolution structure of the complex determined by X-ray scattering analysis (9), whereby gefitinib forms hydrogen bonds with active site residue M793 (Supplementary Table 3). As in Fig. 6, the histogram plot shows T790 mutant exhibited a higher average distance between M793 and the drug, though the G719 mutation caused gefitinib to move closer to the binding site. In DM, the distance between the drug and M793 was shorter with respect to WT because of the secondary mutation (Supplementary Figure 3). This result proves that the T790M secondary mutation effectively restores the nucleotide binding property of the G719S mutant as observed for the L858R mutant7.

Figure 5
figure 5

Histogram plot showing the distance between the mass centers of the structural elements for the P-loop and activation loop of WT, G719S, T790M and DM at various time intervals for the 50 ns of the trajectory.

Figure 6
figure 6

Histogram plot showing the time-dependent distance between the mass centers of gefitinib and residue M793 for G719S, T790M and DM at different time intervals over the 50 ns of the trajectory.

To identify the specific structural change in the binding pocket that resulted in the observed ligand movements, we calculated time-dependent distances among the EGFR pharmacophore residues in the hydrophobic region (L718 and G796). Our analysis indicated that the T790M mutation resulted in a shorter distance between residue L718 and G796 (Fig. 7). By contrast, the G719S mutation increased the distance between residues L718 and G796. For DM, distance was found to be lowest when compared to WT and the other mutations. The decreased distance between L718 and G796 lead to a smaller slot in the hydrophobic region, which in turn facilitated the exclusion of gefitinib from the binding pocket. We also compared the gefitinib-binding modes of the WT and DM structures. The main hydrogen bonding between EGFR (Met 793) and gefitinib is common for the WT and mutant models; however, the aniline ring of gefitinib was shifted upward in DM when compared to WT EGFR. This shift is presumably an adaptation by gefitinib to adjust to the modification caused by EGFR DM (Supplementary Figure 4).

Figure 7
figure 7

Time-dependent distance between the mass centers of residues L718 and G796 for WT, G719S, T790M and DM over the 50 ns of the trajectory.

Color scheme: WT: Black, G719S: Green, T790M: Red, DM: Violet.

The dictionary of secondary structure of protein (DSSP) program was applied to the secondary structure of the EGFR WT and mutant models and the resulting fluctuations were illustrated in Supplementary Figure 5 (A–D). In the G719S mutation, minimal changes were observed in the coil region that remains near to the point mutant region. Specifically, the residues ranging from 710 to 720 showed conformational changes from coil to bend and toward the C lobe conformational changes from turns to coils began to dominate. In the case of DM, most of the alterations affected the hinge region, the activation loop and c lobe secondary structure regions, with the helical elements replaced with turns and bends during the course of the simulation. During the simulation, the native structure retained higher percentages of the native secondary structural element conformations compared to that of the mutant structures.

Principal component analysis

PCA was performed on all the four trajectories of EGFR WT and mutant forms to monitor the overall strenuous motions of the protein. Diagonal covariance matrices were built over the Cα atoms of the protein for each trajectory and used to capture the degree of gefitinib co-linearity in the atomic positions for 324 residues within the EFGR structure for each pair of atoms. The eigenvalues obtained through the diagonalization of the covariance matrix elucidates the atomic contribution on motion. Similarly, the eigenvectors explain a collective motion accomplished by the particles (van der Spoel et al., 2005). A total of 580 eigenvectors was generated for the entire trajectory indicating that the first five eigenvectors accounted for more than 90% of the overall system motion for native trajectory. The overall motion of a double mutation for the top 7 eigenvectors accounted for 85%. Within the top eigenvectors, the first two accounted for a significant amount of overall motion in each case. The projection of first two principal components displays the motion of the native and mutant forms in phase space. Here, the overall flexibility was calculated by the trace of diagonalized covariance matrix. The trace values for WT, G719S, T790M and DM structure of EGFR was found to be 26.234 nm2, 19.671 nm2, 32.789 nm2 and 12.018 nm2 respectively (Fig 8A–D). Among these values, T790M showed high values suggesting an overall escalation in the flexibility than the native model, whereas DM exhibited lowest value confirming the decrease in flexibility in the collective motion of the protein. From these projections, it was observed that clusters of DM were well defined and was more stable compared to the other protein model. The DM form covered a smaller region of conformational space than the WT and other mutant forms as shown in Figure 8.

Figure 8
figure 8

2D projection of EGFR WT and the mutant models over the first two principal components.

(A), WT; (B), G719S; (C), T790M; (D), DM. Color scheme WT: Black, G719S: Green, T790M: Red, DM: Violet.

Discussion

The major drawback of TKI therapy is the development of secondary resistance caused by the acquisition of new mutations, as best exemplified by imatinib-resistant mutations in BCR-ABL-positive CML15. To our knowledge, this mechanism of drug resistance, i.e., resistance conferred by a mutation that increases the affinity for a competing physiologic substrate, has not been previously reported within a clinical context. Interestingly, distinct but related effect has recently been described in a mutant of the mitotic kinesin in KSP in which drug resistance is conferred by an allosteric mechanism involving an enhanced affinity for ATP16.

In light of the present study, we can rationalize and quantify the epistatic effect due to the occurrence of secondary mutations in EGFR. The development of altered drug resistance mechanism with the EGFR double mutation is due to a change in the active site conformation. As proposed in previous study17, the total stabilization of the active state by DM is more than would be expected from a simple combination of stabilization due to the two single mutations. The EGFR double mutation gatekeeper residueT790M is situated at the top of the hydrophobic spine and stabilizes the active conformation. In agreement with a recent study9, the gatekeeper T790M mutant does not appear to act via steric hindrance with inhibitors but rather by stabilizing the active conformation. In this case, this methionine participates in the hydrophobic core surrounding the active site. These results agree with the enhanced stabilization of the catalytic site observed when comparing the collective motions of the WT and mutant kinase domain18.

According to our analysis, we found that the combination of mutations (G719S/T790M DM) in EGFR has both the rigidifying effects of the two single mutations and also stabilizes the correct helical structure of the αC-helix. In particular, the T790M mutation decreased the size of the hydrophobic slot formed by L718 and G796 in the ATP-binding pocket (Fig. 7), suggesting that the design of T790M mutant inhibitors should avoid targeting this region. We found that the importance of DM is not a simple addition of the individual mutations but rather that the secondary T790M mutation reversed the effect of G719S on the distance between the P-loop and activation loop. These EGFR mutants should therefore be considered as an invaluable tool to evaluate the activity of novel, potentially more potent, ATP-competitive inhibitors for NSCLC patients.

Methods

Structural modeling and docking study

For WT and DM, we retrieved already available structure from PDB [WT - PDB ID: 3VJO chain A, at 2.64 Å and DM- PDB ID: 3UG2 chain A, at 2.50 Å] for our analysis. In order to model G719S and T790M mutant proteins, RosettaBackrub web server was used which is based on ab initio modeling technique Rosetta 3.119. The RosettaBackrub server provides an easily accessible interface to Rosetta predictions and implements three applications that utilize the backrub-method for flexible protein-backbone modeling and design. The models with high scores and good topologies were selected as candidate structures.

The Autodock 4.2 suite was used as a molecular-docking tool to perform the docking simulations. A Lamarckian genetic algorithm was used as a search parameter20. The (Lamarckian GA parameters used in the study were as follows: number of runs, 30; population size, 150; the maximum number of evaluations, 25,000,000; number of generations, 27,000; rate of gene mutation, 0.02; and the rate of crossover, 0.8. Docking was performed using grid sizes 60, 60 and 60 along the X, Y and Z-axes, with 0.375 Å spacing. The RMS cluster tolerance was set to 2.0 Å.

Molecular dynamic simulation

Classical molecular dynamic simulations of the EGFR receptor and in a ligand-bound state (with gefitinib) were performed using the GROMACS 4.5 package21. The GROMOS43A1 force field22 was adopted to analyze the ligand-bound dynamics; the ligand force fields were provided by the PRODRG program23. The protein-ligand complex structure was solvated in a triclinic water box under periodic boundary conditions using a 1.0 nm minimum distance from the protein to the box faces and neutralizing the system using two Cl ions added to the solvent. The final systems consisted of approximately 25,000 atoms. Following the steepest descent minimization, the systems were equilibrated in the canonical ensemble (under NVT conditions for 500 ps at 300 K) and, subsequently, in the isothermal–isobaric ensemble (under NPT conditions for 500 ps) by applying position restraints to the protein. Lastly, all the restraints were removed and 50 ns molecular dynamic runs were performed twice under NPT conditions at 300 K. To maintain constant pressure (1 atm), (isotropic coordinates scaling), the Parrinello-Rahmanbarostat24 was used with a relaxation time of 2.0 ps. Van der Waals interactions were modeled using 6–12 Lennard-Jones potentials, with a 1.4 nm cut-off. The long-range electrostatic interactions were calculated using the PME method, with a cut-off for the real space term of 0.9 nm. Covalent bonds were constrained using the LINCS algorithm. The time step employed was 2 fs and the coordinates were saved every 2 ps for analysis, which was performed using standard GROMACS tools.

Principal component analysis

The trajectory of an MD simulation was utilized to identify the motions of the native and mutant EGFR models. We used principal component analysis to extract the principal modes involved in the motion of the protein molecule25. A covariance matrix was assembled using a simple linear transformation in Cartesian coordinate space. A vectorial depiction of every single component of the motion indicates the direction of motion. For this, a set of eigenvectors was derived through the diagonalization of the covariance matrix. Each eigenvector has a corresponding eigenvalue that describes the energetic contribution of each component to the motion26. The protein regions that are responsible for the most significant collective motions can be acknowledged through PCA. The GROMACS inbuilt tool g_covar & g_anaeig was used to perform PCA.

Molecular imaging & MD analysis

All the protein structural visualizations were performed using Pymol (DeLano 2002). The trajectories were analyzed using the integral tools in the GROMACS distribution. A further secondary structure analysis was performed using the DSSP program27. All the graphical displays were generated using the XMgrace program.