Prediction of Deleterious Non-synonymous SNPs of Human STK11 Gene by Combining Algorithms, Molecular Docking, and Molecular Dynamics Simulation

Serine-threonine kinase11 (STK11) is a tumor suppressor gene which plays a key role in regulating cell growth and apoptosis. It is widely known as a multitasking kinase and engaged in cell polarity, cell cycle arrest, chromatin remodeling, energy metabolism, and Wnt signaling. The substitutions of single amino acids in highly conserved regions of the STK11 protein are associated with Peutz–Jeghers syndrome (PJS), which is an autosomal dominant inherited disorder. The abnormal function of the STK11 protein is still not well understood. In this study, we classified disease susceptible single nucleotide polymorphisms (SNPs) in STK11 by using different computational algorithms. We identified the deleterious nsSNPs, constructed mutant protein structures, and evaluated the impact of mutation by employing molecular docking and molecular dynamics analysis. Our results show that W239R and W308C variants are likely to be highly deleterious mutations found in the catalytic kinase domain, which may destabilize structure and disrupt the activation of the STK11 protein as well as reduce its catalytic efficiency. The W239R mutant is likely to have a greater impact on destabilizing the protein structure compared to the W308C mutant. In conclusion, these mutants can help to further realize the large pool of disease susceptibilities linked with catalytic kinase domain activation of STK11 and assist to develop an effective drug for associated diseases.

cells through phosphorylation, and activation of AMPK and its related kinases. STK11 forms a heterotrimeric complex in vivo with STRADα and MO25α. Both STRADα and MO25α have been involved in relocalization of STK11 from the nucleus to cytoplasm 16 . Cytoplasmic localization is critical for the growth suppressive function of STK11 15 .
Germline mutations in the STK11 gene are responsible for Peutz-Jeghers syndrome (PJS), which is an inherited autosomal dominant disorder and is distinguished by gastrointestinal hamartomatous polyps and mucocutaneous pigmentations. The occurrence of PJS is estimated between 1 in 8300 to 1 in 280000 individuals 17 . The most common malignancy associated with PJS is colorectal cancer, followed by breast, small bowel, gastric, and pancreatic cancers 18,19 . The cumulative lifetime risk of developing cancers in individuals with PJS at ages 20, 30, 40, 50, 60 and 70 years are 2%, 5%, 17%, 31%, 60% and 85%, respectively 17 . Recently, studies have shown that about 57-88% of PJS cases concurrently occur with one or more mutations in STK11 protein, including point mutations and large genomic deletions, duplications, or insertions [20][21][22][23] .
Over the past few years, several in silico approaches have been developed particularly for screening functional SNPs and detecting the effect of deleterious nsSNPs in the candidate protein. Many tools can also predict the structural changes based on single amino acid substitution in the protein 24,25 . Based on in silico algorithms, the functional SNPs in BRAF (B-Raf proto-oncogene), BRCA1(Breast cancer type 1 susceptibility protein) 26 , and ATM (Ataxia-Telangiectasia Mutated) genes 27 have been classified successfully from a wide range of disease susceptible SNPs based on their functional and structural consequences. Recent computational approaches are focused on cancer studies by involving either in the prediction of the most damaging SNPs from large databases or in population-based data analysis [28][29][30] .
In this study, detailed investigations have been carried out to postulate 63 nsSNPs in the STK11 protein and to evaluate their deleterious or pathogenic effects on the protein. Employing different prediction algorithms, we classify high-risk nsSNPs and identify their structural and functional impact on the STK11 protein. Moreover, molecular dynamics (MD) and docking calculations are performed to better understand the impact of mutation on protein structure at the secondary and tertiary level.

Results
The complete workflow, tools, and databases applied to identify the damaging SNPs in human STK11 and their structural/functional consequences due to mutation, are summarized in Fig. 1.
Snp annotation. The polymorphism information of the STK11 gene was collected from the NCBI dbSNP database, which contains a total of 2283 SNPs for STK11 protein. Out of 2283 SNPs, 1535 are in intron region, 308 are nsSNPs (missense), 257 are coding synonymous, 146 are in 5′ UTR region, and 81 are in 3′ UTR region. It can be noticed from Fig. 2a that most of the SNPs are found in the intron region (67.24%), followed by missense (13.31%), coding synonymous (11.26%), 5′UTR (6.40%), and 3′UTR (3.55%) SNPs, respectively. The interest of the current study is subject to nsSNPs, as they alter the encoded amino acid. Only nsSNPs of STK11 were considered for further analysis. Determination of functional Snps in coding regions. The various computational prediction tools that were used in this study, are illustrated in Fig. 2b to identify significant nsSNPs in STK11. In the SIFT algorithm, 63 nsSNPs are found as deleterious out of 304 missense SNPs that may have a functional effect on the protein. The consequences of SIFT were further evaluated by investigating the impact of nsSNPs in the structure and function of the protein, using PolyPhen2, I-Mutant3.0, and PROVEAN algorithms. In PolyPhen2, 51 nsSNPs are predicted as deleterious. I-Mutant3.0 algorithm predicted 52 nsSNPs that could alter the protein stability due to mutation. PROVEAN predicted 48 nsSNPs as deleterious that could have a functional effect on the protein (Supplementary Table S1). To confirm these results, we further investigated nsSNPs through the in-silico prediction pipeline: P-Mut, SNAP2, PON-P, and Mutation Assessor. We noticed 37 nsSNPs to be pathological in P-Mut, 51 nsSNPs as affected in SNAP2, 37 nsSNPs as pathological in PON-P, and 57 nsSNPs to be predicted as disease-associated in Mutation Assessor algorithms (Supplementary Table S2). In this study, a total of eight different computational algorithms were used for the identification of high-risk nsSNPs. By combining the results of all the algorithms, three nsSNPs (W239C, W239R, and W308C) are found to be highly deleterious based on their compared prediction scores. As W239C mutant was reported previously at the 239 th position 31 , in this study W239R and W308C mutants were selected for further analysis.

Identification of domains in STK11.
InterPro tool was used to locate domain regions in STK11 and to identify the location of nsSNPs in different domains. This tool provides a functional analysis of proteins by classifying them into families. It also predicts the presence of domains and active sites. It has been reported that three domains: such as the N-terminal domain , catalytic kinase domain , and C-terminal domain (310-433) are found in STK11. The two nsSNPs (W239R and W308C) that we have selected are located in the catalytic kinase domain (Supplementary Fig. S1(b)).

Conservation profile of nsSNPs and evolutionary relationship analysis of STK11 protein. ConSurf
web browser was used to measure the intensity of evolutionary conservation at each residue position in STK11 protein. By using the Bayesian method, the ConSurf server recognizes putative functional and structural amino acids and identifies their evolutionary conservation profile 32 . The ConSurf analysis predicted that both mutants, W239 and W308, are buried and conserved residues, i.e. structural residues (Fig. 3). W239 and W308 residues are involved in the formation of helix-8 and helix-12. Substitution in any of these residues can lead to a decrease in the stability of helix-8 and helix-12 33 . Moreover, this finding is also supported by multiple sequence alignment (MSA) analysis. We performed MSA among the nine different species through use of the MEGA 6 program. It was revealed that conserved regions are homologous in nine species, represented as asterisk signs ("*") in Fig. S1(b).
The maximal conserved residues are found in the kinase domain, which are essential for stabilizing the secondary structure and activating the STK11 protein. Moreover, phylogenetic analysis was performed using the MEGA 6 package, for understanding the evolutionary relationship among the nine different species. The phylogenetic tree reveals that human (Homo sapiens) and chimpanzee (Pan troglodytes) are close neighbors ( Fig. S1(a)). Dilmeç et al. reported that 99% sequence homology is found between human and chimpanzee STK11 proteins 34 . According to the phylogenetic tree analysis, STK11 protein is more conserved in primates, including humans.

Identification of functional SNPs in the UTR region by UTRscan. Gene expression is inhibited by
the SNPs in the 3′ UTR region because of faulty rRNA translation or by affecting RNA half-life 35 . The UTRscan server identifies patterns for regulatory region motifs in the UTR database 36 . This server predicts two UTRsite motifs in the STK11 protein. Three total matches were found for two motifs, as shown in Supplementary Table S3. Here, the W308C mutant was found in upstream open reading frames (uORF) of the 5′UTR region. Mutation at the 308 th position in the 5′UTR region of STK11 protein is associated with PJS 37 .
Structural analysis of native and mutant models. The native structure and two high-risk deleterious variants were modeled by Swiss-Model 38 . Model quality was also validated by generation of a Ramachandran plot using the RAMPAGE server. The Ramachandran plot for both native and mutant models showed 278 residues (94.6%) in the favored region, 11 residues (3.7%) in the allowed region, and only 5 residues (1.7%) in the outer region (Supplementary Figure S2 and Table S4). www.nature.com/scientificreports www.nature.com/scientificreports/ Analysis of structural effects of high-risk nsSNPs in STK11. The impact of amino acid substitutions on the domain structure of STK11 were investigated using the Project Hope server. The W239R variant results in an arginine residue in place of tryptophan at the 239 th position located in the kinase domain. This domain is important for protein binding and replacement of the buried neutral tryptophan residue with positively charged arginine that may cause an empty space in the core structure of the domain and can lead to protein folding problems. A graphical view of the variant is shown in Fig. 4a. A similar destabilizing condition is formed by the W308C mutant. Replacing a buried structural tryptophan with a smaller cysteine residue can create an empty space in the core structure of the kinase domain that may affect the signal transduction between the domains, as shown in Fig. 4a. Mutation in the kinase domain is hypothesized to be responsible for disrupting the binding activity of STK11 because both N and C-terminal lobes of the STK11 kinase domain interact with the C-terminal lobe of STRADα 33 .

Docking Analysis.
Protein-protein docking analysis demonstrated that the mutant STK11 structures bind to the STRADα-MO25α complex in a slightly deviated orientation compared to the native STK11 structure. The W239R mutant shows higher deviation compared with the W308C mutant Fig. 4b. The C-terminal flanking tail, β2-β3 loop, β3-β4 loop, and β7-β8 loop of the W239R mutant showed significant deviation in orientation. Molecular docking of ATP with native and mutant modeled structures also showed a difference in binding affinity. The binding affinity of ATP with native STK11 is −7.7 kcal/mol, while for mutant W239R and W308C are −7.1 kcal/mol and −7.2 kcal/mol respectively. ATP binds at the same binding pocket when compared in native and mutant proteins; however, from the analysis of the binding pose of ATP, a significant deviation in terminal phosphate of ATP is observed between native and mutant protein complexes Fig. 5. Interaction analysis of ATP with native and mutant protein showed a reduction in the number of hydrogen bonds and attractive electrostatic charge interactions of ATP with residues in mutant proteins (Table 1). Many residues such as L55, S59, Y60, E130, and S193 have interactions with ATP in native STK11 but were absent in mutant proteins.

Molecular dynamics simulation of native and mutant STK11 proteins. 150 ns MD simulations
were performed to study the deviation of native and mutant proteins in physiological environments. Native STK11 and mutant W308C have similar root mean squared deviation (RMSD) values with significant deviation observed for the W239R mutant, as shown in Fig. 6a. Mutant W239R presented an increasing trend in RMSD value throughout 36.9 ns (RMSD ~3.611 Å) and 140 ns (RMSD ~3.76 Å) of MD simulation. Average RMSD values of native, W239R, and W308C mutants are 2.67 Å, 2.80 Å and 2.47 Å, respectively. In addition, RMSF (root-mean square fluctuation) value analysis also shows residues in C terminal region have a significant difference in fluctuation between native and mutant structures after 150 ns MD simulation ( Fig. 6b) with RMSF values reaching as much as 7 Å for mutant W239R. From the RMSF plot, it can be observed that residues in 204-228 positions constitute a comparatively flexible region than other residues in the W239R mutant. On the other hand, the highest residual fluctuation can be observed at positions 327 (10.21 Å) and 328 (10.77 Å) in mutant W308C when compared to the native structure (Fig. 6b). Furthermore, it can be noticed that in the mutant W308C, fluctuation of residues (322-334) in C-terminal flanking tail is greater, with values ranging from 2.13 to 10.71 Å when compared to that in the native type (from 1.25-3.80 Å). Deviation in the position and conformation of the bound ATP molecule can be observed when compared between the native and mutant protein complexes (Fig. 5) with a significant deviation observed between the native and mutant W308C. www.nature.com/scientificreports www.nature.com/scientificreports/ From Rg (radius of gyration) analysis of native and mutant structures (Fig. 6c), it can be observed that the W239R mutant revealed a higher average Rg value (20.73 Å) over the simulation time scale when compared to those of native STK11 (20.56 Å) and mutant W308C (20.38 Å). As a result, the flexibility of mutant W239R may be increased. On the other hand, mutant W308C seemed to deviate its Rg value after 60 ns, which could be the reason for its partial protein folding.
Solvent accessible surface area (SASA) analysis indicated that mutant W239R has a higher average SASA value (15498.11 Å2) than mutant W308C (15121.88 Å2) and native STK11 (15129.68 Å 2 ), as shown in Fig. 6d. Since a higher SASA value denotes protein expansion, it can be suggested that mutant W308C and native STK11 are more stable than the mutant W239R. The reason for a greater change observed in the SASA value of W239R compared to native STK11 could be the effect of amino acid substitution by altering the size of the protein surface and other characteristics 39,40 .
The total number of H-bonds within the proteins were also calculated during the MD simulation as depicted in Fig. 6e. From the analysis, it can be noted that the native structure forms a greater number of H-bonds with an average of ∼212, while W239R and W308C mutants exhibit a fewer number of H-bonds with an average of for each mutant of ∼209. Since the number of H-bonds was less in the mutant structures, protein stability may be effected. www.nature.com/scientificreports www.nature.com/scientificreports/ To further understand the interaction of native and substituted residues with surrounding residues, we analyzed non-bonding interactions after MD simulation. Mutant W239R showed a number of interactions with the surrounding residues which are significantly decreased due to the substitution of Trp with Arg at the 239 th position, as shown in Table 2 and Fig. 7a. This decrease is largely due to the failure to form any hydrophobic interactions such as pi-alkyl interaction by R239 in the mutant structure. H-bond interaction with V236 is absent in the mutant type but forms new H-bond interactions with Q220 and K235. R239 also forms a pi-cation interaction with F255, whereas W239 in the native structure does not form any interaction with this residue. On the other hand, a number of hydrophobic interactions such as pi-pi stacked, pi-pi T-shaped, and pi-alkyl are significantly decreased due to the substitution of Trp with Cys at position 308 (Table 2 and Fig. 7b). Moreover, a pi-sulfur and an alkyl interaction with H306 and P28, respectively are present in the W308C mutant, but absent in native type.
Additional information on the flexibility of STK11 mutation was acquired by the analysis of secondary structures of native, W239R and W308C variants during 150 ns MD simulation as presented in Fig. S3. The average percentage values of secondary structures of native and mutant proteins are close in approximation. In Fig. 8, the average alpha helix values of native, W239R, and W308C mutants are 34.23%, 33.22%, and 33.78%, respectively. Although, alpha helix showed a decreasing trend in W239R mutant between 35-85 ns, after that it was stable (Fig. S3a). The average sheet and turn percentage values were found to be similar for native (20.29% and 13.97%) and W308C (20.34% and 13.27%), but in W239R, slightly different values were found (19.67% and 14.24%) (Fig. S3b).  www.nature.com/scientificreports www.nature.com/scientificreports/ In Fig. S3c, the β turn exhibited an increasing trend in the W239C mutant between 45-87 ns when compared to the native protein and W308C mutant. Average coil percentage values of native, W239R and W308C structures were determined to be 30.40%, 31.97%, and 31.76%, respectively (Fig. S3d). Secondary structure analysis also showed that residues in the coil region have a significant difference in fluctuation between the native and mutant structures after 90 ns MD simulation. The average 3 10 -helix percentage values for native STK11, W239R, and W308C are 1.08%, 0.88%, and 0.81%, respectively (Fig. S3e).
Energy and structural information of three protein structures (native STK11, mutant W239R, and mutant W308C) were analyzed using principal component analysis (PCA) to understand the structural quality of proteins during MD simulation. The energy and structural information are the composition of the following variables; bond energies, bond angle energies, dihedral angle energies, planarity energies, Van der Waals energies, and electrostatic energies. The exploratory PCA analysis of these variables revealed the similarity and dissimilarity between native and mutant proteins in the scores plot (Fig. 9a). The scores plot showed that the distance between native STK11 and mutant W239R was further whereas native STK11 and mutant W308C was overlapped. Dihedral angle energies and bond energies loaded significantly into the second PC resulting in the dissimilarity of mutant W239R with respect to native STK11 and mutant W308C (Fig. 9b).
We analyzed position level variations in secondary structure elements of native and mutants using the STRIDE program. In the native complex, helix regions (residues 222-226 and 307-310) are disrupted and formation of 3 10 -helix was observed in mutant W239R (Fig. 8). Furthermore, the most prominent alteration was observed in residues 195-197 and 276-278 positions due to an acquisition of 3 10 -helix conformation in mutant W239R. Moreover, two strands (residues 209-210 and 231-232) are present in native STK11, whereas these strands have converted to coils in mutant W239R. On the other hand, W308C mutant showed variation at positions 217-219 and 338-340, which transformed from 3 10 -helixes to turns, and also showed variation by converting to a 3 10 -helix from a turn at position 294-296 (Fig. 8). Furthermore, an alpha helix (residues 235-250) is found slightly reduced in the W308C mutant structure.

Discussion
All in silico prediction algorithms disclose two nsSNPs, W239R and W308C, that are highly deleterious based on their compared prediction scores. Typically, conserved residues are involved to control the biological system in proteins like stability and/or folding 41 . Functional amino acids are located at enzymatic sites and show substantial protein-protein interaction. These residues are more highly conserved compared to other residues in the protein 42 . For assessing the deleterious impact of two nsSNPs (W239R and W308C) in the STK11 protein, we estimated the evolutionary conservation profile of all amino acids position in STK11 protein via ConSurf web server. This server predicted that tryptophans at the 239 th and 308 th positions are buried structural residues with a conservation score of 9. For evaluating ConSurf result, we executed the multiple sequence alignment (MSA) among the nine different species through the MEGA 6 program, which revealed that the conserved regions were homologous in nine species. Maximum conserved residues are found in the kinase domain, which is essential for stabilizing the secondary structure and in activating the STK11 protein. Moreover, the phylogenetic tree reveals that human and chimpanzee are close neighbors. One study reported that 99% sequence homology is found between the human and chimpanzee STK11 proteins 34 . This result indicated that the catalytic kinase domain is evolutionarily conserved in STK11 protein.  www.nature.com/scientificreports www.nature.com/scientificreports/ In the catalytical kinase domain, the W239R mutant leads to substitution of tryptophan (a nonpolar aromatic amino acid) by arginine (a basic amino acid). Although the mutated residue is smaller in size than the wild-type residue, it is likely to disturb the interaction between other domains that are important for the protein  activity as predicted by Project HOPE. Mass and charge difference in the protein affect spatiotemporal dynamics of protein-protein interactions 43,44 . The mutation introduces a charge which may lead to repulsion between the mutant residues and neighboring residues 44 . Hence, this variant altered the binding interaction with surrounding residues, thereby disturbing normal biological processes. The mutant W308C involves the substitution of tryptophan to cysteine, where tryptophan is larger in size than cysteine. Moreover, a buried structural tryptophan residue is more hydrophobic than the cysteine residue, which probably will not fit at the 308 th position. As a result, this variant will cause a loss of hydrophobic interactions in the core of the protein, as predicted by ProjectHOPE. Structural mutations affect buried residues in the protein core, causing changes in amino acid size and charge, hydrogen bonds, salt bridges, and S-S bridges. These changes cause loss of thermodynamic stability as well as aberrant folding and aggregation of the proteins 45 .
To further evaluate our hypothesis as to whether W239R and W308C mutants have a deleterious effect on STK11 protein, we performed molecular docking and MD simulation analysis. From docking analysis of STK11 with ATP, it is well revealed that both mutants perturbed the binding pocket quite significantly. The most prominent change was noticed in W239R where a significant loss of H-bond interactions within the binding pocket residues can be observed when compared to that in the native protein. In the STK11-ATP complex, ATP binds to a cleft between N-and C-lobes of the protein kinases through formation of H-bonds with the glycine-rich loop (G58-K62), β3 strand (K78), the hinge regions (E130-C132), αD-helix (E130-M136), the catalytic loop (H174-N181), and the activation loop (S193-A198) 46,47 . ATP is strongly bound to the binding cleft of STK11, so these mutants disrupt the favorable contacts which are essential for the functional activity of STK11. Moreover, the deviation observed in the bound ATP molecule can lead to a reduction in catalytic efficiency of STK11.
Proteins are dynamic entities in an aqueous environment. We performed 150 ns MD simulations to observe the effect of mutation on the structural dynamics of the STK11 protein. RMSD values remained relatively constant for both native STK11 and W308C mutant, indicating that the W308C mutant is likely to form a stable structure in physiological conditions; however, consistently higher RMSD values throughout the MD simulation for the W239R mutant indicate that this mutation is likely to make the protein structure less stable. Furthermore, higher fluctuation and a loss of stability were observed for the W239R mutant in RMSF, Rg, SASA, and H-bond analysis. This feature was also confirmed by the principal component analysis (Fig. 9a). The deviation observed after MD simulation in the F204-L228 residue loop region supports our earlier hypothesis that the positively charged R239 residue is likely to disrupt the interactions with surrounding residues. Non-bonding interaction analysis revealed that the residues Q220, P221, and P222 are directly in contact with native W239. Furthermore, L201-A206 (β9-β9′ loop) from the STK11 activation loop have strong interactions with a hydrophobic pocket on the concave surface of MO25α 33,46 . The disruption in the activation loop regions (D194-E223) may cause inhibition in activation of the STK11-MO25α complex. On the other hand, the W308C mutant showed a stable structure like the native type, but higher residual fluctuation can be observed in the residues (I322-S334) of the C-terminal flanking tail. The W308C mutant lost five polar contacts with H154 and H313 and one hydrophobic contact with L282 (Table 2). Polar contacts are important, as they are involved in the formation of H-bonds and these H-bonds and hydrophobic interactions help in maintaining the stability of the protein 48,49 . A Cys residue inside the polypeptide chain has the possibility to form a disulfide bond with another Cys residue and can alter the tertiary structure of the protein. Mehenni et al. have reported that C308 is likely to form a disulfide bond with C158. During the folding of STK11, this interaction may result in divergence of the tertiary structure with slight or no kinase activity 37 . We executed secondary structure analysis for a definitive understanding of the disruption in the native and mutant secondary structures over time. As shown in Fig. 8 and S3, a significant difference is observed in alpha helix, beta sheet, beta turn, and coil regions for both mutants when compared to the native structure. We analyzed the position level variations in secondary structure elements of native and mutants through use of the STRIDE program. The most significant difference is observed in mutant W239R, from the amino acid residue positions of L195-V197 (from β8-β9 loop), P217-F219 (from β9΄-αEF loop), P221-I224 (from αEF-helix), G276-C278 (from αG-αH loop), and S307-F309 (from αI-helix), as shown in Fig. 8c and www.nature.com/scientificreports www.nature.com/scientificreports/ Fig. 10a. Furthermore, T209-C210 (β9΄strand) is totally disrupted in the mutant W239R structure. In the native structure, the αEF-helix is comprised by P222-N226 residues and the αI-helix is comprised by I300-R310 residues, whereas P221-I224 (from αEF-helix) and S307-F309 (from αI-helix) are converted to 3 10 -helix in mutant W239R structure, as shown in Fig. 8d. The conformational changes support our previous results obtained from RMSF analysis that major changes occurred at residues 204-228 in the W239R mutant (Fig. 6b). The co-activator protein, MO25α express strong interactions with E165, S169, Q170, G171 (from αE-helix), L201-A206 (β9-β9′ loop), R301, S307, and R310-K312 (from αI-helix) of STK11 33,46 . Furthermore, H174-D176 motif, β7, and β8 sheets are essential for forming the catalytic region. The activation loop segment is formed by D194-G196 motif, β9, β9′, and β9′′ and ends at P221-E223 motif, in which H174 from H174-D176 motif and D199 from D194-G196 motif are coordinated to Mg 2+ along with PO 4 − ions of ATP 46 . Consequently, the W239R mutant changed the conformation of the C-lobe in the kinase domain (Fig. 10a) and could disrupt the STK11-MO25α interaction, which might have an impact on complex assembly as well as could suppress the STK11 activation 33 .
In the W308C mutant structure, the most prominent alteration was observed from the amino acid residue positions of P294-A295 (from αH-αI loop) and V338-Y340 (from the C-terminal tail) acquiring a 3 10 -helix conformation (Figs 8d and 10b). Interestingly, these 3 10 -helix conformations are only observed in the W308C mutant structure, but are not present in the native type. Furthermore, C134-G135 (from β5-αD loop) is converted to a β-strand in the mutant structure. The proportion of αC-helix (94-104), αF-helix (234-250), and αI-helix (300-310) are slightly reduced in the W308C mutant structure when compared to those in the native structure (Fig. 10b). Binding of both STRADα and MO25α reorients the αC-helix of STK11 in such a way that a salt bridge is formed between K78 of the VAIK motif and E98 of the αC-helix 33 . The regulatory spine is comprised of four hydrophobic residues such as L113, L102, H174, and L195, which are anchored to D237 of the αF-helix 46  From PCA analysis, mutant W239R showed a significant difference in flexibility than native and W308C mutant. This deviation might be due to disruption of secondary structure elements, which in turn affect the protein folding, thereby decreasing the stability of protein. Therefore, we suggest that the W239R mutant might have a great impact on protein function. This hypothesis is in good concordance with the results obtained by Rungsung et al. 50 .

conclusion
This study reports two nsSNPs (W239R and W308C) that were found to be deleterious and have a mutational impact on structure and function of the STK11 protein. These mutations may lead to disruption of the original conformation of the native protein. The mutant W239R structure displays a significant difference in binding with the STRADα-MO25α complex, which could lead to disruption in activation and reduction in catalytic efficiency of the STK11 protein. Our molecular dynamics approach presented a change of deviation in important regions of the mutant structures when compared to the native protein structure. These deviations can interrupt the confirmation of the secondary structure, and thereby, the stability of protein may be disrupted. We also noticed that the ATP binding capability of the mutant proteins was less than that of the native protein. Although the W308C mutant has been previously associated with Peutz-Jeghers Syndrome (PJS) according to the literature, no one has predicted the W239R mutant to be linked with any diseases. Therefore, it is likely that predisposition of the unreported nsSNP can induce disease by altering protein activation or efficiency. The findings of this study will help in future genome association studies to distinguish deleterious SNPs associated with different individual patients with PJS. Hence, comprehensive clinical-trial-based studies are required on a large population to characterize this data on SNPs and also experimental mutational studies are required for the validation of the findings. www.nature.com/scientificreports www.nature.com/scientificreports/ Materials and Methods collection of Snps dataset. The SNP data for the human STK11 gene was collected from various webbased data sources such as: OMIM (Online Mendelian Inheritance in Man) 51 , SNPs information from NCBI dbSNP 52 , and the protein sequence was retrieved from UniProt database (UniProtKB ID Q15831) 53 .

Assessment of functional consequences of missense mutations. Functional consequences of
nsSNPs in the coding region were predicted by using SIFT, PolyPhen 2, I-Mutant 3.0, PROVEAN, P-Mut, SNAP2, PON-P, and Mutation Assessor algorithms. SIFT uses the homology sequence of the proteins and alignment of natural nsSNPs with orthologous and paralogous protein sequences to predict nsSNPs as deleterious. A SIFT score less than 0.05 indicates deleterious impact of nsSNPs on protein function 54 . Another algorithm, PolyPhen2, utilizes the protein sequence and substitution of amino acids in protein sequence to predict the structural and functional effect on the protein. If amino acid are changed or a mutation is found in protein sequence, it classifies SNPs as possibly damaging (probabilistic score >0.15), probably damaging (probabilistic score >0.85), and benign (remaining). Furthermore, PolyPhen2 calculates the position-specific independent count (PSIC) score for each variant in protein. The difference of PSIC score between variants indicates that the functional influence of mutants on protein function directly 55 . The Protein Variation Effect Analyzer (PROVEAN) algorithm was used to predict the damaging effect of nsSNPS in the STK11 protein sequences. This tool utilizes delta alignment scores on the basis of the variant version and reference of the protein sequence regarding the homologous sequences. An equal score or below the threshold of −2.5 indicates deleterious nsSNP alignment 56 . The change in stability caused by mutation was predicted by I-Mutant 3.0. It is a support vector machine (SVM) based tool server. I-Mutant 3.0 prediction is classified into three categories, such as neutral mutation (−0.5 ≤ DDG ≥ 0.5 kcal/mol), large decrease (<−0.5 kcal/mol), and large increase (>0.5 kcal/mol). I-Mutant predicts the Gibbs free energy change (ΔG) dependent on the difference of mutated protein and native type protein. PMUT allows the fast and accurate prediction (~80% success rate in humans) of the pathological character of single amino acid point mutations based on the use of neural networks. Inputing a FASTA sequence in the PMut server provided results which showed the difference among neutral variations and disease-related protein sequence. A prediction score more than 0.5 indicates nsSNPs having a pathological impact on protein function 57 . SNAP2 is a neural network based classifier. It was used to predict the functional impact of single amino acid substitutions in the STK11 protein. This server accepts a FASTA sequence and provides a prediction score (ranges from −100 strong neutral prediction to +100 strong effect prediction) that reflects the likelihood of a specific mutation to alter the native protein function 58 . Another algorithm, PON-P2, predicts the pathogenicity (harmfulness) of amino acid substitutions. It is a machine learning-based tool and utilizes amino acid properties, GO annotations, evolutionarily conserved sequence and functional annotations. PON-P2 distributes variants into 3 groups: pathogenic, neutral, or unknown classes 59 . The Mutation Assessor algorithm predicts the functional impact of amino acid substitutions. The functional impact is evaluated based on evolutionary conservation of the affected amino acid in protein homologs 60 . Identification of mutant nsSNPs position in different domains. The InterPro (http://www.ebi.ac.uk/ interpro/) tool was used for identification of different conserved domains in the STK11 protein and also mapping of nsSNPs positions in different domains 61 . Protein sequence in FASTA format or protein ID was inserted as a query to predict domains and motifs.
identification of functional Snps in conserved regions and phylogenetic analysis of STK11. Amino acid substitutions in the evolutionarily conserved region were predicted by the ConSurf server 32 . According to the Bayesian method, conservation scores were classified into 3 categories: 1-4 score is variable, 5-6 score indicates intermediate, and 7-9 score is conserved 62,63 . A protein structure or protein FASTA sequence of STK11 was input and conserved patterns were predicted in order to find a conservation score and colouring scheme. Structural and functional amino acids were also predicted. High-risk nsSNPs residing in the highly conserved region were selected for further analysis.
To execute phylogenetic analysis for human STK11 protein sequence (UniProtKB ID STK11_HUMAN) and eight different species protein sequences such as Macaca mulatta (H9ETP1_MACMU), Pan troglodytes (H2QET9_PANTR), Gallus gallus (STK11_CHICK), Rattus norvegicus (STK11_RAT), Mus musculus (STK11_ MOUSE), Xenopus tropicalis (F6PM85_XENTR), Bos taurus (E1BCU9_BOVIN), and Ovis aries (W5PNM2_ SHEEP) were retrieved from the UniProtKB and were subjected to evolutionary conservation. Then, multiple sequence alignment (MSA) was executed by the ClustalW tool and the phylogenetic tree was created using 1,000 bootstraps in the MEGA6 package 64,65 . Identification of high-risk nsSNPs in the UTR regions. The 5′ and 3′UTRs have significant roles in the post-transcriptional regulation of gene expression, message stability, translational efficiency, and subcellular localization 66 . UTRScan was used to predict high-risk nsSNPs in UTRs 67 . Its input format requires submission of a protein′s FASTA format. Output was in the form of signal name and its position in the transcript.
Modeling of native and nsSnps structure. The mutant SNPs can notably alter the stability of proteins.
Consequently, 3D structures of native and mutant proteins were constructed to investigate the structural stability and deviations difference within native and mutant proteins. We generated the 3D structure of native and mutant proteins using the SWISS-MODEL server 38 . Only the protein kinase domain of the proteins was modeled which is comprised of residues 47-342. Structural validation was carried out by the RAMPAGE server 68

Molecular Dynamics
Simulation. MD simulations were performed using the YASARA dynamics program 76 to reveal changes at the atomic level in different time scales for native and mutant STK11-ATP complexes. Before starting the simulation, the structures of native and mutant proteins were cleaned and also the H-bond network was optimized 77 . Then, a cubic cell was formed by extending 8 Å on each side of the protein and a periodic boundary condition was maintained. The AMBER14 force field was applied for simulations 78 . The system was implemented by adding water molecules and NaCl salt at 0.9% concentration to replicate the physiological ion concentration. For short-range Coulomb and van der Waals interaction, the cut-off radius was set to 8 Å. The long-range electrostatic interactions were measured by the PME (particle-mesh Ewald) method 79 . MD simulation of each system was run for 150 ns at 310 K with a time step interval of 2.5 fs. The trajectory files were evaluated to get RMSD, RMSF, Rg, SASA, H-bond, and secondary structure analysis.

Principal component analysis (PCA).
The PCA method projects multivariate energy factors into low-dimensional space, which highlights the variabilities present in the collected MD trajectory data although they do not specifically aim to identify them. 80,81 Any potential energy variabilities due to hidden data structures such as groups of samples, local fluctuations in data densities, and unique samples can be visualized by this technique. PCA decomposes the multivariate response arranged in an X matrix into a product of two new matrices as indicated in the following equation: where T k is the matrix of scores which represents how sample relate to each other, P k is the matrix of loadings which contains information about how variables relate to each other, k is the number of factors included in the model and E is the matrix of residuals, which contains the information not retained by the model. Mutant proteins may have different energy profile compared to the native protein which should be discernable by this data decomposition mechanism. All calculations were performed with MATLAB 2011a (The Mathworks, Natick, MA, USA) equipped with the PLS Toolbox v. 6.2.1 (Eigenvector Research Inc., Wenatchee, WA, USA). In this analysis, the last 50 ns of MD simulation trajectory data of different potential energy were used. Before applying PCA, autoscale function was used to preprocess the data. First 4 PCs explained >80% of the energy variation.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.