Pathogenic genetic variants from highly connected cancer susceptibility genes confer the loss of structural stability

Genetic polymorphisms in DNA damage repair and tumor suppressor genes have been associated with increasing the risk of several types of cancer. Analyses of putative functional single nucleotide polymorphisms (SNP) in such genes can greatly improve human health by guiding choice of therapeutics. In this study, we selected nine genes responsible for various cancer types for gene enrichment analysis and found that BRCA1, ATM, and TP53 were more enriched in connectivity. Therefore, we used different computational algorithms to classify the nonsynonymous SNPs which are deleterious to the structure and/or function of these three proteins. The present study showed that the major pathogenic variants (V1687G and V1736G of BRCA1, I2865T and V2906A of ATM, V216G and L194H of TP53) might have a greater impact on the destabilization of the proteins. To stabilize the high-risk SNPs, we performed mutation site-specific molecular docking analysis and validated using molecular dynamics (MD) simulation and molecular mechanics/Poisson Boltzmann surface area (MM/PBSA) studies. Additionally, SNPs of untranslated regions of these genes affecting miRNA binding were characterized. Hence, this study will assist in developing precision medicines for cancer types related to these polymorphisms.

SNP datasets retrieved from dbSNP database. We retrieved the SNPs of BRCA1, ATM and TP53 genes from dbSNP database. We found a total number of 25,754 SNPs of BRCA1, 41,948 SNPs of ATM and 6977 SNPs of TP53 in the dbSNP database. The percentage of different types of SNPs of each gene are shown in Fig. 2. We selected only the nsSNPs of the three genes for our investigation. Finally, including the multiple allele changes, we have analyzed a total number of 3467 nsSNPs of BRCA1, 4650 nsSNPs of ATM and 1106 nsSNPs of TP53 for our investigation.
Deleterious and damaging nsSNPs in BRCA1, ATM and TP53. We subjected all the nsSNPs of the three genes to SIFT, PolyPhen-2 and PROVEAN tools to investigate the effect of amino acid substitution on the respective protein function. We shortlisted those nsSNPs as highly deleterious that were predicted damaging/ probably damaging or deleterious by at least two of these tools. Total 1013 nsSNPs of BRCA1, 1815 nsSNPs of ATM and 528 nsSNPs of met the criteria and we classified them as highly deleterious nsSNPs. Prediction results by SIFT, PolyPhen2 and PROVEAN are shown in Supplementary File 1-3.
Disease-associated nsSNPs in BRCA1, ATM, and TP53. The above shortlisted nsSNPs were submitted to PhD-SNP, PANTHER, SNPs&GO, SNAP2, and PredictSNP tools to identify the disease-associated nsS-NPs. We shortlisted the nsSNPs those were found deleterious by all the five tools. A total number of 250 nsSNPs of BRCA1, 796 nsSNPs of ATM, and 341 nsSNPs of TP53 were predicted deleterious by all the five tools and were considered for further investigation (Supplementary File 4-6).
Identified nsSNPs in conserved domains. NCBI's conserved domain search tool revealed that each of the BRCA1, ATM, and TP53 proteins were found to have four domains shown in Fig. 3. Results showed that the BRCA1 protein had a serine-rich domain associated with BRCT, the first BRCT domain, the second (C-terminal) BRCT domain and a RING finger domain. Originally BRCT domain was identified in the tumor suppressor protein and missense mutations in this region correspond to a high risk for breast and ovarian cancers 9,29 . The ATM protein was found to have a catalytic domain, a FAT domain, a TAN domain, and a FATC domain. The  30 . The FAT domain is located adjacent and upstream of the kinase domain and the name of this domain is derived from FRAP (mTOR), ATM, and TRAPP, all of which members of the phosphoinositide 3-kinase-like family 31 . ATM protein also contains a Tel1/ATM N-Terminal Motif (TAN) that is essential for telomere length maintenance and DNA damage response 32 . Search results of p53 protein showed that it had a p53 DNA-binding domain (DBD), a p53 tetramerization motif, the transactivation domain 2 (TAD2), and a p53 transactivation motif. The DNA-binding domain is absent in p63, p73 and other p53 homologues in primitive organisms thus making it a unique feature for the vertebrate p53 33 . A flexible linker region connects the structured DNA-binding and tetramerization domains of p53 34 . There are 2 distinct TADs found in p53 and it was found that TAD2 also play important part in tumor suppression 35 . As domains play significant role in proteins, we shortlisted the disease-associated nsSNPs that fall on the domain sequences of the three proteins. From CD-search results, we found that, 171 nsSNPs of BRCA1, 313 nsSNPs of ATM and all the shortlisted 340 nsSNPs of TP53 occur on the predicted domains and were considered for further analysis (Supplementary File 7-9).
Conservation profile of deleterious nsSNPs in BRCA1, ATM and TP53. We calculated the evolutionary conservation of amino acid residues of the three proteins using ConSurf server to further explore the possible effects of shortlisted nsSNPs. Results were collected in the form of structural representation of the protein sequence. Based on the location either on protein surface or inside its core, the highly conserved residues are predicted as either functional or structural. Amino acids involved in various vital biological processes appear to be more conserved than others. Considering this, the nsSNPs located at these conserved regions are highly damaging to proteins as compared to those located at non-conserved sites 36,37 . Hence, we focused only on the residues of domains matching their positions with the shortlisted high risk nsSNPs. The results predicted that out of the shortlisted nsSNPs, 89 nsSNPs of BRCA1, 218 nsSNPs of ATM and 282 nsSNPs of TP53 were highly conserved (either exposed or buried). Supplementary File 10-12 contains the graphical representation of Con-Surf results of the three proteins.
Predicted stability modification. We analyzed the stability alterations in the three proteins using  Tables S1-S4) showing the effect of SNPs in the 3′ and 5′ region of ATM and TP53 genes.
Higher conservation score indicates greater effect of the SNPs. In addition, higher context + score denotes higher likelihood of disruption or creation that occurs in the miRNA target site. The miRNAs greatly affected by the SNPs of ATM and TP53 genes based on highest conservation score are shown in Table 3.   www.nature.com/scientificreports/ the catalytic domain and the DNA-binding domain of the two proteins respectively. The crystal structure of BRCA1 BRCT (PDB ID: 4JLU) and p53 DNA binding domain (PDB ID: 2PCX) were available in the RCSB PDB database. These two structures were retrieved and cleaned by removing the ligands and inhibitors (Fig. 4). The "Mutagenesis" tool of Pymol was utilized to carry out mutation with the selected five nsSNPs of each protein. The 3D structure of the catalytic domain of ATM was not available in PDB database. So, the structure was modeled through MODELLER 9.22 using the 3D structure of closed dimer of human ATM solved by electron microscopy at 5.70 Å resolution 41 (PDB ID: 5NP0) and was later refined using GalaxyRefine server. The Ramachandran plot analysis, ERRAT server and ProSA-Web analysis results of the refined structure are shown in Supplementary File 16 ( Supplementary Fig. S1). Later the structural analysis was extended by calculating the RMSD values for each mutant model using Maestro 11.8 tool of the Schrodinger suite. The average distance among all atoms, α-carbon atoms and backbones of WT and mutant models were measured from RMSD values. Greater RMSD value indicates greater deviation of mutant structures from that of the WT proteins. To further validate the mutation inducing method, we carried out homology modeling of all the 15 mutants from 3 proteins though MODELLER 9.22 using the same templates as stated earlier and upon performing refinement using GalaxyRefine server, we calculated their RMSD values. Both the Pymol generated mutant models and homology modeled mutant structures for V1687G and V1736G of BRCT, I2865T and V2906A of Catalytic domain, V216G and L194H of p53 exhibited the maximum RMSD values shown in Table 4. From the Stride server results, it was found that V1687G and V216G mutations were in β-strand region, V1736G, V2906A and L194H mutations were in coil region and I2865T mutation was in α-helix region of the respected proteins ( Supplementary Fig. S2). Also, majority of the mutations caused significant increase in solvent accessible area (Supplementary Table S5). Analysis from ProtParam server revealed that all the mutants had lower instability index and aliphatic index than the WT proteins (Supplementary Table S6).

Structural insights into
Stabilization of high-risk mutants by PK083. PK083 belongs to a class of organic compounds known as carbazoles and they contain a three-ring system containing a pyrrole ring fused on either side to a benzene ring. Several carbazole derivatives such as PK083, PK9284, PK9295, PK9318, PK9320 etc. were tested to stabilize Y220S and Y220N p53 mutants and their binding constant (Kd) were also determined 42 . Among them, PK083 (PhiKan083), was found to bind to the cavity formed due to mutation with a dissociation constant of ≈ 150 μM50 42 . It was also found to raise the melting temperature of the mutant and to slow down its rate of thermal denaturation. The crystal structure of the protein-PK083 complex at 1.5-Å resolution (PDB ID: 2VUK) was solved and available in the PDB database. From molecular interaction analysis we found that PK083 forms a    Table 4. Predicted RMSD (All atoms, Cα atoms and Backbone) values of WT and mutants of BRCA1, ATM and TP53 calculated on both mutated proteins and modeled mutants.

Amino acid substitutions
Proteins mutated using Pymol Proteins modeled using MODELLER www.nature.com/scientificreports/ pi-sulfur bond with the mutant Cys220 residue. As BRCA1, ATM and TP53 were found to be more enriched in connectivity, we hypothesized that, PK083 might stabilize the most damaging mutations of these three proteins. Previously, in-silico investigation was performed to identify the stabilization capability of PK083 to other mutants of Y220 43 . From the study, it was found that Y220S showed a similar interaction pattern with PK083.

RMSD (all atoms) RMSD (Cα atoms) RMSD (backbone) RMSD (all atoms) RMSD (Cα atoms) RMSD (backbone)
The study also showed that structural optimization of PK083 might possibly lead to a novel drug that can interact favorably with another mutant, Y220N 43 . The Y220H and other mutants of p53 including the WT protein were not found to favorably interact with PK083. Molecular docking of PK083 with six mutants as well as WT target sites exhibited nearly same score in binding affinity. The binding affinity of this molecule with the mutants ranges from − 21.1 to − 99.5 kcal/mol shown in Table 5. PK083 binds at the binding pockets of mutation positions as the defined binding sites of three proteins. Highest number of interactions was observed in L194H-PK083 complex where PK083 was bonded to the mutant residue with a Pi-alkyl bond. Interaction of PK08 with the mutants were found to be largely hydrophobic (Fig. 5). It was also observed that Val1713 was common participant residue in the BRCT mutant-PK083 interactions while Ala159 and Ile195 were found to be common interacting residues in the DBD mutant-PK083 complexes. Pi-Sigma, Pi-Alkyl, Pi-Lone Pair and conventional hydrogen bonding interactions were observed in the binding of PK083 with the mutant residues. The common residues of interactions in WT and mutants with PK083 are shown in Supplementary Table S7.
Molecular dynamics (MD) simulation. As physiological conditions are not considered in evaluating the damaging nature of the mutants using computational tools, we performed MD simulation of both the WT protein domains and the six mutants to view the various conformations that they might acquire in the solvated state. Their dynamic behavior was analyzed by RMSD, RMSF, Rg, and SASA analysis. The analyses from 1st independent MD run are presented in Fig. 6 while analyses from 2nd are 3rd MD run is shown in Supplementary Figs. S3 and S4 respectively. Table 7 contains the average values obtained from trajectory analyses of 3 independent production runs. The number of solvent molecules, Na, and Cl ions added to each system are shown in Supplementary File 16 (Supplementary Table S8).
Configuration changes of all the WT and mutant proteins were analyzed in terms of RMSD during the simulation period. Figure 6(A1,A2) depicts that RMSD values from the mutant structures are quite unstable comparing with the WT-BRCT and WT-DBD. The WT BRCT shows steady fluctuation throughout the 100 ns in the WT structure. V1687G and V1736G structures showed similar way of deviation till 45 ns from their starting structure, after that, fluctuated up to ~ 0.3 nm for V1687G. RMSD values of V216G and L194H were higher than the WT protein's RMSD at major points demonstrating that the mutations have considerable destabilizing effects on DBD. Interestingly, RMSD values of I2865T were lower than the WT-catalytic domain whereas significant fluctuation was observed in the RMSD values of V2906A with several spikes of ~ 1 nm from 25 to 32 ns period. We have monitored the RMSF to calculate the average fluctuation of amino acid residue in order to determine the mutation's effect on the protein residues dynamic behavior. From Fig. 6(B1-B3), it can be inferred that residue level fluctuations for V1736G, V2906A and V216G were quite high, up to ~ 0.5 nm, ~ 0.6 nm and ~ 1 nm respectively when compared with native proteins and other mutations. Analysis of the fluctuations also revealed that the greatest degree of flexibility was shown by the V2906A mutant. We have also analyzed the radius of gyration (R g ) for the WT proteins along with its associated mutations contributing to their compactness shown in Fig. 6C1-C5. From Table 6, it can be concluded that V1687G and V216G had approximately similar compactness as of their respective WT proteins. Rather than these, all the mutations possessed higher Rg values than their WT proteins suggesting their structural destabilizing effects caused by the mutations, ultimately leading to the loss of protein compactness. Finally, the solvent-accessible surface areas (SASAs) were analyzed to understand the changes in the protein volume upon mutation. V1687G, I2865T and V2906A mutants showed increased SASA values compared to their WT proteins. The decreased SASA value in the remaining mutants denotes their relatively shrunken nature as compared to the WT structures. The change of SASA value of WT and mutant   Stability of the docked protein-ligand complexes. The stability of the docked complexes during simulation period were assessed analyzing RMSD of the protein backbone and ligand structure as well as the hydrogen bonds analysis formed by PK083 with the WT and mutants (Fig. 7). We performed 3 independent production runs of the protein-ligand complexes, the 1st production run included both the WT and mutant-PK083 complexes (Fig. 7) while the 2nd and 3rd production run included the mutant-PK083 complexes (Supplementary Figs. S5, S6). It was observed that, all the WT and mutant proteins exhibited similar backbone RMSD except the L194H mutant, the WT (L194) protein were more stable compared to the mutant. In case of the six mutants, it was found that rather than the V2906A complex, all the remaining protein-ligand complexes were stable. For the five stable complexes, the ligand RMSD was less than 0.1 nm indicating the initial ligand-backbone contacts remained intact during the simulation period. In case of V2906A, the PK083 showed multiple binding orientations and it re-equilibrated several times during the 100 ns simulation (Fig. 7B3). It was also observed that the www.nature.com/scientificreports/ protein backbone RMSD of I2865T and have significantly decreased upon binding of PK083 than the apo form shown in Fig. 7B1. Further, we calculated the number of hydrogen bonds formed during the simulations period for the WT and mutant complexes, presented in Fig. 7A2,A4,B2,B4,C2,C4 as hydrogen bonding is one of the principal components responsible for the molecular interactions in biological systems. In the V1736G-PK083 complex, highest number of conformations formed up to three hydrogen bonds during the simulation. A very few conformations showed less than two hydrogen bonds. Except the V1687G-PK083 complex, the conformations of the rest of the complexes formed up to two hydrogen bonds throughout the simulation of all the 3 production runs. Moreover, the snapshot of conformers from the 1st MD run showed that PK083 remained in the binding sites of mutants throughout the entire simulation process (Fig. 8). We have also analyzed the interactions of PK083 with the mutants at the final timeframe (100 ns) and found that 2 interactions of V1687G, 3 interactions of L194H, 4 interactions of V1736G, I2865T and V216G with PK083 were stable at the end of the simulations. Although there were no interactions proposed in docking remain stable at the end of the simulation in case of V2906A mutant, it formed 6 interactions with PK083 near the mutant residue. The simulation trajectories of all the complexes were further exploited to study the interaction between the mutants and PK083.  www.nature.com/scientificreports/ Post molecular dynamics binding free energy calculation. We calculated the binding free energy of the last 20 ns of 1st MD production run of the mutant-PK083 complexes with an interval of 50 ps (picoseconds) from MD trajectories using MM/PBSA method. We also utilized the MmPbSaStat.py script included in g_mmpbsa package calculating the average free binding energy and its standard deviation/error from the simulation output files ( Table 7). The interaction between a ligand and protein is shown in the form of binding energy where lesser the binding energy, the better is the binding of the ligand and protein. The cumulative sum of van der Wall, electrostatic, polar solvation, and SASA energy is the final binding energy. PK083 showed the least binding free energy (− 113.211 kJ/mol) with the V216G variant of p53 among the mutants. The carbazole derivative showed almost similar binding free energy with the two variants of BRCT. By plotting the binding energy versus time graphs, a comparison of the binding free energies of all the six complexes were made, shown in Fig. 9. These results verify that PK083 might possess stabilizing effect on majority of the deleterious mutations of BRCA1, ATM and TP53 effectively. Further, we identified the contribution of each residue of the six mutants in terms of binding free energy to the interaction with PK083. By decomposing the total binding free energy of the system into per residue contribution energy, the contribution of each residue was calculated, shown in Fig. 9. This gives us an insight into the 'hotspot' residues that contributes favorably to the binding of this molecule to the mutants. It was found that except the V2906A variant of ATM, more than five residues of the remaining mutants contributed higher than − 1 kJ/mol binding energy. These identified key residues from our analysis will facilitate the study of mutation sites stabilization of three significant domains of these proteins.

Discussion
In human genome, non-synonymous single nucleotide polymorphisms account for about 50% of allele variation of all hereditary diseases 44 . It can help in improving medication strategies by facilitating more tailored personalized treatment to patients 45 . Also, new compounds can be tested to correct the effects of those mutations studying the effects generated by nsSNPs in disease-associated proteins. Identification of such nsSNPs responsible for specific phenotypes using molecular approaches is time-consuming and expensive 46 . Bioinformatics predicting approaches can help in narrowing down the number of high-risk pathogenic nsSNPs to be screened in genetic association studies, and in a better understanding of the function and structure of protein products.
In this study, we performed an intensive in silico evaluation to identify pathogenic nsSNPs of BRCA1, ATM and TP53 genes using a wide variety of computational tools. We selected these genes based on gene enrichment analysis from a list of nine genes. Only two studies have been carried out to evaluate the nsSNPs of human BRCA1 and ATM gene previously. Previously, a mutation of proline to serin at position 1812 as a main target mutation in the BRCA1 gene was reported analyzing only 65 nsSNPs of this gene in 2007 47 . Also, upon analyzing  48 . We expanded our study to include all the nsSNPs currently available in dbSNP database and hypothesized that a more reliable and precise estimation of a substitution consequence could be provided by using a variety of computational methods based on different algorithms to filter the pathogenic and neutral variants.
In this study, we retrieved all the available nsSNPs of the corresponding three genes and annotated them using nine computational tools to distinguish between the functional and neutral variants. Assessing the pathogenicity of functional nsSNPs, we filtered those that occur in the conserved domains of respective proteins. Further, evolutionary conservation analysis revealed that majority of the pathogenic nsSNPs occupy conserved amino acid positions whether they decrease or increase protein stability. The nsSNPs greatly decreasing the stability of proteins were finally selected as the high-risk ones. Combining these results, we found that V1687G and V1736G variants of BRCA1, I2865T and V2906A variants of ATM, V216G and L194H variants of TP53 were highly www.nature.com/scientificreports/ damaging mutations that greatly decrease protein stability and might alter their respective protein functions ( Table 2). All these six variants were found to occur in BRCT domain, catalytic domain and DNA-binding domain of BRCA1, ATM and p53 respectively (Fig. 3). The missense mutations in these domains might cause severe consequences disrupting their ability of functioning. The harmful polymorphic mutations are mainly found to be located in helixes and coil regions of a protein structure 49 . Our secondary structure analysis also revealed that three of these six mutations occur in coil regions, two in β-strand region and one in α-helix region which is in accordance with the previous recognition. Therefore, these mutations might result in significant distortion of the backbone over a turn leading to the likelihood of impaired molecular assembly. Hence, these novel findings encouraged us to study how the dynamic properties differ from WT and mutant amino acids using MD simulation analysis. The RMSD and RMSF analysis in agreement to each other revealed that all the variants except the I2865T, decreased stability and increased the flexibility of protein (Fig. 6A1-B3). The radius of gyration revealed that the WT proteins showed higher level of compactness throughout the time, whereas all the variants showed differential level of compactness (Fig. 6C1-C3). The SASA analysis showed both increased and reduced volumes gained by the mutants and thereby might be responsible for change in function of protein (Fig. 6D1-D3). Based on the simulation study, we demonstrated that the variants V1687G, V1736G, V2906A, V216G and L194H imparted changes in the native conformation or structure of the BRCA1, ATM and TP53 proteins in any sense of behavior and hence speculated to affect the respective protein function and structure in damaging manner.
Further, we carried out our investigation to stabilize the high-risk mutations using a small molecule inhibitor PK083. Mutation site specific molecular docking analysis revealed that PK083 had a strong binding affinity towards all the six mutation sites of three proteins ( Table 5, Fig. 5). More than five interactions were observed with PK083 in the mutation sites of all the mutant proteins. Highest docking energy of − 99.5 kcal/mol was observed with the WT-V2906 protein forming a carbon hydrogen bond with PK083. In case of the mutants, PK083 showed highest docking energy of − 72.3 kcal/mol forming a hydrogen bond with mutant residue. For the validation of docking process, we performed MD simulation (3 times) of six mutant-PK083 complexes over 100 ns and findings demonstrated that PK083 showed RMSD of less than 0.1 nm and no disassociation of bound PK083 is observed throughout the 100 ns simulations (Fig. 7A1,A3,B1,B3,C1,C3). Moreover, 2 interactions of V1687G, 6 interactions of V1736G, 3 interactions of I2865T, 4 interactions of both V216G and L194H with PK083 remained stable during the 100 ns MD simulation. Stability of PK083 during MD simulation were also supported by several H-bonds estimations (Fig. 7A2,A4,B2,B4,C2,C4). The results of MM/PBSA indicated that PK083 binds to all the six mutants efficiently as they exhibit good binding free energies (Table 7). We also identified several key residues essential for binding of PK083 to the mutation sites that will provide valuable insight in drug development against these deleterious mutations (Fig. 9).
Our further analysis from PolymiRTS database identified both the target sites disrupted and created by SNPs and INDELs in miRNA seeds. Four miRNAs, hsa-miR-6796-3p, hsa-miR-1233-5p, hsa-miR-525-3p and hsa-miR-548i were affected by the SNPs rs3745198, rs71309450, rs190453265, rs201549145 respectively having high conservation score (Table 3). As a result, the areas affected by those SNPs might have evolutionary important function.
Our study reports that six nsSNPs (V1687G and V1736G of BRCA1, I2865T and V2906A of ATM, V216G and L194H of TP53) in the BRCT, Catalytic and DNA-binding domains are highly damaging to the structure and function of three proteins. We also found that these mutants are drug targets for PK083 as revealed by molecular interaction results. Several miRNAs were affected by SNPs in the 3′ and 5′ untranslated regions of ATM and TP53 genes and hsa-miR-6796-3p, hsa-miR-1233-5p, hsa-miR-525-3p and hsa-miR-548i due to their higher likelihood of causing disruption or creation in the miRNA target site. Therefore, our findings could provide a cornerstone to the study of potential therapeutic inventions upon clinical-trial and experimental mutational studies.

Methods
A step-wise protocol was followed to identify the pathogenic nsSNPs of the selected cancer susceptibility genes. The work flow is depicted in Fig. 10.
Gene enrichment analysis. Gene symbols of nine selected genes were uploaded to Enrichr 50 web-server that evaluates the biological properties of genes based on enrichment analysis. The z-score method was used for computation of enrichment and a combined scoring method was used to compute a combined P value from Fisher's exact test 50 .

Retrieval of nsSNPs.
Information of nsSNPs (SNP rs IDs, position, and residue changes) of the selected genes was retrieved from the NCBI dbSNP database 51 . The "Missense" filter was used in the function class and all the nsSNPs were retrieved for analysis. Evolutionary conservation analysis. The evolutionary conservation of amino acid substitution was analyzed using ConSurf web server 62 . This server uses an empirical Bayesian inference to automatically analyze evolutionary conservation of amino acid substitutions in protein. The corresponding ConSurf conservation score ranges from 1 to 9, where 1 designates rapidly evolving (variable) regions, 5 designates mildly evolving regions, and 9 indicates conserved regions 63 . The exposed residues having high scores are thought to be functional residues, whereas the buried residues having high scores are considered structural.
Prediction of changes in protein stability. I-Mutant 2.0 was used to analyze the protein stability changes upon nsSNPs. The tool is based on support vector machine (SVM) that provides a free energy change value (ΔΔG) of protein after and before mutation as output and uses data derived from ProTherm database which is one of the most comprehensive of experimental data on protein mutations 64,65 . ΔΔG value of less than Functional effect analysis of SNPs in 3′ and 5′ untranslated regions (UTR). PolymiRTS v3.0 database was used to characterize the SNPs in 3′ and 5′ regions and to analyze the functional impact of genetic polymorphisms in miRNA seed regions and miRNA target sites. This is an integrated platform for analyzing SNPs that affect miRNA. We entered the gene symbols of the selected genes and acquired a list of the miRNAs affected by these mutations that might lead to a decrease/increase of the expression of genes.
Homology modeling and structural analysis of variants. To evaluate whether the high risk nsSNPs alter the WT structure of protein domains, we analyzed the three-dimensional (3D) structures of both WT and mutant domains. The PDB database (https:// www. rcsb. org/) was used to retrieve the available protein structures whereas MODELLER 9.22 66 was used to generate the 3D structures that were not available in the database. The DOPE and GA341 objective functions were used to choose the best structure from MODELLER, where higher GA341 and/or lower DOPE indicates higher quality of a generated model. The best modelled structures were refined through the GalaxyRefine 67 server. The resultant structures were verified by PROCHECK 68 and ERRAT 69 tool from SAVES 6.0 server and ProSA-web 70  (1-(9-ethylcarbazol-3-yl)-N-methylmethanamine) was used as ligand. Carbazole based small-molecules were tested to act as stabilizers in restoring the function of several mutants of p53 DBD 42,43 . So, we tested the stabilizing capability of PK083 to the highly damaging mutants of our study. All the mutant proteins were subjected to energy minimization using SwissPdb viewer 76 and the ligand structure was optimized with MMFF94 force field using steepest descent algorithm prior to docking. The binding site was defined covering the mutant residue to assess the binding affinity of PK083 to each mutation region. The docking processes were composed of maximum iteration of 1500, maximum population size of 50 and Grid solution of 0.3. Further, we carried out post docking processes by hydrogen bonds optimization and energy minimization, simplex evolution at max steps 300 and neighbor distance technical setting fast at 1.00. The energy of the receptor-ligand complexes was minimized using Nelder Mead Simplex Minimization. Later on, the interactions of mutants with ligand were visualized in Discovery Studio 4.1. The GROMOS96 54a7 78 forcefield was selected as the force field for proteins and the ligand topologies were generated from the Automated Topology Builder version 3.0 79 (ATB) server. Due to the enhanced capacity of the backbone NH and CO groups to form hydrogen bonds with each other in the GROMOS96 54A7 parameter set, this force field reproduces the folding equilibria slightly better and can sample more 314-helical or hairpin conformations than the previous 53A6 or 45A3 force fields 80 . Also, based on fitting to a large set of high-resolution crystal structures, the torsional angle terms were reparametrized in this parameter set 81 . The proteins and mutants-ligand complexes were solvated using simple point charge (SPC) water molecules in a rectangular box where every structure was placed in the center at least 1.0 nm from the box edges. Required number of Na+ and Cl− ions were added to make the simulation system electrically neutral. The salt concentrations were set to 0.15 mol/L in all the systems. The solvated systems were subjected to energy minimization for 5000 steps using the steepest descent method. Afterwards, three steps were conducted in the MD simulation: NVT (constant number of particles, volume, and temperature) series, NPT (constant number of particles, pressure, and temperature) series, and the production run. The NVT and the NPT series were conducted at a 300 K temperature and 1 atm pressure for the duration of 100 ps. V-rescale and Parrinello-Rahman were selected as the thermostat and barostat respectively of the performed simulation. Finally, 3 independent production runs of nine proteins (WT and mutants) and six protein-ligand complexes were performed at 300 K for a duration of 100 ns (nanoseconds) in a supercomputing system provided by the Bioinformatics Division of National Institute of Biotechnology (NIB), Bangladesh. Thereafter, a comparative analysis was performed between WT and mutants measuring root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (Rg), solvent accessible surface area (SASA) and hydrogen bonds. Qtgrace program was used to represent all these analyses in the form of plots 82 . Further, the g_mmpbsa 83 package of GROMACS was used to calculate the MM/PBSA (Molecular Mechanics/Poisson Boltzmann Surface Area) binding free energies followed by final MD run to get a more detailed overview of the biomolecular interactions between the mutated proteins and ligand. The tool was tested on 37 structurally divergent HIV-1 protease inhibitor complexes by performing comparison of the calculated relative binding energy with the experimental binding free energies 83 . Also, the results obtained using g_mmpbsa package were comparable to results obtained with the AMBER package in general within differences of 1-2 kcal/mol. Furthermore, the package can be used to approximate the energy contribution per residue to the binding energy and it has been used to identify the crucial residues for binding a range of inhibitors with www.nature.com/scientificreports/ HIV-1 protease 83 . The free solvation energy (polar and nonpolar solvation energies) and potential energy (electrostatic and Van der Waals interactions) of each protein-ligand complex were analyzed to determine the total ΔG bind . The binding energies were calculated using the following equation in this method:

Molecular dynamics (MD
Here, the ΔG binding = the total binding energy of the protein-ligand complex, G protein = the binding energy of free protein, and G ligand = the binding energy of unbounded ligand.

Conclusion
BRCA1, ATM and TP53 protein plays an important role as tumor suppressor in several cancer types. The structural conformation of the functional domains is very crucial for exerting their functional role. This in silico study of the functional SNPs of BRCA1, ATM and TP53 provides significant insight into the damaging effects that the nsSNPs might cause to these proteins. Our study reports that six nsSNPs (V1687G and V1736G of BRCA1, I2865T and V2906A of ATM, V216G and L194H of TP53) in the BRCT, Catalytic and DNA-binding domains are highly damaging to the structure and function of three proteins. We also found that these mutants are drug targets for PhiKan083 as molecular interaction results were evaluated by MD simulation and MMPBSA study. miRNAs were affected by SNPs in the 3′ and 5′ untranslated regions of ATM and TP53 genes and four are noteworthy, hsa-miR-6796-3p, hsa-miR-1233-5p, hsa-miR-525-3p and hsa-miR-548i due their high conservation score. This is the first in silico analysis that combinedly analyzes the impacts of nsSNPs on the structure and function of three cancer susceptibility proteins and predicts stabilizing possibility. Our findings will guide in the study of potential therapeutic inventions upon clinical-trial and experimental mutational studies.