In Silico identification of SNP diversity in cultivated and wild tomato species: insight from molecular simulations

Single Nucleotide Polymorphisms (SNPs), an important source of genetic variations, are often used in crop improvement programme. The present study represented comprehensive In silico analysis of nucleotide polymorphisms in wild (Solanum habrochaites) and cultivated (Solanum lycopersicum) species of tomato to explore the consequence of substitutions both at sequence and structure level. A total of 8978 SNPs having Ts/Tv (Transition/Transversion) ratio 1.75 were identified from the Expressed Sequence Tag (EST) and Next Generation Sequence (NGS) data of both the species available in public databases. Out of these, 1838 SNPs were non-synonymous and distributed in 988 protein coding genes. Among these, 23 genes containing 96 SNPs were involved in traits markedly different between the two species. Furthermore, there were 28 deleterious SNPs distributed in 27 genes and a few of these genes were involved in plant pathogen interaction and plant hormone pathways. Molecular docking and simulations of several selected proteins showed the effect of SNPs in terms of compactness, conformation and interaction ability. Observed SNPs exhibited various types of motif binding effects due to nucleotide changes. SNPs that provide the evidence of differential motif binding and interaction behaviour could be effectively used for the crop improvement program.

In tomato, SNPs were identified by various techniques such as Expressed Sequence Tag (EST) 10 , oligonucleotide arrays, intron and amplicon sequencing of Conserved Orthologous Set (COS) of genes and genotyping arrays 11 . The accomplishment of tomato genome sequence 12 represents the most important asset in genetic research for deciphering Next Generation Sequencing (NGS) based SNPs. An enhancement in the ESTs dataset (from 8261 to 26019 of Solanum habrochaites species), available at http://www.ncbi.nlm.nih.gov/ and high throughput sequence dataset under Sequence Read Archive (SRA) have provided an oppertunity for the polymorphism detection. Generally, non-synonymous SNPs or substitutions have the ability to alter the amino acid sequence of a protein in contrast with synonymous SNPs which do not alter amino acid sequence. These non-synonymous substitutions could result in a biological change in any individual. Therefore, considerable attention has been given to non-synonymous SNPs (alter amino acid) because sequence change leads to various functional impact in terms of protein stability, which hampers the interaction with other proteins 13 .
Differential motif binding has a direct correlation with the gene expression, suggesting the functional consequence of binding variation 14,15 . In human genetics, various studies have explored the allele-specific motif binding and their role in human diseases 16,17 . Hence it is also important to examine the allele-specific differential binding effect of non-synonymous SNPs i.e motif binds to a specific allele of polymorphism but not to others. This information could be helpful in the study of the gene expression and chromatin modifications. The present study has been carried out to get insight about the differential motif binding due to SNPs among cultivated and wild tomato species.
Protein coding genes have a tendency to develop either negative or positive evolutionary routes in nature 18 . Previous study 19 compared transcriptomic level selection pattern between one cultivated and six wild species, and characterized the footprints of positive selection. However, negative selection causes deleterious impacts on the host fitness and evolves at a slower rate in contrast to positive selection. Thus, we have given more emphasis on prediction of both positive and negative evolutionary selection pressure of genes to provide the detail picture of evolutionary selection pressure in studied species.
In this study, we focused on analysis of functional consequences of variations, by the consolidated utilization of ESTs (26019 and 298306 ESTs of Solanum habrochaites and Solanum lycopersicum) and high-throughput dataset under SRA (SRP041563) which enabled numerous SNPs mining among these two species. In addition, in-depth computational analysis of SNPs was performed at the sequence and structural level. Structural analysis enabled molecular dynamics simulations revealing many additional effects of SNPs in comparison to its native-type protein. The results may have the potential to improve our understanding of the consequence of SNPs in a novel way. Identified SNPs could be used as markers for traits of interest such as fruit ripening, color and texture under breeding program.

Results
ESTs and NGS-based SNP Discovery and chromosomal distribution. A total of 8978 numbers of SNPs were predicted by comparing both the species as described in Fig. 1. The SNPs density (inner circle highlighted with green color) along with gene density (outer circle highlighted with yellow color) in the non-overlapping window of 1000/kb was assessed for each chromosome and plotted by Circos 20 (Fig. 2). It clearly demonstrated a non-uniform pattern of SNPs distribution in all the chromosomes. A closer look of the data indicated that the density of both SNPs and genes were higher at one end of all the chromosomes except for chromosome 2 which showed SNPs hump from 34 Mb genomic regions (Fig. 2). Next, SNPs were classified on the basis of their nucleotide substitutions, either transition (purine-purine and pyrimidine-pyrimidine) or transversion (purine-pyrimidine and pyrimidine-purine). The rate of transition was 1.75 times higher than the transversion. The observed proportion of C/A+ T/G transversion was slightly lower (1601) than the C/G + A/T transversions (1662). Likewise, the transition to transversion ratio (Ts/Tv) was significantly (p < 0.001) higher at third codon position (2.01) than first (1.9) and second codon (1.89) positions. Observed transition to transversion ratio reflected the trend of genomic conservation at codon sites during evolution.
Functional classification of SNPs. All SNPs identified through the use of NGS and ESTs approach, were classified into various groups. Approximately 73% SNPs were found in coding region comprises both of synonymous (4739) and non-synonymous (1838) SNPs (Table 2), and distributed among 1840 and 988 protein coding genes, respectively. Using the SnpEff 21 tool, we classified the overall impact of all the SNPs into four categories: low impact (53%), modifier (26%), moderate (20%), and high impact (1%) (Fig. 3a) the direct effect on the gene functionality, i.e caused either stop codon gain or stop codon loss in their respective genes. Stop gain and stop loss may lead to high level of functional consequences due to protein truncation and degradation of the transcript. Interestingly, we observed stop gain (G/A SNP) and stop lost (C/G SNP) in calmodulin 5/6/7/8-like and Glucan endo-1 3-beta-glucosidase 1 protein respectively, that play a role in the network of fruit ripening processes (Fig. 3b). Low impact SNPs consisted of synonymous SNPs whose amino acid remains unaltered in genes, therefore had a low impact on gene functionality. Gene observed under the effect of this group was Auxin response factor 8 (A/G SNP) which regulates the plant hormone regulatory pathway (Fig. 3c). Moderate impact was observed in non-synonymous SNPs which acquire the change in the amino acid due to nucleotide substitutions. This moderate type of impact was found in the HMG1 (high mobility group 1) gene known to involved in DNA regulatory processes (Fig. 3d). Modifier SNPs changed the functionality of respective gene and consist of both UTR (carries the binding site for various miRNA, transcription factors) and intergenic SNPs (Fig. 3e). As a result, transketolase 1 (T/A SNP) protein and Diphosphate-fructose-6-phosphate 1-phosphotransferase (A/G SNP) enzyme were found to consist of 3′ UTR SNPs.
Evolutionary selection pressure under the effect of SNPs. The non-synonymous and synonymous substitution rate (Ka/Ks) were assessed to deduce the direction of natural selection over studied species. This analysis revealed the list of 205 and 34 genes representing the purified (Ka/Ks < 1.0) and diversified (Ka/Ks > 1), groups, respectively (Supplementary Table S1). The Ka/Ks of identified genes ranged from 0.02 to 0.3 in purified group while 1.0 to 4.16 in diversified group. Interactome network analysis provided the relationship of these genes and arranged into three major modules by the use of K-mean clustering as shown in Fig. 4a. Module I represented the densely packaged interaction among genes in comparison to the rest of the modules (II and III). Pathway analysis of depicted network indicated the enrichment of ribosome specific regulatory genes (Fig. 4b).  Furthermore, common and unique pathways were analysed in order to get specific feature of each module. Counting the common pathways between the three modules, we observed that metabolic pathways and biosynthesis of secondary metabolism pathways were present in all the three modules. Whereas, glyconeogenesis and photosynthesis pathways were uniquely present in Module I ( Fig. 4b purple Bar) and Ribosome biogenesis and ubiquitin mediated proteolysis pathways were uniquely present in Module II. Proteosome pathway was enriched in two modules (II and III) only. These results indicated the evolutionary selection pressure of genes regulating the metabolic pathways.

Distinction of tolerable SNPs from Deleterious SNPs.
In order to add another layer of refinement in SNP analysis, SIFT (Sorting Intolerant from Tolerant) 22 and PANTHER (Protein ANalysis THrough Evolutionary Relationships) 23 were used to distinguish the tolerable SNPs from deleterious SNPs. As a result, PANTHER predicted 110 SNPs to be deleterious, possessing the subPSEC score ≤− 3 (5% of aggregate non-synonymous SNPs). SIFT predicted 277 SNPs possessing tolerance index score of ≤ 0.05, named as deleterious SNPs. Afterward, 28 non-synonymous SNPs were commonly depicted as deleterious under SIFT and PANTHER (Table 3). Next, 8 deleterious non-synonymous SNPs were found to be the part of the active site of eight individual genes. Complete functional details along with active site information are listed in Supplementary Table S2. Two deleterious SNPs replaced polar AA (amino acid) with hydrophobic AA (Ser67Ala, Ser170Phe). Overall consequence pinpoints that such type of SNPs are considered to be lethal and alter the critical component of biological reaction such as transcription process and DNA binding ability.
Dynamic change in motif binding in allele-specific manner. Nucleotide change from cultivated to wild species resulted in preferential motif binding on either allele at sequence level, therefore called as allele-specific motif binding event. For better understanding, we created five possible combinations or classes as shown in Supplementary Fig. S1a. Further, we observed that Class I (3) and II (8) SNPs caused depletion in motif binding in wild and cultivated species respectively. Class III (4) SNPs exhibited no effect on motif binding due to allele variation, i.e both alleles show binding to the same motif, while class IV (6) SNPs showed dynamic change in motif binding from cultivated to wild species. No motif binding was observed in the rest of the deleterious SNPs (class V) as represented in Supplementary Fig. S1b. In this way, Class I, II and IV specific SNPs might be useful for the breeding program. Besides this, a collective list of genes regulating the complex network of fruits ripening and other complexes was prepared, which allowed us to investigate the consequence of predicted SNPs in these complexes. As a result, 96 SNPs were found to be the part of the gene regulating traits such as fruit ripening, cold response, trichome development and fruit texture. This study revealed that one of the texture regulatory gene Fasciclin-like arabinogalactan protein 13 carries A/G allelic variation, where allele A binds to Myb motif while allele G binds to Storekeeper motif, indicating the dynamic change in motif binding, therefore included in Class IV. A similar dynamic change in motif binding (from Myb to Homeodomain) was found in beta-glucosidase D4 gene carrying A/T point variation represents the Class IV motif binding effect as listed in Supplementary Table S3. Overall the result indicated that allelic effect had a wider biological role in the multiple fruit ripening complexes as shown in the Fig. 5.

Structural modelling and consequence of deleterious SNPs depicted via protein-ligand complexes.
To directly readout the effect of deleterious SNPs, we superimposed the native (represent the allele of cultivated species) and mutant (represent the allele of wild species) protein models after the energy minimization. Six SNPs significantly changed the structure alignment of native and mutant protein models ( Supplementary Fig. S2a-f ). Detailed analysis presented the immense deviation between the native and Tyr416Asp mutant structures of HMG1 protein pointing towards alteration in the structural geometry ( Supplementary Fig. S2a). Superimposed structure of native and Ser67Ala mutant of H1 Histone-like protein showed a change in loop to helix conformation after energy minimization of protein structures ( Supplementary Fig. S2b). This change may affect the functionality of Histone-like protein which could lead to the change in the DNA binding pattern with their interacting members. Other three deleterious SNPs (Val133Ala, Ser170Phe, Lys67Asn), although observed to be forming the loop conformation in both native and mutant models (Histone H1, Ribonuclease P protein and Reticulon family protein), changed the 3D conformation in mutant as shown in Supplementary Fig. S2c-e. One of the SNP (Leu164Pro) in elongation factor G gene, brought the change from loop (native) to sheet (mutant) conformation in 3-dimensional protein structures ( Supplementary Fig. S2f). This change illustrated the higher stability of the mutant structure of elongation factor G protein as compared to its native form.
To analyse interaction behaviour using docking simulations, one of the protein HMG1 consist of Tyr416Asp AA change, was selected. This SNP affected the active site of the HMG1 protein which could be clearly implied through docking simulations. HMG1 is known for its interaction with p53 ligand. Both the native HMG1-p53   Structural deviation and fluctuation due to deleterious SNPs. For the detailed functionality of genes carrying the 28 deleterious SNPs, pathway analysis was performed using KOBAS web server. Deleterious SNPs were predicted to be the part of base excision repair pathways (affected the HMG1-high mobility group protein B gene), plant hormone signal transduction (affected the T1R1 and PR1 b-pathogen responsive 1b) and plant pathogen interaction pathway (affected the PR1-pathogen responsive 1) ( Supplementary Fig. S3a-d). Under the molecular simulations, we observed that complex of native PR-1 protein (blue) shows almost same pattern of Root Mean Square Deviation (RMSD) growth with respect of time, with slight peaks between 0 to 500 ps and 1750 to 2000 ps (Fig. 7a). After a certain time range, Cys284Arg mutant PR-1 protein (red) obtained convergence towards its folding as compared to the native structural fold, which was an indication of change in required functional distance between the atoms, thus change in functional geometry. A clear change in the peaks of RMSD was detected in Asp115Glu mutant HMGB1 protein, almost following the peak & valley pattern of the native protein up to 1 ns (nanosecond) and revealed interrupted peak line at 1100 ns, while the native one maintains a pattern in remaining time intervals (Fig. 7b). It further indicated that change in functional geometry of protein was affected specifically on the basis of scalar distance between the atom sets of corresponding amino acid. Radius of gyration (Rg) represents the measure of compactness of the protein conformation in biomolecular simulations. In one hand, the native and Cys284Arg mutant PR-1 protein showed a clear difference in gyration radius where native protein exhibited affinity towards convergence and maintains the steady flow between 2.4 to 2.2 nm (nanometer). Whereas the mutant protein represented its compactness point at 1100 ps between 2.6 to 2.4 nm (Fig. 7c). It indicated the effect of mutation over the structural compactness and thus effect on the folding of PR-1 protein. On the other hand, the native and Asp115Glu mutant HMGBI protein, showed almost similar pattern of gyration plot with overlap of peaks at 750 ps to 1250 ps range (Fig. 7d). Following up the similar pattern of gyration radius and overlap for a time range show that mutation was also favourable for maintaining the compactness of structure.

Discussion
In plants, the majority of traits of interest are linked with SNPs and are thought to bring the individual variation, community diversity and the evolution of species. The link between single nucleotide change and gene function has been reported for a number of traits 24 . Modern science is mainly focused on finding the differential behaviour in genomes by utilizing microarray, transcriptome (in the form of up and down regulated genes) and ChIP-seq (in the form of differential peaks) like high throughput approaches. The present study was centered around finding the differential behaviour of non-synonymous SNPs in terms of their motif binding and put forward a methodological approach for SNPs analysis. In this study, we applied more stringent filters for SNP detection as compared to the previous report 25 . Numerous SNPs were identified by making the use of ESTs and sequencing approach as discussed in preceding studies 26 . Identified SNPs were found to be under the influence of transition biases i.e occurrence of higher transition as compared to transversion 27 . However, a contrasting result was observed in ginger EST leaves sequences 28 . The occurrence of higher transition as compared to transversion might be associated to the high frequency of the C to T variation after methylation 29 . The frequency of SNPs was highest at third and lowest at second codon positions 30 . Furthermore, predicted SNPs density was found to be in accordance with gene density, over each chromosome. A recent report 31 claimed the major divergence at heterochromatin as compared to euchromatin regions in their studied species. Here, we found that chromosome 1 consists of maximum variation as expected due to its largest size, which was contradictory from earlier study 11 . Valuable resource of functional data was depicted in the form of SNPs identified in UTR, coding and splice site regions that could provide the ample support for association studies.
The stringency of functional or structural constraints is thought to be a noteworthy key component behind the rate of amino acid substitutions 32 . We identified 205 purifying genes (Ka/Ks < 1), which tend to evolve slower than proteins with weaker constraints and 34 diversifying genes (Ka/Ks > 1) that are known to evolve quickly and tend to acquire a new function. Different threshold were applied in the other reports for positive (purifying) and negative (diversifying) groups such as in pacific white shrimp 18 , authors adopted Ka/Ks > 0.6 and Ka/Ks < 0.2 to be the realistic parameter for the diversifying and purifying respectively, in Eucalyptus grandis 33 Ka/Ks < 0.15 for positive and Ka/Ks > 0.5 for negative selection. Many purifying genes were traced among cultivated and wild tomato species 34 and provides the conclusive influence of SNPs in the evolutionary study.
SNPs are known to be associated with many aspects of human development and diseases. In plants, SNPs are reported to be regulating various Quantitative Trait Loci (QTL) responsible for cold and disease resistance such as such as blight, bacterial canker and gray mold 35,36 . In this study, various SNPs were observed on chromosome 5 and 11, already reported for their involvement in QTLs linked with reduced self-seed, late blight disease which influences fruit size, canopy density and plant size 37,38 and appears to be convincing evidence for the significance of the other predicted SNPs. Differential binding had elucidated the significance of each allele in terms of their motif binding ability. The comparable conclusion comes from another study 39 where researchers observed the influence of SNPs on WRKY domain and found a lesser TGAC binding affinity in WKRY domain due to the presence of non-synonymous SNP. Interpretation of non-synonymous SNPs carrying genes which are regulating fruit ripening, texture and cold responses along with their differential behaviour could provide in-depth information to the biologist. The pathways of biosynthesis of secondary metabolites and plant hormone signal transduction are known for their role in fruit ripening and defense responses 40 . It has been reported that changes in cell wall structure and the conversion of starch into simple sugars play a significant contribution during fruit ripening 41 . Deleterious SNPs were found to be the part of such biological pathways (plant hormone signal transduction and metabolic pathways) and thus enhance the significance of the present study. Metabolites act as a key component of tomato fruit flavour and participate in plant defense responses against pathogens and herbivore 42 . Involvement of non-synonymous SNPs in various metabolic pathways (glycine, serine and threonine metabolism, carbon metabolism and biosynthesis of secondary metabolites) directly implied their role in variation among studied species. Furthermore, 11 SNPs were identified in carotenoid genes, known for its role in fruit ripening and impart colors of fruits which could be used to understand the fruit colouring mechanism. Involvement of 58 SNPs in genes reported to be playing a role in the fruit texture, could be addressed in detail at breeding level. Also, 26 SNPs were identified in the cold-responsive genes which could be used to understand the complex mechanisms of cold stress because cultivated species are unable to tolerate freezing. RMSD and Rg values were considered as selective attributes via molecular simulation analysis. Molecular simulation analysis revealed the SNPs influence over the proteins playing a role in the defense response (PR1) and metabolic activities; both are noteworthy phenotypic difference known to exist in both the species. A similar approach was applied in earlier studies [43][44][45] to study the influence of non-synonymous SNPs. Present study must enhance the knowledge of the scientific community and provide support for the future breeding program.

Materials and Methods
Pre-processing of Transcriptome sequencing data and ESTs sequences. Transcriptome sequencing data (SRP041563) was downloaded from the SRA (http://www.ncbi.nlm.nih.gov/sra/) belonging to proximal and distal leaves of tomato of Solanum lycopersicum and Solanum habrochiates. We used SRA toolkit to get the single end reads (42 samples) in fastq format. In parallel, the EST sequence data of Solanum lycopersicum and Solanum habrochaites comprising a large amount of leaf, fruit, Trichome I and Trichome IV specific cDNA sequences, were downloaded from National Center for Biotechnology Information (NCBI) in fasta format. As ESTs were sequenced only once, sometimes the data has low-quality sequence and repeats generate problems during downstream analysis. To remove the interspersed repeats and low complexity regions, RepeatMasker 46 was run on EST datasets of both species separately against plant based Repbase libraries (http://www.girinst. org/server/RepBase/index.php). This reduced the chance of biases in SNP analysis and the overall noise in EST sequences.

SNP analysis in Illumina short reads and ESTs sequences.
Illumina short reads of wild and cultivated species were mapped to the reference genome of Solanum lycopersicum 2.5 version utilizing the Tophat v2.0 47 with the default option of -n/mismatch 2, -i/min-intron-length 50 except for -p/num-threads that set to 8. Only reliable mapped reads were considered for SNP calling and unmapped reads were discarded. SNP positions within mapped reads were predicted by use of samtools 48 . VCFtools 49 was employed on the raw Variant Calling Format (VCF) files for the minimum depth (DP) 10 and SNP quality (Q) 30 to get high-quality SNPs as described in Fig. 1 under short reads SNP detection workflow. For ESTs based SNPs analysis, AutoSNP v 2 50 tool was used with some modifications, and SNPs were differentiated from sequencing error according to sequence redundancy. Pre-processed EST sequences were assembled using the Contig Assembly Program v.3 (CAP3) program 51 within the AutoSNP pipeline at 85% identity (-p) and 100 bp overlap (-o). The resulted ACE format files were then parsed for SNP identification using custom scripts. Few filtration steps were applied in ESTs dataset, in order to diminish false SNP prediction caused by sequencing errors such as -depth of reads (minimum 3/3 reads belongs to both species) and removal of paralogous polymorphism at SNP site as shown in Fig. 1 under ESTs SNP detection workflow. Rest of indels and variants involving more than one nucleotide change in both (short read as well as EST) approaches were excluded. Furthermore, we performed the blast of the ESTs contig at e-value ≤ 1e-10 and 90% identity, against the tomato genic sequences to get their position in reference genome. A portion of the SNPs comprised of allelic variation with respect to (w.r.t) genic sequences. In this manner, such SNPs were excluded in further processing.
SNPs classification and interactome network of purified genes during selection pressure of SNPs. To annotate the effect of SNPs (synonymous and non-synonymous), SnpEff 21 software was used.
Identified SNPs were scanned for the ratio of the number of non-synonymous substitutions per non-synonymous site (K a ), to the number of synonymous substitutions per synonymous site (K s ), in the given timeframe. In this way, cultivated and wild allele carrying gene sequences were created based on the SNP position and compared in Ka/Ks_calculator 52 tool with an optional parameter of standard genetic code and NG method. Two variables (positive and negative) were created by Ka/Ks ratio, i.e genes with Ka/Ks ≥ 1 falls in the diversifying group while genes with Ka/Ks ≤ 1 genes fall in the purifying group. These two groups were further analysed for their interaction network in STRING database 53 (http://string-db.org/).

Assessment of deleterious SNPs under PANTHER and SIFT and their active site prediction.
To find out the potential functional effect of amino acid substitution on corresponding proteins, SIFT 22 (subPSEC score ≤− 3) and PANTHER 23 (tolerance index score of ≤ 0.05) were used for the detection of deleterious SNPs. Three dimensional structure of both native and mutant proteins were produced by the Phyre2 server 54 available at www.sbg.bio.ic.ac.uk/phyre2/html/ under intensive mode. Simultaneously, active sites of protein model carrying deleterious SNP were predicted by using CASTP web server (http://sts.bioe.uic.edu/castp/). Furthermore, native and mutant structures were superimposed in UCSF-Chimera molecule viewer tool after energy minimization using Needleman-Wunch alignment method, applying gap extension penalty 1, gap opening penalty 12 and structure score > 30% with distance range of 2.0 A.
Probable consequence of deleterious SNPs at sequence level. To understand SNP impact at the sequence level, differential binding events were predicted by utilizing CIS-BP web server (http://cisbp.ccbr.utoronto.ca/TFTools.php), where sequences of each of 10 bp (in upstream and downstream) from SNPs site were extracted and used as input in CIS-BP web server 55 by the use of 8mer motif model. Moreover, SNPs were classified into five classes based on motif binding ability of each SNP allele as demonstrated in Supplementary Fig. S1a. Protein conformation changes at structural level. One of the deleterious SNP was randomly selected for protein-ligand analysis. Docking analysis was performed to predict the putative modification of binding modes of studied ligand (p53) with selected protein (HMG1) using AutoDock 4.0 (http://autodock.scripps.edu). The grid size was set to cover acting domains present in selected protein with the grid spacing of 0.375 Å. Genetic algorithm (GA) was applied as searching parameter with 10 numbers of GA runs and setting population size 150, a maximum number of energy evaluations was set to 25,00,000 with considering the maximum number of generations to 27,000. The lowest binding energy conformation with H-Bonds in cluster was considered as the most favourable docking pose. Protein-ligand complexes obtained from AUTODOCK 4 were further viewed in UCSF-Chimera molecule viewer tool for better analysis of interaction. In each case, 10 different docking arrangements were produced. The conformations obtained as a result of rigid body docking were sorted by total binding energy, hydrogen bonds formed, bond lengths and close contacts between enzyme active sites.
Detail Annotation and molecular simulation of deleterious SNPs at structural level. To elucidate key role of deleterious SNPs at a biological level, Kobas web server (http://kobas.cbi.pku.edu.cn/home. do) was used for pathway analysis. Furthermore, molecular simulations were performed on the selective 2 non-synonymous SNPs to gain insight into the impact of deleterious SNPs in terms of protein functionality. All the molecular simulations were carried out using Gromacs program 56 opting GROMOS96 54a7 force field and SPCE water model. A cubic box with 1.0 nm spacing was generated for each protein, followed by the energy minimization using the steepest descent method. After minimization, equilibrium (canonical NVT ensemble) step was initiated using leap-frog integrator and set the LINCs order value 4. Again, Leap-frog integrator was used for the production MD run, considering all bond constrains with PME for long-range electrostatics.