Mutation analysis of pathogenic non-synonymous single nucleotide polymorphisms (nsSNPs) in WFS1 gene through computational approaches

A single base changes causing a change to the amino acid sequence of the encoded protein, which is defined as non-synonymous single nucleotide polymorphisms (nsSNPs). Many of the nsSNPs can cause disease, and these nsSNPs are considered as pathogenic mutations. In the study, the high-risk nsSNPs of WFS1 and their influence on the structure and function of wolframin protein were predicted by multiple bioinformatics software. We obtained 13 high-risk nsSNPs of WFS1. All the 13 high-risk nsSNPs are highly conserved residues with a conservative score of 9 or 8 and mostly may cause a decrease in protein stability. The high-risk nsSNPs have an important effect on not only amino acid size, charge and hydrophobicity, but also protein's spatial structure. Among these, 11 nsSNPs had been previously published or cited and 2 nsSNPs (G695S and E776K) had not been reported to date. The two novel variants increased or decreased hydrogen bonds. In conclusion, through different computational tools, it is presumed that the mechanism of pathogenic WFS1 nsSNPs should include the changes of physicochemical properties, significant structural changes and abnormal binding with functional partners. We accomplished the computational-based screening and analysis for deleterious nsSNPs in WFS1, which had important reference value and could contribute to further studies of the mechanism of WFS1 related disease. The computational analysis has many advantages, but the results should be identified by further experimental studies in vivo and in vitro.

www.nature.com/scientificreports/ effect on the function of wolframin protein, it is not only time-consuming but also expensive to deeply explore their functional effect. It is worthwhile to use different bioinformatics tools to analyze the high-risk nsSNPs. Our study focused on the relationship among these nsSNPs and protein function in depth.
Prediction of high-risk nsSNPs in WFS1. To assess the potential effect of SNPs in the WFS1, we performed analyses utilizing a range of database servers. When all the computational tools predict one nsSNP is deleterious, we consider it as the high-risk nsSNP, which is highly likely to have harmful effects on the function of protein and even lead to diseases. Sorting intolerant from tolerant (SIFT) (http:// sift. jcvi. org/) and Protein variation effect analyzer (PROVEAN) (http:// prove an. jcvi. org) can predict the potential influence of an amino acid substitution in a protein according to the sequence homology 6 . Polymorphism phenotyping V2 (PolyPhen-2) (http:// genet ics. bwh. harva rd. edu/ pph2/) can calculate the potential functional effect of amino acid substitutions from its individual characteristics via Naïve Bayes classifier 7 . Likelihood Ratio Test (LRT) uses the statistical method of likelihood ratio test to make predictions by analyzing the conservation of amino acids 8 . Unlike SIFT and PloyPhen-2, LRT does not need to analyze the evolutionary distance between homologous protein sequences to predict amino acid conservation, and has a wider range of applications. Functional Analysis through Hidden Markov Models (FATHMM) (http:// fathmm. bioco mpute. org. uk/ inher ited. html) can predict the impact of missense mutations on the function of protein with optional species-specific weights 9 . Its MKL algorithm can be used to predict both coding and non-coding variants. Mutation Taster (http:// www. mutat ionta ster. org/ ChrPos. html) is an analysis tool which have recruits several biomedical databases, and predict that the mutation is a polymorphism or disease causing through a naive Bayes classifier2 10 . Based on evolutionary conservation of the mutant amino acid in protein homologs, Mutation Assessor (http:// mutat ionas sessor. org/ r3/) can assess the functional influence of nsSNPs 11 . Protein variation effect analyzer Variant Effect Scoring Tool 3 (VEST3) (http:// karch inlab. org/ apps/ appVe st. html) can predict the functional influence of variants according to the probability of missense mutations causing disease 12 . There are also following six comprehensive prediction tools using machine learning and other related algorithms to score the pathogenicity of a SNP and other variants: CADD 13  Prediction of stability of mutant proteins. MUpro (http:// mupro. prote omics. ics. uci. edu) can predict protein stability changes without tertiary structures with two machine learning methods: Neural Networks and Support Vector Machines 18 . The confidence score is between 1 and -1. The bigger the absolute value, the more confident the prediction is. I-Mutant2.0 (https:// foldi ng. biofo ld. org/i-mutant/ i-mutan t2.0. html) can evaluate the change of protein stability upon single site mutation starting from the protein sequence or structure. INPS-MD (http:// inpsmd. bioco mp. unibo. it), also named as Impact of Non-Synonymous Mutations on Protein Stability-Multi Dimension, is a web server devised to prediction of protein stability change upon single point mutation 19 . The iStable (http:// predi ctor. nchu. edu. tw/ istab le/ index Seq. php) is a comprehensive predictor of protein stability change after single mutation 20 .
Evolutionary conservation analysis of nsSNPs. The ConSurf server (http:// consu rf. tau. ac. il) can calculate the conservation score to estimate the conservation of amino acids in evolution through a maximum likelihood (ML) method or an empirical Bayesian method 21 . The score between 7 and 9 is considered evolutionarily conservative.
Prediction of secondary structure and membrane protein topology. SOPMA is also named as Self-Optimized Prediction Method with Alignment based on the homologue method. It can predict the secondary structure of protein through five independent algorithms 22 . TMHMM Server 2.0 (https:// servi ces. healt htech. dtu. dk/ servi ce. php? TMHMM-2.0) is an online tool, based on a hidden Markov model (HMM) approach, for prediction of transmembrane structures in proteins.
Model building and evaluation of wolframin protein, analysis of mutation-induced structural changes. The Robetta (http:// robet ta. baker lab. org/) is a prediction server of protein structure, using the Robetta fragment-insertion method. It can predict a full chain protein structure automatically for ab initio and comparative modeling. The three-dimension (3D) model of the wolframin protein was built by Alphafold2, with the template performed using Robetta server. PyRAMA was used to geometrically evaluated the modeled 3D structure by calculating the Ramachandran plot. Structural presentation of wild and mutant proteins was made by using PyMOL programs. Cartoon drawings of the structures were obtained. PyMOL software was used to label native as well mutant amino acids and present the hydrogen bond between them. Modeled mutant proteins was superimposed with PyMOL on the wild protein for comparison of three-dimensional structure of wild and mutant proteins. As an online web service, HOPE (http:// www. cmbi. umcn. nl/ hope) can analyze the impact of a given mutation on the protein structure. The Accessible Surface Area and Accessibility Calculation for Pro-Identification of the pathogenicity of nsSNPs. We used fourteen computational tools to predict whether every one of the nsSNPs is deleterious or not (Table S1). The number of deleterious nsSNPs predicting by each software was shown in Fig. 1c. The screening conditions predicted as harmful by each software are as follows: "Deleterious" by M-CAP, FATHMM, MetaLR, LRT, MetaSVM and PROVEAN; "Probably damaging" or "Possibly damaging" by Polyphen2; "Damaging" by SIFT; "High" or "Medium" by MutationAssessor; "Disease causing automatic" or "Disease causing" by Mutation Taster2; rank score higher than 0.9 by FATHMM -MKL and DANN; score higher than 25 by CADD_Phred; score higher than 0.9 by VEST3 and REVEL. There were 13 nsSNPs which considered as the highly harmful nsSNPs by all the computational tools. As shown in Fig. 1d, the darkest purple region was highlighted and had the positive correlation with highly harmful nsSNPs in WFS1. The detailed results are highlighted in Table 1. Finally, the 13 nsSNPs were considered high risk and selected for in-depth analysis. www.nature.com/scientificreports/ dicted that all 13 nsSNPs resulted in decreased protein stability. MUpro software, I-Mutant 2.0 and iStable software predicted 10, 10 and 11 nsSNPs leading to a decline in stability of wolframin protein, respectively. Meanwhile, the stability of protein had a sharp decline after the L723P and L829P mutations, because their total score was both below -6. Moreover, P724L and P885L were predicted to increase the stability of wolframin protein by three software tools.
The secondary structure, and transmembrane helices prediction of the wolframin protein. The secondary structure of the wolframin protein was predicted by SOPMA (Fig. 3b). Four secondary structures composed the wolframin protein with 890 amino acids ( Fig S2). The alpha helix consisted of 442 amino acids (accounting for 49.66%), the beta turn consisted of 44 amino acids (4.94%), the beta sheet consisted of 98 amino acids (11.01%), and the random coil consisted of 306 amino acids (34.38%). We used the TMHMM  www.nature.com/scientificreports/ to characterize the amino acid of WFS1 for their inside/outside of membrane region and transmembrane region and investigated the effect of mutations of high-risk nsSNPs. It was showed that there were nine transmembrane regions in WFS1 by using TMHMM server (Fig. 4a-c). The G494R and P885L mutation significantly increased the probability that the corresponding amino acid site is located at transmembrane region. However, none of the nsSNPs resulted in changes in the structure of the wolframin transmembrane region. Notably, most pathogenic variants were found in the C-terminal region of wolframin rather than the transmembrane domain.

Protein modeling of wolframin and analysis for the structural effects of mutation. The 3D
structures of wolframin and its mutant proteins were predicted by Robetta server (Fig. 5a). The Ramachandran analysis was carried out for wolframin protein. The residues of the wild type protein were greater than 90% in most favored and allowed region, which showed the structure was usual (Fig. 5b). According to the comparison of the qualitative electrostatic representation of wild and mutant G494R proteins, it was found that the G494R mutation changed the charge of the amino acid at this site from neutral to positive (Fig. 5c,d). Figure 5e exhibited the wildtype protein model highlighting substitution regions. Modeled mutant proteins was superimposed on the wild protein by PyMOL for comparison of three-dimensional structure of wild and mutant proteins (Fig. 5f). Almost all nsSNPs resulted in the structural drifting, further confirming by energy refinement. We calculated root-mean-square deviation (RMSD) values for all mutant models ( Table 3). The value means the average distance of α-carbon backbones between mutant and wild model. The structure deviation between mutant and wild protein was positively correlated with RMSD value. The model of the G107R mutation had the greatest deviation with 1.730B RMSD value followed by P885L, G702S, L829P, G494R and P724L with 1.573B, 1.504B, 1.496B, 1.469B and 1.468B RMSD values, respectively. Others had slight changes including G736S (1.367B RMSD), L723P (1.133B RMSD), and A684T (1.106B RMSD).
The changes of amino acid substitutions on the size, hydrophobicity, structure and so on of wolframin were predicted by HOPE (Table 3). All 13 nsSNPs brought about changes in size of amino acids (10 larger and 3 smaller) and 5 nsSNPs resulted in change of charge. Besides, 4 nsSNPs decreased the hydrophobicity. It is speculated that these changes can lead to changes of intramolecular interactions so that affect the function of wolframin protein. The changes of solvent accessible surface areas (SASA) were analyzed by the Accessible Surface Area and Accessibility Calculation for Protein (ver. 1.2) online server, which is considered as an important factor in protein folding and stability studies. According to SASA analysis, the similar residual fluctuations were shown between the wild and mutant protein (Fig. 6a). SASA parameter is proven that the protein is accessible to other ligand and/or proteins. As shown in Fig. 6b, there were 385 (43.26%) amino acids on the surface, 270 (30.34%) in the core and 235 (26.40%) in other parts of wolframin protein. The proportion of protein surface amino acids was increased except for P724L.The proportion of protein core amino acids was decreased in all 13 WFS1 high-risk pathogenic nsSNPs mutations, meaning that more amino acids were exposed and eventually may have harmful effects on interaction with other proteins.
In the next step, we selected two novel nsSNPs (G695S and E776K) that have not been reported (Fig. 7a,b). All the novel variants increased or decreased hydrogen bonds (Fig. 7c,d). In the wild type, Gly695 has a hydrogen bond with Tyr660 and Leu829, respectively. In the mutant type G695S, the original hydrogen bond distances are changed and a hydrogen bond between Ser695 and Glu694 is added. Wild type has two hydrogen bonds between Glu776 and Arg708, Arg805, respectively. The variant E776K eliminates hydrogen bonds between Lys776 and Arg805. Changes in hydrogen bonds may influence the stability and intramolecular interactions of the wolframin protein, then causing diseases.    (Table S3). At high confidence score 0.700, the number of average node degree, nodes and interaction number of edges were 4.57, 21 and 48 respectively (Fig. 8a,b). It is shown that WFS1 interacts with ATP1A1, ATP1A2, ATP1A3 and ATP1B1, then connecting with Na/K-ATPase. It is also named as sodium/potassium adenosine triphosphatase or ATP1A protein. ATP1A consists of a α subunit and a β subunit and plays an important part in keeping the electrochemical gradient on the cell membrane. ATP1A1, ATP1A2 and ATP1A3 belong to the α subunit, and ATP1B1 belongs to the β subunit. Furthermore, WFS1 interacted with ATF6, ATF6B and XBP1, which were involved in the unfolded protein response (UPR) and endoplasmic reticulum stress (ERS), and maintained interactions with Ca 2+ -associated folding factors (HSPA5, HSP90B1/GRP94, and CALR) and other chaperones (ERN1 and DNAJC3). Among the other interacting proteins of the WFS1, CISD2 and TMEM38A play a vital part in regulation of cytosolic Ca 2+ homeostasis, and ADCY8 is essential for activating the glucose-induced signaling pathways in beta cells. WFS1 www.nature.com/scientificreports/ also interacts with the insulin release-related proteins (HHEX and CDKAL1), which are strongly associated with genetic risk variants for diabetes. www.nature.com/scientificreports/

Discussion
We screened out 13 high-risk nsSNPs of WFS1 gene, in which 11 nsSNPs had been reported in the literature. It is noteworthy that one nsSNP (L829P) is associated with non-syndromic hearing loss and eight nsSNPs (G107R, A684T, G702S, L723P, P724L, G736S, G736R and P885L) are associated with Wolfram syndrome (WS) [24][25][26][27] . The G494R and R732H were reported as the variants of uncertain significance. Wolfram syndrome is an autosomal recessive disorder, and its clinical features are diabetes insipidus, diabetes mellitus, optic atrophy and deafness. The mutational studies of Wolfram syndrome reported most pathogenic variants were located in transmembrane region and carboxy tail of wolframin protein, inside exon 8. However, the nsSNP (c.319G > C, p.G107R) was detected in two siblings from Southern Italy with Wolfram syndrome (WS), inside exon 4 28 . We obtained two novel nsSNPs (G695S and E776K) from WFS1 high-risk nsSNPs. We speculate that they are highly likely to be pathogenic mutations, because:(1) they were all predicted to be highly harmful by all predictive tools; (2) they all had highly score 9 in conservation analysis; (3) they could all lead to decreasing protein stability; (4) they could all cause changes in amino acid properties and tertiary structure. However, clinical literature reports and other evidence are needed to verify their pathogenicity.
In order to further explore the potential pathogenic mechanism of WFS1 high-risk nsSNPs, we analyzed the stability, conservation, the physical and chemical properties, tertiary structure through many software tools. The conservation analysis showed that all 13 high-risk nsSNPs were highly conserved in the WFS1, regarded as massively damaging, because the residues of the conserved domain have an important effect on biological process such as interactions among proteins. The wild-type residues of G107R, G494R, G695S, G702S, G736S and G736R are glycine, which is flexible enough to make torsion angles. Their mutant residues cause torsion angles to be unusual so that force the local backbone into an incorrect conformation and disturb the local structure. As the wild-type residues of P724L and P885L, proline has a very rigid structure, thus inducing a special backbone conformation which might be required at corresponding positions. These mutations with leucine residue maybe disturb the local structure and function of wolframin protein.
According to multiple studies, the wolframin protein have a vital function on the following aspects: (1) interaction with Na+/K + ATPase β subunit 29 ; (2) regulation of the ER stress response 30 ;(3) regulation of the cellular calcium homeostasis 31 ; and (4) regulation of insulin production and secretion from pancreatic β-cells 32 . The same conclusion can be drawn from the WFS1 protein interaction network analyzed by STRING. The C-terminal region of wolframin is located on the cytoplasmic side of the ER membrane, adopting a folded confirmation. It can interact with the C-terminal region of the ER-localized Na + /K + ATPase beta1 subunit, which is important for subunit maturation. Na + /K + ATPase deficiency is known to be responsible for apoptosis and neural degenerative disease. If a similar association exists within the inner ear, amino acid substitutions may result in hearing loss in this way. The state of accumulation of misfolded and unfolded proteins in the organelle is ER stress. The unfolded protein response (UPR), also called the ER stress signaling network, can deal with ER stress in cells. Wolframin can negatively regulate the ER stress signaling network through interaction with the master regulators of the UPR (such as ATF6). Under normal conditions, WFS1 recruited ATF6α to an E3 ligase, HRD1, and the proteasome, prevents ATF6 activation and promotes ATF6 ubiquitination and proteasomal degradation. WFS1 also can reduce the expression of ATF6α target genes, for example HPSA5/GRP78/BiP and XBP-1. In patients with Wolfram syndrome, because of the variation of WFS1, ATF6 is hyperactivated, leading to dysregulated ATF6 signaling pathway. Wolframin can modulate the filling state of the ER Ca 2+ store to participate in the regulation of cellular Ca 2+ homeostasis. Once a variant occurs in WFS1, ER stress is strongly induced, and endolymphatic ion composition and homeostasis are disrupted, which leads to deafness. The C-terminal segment of wolframin protein in ER lumen bind to vesicular cargo proteins including proinsulin directly. The pathogenic variants in the domain may disrupt the interaction and result in abnormal accumulation of proinsulin in endoplasmic reticulum, which impede insulin secretion and proinsulin processing. www.nature.com/scientificreports/  www.nature.com/scientificreports/ In conclusion, the bioinformatics analysis is useful to efficiently identify high-risk nsSNPs. Pathogenicity of some high-risk WFS1 nsSNPs has been confirmed by pedigree and genetic analysis, but further vivo and vitro functional studies are required to verify the accuracy of our methods.

Data availability
The data supporting the results reported in the article can be found in the Supplementary Information files. Web resources: dbSNP database, http:// www. ncbi. nlm. nih. gov/ proje cts/ SNP/; ClinVar database, https:// www. ncbi.