Computational pipeline to probe NaV1.7 gain-of-function variants in neuropathic painful syndromes

Applications of machine learning and graph theory techniques to neuroscience have witnessed an increased interest in the last decade due to the large data availability and unprecedented technology developments. Their employment to investigate the effect of mutational changes in genes encoding for proteins modulating the membrane of excitable cells, whose biological correlates are assessed at electrophysiological level, could provide useful predictive clues. We apply this concept to the analysis of variants in sodium channel NaV1.7 subunit found in patients with chronic painful syndromes, by the implementation of a dedicated computational pipeline empowering different and complementary techniques including homology modeling, network theory, and machine learning. By testing three templates of different origin and sequence identities, we provide an optimal condition for its use. Our findings reveal the usefulness of our computational pipeline in supporting the selection of candidates for cell electrophysiology assay and with potential clinical applications.

1 Variants List Table S1 shows the list of mutations of the PAT group and their characteristics. Table S2 shows the same information for the NEUTRAL group.  Table S1: List of deleterious mutations of the PAT group with related information and frequencies among individuals: gpos is the genomic position, cpos is the coding position, rsID is the genetic variation code as reported in dbSNP, Freq is the allele frequency in annotation databases.  Table S2: List of NEUTRAL variants with related information and frequencies among individuals: gpos is the genomic position, cpos is the coding position, rsID is the genetic variation code as reported in dbSNP, Freq is the allele frequency in annotation databases. Mammals: pseudo mutations identified among SCN9A homologous genes from mammalian species sharing > 90% nucleotide sequence identity. Human: genetic variants from dbSNPs with uncertain significance or benign that do not alter the biophysical properties of the channel.

Templates alignments
In this section we present the alignments of the WT sequence (NCBI code NP 002968.1 ) with the MOESM3, 6A90 and 6J8J templates sequences. All the alignments have been performed by using Clustal Omega.

Alignment WT -MOESM3
In this section we present the alignment of the WT sequence with the MOESM3 template sequence. The sequence identity is 50.8%.

Alignments WT -6A90
In the following we present the alignments of the WT sequence and the 6A90 sequence. We also present the alignments of their four domains separately.

Alignment of the first domain DI
For the first domain the sequence identity is 43%.

Quality assessment: detailed analysis
In this section we report further results concerning the quality assessment of the three-dimensional structures obtained with our computational pipeline. Each structure obtained by FG-MD was subjected to an quality evaluation with the tool QMEANBrane, which has a special scoring function designed for membrane proteins. As reported in the paper, QMEANBrane shows that the produced models are of high quality within each domain area, while, in the inter-domains loops area, the reliability of the models are significantly lower. Figure S1 shows the quality results for the MOESM3 template. In particular, part (a) depicts the structure of the WT and part (b) shows the quality values along peptide sequence: the higher values correspond to positions falling into the transmembrane region. Moreover, part (c) of Figure S1 shows the RAMPAGE results for the MOESM3 WT. The analysis considers only the protein segments corresponding to the transmembrane region, i.e. all the α-helices. It turns out that 95.8% amino acids fall within a favorable region, 4% fall within a permitted region and only one amino acid falls in the forbidden region.
The same analysis is reported in Figure S2 for template 6J8J: it shows that the percentage of amino acids in the favorable zone is 98.7%, while amino acids in the permitted area are 0.9% of the whole modeled sequence.
To complete the quality assessment, the local quality values of all the considered point mutations are reported in Tables S3 and S4 for the MOESM3  template; in Tables S5 and S6 for the 6A90 template and in Tables S7 and S8 for the 6J8J template.
A further quality check was performed on the Ramachandran plot of the WT structure of the three considered templates: for each genetic variant of the two groups PAT and NEUTRAL, we check if the corresponding amino acid is present in a high quality portion of the protein reported by QMEANBrane. The results are reported in Tables S9 and S10.  Table S3: QMEANBrane results for the models generated by the MOESM3 template: quality value of each point mutation in the PAT group before (WT) and after (mutated) the amino acid change.

Energy Landscape
A pivotal step, immediately following homology modeling, is the step involving the energy minimization. It is well known the connection between function and structure of a protein, therefore mutations that cause similar physiological changes will also have similar structures. We used the same starting geometry (reference template) for all the protein sequences that we modeled. And the basic idea is that the energy minimization step, by taking into account the punctual amino acid differences, would have led the models to divide into two groups characterized by similar geometries. Indeed, these two groups of similar geometries correspond precisely to the two groups having different physiological behaviors (see Figure S3).      Table S8: QMEANBrane results for the models generated by the 6J8J template: quality value of each point mutation in the NEUTRAL group before (WT) and after (mutated) the amino acid change.

Graph kernels and Dominant Set further results
In this section we complete the presentation of the kernels results by showing the similarity matrices and the dominant set results not included in the main paper. We recall that each similarity matrix has rows and columns numbered in the range 0-84, which are the mutations ids already shown in the main paper: according to the ids list, note that ids 0-29 are relative to pathogenic mutations while ids 30-84 identify mutations not associated with pain disorders. Each cell (i,j) in a matrix shows the similarity value between the i-th and j-th RINs. The lighter is the cell color the more similar are the two graphs (the main diagonal shows always the lightest color, being the result of the comparison of a graph with itself). The first analysis examines the role of each interaction separately, with the aim of checking their contribution to the pattern observed in the whole RINs comparison. Figures S4 and S5 show the similarity matrices of the WL kernel for the MOESM3 template and 6A90 template, respectively. Note that interactions H-bond, Van Der Waals and Ionic, taken separately, are in agreement with the pattern observed in the comparison of the whole RINs as shown in the main paper.
Additional insights can be obtained from unsupervised learning techniques applied to the obtained similarity matrices. Figure S6 shows in the (a) and (b) diagrams, the Dominant Set (DS) results for templates MOESM3 and 6J8J, respectively. The diagrams are composed of 85 rows labelled with the ids of considered variants and two columns: one showing the classification resulting from the application of the Dominant Set method and the other one showing the correct classification, which is known, and distinguish between pain related (PAT) mutations (dark color) and neutral (NEUTRAL) variants (yellow). The DS classification of the two templates are in line with their WL kernel results. In particular, template MOESM3 obtains a good classification: among the pathogenic mutations only two are misclassified. Concerning template 6J8J, since the kernel is not able to discriminate the two clusters, it is not a surprise that also DS does not show a good result.
6 Considering other human NaV1.7 templates We considered three further templates that stem from the paper by Xu et al (Cell 2019) and refer to the following structures that model only the voltage-sensor domain II (VSD2) of Nav1.7. The PDB entry 6N4Q represents the protein in the activated state and is in complex with a spider toxin; the PDB entry 6N4R represents NaV1.7 in deactivated state, again in complex with a spider toxin. Neither of these two cases refer to the closed state and hence cannot be compared with those of the present study. Instead, the PDB entry 6N4I is is a chimeric structure that model the NaV1.7 protein in closed state. As we further elaborated below, however, the quality of this template is significantly lower compared to the other human template 6J8J and it cannot be used efficiently in our computational pipeline. We start by superimposing the two structures in Figure S7 with 6J8J in cyan and 6N4I in gold.
The 6N4I template is obtained via X-ray crystallographic diffraction with 3.54Å resolution and is formed by only 245 amino acids that belong to the second domain. Such structure has been then replicated by symmetry in order to model the other three domains and complete the structure of the sodium channel. Instead, the 6J8J template represents 1193 amino acids out of the nearly 2000 of the whole protein sequence. We compared the two templates 6N4I and 6J8J and calculated their Root Mean Squared Deviation (RMSD): while the RMSD between 128 selected and pruned amino acids is 1.25 A, the total RMSD is larger that 35 A.
This notwithstanding, we have used Swiss-model to perform homology modelling using the WT sequence and the 6N4I template, in a way akin to that we did for the 6J8J template in the original manuscript. The resulting structure is depicted in Figure S8.
As noticeable, the geometry of the sodium channel is quite different with respect to the original template at variance with what was happening when using the 6J8J template. We ascribed this result both to the significantly lower quality of the original crystal of the 6N4I deposited structure and to the much lower number of represented amino acids. Hence, the structure of 6N4I is clearly insufficient to be used for homology modeling.