Target-Driven Evolution of Scorpion Toxins

It is long known that peptide neurotoxins derived from a diversity of venomous animals evolve by positive selection following gene duplication, yet a force that drives their adaptive evolution remains a mystery. By using maximum-likelihood models of codon substitution, we analyzed molecular adaptation in scorpion sodium channel toxins from a specific species and found ten positively selected sites, six of which are located at the core-domain of scorpion α-toxins, a region known to interact with two adjacent loops in the voltage-sensor domain (DIV) of sodium channels, as validated by our newly constructed computational model of toxin-channel complex. Despite the lack of positive selection signals in these two loops, they accumulated extensive sequence variations by relaxed purifying selection in prey and predators of scorpions. The evolutionary variability in the toxin-bound regions of sodium channels indicates that accelerated substitutions in the multigene family of scorpion toxins is a consequence of dealing with the target diversity. This work presents an example of atypical co-evolution between animal toxins and their molecular targets, in which toxins suffered from more prominent selective pressure from the channels of their competitors. Our discovery helps explain the evolutionary rationality of gene duplication of toxins in a specific venomous species.

The adaptive evolution of the scorpion α -toxin family has been an intriguing model for studying functional diversification of proteins via accelerated substitutions in the bioactive surface 13,14 . The pioneering analysis of this family by maximum likelihood (ML) models of codon substitution identified 11 sites evolved by positive selection 13 . Subsequently, a similar analysis was reported based on sequences from more scorpion species 15 . To investigate the evolutionary significance of positive selection of toxins in a specific lineage, Zhu et al. re-analyzed α -toxins from two sibling scorpion species (M. eupeus and M. martensii) and detected nine positively selected sites (PSSs) that are associated with functional diversification of Mesobuthus α -toxins 14 . More recently, Sunagar et al. showed that α -toxins from several Australian scorpion species had experienced episodic influence of positive selection with 14 sites evolved by positive selection 16 . Despite these remarkable progresses, it is still unclear what factors have driven the adaptive evolution of these toxins in a specific species.
Here, we report the discovery of PSSs in scorpion α -toxins from a single species (Mesobuthus martensii) and their locations in the toxin-channel interface. In combination with structural and evolutionary analyses of Na v channels from both prey and predators of scorpions, we provide convincing evidence for an atypical co-evolutionary manner between scorpions and their competitors, in which toxin-bound regions of the ion channel evolved by relaxed purifying selection to accumulate sequence mutations may act as a driver of the adaptive evolution of toxins. Our work therefore addresses a key question regarding the evolutionary rationality of the presence of multiple paralogous α -toxins in a scorpion's venom.

Results and Discussion
Positively Selected Sites of M. martensii Sodium Channel Toxins. The completion of the whole genome sequencing of the first scorpion species (M. martensii) 17 allows us to gather nearly all scorpion α -toxin genes of this species to analyze their evolutionary selection pattern and potential driving force. A total of 29 non-redundant toxin sequences were collected and aligned, which cover all three different pharmacological subgroups, e.g. α -like BmKM1 and BmKM10, insect-specific BmKα IT1, and classic BmKα TX11 (Fig. S1). To test positive selection of the M. martensii α -toxin multigene family, we employed maximum likelihood (ML) models of codon substitution to identify potential PSSs (sites with a nonsynonymous to synonymous substitution rate ratio, d N /d S = ω , > 1 significantly) 18 . As shown in Table 1, the ML estimates (MLEs) under M0 predicts that all amino acid sites have a ω of 0.85, approximately equals 1, indicative of neutral selection. However, M0 fits the data worse than other models, suggesting that ω values (i.e. selective pressure) may be heterogeneous among sites. The MLEs under M2a suggest that 27% of sites are under positive selection with ω = 2.47 ( Table 1). The likelihood ratio test (LRT) statistic between M1a and M2a is 16.6 (2Δ l), much greater than critical values from a χ 2 distribution with degree of freedom = 2 (p < 0.001), indicating the presence of PSSs. M2a did identify 10 PSSs with P ≥ 0.9 by methods of the Bayes Empirical Bayes (BEB) ( Table 1) and the Naïve Empirical Bayes (NEB) ( Table S1). The test using M7 and M8 models results in a highly similar conclusion. Although M8 predicts two more PSSs, 10 of them are identical to those predicted by M2a (Table 1 and Table S1). It is remarkable that these PSSs are scattered in the primary structure of the toxin, yet they cluster in the functional regions previously characterized [19][20][21][22] , in which six (15E, 17A, 18R, 37Q, 38W and 39V, numbered according to BmKM1) are seated on the core-domain comprising two loops: B-and J-loop; and others on the NC-domain (Fig. 1). When compared with the PSSs identified previously [13][14][15] , it was found that the majority of them were overall similar in positions, suggesting that Na v channel toxins from different scorpion species might suffer from similar selective pressure. Analysis of amino acid composition of the PSSs revealed their high variability ( Fig. 1), as identified by nearly all PSSs containing different polar and non-polar amino acids and some even are occupied by residues with opposite charges. For example, at site 15 there are four different types of amino acids (Glu, Gly, His and Phe), varying from small neural glycine to large aromatic phenylalanine, and from positively charged histidine to negatively charged glutamic acid.
PSSs Located at the Toxin-Channel Interface. Previous experiments have indicated that two extracellular loops linking S1-S2 and S3-S4 in the domain IV VSD of Na v channels (L DIVS1-S2 ; L DIVS3-S4 ) are main regions of site 3 directly interacting with the core-domain of α -toxins 23,24 . The pore module of domain I in Na v channels forms a secondary site by the interaction with the NC-domain of toxins 22,[24][25][26][27] . To observe the location of the PSSs in the interface of toxin-channel, we constructed a complex model between BmKM1, an extensively studied and structurally known M. martensii α -like toxin, and the DIV VSD of rNa v 1.2 using the ZDOCK server 28 . The channel structure used here is the one previously successfully used in exploring the binding mode of a scorpion α -toxin (AaHII) 29 . As shown in Fig. 2, in the five top ZDOCK models, BmKM1 similarly binds to the VSD (Fig. 2) and we thus selected the top 1 model for further analysis.
In the complex structure, four toxin-VSD residues pairs were identified to contact directly, including: 1) Glu15/Ala17-L1611; 2) Arg18-Glu1613; 3) Trp38-Thr1560; and 4) Lys62-D1554, where pairs 2 and 4 form salt bridges to stabilize the complex structure (Fig. 3A). Evidences supporting the reliability of this complex come from the following observations: 1) Functional importance of the toxin residues (i.e. sites 15, 17, 18, 38 and 62) in the interface of the complex has been verified by prior mutagenesis data in multiple toxins, such as BmKM1, Lqh2, Lqh3 and Lqhα IT [19][20][21][22][23][24]30 ; 2) Thr1560 and Glu1613 have been found to be key residues of rNa v 1.2 implicated in Lqh2 binding and Glu1613 was previously identified as a primary component of the receptor site for α -toxins 24,25,31 ; 3) Pairs 1, 2 and 4 were also observed in the AaHII-rNav1.2 VSD model that was built based on atomistic molecular dynamics simulations 29 ; 4) The overall orientation of the toxin relative to the VSD in our complex is similar to models of Lqh2-rNav1.2 25 and AaHII-rNav1.2. In these three complexes, residues from the toxin's J-loop are close to L DIVS3-S4 of the channel and residues from the toxin's B loop to L DIVS1-S2 (Fig. 3A). Apart from the four residues mentioned above implicated in direct contact with the channel, two additional PSSs (Q37 and V39) are also located on the interface of the complex (Fig. 3B). Only one exception in our complex is F1610, a   channel residue derived from L DIVS3-S4 of rNa v 1.2 conferring to toxin binding 25 , is a bit far away from the interface (Fig. 3B). A possible explanation is that this residue directly interacts with the toxin through a conformational adjustment, as proposed in the Lqh2-rNa v 1.2 model 25 . Alternatively, this residue might have a synergistic role with its adjacent residue (e.g. L1611) to bind to the toxin (Fig. 3B).

Models p l
Parameter estimates 2Δl PSSs    32 . To investigate whether these prey and predators have driven positive selection of scorpion α -toxins, we analyzed the evolutionary pattern of their Na v channels in the toxin-bound region, which include 84 members from birds, 28 from lizards, 27 from mammals and 45 from insects ( Fig. S2-Fig. S5). Firstly, we used the same ML models of codon substitution to test selective pressure in the VSD from the four different lineages (birds, lizards, mammals and insects) and, unexpectedly, we detected no any  Table 5. Parameter estimates and likelihood ratio statistics (2Δl) for insect sodium channel genes. signals for their adaptive sequence evolution. The MLEs under M0 suggest that average ω ratios for overall sequence pairs in these lineage ranged from 0.02 to 0.09, far smaller than 0.85 in the toxin gene (Tables 1-5), indicating strong purifying selection on the VSD, consistent with the prediction of M2a and M8 that detected no sites evolved by positive selection. Although M8 fits the data better than M7 in the bird Na v channel (2Δ l = 11.6, 0.002 < p < 0.005), the extra class of sites (3%) has a ω s = 1 rather than > 1 ( Table 2). In the mammalian and insect Na v channels, the MLEs under M8 gave ω s > 1, but their proportion (p1) equals 0 (Tables 4 and 5). Hence, no site was positively selected in these genes. Comparison of the log likelihood values revealed that M0 fits the data worse than other models in all the lineages (Tables 2-5), supporting the presence of variable selective pressure among sites of the VSD. We therefore calculated ω values for each site in the VSD of different lineages by Selecton, a server for detecting evolutionary forces at a single amino-acid site, under M8a 33 . This model is a variation of M8 with the ω s of the extra class sites set to 1 other than > 1 in M8 18,34 and thus suitable for testing purifying and neutral selection in each site (0 < ω ≤ 1). As shown in Fig. 4, each site has different evolutionary rates and in particular a significantly higher ω was observed in sites belonging to the two extracellular loops (L DIVS1-S2 and L DIVS3-S4 ) than those in their respective adjacent helical segments in the predators (Fig. 4). In insects, a similar case was also observed in L DIVS1-S2 . The elevated evolutionary rates in the loops suggest they are under lower selective constraints than the helices, evidence for relaxed purifying selection. This could be ascribed to the distinct physiological role of VSD (DIV) in the Na v channel fast inactivation, in which the helices are structurally and functionally important elements 35 and they thus are expected to be constrained by stronger purifying selection to maintain their sequence and structural conservation. On the contrary, the two loops can tolerate more amino-acid substitutions in the absence of functional constraint.
To provide further evidence for relaxed purifying selection in the toxin-bound regions (L DIVS1-S2 and L DIVS3-S4 ) of Na v channels, we employed six fixed-sites models (A to F, from simple to complex) developed by Yang and Swanson 36 to compare their ω values with those of the non-toxin-bound regions (S1-S4 and the intracellular loop) from the four different lineages. As shown in Supplementary Tables 2 to 5, in all the lineages, the three more complex models (D, E, and F) that assume different ω ratios between two partitions convergently fit the data better than the three more simple models (A, B, and C). For example, the LRT statistics between C and E are 14.4 for birds, 33.0 for lizards, 43.2 for mammals, and 12.4 for insects (p < 0.005) (Table S2-Table S5). These models all suggest that the two partitions have very different ω values and overall the toxin-bound regions evolved two folds more quickly than its neighboring regions (Fig. 5), in line with the opinion of relaxation of purifying selection in these extracellular loops (Fig. 4).

Amino Acid Variability in Toxin-Bound Regions.
Given the extensive existence of co-evolution between proteins and their interaction partners (e.g. ligand-receptor pairs) 37,38 , the lack of matched PSSs in the toxin-bound region of Na v channels is puzzling as strong positive selection signals exist in the channel-bound region of the toxins (Fig. 3). To address this puzzling, we further analyzed sequence conservation in the VSD using WebLogo, a web-based application designed to make the generation of sequence logos 39 . The results indicate that the two loops (L DIVS1-S2 ; L DIVS3-S4 ) are highly variable in birds, lizards and mammals relative to their adjacent transmembrane helices (S1-S4) that show more conservation (Fig. 6A). In insects, more variability was also found in L DIVS1-S2 . In parallel, we calculated the sequence logo of the toxin family, confirming the variability of the two positively selected loops (B-and J-loop) (Fig. 6B). The evolutionary variability of the toxin-bound region in VSDs of the Na v channels is further confirmed by ConSurf, an algorithmic tool for the identification of variable and conserved regions in proteins by surface mapping of phylogenetic information 40 . As shown in Fig. 7A, BmKM1 binds to the two variable loops of the mammalian VSDs primarily via its PSSs (shown in blue). In the other two vertebrate predators (birds and lizards), the variability also occurs in similar regions of their VSDs. In accordance with the sequence logo, the insects have only one variable loop (Fig. 7B). These results suggest that the amino acid variability observed within the VSD in predators and prey of scorpions is a consequence of relaxed purifying selection, leading to their higher evolutionary rates.

Channel's Variability Driving Adaptive Evolution of Scorpion Toxins.
In the toxin-channel complex, we have established several pairs of interactions, in which three channel sites located in the two loops are involved in interaction with the PSSs of toxins, including 1560, 1611 and 1613 (Fig. 3), and two of them exhibit a side-chain variability: 1560: E/K/Q/T/V in birds, D/E/I/K/T/V in lizards, E/D/K/S/T/V in mammals, N/S/T in insects; 1613: E/D/G/K in birds, E/D/G/K/Q in lizards, A/D/E/G/T in mammals. Besides these point mutations, L DIVS3-S4 also contains some insertion/deletion (indel) mutations in birds and mammals (Figs. S2-S4). Collectively, these variable toxin-bound sites derived from the predators and prey may drive accelerated changes of their interacting residues (PSSs) in scorpion toxins for maintaining abilities in both defense and attack during evolution (Fig. 8). These observations also account for evolutionary rationality of gene duplication of scorpion toxins in a specific species since multiple toxin members may facilitate to deal with a diversity of competitor species via rapidly changing their bioactive surfaces by positive selection (Fig. 8). Given that the majority of animal toxins exist in a multigene form 41 , our finding may be of general significance in understanding forces driving the evolution of these toxins in diverse venomous species.
The conservation analyses presented here indicate that the predators may exert greater selective pressure on scorpions than their prey as the toxin-bound region of these predators' sodium channels display higher variability than their insect counterparts via point mutations and indels. Higher evolutionary divergence rates of the VSD in these vertebrate predators could be also related to their gene numbers. In insects and other invertebrates, Na v channels exist as single or very low copy genes 9,42,43 and this could lower their divergence rates among orthologous genes. In contrast to the prey's counterpart, Na v channels in the predators have undergone gene expansion to form a multigene family with differential tissue distribution 9,42,43 . In mammals, Na v 1.1-Na v 1.3 are mainly expressed in the central nervous system (CNS); Na v 1.4 in skeletal muscles; Na v 1.5 in cardiac muscles; Na v 1.6 in both central and peripheral nervous system; Na v 1.7 in the peripheral nervous system; Na v 1.8 and Na v 1.9 in sensory neurons. Therefore, to cope with these diverse Na v channels from different tissues of the large-sized predators, scorpions need to evolve more paralogous toxins with altered bioactive surfaces (Fig. 8). However, it was found that scorpions seldom use venom to prey on small-sized insects 32 , in line with the opinion that insects may exert smaller impact to scorpions during evolution. The discovery that scorpion α -toxins bind to evolutionarily variable regions of their targets is of rather importance in that it suggests that mutations at these regions by relaxed purifying selection might be a key factor driving the adaptive evolution of scorpion α -toxins. The potential role of purifying selection in promoting the evolution of protein-protein interfaces from cytochrome c oxidase I was recently reported 44 .

Conclusions
Rapid evolution driven by positive selection has been observed in a variety of gene families, such as viper phospholipase A2, spider HSP70, vertebrate antimicrobial cathelicidin, neurotoxins from scorpions and cone snails 13,45-48 . However, driving forces responsible for their evolution remain unsolved. Based on a combination of analyses of the evolution of both toxins and channels, we provide evidence in favor of the variability of toxin-bound region in Na v channels from predators and prey of scorpions as a driver for accelerated evolution of toxin functional regions following gene duplication (Fig. 8).
It has been proposed that co-evolution between predators and prey might engender evolutionary forces to motivate rapid diversification of functional surfaces of toxins 49 . However, the discovery of relaxed purifying selection in the toxin-bound region of Na v channel raises a possibility of non-classical co-evolution, in which toxins are more selected to evolve rapidly for deterring physically huge predators 50,51 . On the contrary, the channels of predators and prey appear to experience no selective pressure from scorpions, as identified by the lack of positive selective signal in their toxin-bound regions. In addition to scorpion toxins, toxins from other venomous animals also target the two evolutionarily variable loops of Na v channels 52 . Our conclusion might thus be of general significance in understanding the evolutionary rationality of gene duplication in venomous animals. Na v channels from three mammalian species (Mus musculus, Rattus norvegicus and Homo sapiens), each consisting of nine paralogous genes, 45 from insects of eight orders, 84 from birds, and 28 from lizards. Nucleotide sequences were aligned by Clustal X (1.83) and corresponding alignments were used to construct neighbor-joint (NJ) trees for maximum likelihood (ML) models of codon substitution.
Multiple sequence alignments of toxins and channels generated by Clustal X were used to create sequence logos, a graphical representation for depicting conserved and variable regions within a protein family, by WebLogo 39 . To calculate conservation scores for sites in the M. martensii α -toxins and the VSD of Na v channels from different lineages (birds, lizards, mammals and insects), we used the ConSurf program (http://consurf.tau.ac.il/) to analyze their amino acid sequences under default parameters. Similar to WebLogo, ConSurf also identifies both conserved and variable regions of a protein family, but it depends on not only sequence alignment but also phylogenetic tree. A Bayesian tree was constructed with the JTT evolutionary substitution model for each data 53 . Also, Consurf can clearly define the variability and conservation of proteins on their molecular surfaces.
Positive Selection Analysis. It is generally accepted that the nonsynonymous to synonymous substitution rate ratio (ω = d N /d S ) > 1 significantly is key evidence of protein adaptive evolution. To make a reliable test of positive selection, we employed two pairs of maximum likelihood (ML) models (M1a/ M2a and M7/M8) to measure selective pressure among sites of the M. martensii α -toxins and the Na v channels from different evolutionary lineages, in which the two null models (M1a and M7) do not allow sites with ω > 1 whereas their alternative models (M2a and M8) assume the existence of such sites (ω > 1) 54 . As M1a and M7 are nested within their respective alternative models (M2a and M8) and have two more parameters, the χ 2 distribution may be used for a likelihood ratio test (LRT) to compare the fit between the two competing models. The Naïve Empirical Bayes (NEB) and the Bays Empirical Bayes (BEB) analyses were then used to calculate the posterior probabilities of ω classes for each site. The BEB is an improvement of NEB as it accounts for sampling errors in the ML estimates of parameters in the model 18 . Sites with a high probability (≥ 90%) of coming from the class with ω > 1 are likely to be under positive selection and those located in the core-domain of scorpion α -toxins were used for next analysis.
The ω values for each sites of the VSD in different lineages were calculated on the Selecton server 33 under the M8a model 34 and ω values between different partitions of the VSD were compared by fixed-sites models implemented in PAML 36 . In these analyses, the two extracellular loops of VSD for toxin binding is considered as one part while the remaining region as another part. Six fixed-sites models were used here: Model A, totally identical parameters including branch lengths, transition/transversion rate ratio κ , ω , and the nine parameters for the codon frequencies; Model B, different codon frequencies but equal different κ , ω and codon frequencies between two partitions; Model C, different codon frequencies and branch lengths but equal different κ and ω between two partitions; Model E, different κ , ω , branch lengths and codon frequencies between two different partitions; and Model F, separate analysis which assumes these two partitions have independent substitution parameters. The likelihood ratio text can be used to determine whether or not κ and ω are identical between two partitions 36 . Molecular Docking. ZDOCK (version 3.0.2), a Fast Fourier transform (FFT)-based, rigid-body protein-protein docking program 28 (http://zdock.umassmed.edu/), was employed to construct the complex model between BmKM1 and the rNa v 1.2 VSD. ZDOCK searches all possible binding modes in the translational and rotational space between the two proteins and performs scoring calculations based on a combination of statistical potential of interface atomic contact energies (IFACE), shape complementarity and electrostatics 55 . The atom coordinates of BmKM1 (PDB entry 1SN1) and the previously published structure of the VSD of rNa v 1.2 (residues 1520-1645) 29 were used as inputs for ZDOCK calculations. On the basis of previous mutational data both from the toxins and the Na v channels, we specified binding site residues for filtering output predictions, which include sites 15, 17, 18, 38, and 40-43 in scorpion α -toxins (BmKM1, Lqh3, Lqhα IT, and Lqh2) 14 ; and sites 1560, 1610, 1613, 1617-1620 in the VSD of Na v channels (rNav1.2 and rNa v 1.4) 25,56 . All these sites are associated with the toxin-channel interaction, as identified by their mutations significantly affecting the binding of at least one toxin or channel mentioned here.
In terms of structural displays, unless otherwise indicated, structural figures presented here were prepared by PyMOL (www.pymol.org).