Strong and widespread action of site-specific positive selection in the snake venom Kunitz/BPTI protein family

S1 family of serine peptidases is the largest family of peptidases. They are specifically inhibited by the Kunitz/BPTI inhibitors. Kunitz domain is characterized by the compact 3D structure with the most important inhibitory loops for the inhibition of S1 peptidases. In the present study we analysed the action of site-specific positive selection and its impact on the structurally and functionally important parts of the snake venom Kunitz/BPTI family of proteins. By using numerous models we demonstrated the presence of large numbers of site-specific positively selected sites that can reach between 30–50% of the Kunitz domain. The mapping of the positively selected sites on the 3D model of Kunitz/BPTI inhibitors has shown that these sites are located in the inhibitory loops 1 and 2, but also in the Kunitz scaffold. Amino acid replacements have been found exclusively on the surface, and the vast majority of replacements are causing the change of the charge. The consequence of these replacements is the change in the electrostatic potential on the surface of the Kunitz/BPTI proteins that may play an important role in the precise targeting of these inhibitors into the active site of S1 family of serine peptidases.


Results and Discussion
Snake venom Kunitz/BPTIs are encoded by multigene families. By screening of Vipera ammodytes genomic library we isolated four different Kunitz/BPTI genes. Five additional Kunitz/BPTI genes were obtained by PCR amplification of V. ammodytes genomic library. The lengths of the PCR amplified genes were 1449-1456 bp, while more than 2 kb of genomic clones have been sequenced. Seven V. ammodytes Kunitz/BPTI genes are chymotrypsin inhibitors since P1 amino acid residue is leucine (L), while two genes (clones 562E and 648) encode trypsin inhibitors since a typical lysine (K) is located at the P1 site 8 (Suppl. Fig. 1).
To determine the gene structure of V. ammodytes Kunitz/BPTI genes, we first obtained the sequence of genomic DNA. A comparison of the V. ammodytes Kunitz/BPTI genomic sequence with the corresponding cDNA sequences 10 revealed that the genes contain three exons (a 5′ exon, an internal exon, and a 3′ exon) interrupted by two introns, which possess canonical GT-AG splice sites. First exon encodes signal peptide and 4 amino acid residues of a propeptide, major part of the protein is encoded in the second exon, while the remaining 8-13 amino acid residues are encoded in the third exon together with the 3′ UTR ( Fig. 1, Suppl. Fig. 2). V. ammodytes Kunitz/BPTI genes possess the same exon-intron organization as the other known snake Kunitz/BPTI genes [19][20][21][22][23] (Suppl. Table 1). The lengths of exons in all V. ammodytes Kunitz/BPTI genes range in size from 85 to 162 bp. The first and the third exon are the most conserved exons of all Viperidae Kunitz/BPTI genes. Both introns of the V. ammodytes Kunitz/BPTI genes are located in positions homologous to those occupied by the introns of the snake Kunitz/BPTI genes [19][20][21][22][23] (Suppl. Table 1). In Kunitz/BPTI genes of venomous snakes, the comparison of the sizes and nucleotide sequences of all introns shows a very high level of conservation (Suppl. Table 1). The CR1 LINE element is present in the second intron of V. ammodytes Kunitz/BPTI genes.
New snake venom Kunitz/BPTI genes have been searched in different WGS databases. In draft genomes of Vipera berus, Crotalus mitchellii and Ophiophagus hannah we have found numerous novel genes with Kunitz domain. Some of them contain single Kunitz domain while others encoded both Kunitz and WAP domains. Since the genome of C. mitchellii contain quite short contigs, we have obtained only four partial genes with Kunitz domain. In draft genomes of V. berus and O. hannah we have found several full length Kunitz/BPTI genes and Kunitz-WAP genes (Suppl. Table 1).
KU-WAP genes contain 4 exons and 3 introns. It is apparent that the exon encoding WAP domain is inserted into the intron 2 of Kunitz/BPTI gene (Fig. 1). Interestingly, exons 1 and 2 and intron 1 of Kunitz/BPTI genes are nearly the same in KU-WAP genes as well as the last exons (exons 3 and 4, respectively). Kunitz/BPTI and KU-WAP genes have highly similar sequences and gene organization. In contrast to the Kunitz/BPTI genes, the introns 2 and 3 of KU-WAP genes are longer, with some sequence similarities of intron 2 from Kunitz/BPTI genes. The lengths of introns 2 and 3 are quite variable (Suppl. Table 1).
Tandem duplications of Kunitz/BPTI MFs in Viperidae snakes-evidence from V. berus draft genome. The availability of a few snake draft genomes allowed us to examine not only gene sequences but also the locations and orientations of Kunitz/BPTI and KU-WAP genes in Viperidae snake genomes. We found that V. berus possess three KU-WAP genes within a single contig (accession number JTGP01134455, size of this contig is 19,647 bp). The presence of three KU-WAP genes in the same genomic contig is highly suggestive of intrachromosomal gene duplications. These three KU-WAP paralogs in the V. berus genome occur next to one another in a head-to-tail orientation.
The only data on tandem duplications of Kunitz/BPTI genes are available from the ticks 24 , where the authors found 46 dispersed genes and 9 tandemly duplicated genes. However, the latest tick (Ixodes scapularis) genome data available on Ensemble (http://metazoa.ensembl.org/Ixodes_scapularis/Info/Index) revealed the opposite, with tandem duplications of Kunitz/BPTI genes prevailing in this genome. This also shows that the inference about tandem duplications of genes is highly dependent on the quality of the genome assembly and the lengths of genomic contigs. However, the short lengths of genomic contigs of Viperidae snakes prevent an in-depth insight into the real extent of tandem duplication events of the Kunitz/BPTI genes. As evident from the phylogenetic tree ( Fig. 2), there were numerous duplication events in three Viperidae species with genomic data, and at least a few of them were in tandem.
Evolutionary relationships in snake Kunitz/BPTI multigene families. A ML phylogeny of the snake Kunitz/BPTI multigene families (MF) has been made, with the inferred best fit model of amino acid sequence evolution for these proteins, which was JTT+ G (Fig. 2). In the phylogenetic analysis we included Kunitz/BPTI proteins from venomous snakes only, since we focused on the Kunitz/BPTI genes which were functionally diversified (as evident from the presence of MF for these genes). From the snake Kunitz/BPTI ML tree we can infer how old the genes are and how and where they originated. A number of species-specific paralogs can be recognized in the ML tree, such as in Daboia, Bungarus etc. These paralogs group together in a species-specific manner, indicating that they are quite young gene duplicates. We also found older genes, where paralogs show dispersed pattern of distribution and are present in diverse species that belong to different snake families (Viperidae, Elapidae and even Colubridae). By searching with V. ammodytes trypsin inhibitor we found highly conserved genes that show up to 80% amino acid identity in Elapidae and Colubridae, which indicates that they belong to older Kunitz/ BPTI genes that are about 50 My old 30 . These genes are indeed KU-WAP genes. ML phylogeny of venomous snake Kunitz/BPTI multigene families has shown that the observed diversity of these proteins is the consequence of several rounds of gene duplications at different time points. It is very likely that the observed diversity of venomous snake Kunitz/BPTI multigene families was strongly influenced by the predator/prey arms race.
Strong positive selection is acting on the snake Kunitz/BPTI family. Evolution of snake Kunitz/ BPTI genes was first evaluated under several standard models of sequence evolution as implemented in the Selecton. We used the site-specific model that accounts for the rate variation across the sites (Suppl. Table 2). Visualization of site-specific ω estimations under the M8 and MEC models was obtained by translating the scores to a discrete color scale and their projection onto the 3D structure of textilinin-1 11 (PDB: 3BYB) using the PyMol program 32 (Fig. 3).
Positive selection, as detected by the Selecton, has been conclusively supported by the Single Likelihood Ancestral Counting (SLAC) 33 , Fixed Effects Likelihood (FEL), Fast Unconstrained Bayesian AppRoximation Figure 2. ML phylogeny of the snake venom Kunitz/BPTIs. Phylogenetic analysis of the snake venom Kunitz/ BPTI genes was conducted using the translated amino acid sequences of the exon 2. We used MEGA6 53 to select the best model of sequence evolution for the snake venom Kunitz/BPTI family. The best fit model of amino acid substitution for our data set was determined as JTT + G. ML tree represents the bootstrap consensus following 500 replicates, nodes with confidence values greater than 50% are indicated. As an outgroup, we used bovine BPTI. Phylogenetic analyses were performed with the program MEGA6 53 .
(FUBAR) 34 , Mixed Effects Model of Evolution (MEME) 35 and the integrative selection analyses, implemented through the Datamonkey server 29 . MEME, SLAC, FEL, FUBAR and integrative approach have identified a very similar set of positively selected codons (Suppl. Table 3). The MEME model found the largest number of positively selected codons, followed by the SLAC, FUBAR, and FEL (Suppl. Table 3). Some sites were identified only by the Selecton models or by the Datamonkey models (Suppl . Tables 2 and 3). Evidence provided by various analyses (M8, MEC, SLAC, FEL, MEME, FUBAR and integrative selection analyses) revealed the strong influence of positive diversifying selection on the snake venom Kunitz/BPTI genes.
We compared the amount of positively selected sites in snake Kunitz/BPTI inhibitors with the only two previous studies that also included analysis of positively selected sites in Kunitz/BPTIs, one in vampire bats 25 and the other in ticks 24 . No positively selected sites were found in tick Kunitz/BPTI inhibitors; however, they were found in tick channel inhibitors that evolved from the Kunitz/BPTI inhibitors 24 . The number of positively selected sites under M8 model in two-domain Kunitz/BPTIs of vampire bat is indeed much smaller than in snake venom Kunitz/BPTI proteins. In vampire bat Kunitz/BPTIs 4 positively selected sites were found under the M8 model in the first Kunitz domain and 5 in the second Kunitz domain. From our analysis it is evident that snake venom Kunitz/BPTI inhibitors possess the highest number of positively selected sites in the Kunitz/BPTI family. Our results clearly confirm that positive selection in snake venom Kunitz/BPTIs is much more influential than in vampire bat or ticks both in terms of the number of amino acids under selection and in the strength of selection.
Positively selected amino acids in snake Kunitz/BPTI genes are located in functionally important regions. As a means to investigate selection patterns and to assess their influence on the structure and function of these proteins, we mapped the sites under selection on the 3D structure of snake Kunitz/BPTI protein textilinin-1 11 (PDB: 3BYB). The numbering of amino acid positions is the same as in the textilinin-1. The locations of major structural elements in the textilinin-1 are the following: 3 10 11 .
In order to obtain some insights into the roles that positive selection might play, we mapped positively selected amino acids onto the 3D structure of the textilinin-1. We have mapped 18 positively selected sites detected under M8 model in all snake Kunitz/BPTI data set and found that five amino acids are located in the canonical loop Mapping of the positively selected codons that have been identified by MEME 35 , SLAC 33 , FEL, FUBAR 34 and integrative approach has shown that they are mainly located in the same locations as in the M8 and MEC models (Suppl . Tables 2 and 3 The role of electrostatics in the function and evolution of snake venom Kunitz/BPTIs. Protein surface properties, such as electrostatic potential, play important roles in specific protein-protein and protein-ligand recognition 36 . Proteins and protein surfaces that interact with one another have electrostatic complementarity, and this property of protein-protein interaction is important for defining specificity. Intermolecular interactions involve associations between surfaces with complementary electrostatic potential. All protein charges, and not just the salt bridges or surface charges, contribute towards the value of electrostatic complementarity at the interface 37 . The distribution of positively selected sites on the 3D Kunitz/BPTI structure has shown an interesting pattern. We demonstrated the presence of positively selected sites in the inhibitory loops 1 and 2, which form a binding interface. These loops are the most important parts of the Kunitz/BPTI proteins for the inhibition of the S1 serine peptidases 6,16 . The remainder of the Kunitz/BPTI structure forms the inhibitory scaffold, which is important in maintaining the loop conformation 3,16 . Surprisingly, we found unexpectedly high numbers of positively selected sites in the scaffold (Suppl. Table 4). Under all models, the positively selected sites in the scaffold strongly outnumber the positively selected sites in the inhibitory loops.
What is the consequence of the positively selected sites in the Kunitz/BPTI scaffold? When we investigated the alterations of charge and polarity for the amino acids identified as being under positive selection, the majority of amino acid sites exhibited at least one charged state alteration (Suppl. Fig. 3). Thus, the substitutions in positively selected sites could contribute with greater potency to the variations of the electrostatic potential on the surface of the snake venom Kunitz/BPTI proteins.
The program TreeSAAP v.3.2 38 has been used to detect positive selection based on the physiochemical changes in protein sequences. The sliding window analysis in TreeSAAP identified 20 amino acid properties to be under positive destabilizing selection in the snake venom Kunitz/BPTI data set (P < 0.001) ( Fig. 4; Suppl. Fig. 4). The following 8 amino acid properties are highly important for the electrostatics of the snake venom Kunitz/BPTI: hydropathy, surrounding hydrophobicity, isoelectric point, polar requirement, polarity, solvent accessible reduction ratio, long range energy as well as the short and medium range energy. These properties belong to the following categories: hydrophobicity, ionization constants, polarity and polarizability, solvent accessibility and  non-bonded energy ( Fig. 4; Suppl. Fig. 4). The remaing 12 amino acid properties belong to the following categories: molecular size and composition, secondary structure and tertiary structure ( Fig. 4; Suppl. Fig. 4) and are not important for the electrostatics of the snake venom Kunitz/BPTI proteins.
The above analyses have demonstrated that the selection pressure has favored the acquisition of a distinct distribution of surface electrostatic potential in the snake venom Kunitz/BPTIs for more efficient inhibition of the diverse prey S1 serine peptidases. To better understand the potential functional changes during the evolution of snake venom Kunitz/BPTI multigene families, we analysed their 3D surface electrostatic potentials. Figure 5 shows the electrostatic surface potentials for a few snake venom Kunitz/BPTIs. We found that the distribution of surface electrostatic potential is markedly different among these proteins. The electrostatic potential of the surface might be affected more frequently with amino acid substitutions in the positively selected sites.
The BPTI fold is indeed quite rigid, without any dynamic conformational changes upon or before the inhibition of the target S1 peptidase 3 . Enzymes and their inhibitors have coevolved to form an interface with a high degree of surface complementarity 39 . Studies of the trypsin-BPTI complex have demonstrated the striking electrostatic and steric complementarity between the interacting surfaces in a "typical" protein-protein interaction. The driving forces for the binding of BPTI to trypsin are thus a complementary surface fit and electrostatic complementarity. The combination of the attractive and repulsive forces may help guide BPTI into the correct position for binding before actually contacting the active site 40 .
It is apparent that the consequence of the functional diversification in the snake venom Kunitz/BPTI multigene families has been the alteration of the electrostatic potential on the surface. The functional consequences Figure 5. Electrostatic potentials of the Vipera berus Kunitz/BPTI inhibitor (VbeJTGP01198494), Textilinin-1 (3byb) and beta-bungarotoxin B2 chain (1bun). The electrostatic potential of the models was calculated by the Adaptive Poisson-Boltzmann Solver (APBS) 58 using the PyMOL plug-in with the default parameter settings. ± 5 kT/e electrostatic potential of the Vbe Kunitz/BPTI, Textilinin-1 and beta-bungarotoxin B2 chain was plotted on the solvent-accessible surface in PyMOL. The model of Vipera berus Kunitz/BPTI inhibitor (VbeJTGP01198494) was generated using Modeller 59 with the structure PDB 3byb as a template. All images were prepared using The PyMOL Molecular Graphics System (Schrödinger, LLC). can be associated with interactions and the recognition of the snake venom Kunitz/BPTIs with the diverse prey S1 serine peptidases.

Why snakes need numerous and diverse Kunitz/BPTI inhibitors in their venoms?
The presence of diverse and numerous Kunitz/BPTI inhibitors in snake venoms (encoded by multigene families) represents an unusual situation 10,41,42 . It was demonstrated long ago that Kunitz/BPTI inhibitors are broad spectrum inhibitors of S1 family of serine peptidases 4,5,8,43 . In vertebrates S1 family peptidases belong to at least 8 putative clades (cell-mediated immunity, kallikrein, plasmin/HGF, degradative clades 1 to 3, fibrinolytic and ≫clotting and humoral immunity≪ ) 44 . In addition to trypsin and chymotrypsin (members of the degradative 1 and 2 clades), Kunitz/BPTI inhibitors can inhibit diverse representatives of serine peptidases from the majority of above mentioned clades 9,11,43 . The differences in the inhibition are attributable to the specificity and affinity of inhibition, and the inhibitory range can vary from micro to pico/nanomolar 45 . S1 family of serine peptidases is the largest family of peptidases 46 and can reach particularly large numbers in metazoans (from 100 to nearly 500) as well as in vertebrates (from 100 to over 300), where mammalian representatives − mouse (208), rat (217) and human (199)-possess quite large numbers of S1 peptidases (data obtained from the Merops Db). Since the vertebrate prey of venomous snakes possess very large numbers of S1 peptidases (from 100 to few hundreds), it is apparent why venomous snakes have large Kunitz/BPTI multigene families that can easily cope with the diverse S1 peptidases in their prey. It is interesting that the majority of venomous snakes also possess the combined KU and WAP domains, because both domains are inhibitors of S1 peptidases 47 . In this way the venomous snakes can easily increase their inhibitory range and the efficiency against the diverse prey S1 peptidases.
In snake venoms, Kunitz/BPTI inhibitors play important roles in inhibiting of quite large repertoires of prey S1 peptidases. Nevertheless, the snakes can also protect themselves in the case of attacks. In fact, the snake venom Kunitz/BPTI inhibitors and prey S1 peptidases are caught in a constant arms race-as is the case for other snake venom components 48 . Since venomous snakes attack diverse prey or use venom for defense, the cocktail of diversified Kunitz/BPTIs can serve as a good source of effective inhibitors of the S1 family of serine peptidases.

Conclusions
Although we demonstrated adaptive evolution in snake venom Kunitz/BPTI proteins 10 , the exact mechanism of their evolution has remained unknown. In the present study we used a large collection of snake venom cDNAs and genes encoding Kunitz/BPTI family of proteins to analyse the action of site-specific positive selection and their impact on the structurally and functionally important parts of the Kunitz fold. By using diverse models we demonstrated the presence of large numbers of site-specific positively selected sites that can reach between 30-50% of the Kunitz domain. The mapping of the positively selected sites on the 3D model of Kunitz/BPTI inhibitors has shown that these sites are located in the structurally most important part of the molecule, in the inhibitory loops, but also in the Kunitz scaffold. Amino acid replacements are localized exclusively on the surface of the molecule, and the vast majority of replacements are causing the change of the charge. The consequence of these replacements is the change in the electrostatic potential on the surface of the Kunitz/BPTI proteins that may play an important role in the precise targeting of these inhibitors into the active site of S1 family of serine peptidases. The presence of the multigene families of Kunitz/BPTIs in venomous snakes can be explained by the target-oriented arms race, since the number of S1 peptidases in vertebrate preys can reach up hundreds of representatives. Since Kunitz/BPTIs are broad spectrum inhibitors, they can be functionally diversified to target numerous and diverse S1 peptidases in their prey. Comparison of the Kunitz/BPTI inhibitors from venomous snakes with ticks and vampire bats has demonstrated that they experienced strong and widespread action of site-specific positive selection only in the venomous snakes.

Materials and Methods
Screening the genomic library, cloning and sequencing of V. ammodytes Kunitz/BPTI genes.
The genomic DNA library from V. ammodytes 49 , prepared in phage λ GEM 12, was screened with a 155 bp long fragment of the V. ammodytes trypsin inhibitor. The preparation of 35 S-labelled probe was described previously 10 . The library was screened by the plaque hybridization method 50 under the following conditions: 24 h hybridization at 42 °C in hybridization buffer (6 X SSC (NaCl/Na-citrate), 50%(v/v) formamide, 5 X Denhardt's solution, 0.5% (w/v) SDS and 500 μ g denatured herring sperm DNA. The filters were washed in 4 X SSC/0.1% SDS for 15 min and 2 X SSC/0.1% SDS for 5 min at room temperature. Positive clones were purified by repeating the screening procedure. Phage DNA was isolated from plate lysates 50 and digested with Eco RI, Hind III, Sac I and Bam HI restriction enzymes. Restriction fragments were separated by 0.7% agarose gel electrophoreses. Positive fragments were determined by Southern blotting using 35 S-labelled probe described above and ligated into pUC19 vector. The inserts were sequenced on both strands with an ABI 310 sequence analyzer using BigDye chemistry (Applied Biosystems). electrophoresed on 1% agarose gel and cloned into pGEM-T by pGEM-T Easy Vector System I kit (Promega). PCR amplified V. ammodytes Kunitz/BPTI genes were sequenced as described above.

PCR amplification of V. ammodytes
Database mining. The databases analyzed were the nonredundant, EST, TSA and WGS databases at the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov). Comparisons were performed using the diverse BLAST tools 51 , with the E-value cutoff set to 10 −5 and other parameters to default settings. Snake Kunitz/BPTI proteins, cDNAs or genes have been used as queries. DNA sequences were translated using the Translate program (http://web.expasy.org/translate). Orthologs and paralogs of the snake Kunitz/BPTI family have been identified in NCBI snake transcriptome, genome and proteome databases. The reference set of all representatives of the snake Kunitz/BPTI family is available in the Supplementary Table 5.
Phylogenetic analysis of snake Kunitz/BPTI multigene families. We analysed 100 Kunitz/BPTI (including KU-WAP) genes and cDNAs, representing 32 venomous snake species (Suppl. Fig. 5). For the phylogenetic analysis we used 22 novel Kunitz/BPTI genes from four venomous snakes (V. ammodytes, V. berus, C. mitchellii and O. hannah). The Kunitz domain in the newly discovered representatives of the Kunitz/BPTI family has been identified using the SMART (http://smart.embl-heidelberg.de/), InterPro (https://www.ebi.ac.uk/interpro/) and Pfam (http://pfam.xfam.org/) domain databases. The protein sequences were aligned using Clustal Omega 52 . Molecular phylogenetic analyses of snake Kunitz/BPTI genes were conducted using the translated amino acid sequences. We used MEGA6 53 to select best models of sequence evolution for the data from snake Kunitz/BPTI family. The best fit model of amino acid substitution for our data set was determined as JTT + G according to either BIC (Bayesian information criterion), AICc (Akaike information criterion) or lnL. Phylogenetic trees were reconstructed using the neighbor-joining (NJ) method 54 and the maximum likelihood (ML) method 55 . The reliability of the resulting topologies was tested by the bootstrap method, 500 bootstrap replications were used. Condensed ML tree is shown as a Suppl. Fig. 6. As an outgroup, we used bovine BPTI. Phylogenetic analyses were performed with the program MEGA6 53 .

Selection analyses. The Selecton server (Server for the Identification of Site-Specific Positive and Purifying
Selection, version 2.4; http://selecton.tau.ac.il/index.html) 27,28 has been used to identify the ratio of nonsynonymous and synonymous substitutions (dN/dS; termed ω ) at each codon site based on an empirical Bayesian method. The pressure of selection can induce either purifying or positive selection at specific areas of the sequence where sites with ω values significantly higher or lower than one are an indication of positive or purifying selection, respectively 26 . 100 aligned coding sequences (exon 2) of the snake Kunitz/BPTI genes have been analysed. To identify codons under positive selection, we used two tests: M8 vs. M8a and MEC vs. M8a. M8 assumes a beta distribution, from 0 to 1, of ω for sites and an additional class of sites under positive selection (ω > 1), while M8a acts as a null model by fixing this last class of sites at ω = 1. Following these analyses, a likelihood ratio test was conducted on each model pair to determine if there were significant likelihood gains by allowing positive selection. Nested pairs of models (M8a and M8) and a non-nested pair (M8a and MEC) were compared using the likelihood ratio test implemented in the Selecton program. The results from Selecton version 2.4 are visualized with seven-color scale for representing the different types of selection. To identify the statistically significant levels of the results, the LRT was conducted to compare the nested models 56 . The Bayes empirical Bayes (BEB) approach 57 was used to identify amino acids under positive selection by calculating the posterior probabilities that a particular amino acid belongs to a given selection class (neutral, conserved or highly variable). Sites with greater posterior probability (PP ≥ 95%) of belonging to the 'ω > 1 class' were inferred to be positively selected.
We further assessed the impact of positive selection using additional codon models to estimate the rates of synonymous and nonsynonymous substitutions 33 . Four methods implemented on the DATAMONKEY web server 29 (http://www.datamonkey.org/), Single Likelihood Ancestral Counting (SLAC), Fixed Effects Likelihood (FEL), and FUBAR 34 , were used. After reconstructing ancestral sequences, SLAC compares normalized expected and observed numbers of synonymous and nonsynonymous substitutions per variable site. FEL compares the instantaneous synonymous site rate (α ) and the instantaneous nonsynonymous site rate (β ) on a per site basis, without assuming a prior dN/dS distribution. Sites with cut-off values of P < 0.1in SLAC and FEL and PP < 0.9 in FUBAR were considered as candidates to have evolved under positive selection. In all analyses performed in DATAMONKEY, the most suited model of evolution for each data set, directly estimated on this web server, was used. In addition, the mixed effects model of evolution (MEME), a branch-site method incorporated in the DATAMONKEY server, was used to test for both pervasive and episodic diversifying selection 35 . MEME is a generalization of FEL but models variable ω across lineages at individual sites, restricting ω to be ≤ 1 in a proportion p of branches and unrestricted at a proportion (1 − p) of branches per site. Positive selection was inferred with this method for a P value < 0.05. PyMOL 32 was used to visualize the structure and locations of the positively selected sites that have been identified by the likelihood ratio tests as well as for the visualization of the electrostatic potential maps.
Investigating selective influences. Amino acid substitutions have a wide range of effects on a protein depending on the difference in physicochemical properties and location in the protein structure. This approach provides further resolution to differentiating between types of selective pressures with the ability to detect positive and negative and stabilizing and destabilizing selection and offers insights into the structural and functional consequences of the identified residues under selection. Significant physicochemical amino acid changes among residues in snake Kunitz/BPTI genes were identified by the algorithm implemented in TreeSAAP 38 , which compares the observed distribution of physicochemical changes inferred from a phylogenetic tree with an expected distribution based on the assumption of completely random amino acid replacement expected under the condition of selective neutrality. The evaluation of the magnitude of property change at nonsynonymous residues and their location on a protein 3D structure may provide important insight into the structural and functional consequences of the substitutions. Eight magnitude categories (1 to 8) represent one-step nucleotide changes in a codon and rank the correspondent variation in a property scale of the coded amino acid. Categories 1 to 3 indicate small variation in the amino acid characteristics while categories 6 to 8 represent the most radical substitutions. By accounting for the property changes across the data set, a set of relative frequencies changes for each category is obtained allowing to test the null hypothesis under the assumption of neutral conditions [65]. The categories for which the observed numbers of amino acid replacements in the data set is significantly different from the null model (z-scores > 1.645; P < 0.05) are considered as being potentially affected by selective pressures. Here we focus on amino acid differences that correspond to radical physicochemical variation (positive-destabilizing selection) and are expected to be linked with significant changes in function. TreeSAAP categorizes each amino acid site by positive and negatively destabilizing using 31 properties. To detect strong directional selective pressure, only changes corresponding to categories 6 to 8 (the most radical property change categories) at the P ≤ 0.001 level were considered. The total number of changes per site is the sum of those occurring in each branch of the phylogeny.