Molecular diversity and evolutionary trends of cysteine-rich peptides from the venom glands of Chinese spider Heteropoda venatoria

Heteropoda venatoria in the family Sparassidae is highly valued in pantropical countries because the species feed on domestic insect pests. Unlike most other species of Araneomorphae, H. venatoria uses the great speed and strong chelicerae (mouthparts) with toxin glands to capture the insects instead of its web. Therefore, H. venatoria provides unique opportunities for venom evolution research. The venom of H. venatoria was explored by matrix-assisted laser desorption/ionization tandem time-of-flight and analyzing expressed sequence tags. The 154 sequences coding cysteine-rich peptides (CRPs) revealed 24 families based on the phylogenetic analyses of precursors and cysteine frameworks in the putative mature regions. Intriguingly, four kinds of motifs are first described in spider venom. Furthermore, combining the diverse CRPs of H. venatoria with previous spider venom peptidomics data, the structures of precursors and the patterns of cysteine frameworks were analyzed. This work revealed the dynamic evolutionary trends of venom CRPs in H. venatoria: the precursor has evolved an extended mature peptide with more cysteines, and a diminished or even vanished propeptides between the signal and mature peptides; and the CRPs evolved by multiple duplications of an ancestral ICK gene as well as recruitments of non-toxin genes.

Family/cluster identification. In the present study, CRP toxins were classified into 24 families based on the alignment of precursors and their known or predicted cysteine framework. The formation of disulfide bonds stabilizes the three-dimensional (3D) structures of toxins, and is commonly used to classify toxins.
Family 1-8. The full primary sequences of the CRPs in the Family 1-8 are comprised of a signal sequence (19-25 residues) and a propeptide (11-19 residues) preceding the mature toxin sequence. The N-and C-terminus of mature peptides are highly variable regions. The 11 members of Family 1 are homologs of Kappa-SPRTX-Hv1c, including its five different precursors (κ-SPRTX-Hv1c_1-5). Signal peptide mode of CRPs in the family is 'MKh 12 Sh 5 ' , where 'h' indicates hydrophobic residue, the Arabic numerals denote the number of residues, and capital letters indicate the corresponding amino acids. The propeptides of Family 1 are 19 residues with highly conserved DEQR as an endoproteolytic site preceding the mature peptides, named the Processing Quadruplet Motif (PQM) 22 . Mature peptide mode of Family 1 is 'XC I X 6 C II X 5 C III C IV X 4 C V X 3 C VI X 4-6 ' , where 'X' is any amino acid. On the C-terminal of the mature peptides, there is 'GK' as the amidation site. The characters of the signal peptides, propeptides and mature peptides of Family 2-8 are compared with those of Family 1, shown in Table 1. According to homology analysis, the mature peptides of Family 1-8 are speculated as the 'classical' Inhibitory Cystine Knot (ICK) motif containing three disulfide bonds with I-IV, II-V and III-VI connectivity. The first two disulfide bonds (I-IV and II-V) form an embedded ring which is threaded by the third disulfide bond (III-VI). The backbone regions between successive Cys residues are referred to as loops, numbered starting with loop 1 between Cys I and Cys II 23,24 . There is less amino acid sequence divergence in loop 1 and 3 than in the much more variable loop 2, 4, N-and C-terminus in mature peptides. The precursors of Family 1-7 have higher similarity with those from the same species than others. Only the sequences of CRPs in the Family 8 show high homology with U23-ctenitoxin-Pn1a and U4-agatoxin-Ao1a from Phoneutria nigriventer and Agelena orientalis respectively (Supplementary Information File 1, Suppl- Fig. 1). Recently, there are also 8 superfamilies of 6-cys ICK motif toxins identified in Hadronyche infensa 25 .
Family 9. The precursors of Family 9 have a high content of acidic amino acid in putative mature peptides with a novel Cys scaffold 'C I X 5 C II X 3 C III X 5 C IV XC V XC VI ' . Since no significant homologous sequence has been found in public protein databases, posttranscriptional processes such as alternative splicing or post-translational modifications remain uncertain. In this case, the dotted border indicates the putative short propeptides (Supplementary Information File 1, Suppl- Fig. 2). The motif of Family 9 is the first time reported from spider venom.
Family 10. Family 10 includes eleven homologous sequences which are characterized by two consecutive Cys residues in the middle of signal peptide and that is straight followed by the mature region with a cys-scaffold 'C I X 13 C II X 2 C III X 12 C IV X 3 C V X 8 C VI ' . The scaffold in mature peptide has been identified as a conserved domain pfam01147 (representative proteins gi: 221468699, 5921747 shown in Supplementary Information File 1, Suppl- Fig. 3), which includes all known crustacean hyperglycemic hormones (CHHs) found in the sinus gland of isopods and decapods 26 and the molt-inhibiting hormone (MIH) of the lobster Homarus americanus 27 . The three disulfide bridges are C I -C V , C II -C IV , and C III -C VI 28 . In addition, the amino acid sequences of several translated cDNA (gi: 304306070, 304307035, 304306844, 304306583) from Loxosceles intermedia venom gland library 29 Table 1. Sequence diversity of the predicted 6-cys ICK motif toxins in H. venatoria. In the signal peptide mode, 'h' indicates hydrophobic residue, and the Arabic numerals denote the number of residues. Capital letters indicate the corresponding amino acids. 'C' is cystine, 'X' is any residue but cystine, and 'W/O' denotes no putative C-terminal propeptides.

Family
Signal peptide mode Length of propeptides/PQM Mature peptide mode C-terminal propeptides 1 MKh 12 S h 5 19/DEQR XC I X 6 C II X 5 C III C IV X 4 C V X 3 C VI X 4-6 G(K) 2 MKT h 11 S h 5 15/AVAR X 2 C I X 6 C II X 5 C III C IV X 4 C V X 3 C VI X 3 G 3 MKh 18 15/VAAR X 5 C I X 6 C II X 6 C III C IV X 3 C V X 3 C VI X 5 GK (K)   4 MKIT h 15 11/VQAR XC I X 6 C II X 6 C III C IV X 4 C V X 4 C VI X 6 GK 5 MKTTh 3 Th 6 Sh 5 15-17/ATGR X 3 C I X 6 C II X 5 C III C IV X 4 C V X 5 C VI X W/O 6 MKTTh 3 Th 6 Sh 5 15/VTGR X 2-4 C I X 6 C II X 5 C III C IV X 4 C V X 5 C VI X 3 RKX 4-5 7 MKh 5 Th 6 Sh 5 15/ITVR X 2-3 C I X 6 C II X 5 C III C IV X 4 C V X 9 C VI X 4 G 8 MKSSh 7 Th 4 Sh 2 EFTRS 12/VQER X 2 C I X 6 C II X 4 C III C IV X 4 C V X 8  www.nature.com/scientificreports/ are also similar to that of Family 10 ( Supplementary Information File 1, Suppl-Fig. 3). The latrodectins which are identified in widow spider venom glands, also share six conserved cysteines that adopt the same disulfide bond pairing in the mature peptide 30,31 .
Family 11, 12, 13 and 14. The four families have eight cysteines with a typical motif 'C I X 6 C II X n C III C IV X 4 C V X-C VI X n C VII XC VIII ' where X is any residue but cystine. However, the amounts and properties of residues in the loops between C II and C III , C VI and C VII and at the C-terminus vary greatly. The sequences of the signal and precursor proteins, as well as endoproteolytic sites, are also diverse. In Family 14, there is a long loop between C VI and C VII and a very short propeptide preceding the mature region. Significantly, there is no propeptide predicted in the precursor of U32-sparatoxin-Hv1a. The amino acid sequences of Family 11, 12, 13, and 14 are aligned with the most similar known homologs ( Supplementary Information File 1, Suppl-Fig. 4). The cysteine-frame is also employed by superfamily 3 and 23 of H. infensa venom. The eight cysteines are arranged in four disulfide bonds (C I -C IV , C II -C V , C III -C VIII , C VI -C VII ), which form an extended ICK motif 25 .
Family 15. Family 15 includes 20 unique sequences that are highly homologous. It is noteworthy that Family 15 includestranscripts coding a new venom peptide type with a high mRNA expression level in H. venatoria. There are 20 orthologs identified coding full-length cysteine frame in Family 15. The transcripts of U25-sparatoxin-Hv1c and U25-sparatoxin-Hv1j are 30 and 29 copies, respectively, which are the top two precursors identified in the H. venatoria venom cDNA library. There are twelve residues between the signal peptides and the first Cys, but no usual PQM. The mature region is characterized by a novel eight-Cys scaffold 'C I X 21 C II X-4 C III X 9 C IV X 10 C V X 11 C VI C VII X 4 C VIII ' . The transcript of a secretory protein with the identical Cys-frame has been identified from the black widow spider (gi: 318087504). However, it is hypothesized to be involved in wrapping silk fibers. Moreover, several hypothetical non-secretory proteins from Amblyomma maculatum also adopt the same eight-Cys framework (Supplementary Information File 1, Suppl- Fig. 5).
Family 16. The six precursors are highly homologous with Omega-agatoxin-1A (gi: 2507406) from Agelenopsis aperta containing a ten-Cys scaffold 'C I X 6/8 C II XC III X 6 C IV XC V X 7/11 C VI XC VII X 7 C VIII X 5 C IX X 19/20 C X ' , so they belong to the omega-agatoxin superfamily, which has a particularly exciting feature of the prepropeptide with the occurrence of two glutamate-rich sequences interposed between the signal sequences, the major peptide toxin, and the minor toxin peptide. The heterodimer of the two subunits is linked by a disulfide bond 32 (Supplementary Information File 1, Suppl- Fig. 6).
Family 17 and 18. Family 17 is homologous with U19-ctenitoxin-Pn1a (gi: 50401390), Hainantoxin-XIV-7 (gi: 310946827), HWTX-XIVa2 (gi: 166007861) precursor, and a toxin-like peptide (gi: 380692240) from G. rosea. Family 18 is homologous with U3-aranetoxin-Ce1a (gi: 27805756). The precursors in both families contain a signal peptide and a mature peptide with a ten-Cys-scaffold -like 'C I X n C II X 4 C III C IV X n C V X 9 C VI X n C VII XC VIII X-5 C IX X n C X ' . However, the residues are very different in the loops between cysteines. The loops of C IV -C V, C VI -C VII, and C IX -C X are longer in Family 18 than those in Family 17 (Supplementary Information File 1, Suppl- Fig. 7). Recently, the ten-Cys scaffold was reported as superfamily 2 in Australian funnel-web spiders H. infensa, named mamba intestinal toxin 1 (MIT1)-like toxin. The precursors of SF2 also have been identified with no propeptide region 25 .
Family 19, 20 and 21. All the three families are composed of a signal peptide and a mature region with a ten-Cys framework 'C I X 7 C II X 8 C III C IV X 4 C V X 5 C VI C VII X 3 C VIII X 3 C IX X 17 C X ' , which has a high degree of similarity to the Cysteine frame of U7-agatoxin-Ao1a (gi: 74845728) and U20-lycotoxin-Ls1a/c/d (gi: 313471673/313471696/313471677) from A. orientalis and L. singoriensis respectively. The amino acids in the loop between C VIII and C IX are conserved in the peptides even from different spider families (Sparassidae, Agelenidae and Lycosoidea). The sequences between C IV and C V are also conserved in the three families from the same spider H. venatoria. However, those in other spaces are less homologous, especially, the aminoacid sequences in the N-and C-terminus are much various and diverse in Family 19, 20, and 21 (Supplementary Information File 1, Suppl- Fig. 8).
Family 22. The predicted peptide sequences in Family 22 have a similar disulfide bonding pattern and structure to U9-agatoxin-Ao1a (gi: 74845712). Their typical Cys bonding pattern is 'C I X 6 C II X 3 C III XC IV C V X 5 C VI X-C VII X 4 C VIII XC IX X 8 C X X 6 C XI X 12 C XII ' . However, the sequences are much different in signal peptides, propeptides, and loops between the cysteines, and PQM is apparent in U9-agatoxin-Ao1a, rather than in the precursors of Family 22 from H. venatoria (Supplementary Information File 1, Suppl- Fig. 9).
Family 23. There are three secretory proteins with a long 12-Cys framework in Family 23 as follows: 'C I X-7 C II X 23 C III X 9 C IV X 7 C V X 22 C VI X 15 C VII X 11 C VIII X 11 C IX X 8 C X X 8 C XI X 22 C XI ' . The precursors had no homologs when they were aligned against the Database of GenBank, EMBL, and DDBJ. However, two sequences from the spider EST database were matched using TBLASTN, which have not been identified as toxins. The amino acid sequences of gi: 304306221 and gi: 189216028, which are in the cDNA library from L. intermedia venom gland and Acanthoscurria gomesiana, respectively, are homologous to U28-sparatoxin-Hv1a with 43% (E-value is 5e−07) and 48% (E-value is 1e−06) positives (Supplementary Information File 1, Suppl- Fig. 10). Family 24. The two precursors in Family 24 with only two cysteines 'C I X 6 C II X 16 ′ in the mature region have no significant sequence homolog in the Database of GenBank, EMBL, and DDBJ. The propeptides were predicted by using SpiderP (Supplementary Information File 1, Suppl- Fig. 11). Noteworthy, there are two different probable cleavage modes.
Mass spectrometry reveals complex PTM in spider peptide toxins. The peptide elution from RP-HPLC separation was collected and analyzed by MALDI-TOF mass spectrometry. There are several distinct components in each eluent on the retention time (RT). About 140 different peptide masses were detected, most of which fall in the 2800-5000 Da mass range and only 19 between 5000 and 7000 Da. Given the low abundance peptides may vanish in the process of isolation and purification, the crude venom was directly analyzed by www.nature.com/scientificreports/ MALDI-TOF mass spectrometry and resulted in 227 peptide masses, a considerable part of which are in the 5000-8000 Da mass range. Intriguingly, only a few of peptide masses (22) match to the theoretical molecular weights directly even though the disulfide bonds and C-terminal amidation are considered (Supplementary  Information File 2 and Supplementary Information File 3). Significantly, there are many abundant long precursors cannot match to any mass, which suggests that the post-translational modifications are prevalent and to be unscrambled in H. venatoria venom. There are several characters observed: Firstly, the C-terminal loops of most CRPs precursors (101 out of 154) are ≥ 5 amino acid residues. Secondly, the equivocal hydrolysis sites of propeptide are not the usual motifs recognized by SpiderP. For example, U1-sparatoxin-Hv1c and U23-sparatoxin-Hv are predicted with a long N-terminal loop (> 22 amino acids), respectively. Both the long C-terminal and N-terminal loops have more probabilities for processing. Thirdly, the propeptide occurs between the cysteines. For example, the precursor of U21-sparatoxin-Hv1a is speculated under post-translational proteolytic processing by proprotein convertases at two sites, one (EQAR) following the signal peptide, and the other (REEDELER) between the ninth and tenth cysteines ( Supplementary Information File 1, Suppl-Fig. 6).
Phylogenetic study of the CRPs in H. venatoria. The 151 precursor sequences of CRPs from H.
venatoria venom gland were aligned using Clustal X 2.0. The resulting alignment was imported into MEGA X software to construct the phylogenetic tree with the neighbor-joining method. All ambiguous positions were removed for each sequence pair (pairwise deletion option). There were a total of 205 positions in the final dataset. Most of the 6-cys ICK motif precursors (Family 1, 2, 3, 5, 6, and 7) were defined as the relatively original clade. Only Family 4 and 8, with shorter propeptides (12 aa), were placed outside the "older" clade. Remarkably, Family 8, whose signal peptides are different from and longer than other families with speculative 6-cys ICK motif, was located far away from the original clade precursors (Family 1, 2, 3, 5, 6, and 7). Intriguingly, four 8-cys ICK-like motif families (Family 12, 13, 14 and 15) were put in different clades. It is reasonable to put them in four families, although the cys-scaffold of the mature peptide looks similar. Family 17 and 18, which both adopt a ten-Cys-frame 'C I X n C II X 4 C III C IV X n C V X 9 C VI X n C VII XC VIII X 5 C IX X n C X ' in the mature peptide domains, were also arranged in two far-away phylogenetic clades (Fig. 2). The three loops (C IV -C V , C VI -C VII, and C IX -C X ) are longer in Family 18 than in Family 17.
The evolutionary selection of each Family was also conducted in MEGA X. The sequences of each family of H. venatoria venoms were analyzed and aligned, and the gap positions were omitted in the subsequent analysis. The nucleotide sequences of signal peptides, propeptides, and mature peptides of Family 1-10 and 15-23 were compared, respectively. The original Nei-Gojobori model (p-distance) was used to estimate the number of synonymous substitutions (Ds) of each synonymous site, the number of non-synonymous substitutions (Dn) of each non-synonymous site, and calculate ω (Dn/Ds) values. The p-value of each family was calculated and analyzed ( Table 2).
The ω values of both pro and mature peptide coding genes are over 1.0 only in Family 6, which suggests that the rate of non-synonymous mutation is significantly higher than that of synonymous mutation, and positive selection may increase the diversity of venoms in Family 6. The ω values of signal peptide, propeptide and mature peptide coding genes are under 1.0 with p-value ≤ 0.05 in almost all CRPs families except Family 6, which means most of CRPs are under negative selection. We also noticed that the ω values of some families are unreliable as their p-value ≥ 0.05, which may attribute to very few samples.

Common types of spider venom CRPs from multiple families.
In the present study, the spider CRPs discovered with Sanger sequencing and Edman degradation sequencing methods were gathered from Arachnoserver spider toxins database 18 . There are more than 28 scaffolds of the CRPs from 19 families, 55 species of spiders. The short CRPs with 6-cys are popular in Mygalomorphae, which present account for 82.5% and mainly include three kinds of scaffolds (ICK, disulfide-directed β-hairpin (DDH) and Kunitz). Relatively, the 6-cys peptides account for 22.2% in Araneomorphae which are mainly ICK-like motif peptides, a few CHH and low molecular weight protein (LMWP) motifs. The long CRPs with more than 6 cysteines account for 17.5% in Mygalomorphae, which present six scaffolds with 8-cys and one with 10-cys. However, the long CRPs are much more popular in Araneomorphae, which present account for 77.8% and include diversified scaffolds (Table 3).

Evolution trend analyses of the propeptides of CRPs and cysteines in mature peptide.
The high-quality cDNA libraries and full-length EST sequences from eight spiders, 4 of Mygalomorphae and 4 of Araneomorphae, were used for the analysis of propeptide and cysteines in the mature domain. The length of propeptide varies among the species. There are longer propeptides in Mygalomorphae than in Araneomorphae. The propeptides longer than 25 aa account for 72.7%, 90%, 53.6% and 53.8% in C. schmidti, C. hainanus, G. rosea and C. guangxiensis; respectively. By contrast, there are only 5%, 13.2% and 2.5% propeptides longer than 25 aa in L. singoriensis, D.mizhoanus and A.ventricosus, respectively, and none is longer than 25 aa in H. venatoria. By contrast, the percentages of precursors with a propeptide less than 10 residues are 4.6%, 5.7%, 17.8% and 29.2% in C. schmidti, C. hainanus, G. rosea and C. guangxiensis, respectively, and 47.7%, 43.4% and 92.5% in three species (H. venatoria, D. mizhoanus and A. ventricosus) of Araneomorphae. Although the ratio percentage of precursors with the shortest propeptide is only 10.4% in L. singoriensis, the precursors with a 10-25 aa propeptide account for 84.7% (Fig. 3).
As for the cysteines in the mature domain, there are 69.3%, 83.6%, 75%, 76.9% peptides with 6-cys motif in the four species of Mygalomorphae (C. schmidti, C. hainanus, G. rosea and C. guangxiensis), respectively. However, there is no 6-cys CRPs described in L. singoriensis and D. mizhoanus so far and 55% and 10% peptides with 6-cys motif in H. venatoria and A. ventricosus, respectively. On the contrary, there are more peptides with ≥ 8-cys  (Fig. 4). The styles of cysteine frames of CRPs with more than 6 cysteines in Mygalomorphae are much fewer than those in Araneomorphae. The evolution of CRPs expressed in the venom of H. infensa, a spider in Mygalomorphae, was recently shown largely by duplication and elaboration of a single ancestral knottin gene 25 . However, the CRPs in the venom of H. venatoria, a spider in Araneomorphae, evolved by duplication as well as recruitment. In the present study, Family 1-8, 11-14, 17 and 18 are predicted to comprise of simple or highly derived knottins and evolve by multiple duplications of an ancestral ICK gene followed by periods of diversification, which is the similar style employed by CRPs in funnel-web spider venom 25 . Furthermore, ten families with different cysteine frame predicted beyond the ICK scaffold are also identified in the venom of H. venatoria. Among them, four novel cysteine scaffolds (Family 9, 15, 23 and 24) are uncovered for the first time in spider venom.

Discussion
The construction of the cDNA library with ESTs Sanger sequencing approach has been proved to be a rapid and reliable method for discovering new genes and obtaining data on the gene expression of CRPs in venom glands, which are characterized as multigene displaying high similarity in part of their sequence 36 . In our group, second-generation sequencing technologies were applied to explore the diverse peptide toxins in venom of C. schmidti 37 and Latrodectus tredecimguttatus 38 , the sequencing assembly of which strongly relied on the ESTs    www.nature.com/scientificreports/ to a molecular mass with considerations of normal PTM such as the disulfide bonds and C-terminal amidation ( Supplementary Information File 2). The diversity of primary structure within the H. venatoria venom families suggests that there have been few evolutionary restraints on CRPs diversification outside of the disulfide bridges that direct the 3D fold of these peptides. These findings highlight the extensive diversity of CRPs in H. venatoria venoms which can not only provide important data for evolutionary analysis of CRPs in spider venom but also be exploited as novel therapeutic and biopesticide lead molecules.
In the present study, CHHs-like peptide genes are identified in Family 10 from the venom gland of H. venatoria, and that also widely distributed in very distantly related families: Agelenidae, Sicariidae, Theridiidae, Sparassidae, Pisauridae and Nephilidae. It is strongly suggested that CHHs-like peptides are derived from the superfamily of neuropeptides containing Crustacean Hyperglycemic Hormones (CHH) and independently recruited in spider venom glands 31 . The structure of U1-agatoxin-Ta1a from Eratigena agrestis was determined using heteronuclear NMR as a structural homolog of the CHH family recently, which confirms the molecular evolutionary analyses indicating that CHHs-like toxins are highly derived members of the ITP/CHH family 40 . The hormone-derived venom peptides, named HAND toxins, are among the most stable peptides ever described, which provide a novel molecular scaffold for engineering drugs and insecticides 40 . Family 15 from the venom gland of H. venatoria with a novel cysteine scaffold in the mature region is also seemly evolved by recruitment of genes encoding normal body proteins followed by extensive duplication and neofunctionalization to play a role in killing and paralyzing prey or defending the organism. A secretory protein with the identical Cys-bone structure has been identified from black widow spider (gi: 318087504). However, it is noted as a gene involved in coding silk fibers. Moreover, several hypothetical non-secretory body proteins from A. maculatum also adopt the same eight-Cys-framework ( Supplementary Information File 1, Suppl-Fig. 5) without a predicted signal peptide. Therefore, two clusters of transcripts were detected in H. venatoria venom gland EST library which showed similarity to previously reported non-venom proteins.
Spiders evolved over some 300 million years, and have become the most diverse terrestrial organism group only after insects. Spiders have evolved efficient weapons represented by the venom and/ or the silk for their hunting strategies. The venom system has evolved to restrain prey, defend and deter competitors. However, spiders investigated for venom molecular research are not more than 0.5% of all known species so far 11 , and have focused on many big species and medically important species. With the evolution of the spider from Mygalomorphae to Araneomorphae the spider body size is evolving smaller and the predators have evolved to adopt webs as capture strategies. The species hunting without silk is considered more offensive, and its venoms often show higher complexity and potency 41 . A recent study suggested that peptides (2-15 kDa) in spider venoms are mainly responsible for the paralysis efficacy of the venom 42 . In the present study, H. venatoria in the family Sparassidae, a relatively primitive species of Araneomorphae, does not directly use web or silk to capture prey that is similar in a way to the predation tactics of cursorial spiders Lycosidae, Hexathelidae and Theraphosidae. It is an important candidate for the evolutionary investigation of CRPs in spider venom. The transcript data of H. venatoria venom gland along with before work about spider peptide toxins in our group and high-quality cDNA library data in the public database provide the opportunity to take a holistic view analysis of the evolution of spider venom CRPs. According to the analysis of the sequences gathered from Arachnoserver spider toxins database, the short CRPs with 6-cys account for at least 69% of all discovered CRPs in each of the four species of Mygalomorphae (C. schmidti, C. hainanus, G. rosea and C. guangxiensis) (Fig. 4), and mainly include three kinds of scaffolds (ICK, DDH and Kunitz) 18 . By contrast, in Araneomorphae, there is no 6-cys CRP in L. singoriensis and D. mizhoanus. Both belong to the superfamily Lycosoidea (Lycosidae and Dolomedes respectively), which are most evolutionary at the distant end in the spider system. The 6-cys CRPs are 10% and 55% in A. ventricosus and H. venatoria respectively. A. ventricosus mainly uses webs for predation, which may explain the low percentage of 6-cys CRPs in this species. Intriguingly, the 6-cys CRPs in H. venatoria have a much shorter propeptide comparing to that in Mygalomorphae. The propeptide that was ever thought of as a necessary part of a spider CRPs precursor is short, even disappears in several families of CRPs in Araneomorphae venom.
Overall, the present study shows the high diversity of CRPs in H. venatoria and suggests the evolutionary trends of CRPs in spider venom from Mygalomorphae to Araneomorphae: the mature peptides have been developed longer with more cysteines; the propeptides have diminished and even vanished; and the CRPs evolved by multiple duplications as well as recruitments of non-toxin genes.  www.nature.com/scientificreports/ R-cyano-4-hydroxycinnamic acid (CHCA), 50% ACN, and 0.1% TFA. The mixture was allowed to dry at room temperature. Calibration of the instrument was performed externally with a peptide calibration standard II (Bruker, Germany). Mass spectrometry was performed using an acceleration voltage of 25 kV. After desalting with ZipTips (C4), 1 μL of cleaned crude venom was subjected to MALDI-TOF MS analysis.

Methods
cDNA library construction and expression sequence tag sequencing. The cDNA library was prepared and sequenced as previously described 43 . Simply, eight adult female spiders were aggravated to secret their venom gland contents and encourage the production of venom transcripts 44 . Four days later, the venom glands of the eight spiders were harvested and homogenized in liquid nitrogen with the presence of TRIzol reagent (Invitrogen). Polyadenylic acid (+) [polyA(+)]-containing RNAs were purified from the total RNA on oligo(dT)cellulose affinity column using the mRNA Purification Kit (Promega) according to the manufacturer's protocol.
The full-length cDNA library was constructed using Primer Extension protocol as described in the Creator SMART cDNA Library Construction Kit (Clontech). The polymerase chain reaction was performed with the M13 forward and reverse primers from the kit to rapidly screen recombinant clones. The clones containing inserts ≥ 500 base pairs were grown in LB medium containing chloramphenicol (30 mg/mL) in 96-well plates for 16 h. The plasmids were extracted by alkaline lysis and sequenced from the 5′-end on an automated ABI PRISM 3700 sequencer (Perkin Elmer) using the T7 promoter primer and ABI PRISM Big Dye terminator v3.1 ready reaction cycle sequencing kit (Applied Biosystems).
CRPs identification and evolutionary analyses. The sequenced cDNA was trimmed by removal of vector, primer sequences and poly(A) tails with ABI PRISM DNA Sequencing Analysis Software V.3.3 45 .
The consensus sequences of each cluster were further filtered by screening for homology to ribosomal RNA, mitochondrial DNA and E. coli genome sequences 46 . After deleting matches, the remaining sequences were searched against public databases (nr/NCBI, SwissProt/UniProtKB and TrEMBL/UniProtKB) using the BLASTn or BLASTx programs to identify putative functions of the new expression sequence tags (ESTs) 47 . The signal peptides were predicted with the SignalP 4.1 program (http://www.cbs.dtu.dk/servi ces/Signa lP/) 48 and SpiderP (http://www.arach noser ver.org) 49 . Furthermore, the putative CRPs were searched in the KNOTTIN database (http://knott in.cbs.cnrs.fr) 50,51 . Multiple sequences of precursors were aligned using the ClustalW program 52 .
The resulting alignments were then hand-edited using the BioEdit program (http://www.mbio.ncsu.edu/BioEd it/BioEd it.html). Sequences were aligned using Clustal X 2.0, and gapped positions were omitted from subsequent analyses. The original nei-gojobori model (p-distance) of MEGA7.0 software was used to estimate the number of Ds for each synonymous site and the number of Dn for each non-synonymous site, and ω was calculated 53 . The resulting alignments were imported into MEGA X software to construct a phylogenetic tree with the neighbor-joining method 54,55 . The 64-bit Microsoft @excel @2019 MSO (16.0.12730.20342) was used to calculate and analyze the p-value of each family.

Data availability
The cDNA sequences of CRPs have been submitted into the public database (http://www.ncbi.nlm.nih.gov/ entre z, GenBank accession numbers: KC145575-KC145728). This article is partly present on a website and can be accessed on https ://www.resea rchsq uare.com/artic le/rs-24220 /v1. This article is not published nor under publication elsewhere.