Insights into the venom composition and evolution of an endoparasitoid wasp by combining proteomic and transcriptomic analyses

Parasitoid wasps are abundant and diverse hymenopteran insects that lay their eggs into the internal body (endoparasitoid) or on the external surface (ectoparasitoid) of their hosts. To make a more conducive environment for the wasps’ young, both ecto- and endoparasitoids inject venoms into the host to modulate host immunity, metabolism and development. Endoparasitoids have evolved from ectoparasitoids independently in different hymenopteran lineages. Pteromalus puparum, a pupal endoparasitoid of various butterflies, represents a relatively recent evolution of endoparasitism within pteromalids. Using a combination of transcriptomic and proteomic approaches, we have identified 70 putative venom proteins in P. puparum. Most of them show higher similarity to venom proteins from the related ectoparasitoid Nasonia vitripennis than from other more distantly related endoparasitoids. In addition, 13 venom proteins are similar to venoms of distantly related endoparasitoids but have no detectable venom matches in Nasonia. These venom proteins may have a role in adaptation to endoparasitism. Overall, these results lay the groundwork for more detailed studies of venom function and adaptation to the endoparasitic lifestyle.


Results
Assembly and analyses of P. puparum transcriptome. Three cDNA libraries were separately constructed and sequenced for transcriptome assembly: whole female adults, venom glands and female body carcasses without venom apparatus. Then all the raw data was filtered and de novo assembled to create a P. puparum transcriptome (Fig. 1). Assembly statistics showed that the N50 was 2226 bp, and N80 was 825 bp (Table 1). Unigene represents a set of transcripts from the same transcription locus. Here the longest copy of redundant transcripts was regarded as a unigene. Finally, 39,738 unigenes that represented 55,958 transcripts were obtained (supplementary Figure S1). Among all unigenes, 43.73% (17,379) unigenes got matches in the nr database using blastx (e-value < 1e −5 ).
Venom gland cells from parasitoid wasps secrete venoms into the lumen of venom glands. Therefore, venom proteins are expected with secretory signal peptides in their amino acid sequences. For signal peptide analysis, transcripts with complete N terminal (subject start position of the best hit alignment = 1) were computationally translated into proteins and subjected to software SignalP. Simultaneously, their best hit sequences in nr database were also retrieved from NCBI as reference sequences for signal peptide analysis. The identity of the results by two different methods was 99% (Supplementary Table S1). Therefore, the signal peptide analysis of all unigenes was finally conducted using reference sequences. By this method, 2714 unigenes with signal peptides were identified in P. puparum combined transcriptome in total ( Fig. 2A).
The expression levels of unigenes in both venom gland and carcass without venom apparatus were estimated by software eXpress 35,36 . To control false positive rate, the expression level cut-off was set as FPKM_VG (Venom gland) > 10, and a venom gland to carcass expression ratio log 2 (FPKM_VG/FPKM_Carcass) > 1 and corrected P-value < 0.001 to define differentially expressed genes in venom gland. By this criterion, 2355 unigenes were identified differentially expressed in venom gland relative to carcass (whole female body minus the venom apparatus) ( Fig. 2A).

Identification of venom proteins by proteomic approach.
For proteomic identification, venom proteins were separated by SDS-PAGE. Several apparent bands were observed with molecular masses ranging from less than 14 kDa to more than 97 kDa (Fig. 2B). And the most abundant band was a little below 66 kDa. The SDS-PAGE gel was cut into 21 slices as the graph showed (Fig. 2B). These slices were in-gel digested by trypsin and subjected to LC-MS/MS to identify the proteins. The database for proteomic research was generated by computationally translating transcriptomic sequences into proteins according to the blastx results. Finally, 630 unigenes were identified from the venom reservoir by this approach (Fig. 2A).

Identification of putative venom proteins by combined analyses of transcriptomic and proteomic information.
To identify a robust set of venom proteins, all data were analyzed under the assumption that venom proteins were secretory and differentially expressed in venom gland (Fig. 1). Combined transcriptomic and proteomic information, 70 unigenes were identified as secretory, differentially expressed in venom gland and confirmed by proteomic approach (Fig. 2A). In this study, these unigenes were defined as putative venom proteins for further analysis.
Because the transcriptomic and proteomic sequencing are not replicated, gene expression in the venom gland was examined for 34 putative venom protein genes by qPCR, and 8 proteins were examined for their presence in venom reservoirs by Western blotting. All 34 tested venom protein genes were differentially expressed in venom gland related to carcass (Fig. 3A), all 8 proteins were confirmed by Western blotting using antibodies to the specific venom proteins (Fig. 3B). These results showed that the putative venom proteins set in this study is reliable.
Similarity comparison of P. puparum venom proteins to N. vitripennis and other endoparasitoid venoms. Comparisons of P. puparum venoms to venom and non-venom proteins in other parastitoids were investigated by three general methods. In our initial comparisons, we performed a blastx of P. puparum venom proteins against the nr database. Excluding self matches, the large majority (68 of 70) of P. puparum venom proteins have a best hit to proteins from the ectoparasitoid, N. vitripennis (Fig. 4A, Table 2). The remaining two gave best matches to a venom protein from the parasitoid Chelonus inanitus and a non-venom protein from the bee Megachile rotundata, respectively. We next specifically compared P. puparum venom proteins to venoms reported in N. vitripennis and to our database of venoms from other endoparasitoids (OEP, see methods) using blastp (Supplementary Table S2 Figure S3). This likely reflects the closer evolutionary relationship of the endoparasitoid P. puparum to the ectoparasitoid N. vitripennis, which are in the same subfamily Pteromalinae with similar morphology (Supplementary Figure S4), than to species in the OEP, which occur in other families and superfamilies of parasitoids (e.g. Leptopilina boulardi, L. heterotoma 21 , Aphidius ervi 20 , Microplitis demolitor 27 , Microctonus sp 24 . and C. inanitus 18 ). Using cut-off criteria (e-value ≤ 1e −5 , bit score ≥ 50, see methods), we then assigned all proteins from the three venom data sets to shared and unshared categories (Fig. 4B). Based on the criteria, 14 venom proteins were found to be unique to P. puparum, while 25 were shared only with N. vitripennis, 13 were shared only with OEP, and 18 were shared among all three sets. Pteromalus puparum venom proteins are significantly more likely than are N. vitripennis venom proteins to show similarities only to OEP venoms (13/70 versus 3/79, two tailed fisher extract test, p = 0.006). Examples includ adenosine deaminase CECR1-like, protein lethal (2) essential for life-like, disulfide-isomerase A3-like, pancreatic triacylglycerol lipase-like, GILT-like, protein FAM151A-like. This set of venom proteins which only shared between P. puparum and OEP may have a role in the adaptation to endoparasitism.
Twenty-five venom proteins in P. puparum and 22 in N. vitripennis were shared in P. puparum and N. vitripennis venom only, and might be Pteromalinae venom specific. Some venom proteins which were previously reported as unique in N. vitripennis were also detected in P. puparum venom, including venom protein D, G, J, L, O, U and Z. Eighteen venoms were shared among all three data sets, and might present a core of venom proteins in parasitoid wasps, including venom allergen, calreticulin, serine protease, acid phosphatase, glucose dehydrogenase, gamma-glutamyltranspeptidase and so on (Supplementary Table S2).
Test of P. puparium venom antibodies against N. vitripennis venom. Antibodies against P. puparum venom proteins were tested on N. vitripennis venom to see whether they could cross detect N. vitripennis venom proteins. Antibodies against P. puparum calreticulin, GOBP-like venom protein, venom protein U, serine protease 22 and serine protease homolog 29 could also cross detect the venom proteins in N. vitripennis (Fig. 3B). The results support the view that similar proteins are present in N. vitripennis venom and share antigenic similarities. Western blotting results also showed several venom proteins were not detected in N. vitripennis venom by the antibodies against P. puparum venom proteins (Fig. 3B). GILT-like protein was absent in the venom set of N. vitripennis, and as expected, couldn't be cross detected in N. vitripennis venom by antibody against P. puparum GILT-like protein. And antibodies against P. puparum lipase-like venom protein and serine protease 87 also failed to cross detect the venom proteins in N. vitripennis. These failures might be caused by the divergence of antigens between P. puparum and N. vitripennis venom proteins, which could be sequence and/or modification differences.

Discussion
Using high throughput RNA-sequencing technology, we first assembled the transcriptome of P. puparum, which is a pupal endoparasitoid. Differential expression analysis and signal peptide analysis were conducted to search the differentially expressed secretory proteins in venom gland. In parallel, we used the shotgun proteomic approach to analyze the composition of venom. Combined the venom proteomic data with the transcriptomic information, we finally identified a robust set of putative venom proteins.
In this study, we assumed that venom proteins from P. puparum were secretory and differentially expressed in venom gland. However, proteins that were not differentially expressed in venom gland could not be totally excluded, as venom proteins, such as heat shock proteins and arginine kinases that are commonly found in parasitoid venoms. In some extreme cases, like L. boulardi 21 , venom proteins were even not specifically expressed in the venom gland. In P. puparum, most venom proteins are likely to be differentially expressed in the venom gland, as confirmed by qPCR in this study. In addition, many unigenes (116) from the whole body transcriptome encoded secretory proteins and were significantly more highly expressed in venom gland, but not identified by the proteomic approach. These proteins could just have local functions in the venom gland or have been missed by the proteomic approach, especially for small peptides which may not be retained by SDS-PAGE and are easy to be missed especially when there was a lack of genomic information.
Despite the rigorous filtering, the venom composition of P. puparum is still found to be quite complex. It is reasonable to believe that parasitoid venoms are much more complex than venoms from social Hymentoptera 23 . The parasitoid venoms must target immunity, development, metabolism and sometimes even the host nervous system to ensure successful parasitism 4,23 . This is quite different from the function of venoms from social Hymenoptera, which are mainly used for predation and defense.
Pteromalus puparum evolved endoparasitism from an ectoparasitoid ancestor relatively recently within the pteromalids. In the subfamily Pteromalinae, the majority of species are ectoparasitoids, such as Urolepis rufipes 37 , Trichomalopsis sarcophagae 38 , Muscidifurax raptor 39 , Nasonia and so on. There are also several ectoparasitoid wasps in the genus Pteromalus. For example, both P. cerealellae 40 and P. sequester 41 are solitary ectoparasitoids of larvae of Coleoptera. Moreover, according to the phylogenetic analysis and substitution rate results of calreticulin from parasitoid wasps, P. puparum and N. vitripennis has a relatively small evolutionary distance (supplementary Figure S5, Table S3). The evolutionary distance between P. puparum and N. vitripennis is even smaller than that between L. boulardi and L. heterotoma, which are in the same genus and have been intensively compared 25,42 . Thus, P. puparum and N. vitripenis provide a good model for comparative studies between endo-and ectoparasitoids, and particularly to the evolutionary changes that occur when endoparasitism evolves from ectoparasitism.
As expected, most of (68/70) the identified venom proteins from the endoparasitoid P. puparum had significant similarities with proteins from the ectoparasitoid wasp N. vitripennis, which belongs to the same subfamily (Pteromalinae). Moreover, most of P. puparum venom proteins showed higher similarities to venom proteins from N. vitripennis rather than to other reported endoparasitoids. All these results are consistent with the fact that these endoparasitoids have different independent origins from ectoparasitoids 30 .
Although endoparasitoid wasps have different independent evolutionary origins, convergent recruiting of some similar proteins could still be expected. Strikingly, several P. puparum venoms are only shared with other reported endoparasitoids, and not present in venoms of its closest sequenced relative, N. vitripennis, which is an ectoparasitoid. These venom proteins may have a role in the adaptation to endoparasitism. However, it is  also possible that this pattern is caused by incomplete characterization of the venom repertoire of these species. Further investigation is therefore needed. The development of the stinger and venoms in Hymenoptera had a single origin 30 . So it is expected that parasitoid wasps might contain some ancestral venom proteins. Venom antigen 5 is an example of such conservation as it is found from social Hymenoptera to parasitoid wasps (Supplementary Figure S6). In addition, different proteins have been recruited into venom for similar functions in different parasitoid wasps. These include superoxide   44 which are known to inhibit the pro-phenoloxidase activation. A second example is RhoGAP (Ras homologous GTPase activating protein) from L. boulardi 45 , VPr3 from Pimpla hypochondriaca 46,47 and SERCA (sarco/endoplasmic reticulum calcium ATPase) from Ganaspis sp.1 48 , which are very different proteins, but all are known to alter the behavior of host hemocytes. Taking into consideration the complexity and diversity of parasitic factors delivered to hosts (including venom, PDV and others), parasitoid wasps seem to be an untapped source of valuable molecules with agricultural and medical potential 16,17 . Of course, a lot of work on the compositions of parasitoid venom, functions and applications of individual venom proteins is still needed. The identification of venom composition from P. puparum in this current study is the basis for further detailed analyses of the functions of these venom proteins.
Our research revealed closer relationship of most P. puparum venom proteins to those from the pupal ectoparasitoid N. vitripennis, rather than to other reported endoparasitoid wasps. Thirteen P. puparum venom proteins show similarity to other endoparasitoid venoms but not to venom proteins of the more closely related ectoparasitoid N. vitripennis. These proteins are promising candidates for a functional role in the evolution of endopariasitism. These results will open the way to a better understanding of venom evolution in the transition from ectoparasitoids to endoparasitoids. Analysis of transcriptomic data. The transcriptomic raw data was assembled using Trinity v2013-02-16 49 . All unigenes were annotated by blastx search against NCBI nr database (March, 2013) with a cutoff of 1e −5 . The expression level was estimated by software eXpress v1.3.3 35,36 . Differential expression analysis between venom gland and carcass was performed using the R package DEGSeq v1.2.2 50 . The p-values were adjusted using the Benjamini & Hochberg method. Corrected p-value < 0.001, log 2 (FPKM_VG/FPKM_Carcass) > 1 and FPKM_VG (Venom gland) > 10 were set as the threshold for significantly differential expression in venom gland. Presence of signal peptides was analyzed by software SignalP 4.1 51 . The putative venom unigenes were manually checked using blastx on NCBI website, and categorized into enzymes, protease inhibitors, recognition and binding proteins, others and unknown based on their blast results and domain information.

Comparison of P. puparum venom proteins to N. vitripennis and other endoparasitoid venoms.
For similarity comparison, blastp were performed between three different venom data sets, P. puparum venom, N. vitripennis venom 23 and a database of other endoparasitoid venoms generated here. The other endoparasitoid venom database was manually generated, including venom proteins from Leptopilina boulardi, L. heterotoma 21 , Aphidius ervi 20 , Microplitis demolitor 27 , Microctonus sp 24 . and Chelonus inanitus 18 . All the nucleotide acid sequences from P. puparum venom and other endoparasitoid venoms were translated into proteins by OrfPredictor 40 (http://proteomics.ysu.edu/tools/OrfPredictor.html). And venom proteins in other endoparasitoid venom database were further clustered by CD-HIT 41 with sequence identity cutoff= 0.5 (http://weizhong-lab. ucsd.edu/cdhit_suite/cgi-bin/index.cgi?cmd= cd-hit) to remove redundancy. As bit score is independent on database size and more suitable than e-value for comparing similarity scores from different searches (http://www.ncbi. nlm.nih.gov/BLAST/tutorial/), we analyzed a range of bit scores to determine how these criteria affect similarity scores among the different venom protein sets (Supplementary Table S4). Criteria that incorporated a bit score criterion (e-value ≤ 1e −5 , bit sore ≥ 50) was used for the further analyses.
Multiple amino acid sequence alignments were performed using MUSCLE v3.8 52 . Phylogenetic analysis was conducted by MEGA 5 using maximum likelihood algorithm 53 . Pairwise substitution rates were calculated by CodeML in PAML v4.8 54

Mass spectrometric venom protein identification by LC-MS/MS. Pteromalus. puparum venom sam-
ple containing 100 μ g proteins dissolved in 20 μ l rehydration solution (7 M urea, 2 M thiourea, 4% CHAPS, 0.5% Triton X-100, 65 mM DTT, 0.5% Bio-Lyte, and 0.001% bromophenol blue) were separated by SDS-PAGE and stained with Coomassie Brilliant Blue R-250 (Bio-Rad, USA). The gel was excised into 21 slices, depending on the molecular masses of protein bands. Each gel slice was digested by trypsin and lyophilized separately followed by 1DLC-LTQ-Velos (Thermo Finnigan, San Jose, CA). In this study, samples were desalted on Zorbax 300 SB-C18 (Agilent Technologies, Wilmington, DE) and then separated on a RP-C18 column (150 μ m i.d., 150 mm length) (Column technology Inc., Fremont, CA). The buffer A was water with 0.1% formic acid, buffer B was 84% acetonitrile with 0.1% formic acid, and the gradient was from 4% buffer B to 50% buffer B in 1h. The charge-to-mass ratios of peptides and fractions of peptides were collected 20 times after every full scan. The resulting MS/ MS spectra were searched against the translated P. puparum transcriptome using Sequest search algorithm 56 . Carbamidomethyl of cysteine and oxidation of methionine were set as fixed and variable modifications, respectively. Delta CN (≥ 0.1) and cross-correlation scores (Xcorr, one charge ≥ 1.9, two charges ≥ 2.2, three charges ≥ 3.75) were used to filter the peptide identification. This part was done by Shanghai Applied Protein Technology Co., Ltd (Shanghai, China).
Quantitative real-time PCR (qPCR). cDNA from venom glands and carcasses without venom apparatus was synthesized, respectively, using TransScript one-step gDNA Removal and cDNA Synthesis SuperMix (TransGen, China) with random primers. All the primer sequences (Supplementary Table S5) used were designed on website Primer 3 57 and synthesized commercially (Sangon, Chnia). The PCR reaction was run in ABI 7500 Real Time PCR System (Applied Biosystems, Foster City, CA) using SsoFast EvaGreen Supermix with Low Rox (Bio-Rad, USA) according to the manufacture's protocol. The cycling conditions for qPCR were as follows: enzyme activation at 95 °C for 30 sec, followed by 40 cycles with denaturation at 95 °C for 5 sec, annealing at 60 °C for 34 sec. Relative expression level of putative venom proteins was normalized to reference gene (18S rRNA) using 2 −ΔΔCT method 58 . Statistical analysis was performed using Student's t test. Unigenes with log 2 (Expression ratio venom gland/carcass) > 1 and p-values < 0.05 were considered differentially expressed in venom gland.
Western blotting. The antibodies against different P. puparum venom proteins were prepared as previously described 59 . Recombinant venom protein GOBP was expressed in the pGEX-4T-2 vector with a GST tag, others were expressed in pET-28a vector with a His tag. The primary antibody against β -actin was bought commercially (Huabio, China). The venom and carcass proteins of P. puparum and N. vitripennis were separated by 12% SDS-PAGE, and then transferred to a polyvinylidene difluoride (PVDF) membrane (Bio-Rad, USA) using Mini-ProTEAN Tetra system (Bio-Rad, Hercules, CA) at 16 V for 16 h. The PVDF membrane was blocked and washed. Anti-venom protein antibodies (diluted from 1: 500 to 1: 2000, depending on the antibody) and anti-actin antibody (diluted 1: 5000) were respectively used as the primary antibody. And goat anti-rabbit IgG-horseradish peroxidase (HRP) conjugate (Sigmae Aldrich, Taufkirchen, Germany; diluted 1: 5000) was used as the secondary antibody. The PVDF membranes were detected using ECL Western Blotting Substrate (Promega, Madison, WI, USA) and imaged in Chemi Doc-It TM 600 Imaging System (UVP, Cambridge, UK).
Availability of supporting data. All RNA-seq raw data have been deposited at the NCBI Sequence Read Archive under accession number SRP055738. This Transcriptome Shotgun Assembly project has been deposited at GenBank under the accession GECT00000000. The version described in this paper is the first version, GECT01000000.