Metazoan ribotoxin genes acquired by Horizontal Gene Transfer

Ribosome inactivating proteins (RIPs) are RNA N-glycosidases that depurinate a specific adenine residue in the conserved sarcin/ricin loop of 28S rRNA. These enzymes are widely distributed among plants and their presence has also been confirmed in several bacterial species. Recently, we reported for the first time in silico evidence of RIP encoding genes in metazoans, in two closely related species of insects: Aedes aegypti and Culex quinquefasciatus. Here, we have experimentally confirmed the presence of these genes in mosquitoes and attempted to unveil their evolutionary history. A detailed study was conducted, including evaluation of taxonomic distribution, phylogenetic inferences and microsynteny analyses, indicating that the culicine RIP genes derived from a single Horizontal Gene Transfer (HGT) event, probably from a Cyanobacterial donor species. Moreover, evolutionary analyses show that, after transference, these genes evolved under purifying selection, strongly suggesting that they play functional roles in these organisms. In this work we confirm the presence of RIP genes in Culicinae species, and show solid evidence supporting the hypothesis that these genes are derived from a single prokaryotic transferred gene through HGT. In addition, clear evidence of purifying selection pressure has been recorded, supporting the hypothesis that these genes are functional within this subfamily.


Introduction
Ribosome inactivating proteins (RIPs, EC 3.2.2.22) irreversibly modify ribosomes through the depurination of an adenine residue in the conserved alpha-sarcin/ ricin loop of 28S rRNA (1)(2)(3)(4). This modification prevents the binding of elongation factor 2 to the ribosome, arresting protein synthesis (5,6). The occurrence of RIP genes has been experimentally confirmed in a wide range of plant taxa, as well as in several species of gram positive and negative bacteria (7)(8)(9). Additionally, the exponential increase of information in databases has suggested the presence of these toxin genes in Fungi, Cyanobacteria and Metazoan lineages (10)(11)(12). Although several RIPs have been extensively studied at the biochemical level, their biological roles remain open to speculation. In some cases, it seems reasonable to predict their functions. For instance, the high toxicity of ricin supports an antifeedant role, whereas shiga and shiga-like toxins are strong virulence factors for their harboring bacteria. Antiviral and other defense activities have been postulated for other plant RIPs, but no concluding evidence has been obtained. Recently, the RIP of the symbiotic Spiroplama in Drosophila neotestacea was shown playing a defensive role in preventing a virulent nematode from infecting this insect (13).
In a previous work, we have described that phylogeny of RIP genes show incongruence with those of species. Most of these inconsistencies can be explained by gene duplication, loss and/or lineage sorting (11). Another mechanism leading to phylogeny incongruence is horizontal gene transfer (HGT); namely the non-genealogical transmission of DNA among organisms. HGT is accepted as an important force driving prokaryotic genome evolution (14,15). In contrast, its impact on genomes from multicellular eukaryotes, in particular animals, is largely controversial (16,17). To be maintained permanently in animal species, heritable changes (i.e. the transferred gene) must be incorporated into germline cells and transmitted to the offspring. Nevertheless, in the particular case of herbivore arthropods, HGT has been postulated to play a role in the adaptation to phytophagy, including the efficient assimilation and detoxification of plant produced metabolites (18).
Detection of bona fide HGT derived genes is not trivial, and careful data revision is required for its corroboration. Many cases of putative foreign genes have been shown, after further revision, to result from artifacts or misinterpretations, such as contamination of genomic data, incomplete sampling of sequences and/or taxa, incorrect phylogenetic inferences and hidden paralogy. Two emblematic cases illustrating these issues are the initial conclusion that the human genome contained a high percent of bacterial derived genes (19), and the recent claim that tardigrade genomes contain significant amounts of foreign DNA (20). In both cases, subsequent sounder analyses demonstrated that contamination or incomplete sampling better explained the available data (21,22). Consequently, careful case-by-case analyses of HGT candidates are required for their efficient detection. To do so, independent evidences and alternative evolutionary scenarios should be taken into account.
Based on the previous finding of in silico evidence of the presence of RIP genes in two closely related species of mosquitoes, we aim to confirm the presence and determine the location of RIP genes in species of the Culicinae subfamily. Moreover, we show solid evidence supporting the hypothesis that these genes derive from a single prokaryotic transferred gene.

PCR experiments
PCR experiments were conducted to confirm the presence of RIP encoding sequences in selected organisms. Individuals of C. quinquefasciatus strain JHB were obtained from the MR4 colony (Malaria Research and Reference Reagent Resource Center) of the Center for Diseases Control and Prevention (CDC). Genomic DNA from these specimens was extracted employing the protocol previously described by Collins (23). Genomic DNA from wild specimens of C. molestus, C. pipiens and C. torrentium was kindly provided by Dr. Stefanie Becker (Bernhard Nocht Institute for Tropical Medicine, Hamburg) (24). Primer sets were designed to amplify the full-length RIP encoding sequence of C. quinquefasciatus and also to identify the predicted neighbor gene (S1 Table). High fidelity Phusion DNA polymerase (New England Biolabs) was used. PCR conditions were: initial denaturation 30 s at 98°C, followed by 35 cycles of denaturation (10s at 98°C), annealing (30s at Ta°C see S1 Table) and extension (30s at 72°C), final extension of 10 min at 72°C. PCR products were cloned into the pGEMT-easy vector (Promega) following standard methods, and sequenced. The obtained sequences are available with the following GenBank codes: KX674699, KX674697, KX644696 and KX674698.

Multiple sequence Alignments (MSAs) and phylogenetic inferences
We used the amino acids sequences of previously reported RIPs (11) as probes for iterative BLAST searches. All collected sequences were employed to construct MSAs using MAFFT 7 on-line server (http://mafft.cbrc.jp/alignment/server/). MSAs were manually edited to remove gaps. Phylogenetic trees were obtained employing Bayesian inference using Mr. Bayes 3.2 software (25). A mixed amino acid substitution model was set up, 4 gamma categories and a proportion of invariable sites were considered. The analyses were concluded after 1,000,000 generations when the split frequency was < 0.02. FigTree 1.4.2 software was used to visualize and edit the obtained trees.

Genomic context analyses
The whole contigs containing RIPs of C. quinquefasciatus (DS232037), Aedes aegypti (NW001810221) and Anopheles gambiae (chromosome 3L) were subjected to BLASTx searches on different protein databases to identify individual genes upstream and downstream from the RIP open reading frame (ORF). The retrieved sequences were subjected to reciprocal BLASTp and tBLASTn searches in order to determine putative orthologous sequences. The obtained orthologs were confirmed using the comparative genomic tool available at VectorBase website (https://www.vectorbase.org/).

Evolutionary analyses
Nucleotide encoding RIP sequences found in mosquitoes were aligned by codons, using PAL2NAL algorithm (26). Then, the synonymous and non-synonymous substitution rates were estimated employing SLAC, FEL and REL tests available in the Datamonkey Package (27,28).
Finally sequence logo diagram was constructed employing the codon alignment using the Weblogo software (29) and codons under purifying selection were highlighted.

Culex spp genomes harbor RIP encoding genes
Recently, we have found in silico evidence for the presence of RIP genes in two closely related species of metazoans: Aedes aegypti and Culex quinquefasciatus (11). These intriguing findings led us to design experimental strategies to confirm their presence by ruling out possible database artifacts (i.e. contamination). For this purpose, genomic DNA was obtained from a pool of four mosquitoes of C. quinquefasciatus strain JHB. Then, two independent PCR experiments were designed to demonstrate the presence of the RIP gene, and to confirm its physical linkage to the predicted neighbor gene, which is an intron-containing metazoan-derived gene (XM_001850822).  Once the presence and location of the RIP sequence in C. quinquefasciatus genome were experimentally confirmed, we successfully amplified the full length RIP coding sequences (~1300 bp) of the closely related species C. pipiens, C. molestus, and C. torrentium (S1 Fig). PCR products were cloned and sequenced, and the obtained sequences were aligned. As

Culicine RIP genes are monophyletic and syntenic
We have previously shown that RIP genes from particular lineages (e.g. monocots or dicots, bacteria and fungi) are not monophyletic. Moreover, many RIP clades include sequences belonging to largely distant taxa (11, 30). Based on this evidence, we postulated that the evolutionary history of RIP genes is consistent with the existence of several ancient paralogues, followed by multiple lineage-specific gene duplications and losses (11). However, metazoan RIP genes are particularly interesting because they are restricted to closely related insects of the subfamily Culicinae. As can be seen in along with their apparent narrow taxonomic distribution, suggest that these genes are derived from a rather recent, single ancestral sequence. In order to test this, microsynteny analyses were carried out using scaffolds from Aedes aegypti, Culex quinquefasciatus and Anopheles gambiae (the closest relative specie lacking RIP genes). As expected, partially conserved synteny blocks were identified in the three species (Fig 3 and S2 Table).   Table S2.

Culicine RIP genes are derived from a unique HGT event
Recent sequence homology searches further confirmed that metazoan RIP genes are restricted to the Culicinae subfamily. In addition to the above described genes, we found in silico seven RIP genes in Aedes albopictus (GenBank: KXJ78156, KXJ78155, KXJ78158, KXJ73132, KXJ78157, KXJ73133 and KXJ73764), which are clearly syntenic with Aedes aegypti RIPs (Data File S1) and two transcriptomic sequences from another Culicine mosquito; Armigeres subalbatus, partially covering the ORFs (GenBank: EU212208, EU211398). In light of these findings, two alternative hypotheses were postulated: i) These genes have been vertically inherited from the metazoan cenancestor, and were purged from other metazoan genomes by a number of independent gene loss events.
ii) These genes are derived from a unique HGT event which took place in the common ancestor of the Culex and Aedes species.
The minimal number of independent gene loss events required was determined in order to evaluate the plausibility of the vertical transmission hypothesis (see Data File S2 for details).
Following a conservative approach, at least 15 gene losses should be postulated, from Bilateria to Culicidae, to explain the narrow taxonomic distribution of RIP genes in Metazoa. The alternative HGT hypothesis involves a single gene loss event in the lineage to extant metazoans followed by a single recent HGT event in the ancestor of Culex and Aedes, yielding a more parsimonious evolutionary scenario.
Based on the strong evidence supporting that culicinae RIPs are derived from an HGT event, a search for possible donors was conducted. The phylogeny of RIPs shows that culicine RIPs form a well supported monophyletic group (PP: 1) embedded within bacterial sequences (Fig 2), suggesting a prokaryotic origin of these sequences. The lack of introns in culicine RIPs also supports this idea. Moreover, BLASTp and tBLASTn searches using mosquito RIPs as queries yielded sequences belonging to Tolypothrix bouteillei (Cyanobacteria), Calothrix parietina (Cyanobacteria) and Spiroplasma spp (Tenericutes). In order to perform a more robust study we focused on culicine, including the recently reported sequences from A. albopictus, and on bacterial RIPs. Phylogenetic analysis showed that culicine RIPs form a monophyletic group (PP: 0.96) with sequences from Spiroplasma spp and Cyanobacteria species (Fig 4). Thus, these organisms or others closely related-could have been potential donors.

Horizontally acquired RIP genes evolve under purifying selection pressure
Except for a few well characterized potent toxins (e.g. ricin, shiga and shiga-like), the physiological role of most RIPs remains unknown (31). The defensive role demonstrated for Spiroplasma RIP genes in Drosophila neotestacea (13) invites us to postulate a biological function for HGT acquired RIP genes in mosquitoes. Therefore, we searched for signs of selective pressure on HGT derived sequences as reliable evidence of functionality in insects. To do this, all RIP encoding sequences from C. quinquefasciatus, C. molestus, C. pipiens, C. torrentium, A. aegypti and A. albopictus were aligned (Dataset S1). Interestingly, the observed INDELs were always multiple of three nucleotides, strongly suggesting that frame-shifting mutations were actively purged by selection pressure.  Table).

Discussion
Ribosome inactivating proteins form a very interesting protein family displaying a patchy taxonomic distribution. As it was mentioned above, in a previous report we have found in silico evidence of the presence of RIP genes in two closely related species of mosquitos (11).
Due to the uniqueness of this finding, in this work we have experimentally confirmed the presence and location of RIP genes in the C. quinquefasciatus genome, as well as in other Culex species. Moreover, new exhaustive searches by BLAST and HMMER on metazoan databases, revealed the presence of additional homologous genes in A. albopictus and Armigeres sulbabatus, confirming that the RIP gene family is taxonomically restricted to the Culicinae subfamily (Data File S1 and S2).
Currently, there is not a "gold standard" methodology for automatic and reliable detection of HGT. Moreover, several reports claiming the presence of foreign genes have been undermined after more thorough analyses due to artifacts or misinterpretations (19,(32)(33)(34).
Therefore, careful integration of information derived from taxonomic distribution, phylogenetic inferences and biological information is needed to detect bona fide horizontally acquired genes.
The evidence obtained in this work shows that the most plausible origin of culicine RIPs is a single HGT event to the cenancestor of Culex and Aedes genera. This model is supported by the monophyly of metazoan RIPs and their very narrow taxonomic distribution, the gathering in a clade along with prokaryotic sequences and the shared genomic context among Culicinae species.
According to phylogenetic inferences (Figs 2 and 4), the donor of the RIP gene was, most likely, a prokaryotic organism. An obvious donor candidate for insects is Wolbachia spp, since several HGT events between these bacteria and arthropods have been clearly documented (35)(36)(37). It is expected that animal genomes are marginally affected by HGT because of the separation of the germline from somatic cells. This barrier to HGT, known as Weismann barrier is not present in the case of bacteria infecting germline cells, as Wolbacchia spp, which is consistent with the relatively high number of Wolbacchia to insect HGT events (17). However, no RIPs encoding sequences can be found in any of the Wolbachia spp databases (including 27 fully sequenced genomes). Interestingly, homology searches using mosquitoes RIPs as probe yielded significant similarity to several Spiroplasma genes (Fig 5). The facts that Spiroplasma species lack cell walls and that they are frequent endosymbionts of arthropods make them logical donor candidates. However, considering Spiroplasma spp as donor involves two major drawbacks. Spiroplasma spp coding sequences harbor very low GC content (around 23%), whereas RIP genes from Diptera range from 44.8% to 55.6% (Table 1). Secondly, Spiroplasma spp and some species of Millicutes use a non-universal UGA tryptophan codon. This variation in the genetic code is presumed to have occurred in the early divergence of these genera (dating 250 mya approximately) (38), and the Culicinae subfamily has diverged more recently (between 51 to 204 mya) (39, 40), the transferred genes containing the non-universal UGA tryptophan codons would be read as a stop. Although CG content of a transferred functional gene could be gradually modified by amelioration, the reversion of several nonsense codons to Trp does not seem plausible. Therefore, these pieces of evidence lead us to reject the hypothesis that Diptera RIP genes are derived from Spiroplasma.
The phylogenetically closest sequences to culicine RIPs belong to Cyanobacteria species (Tolypothrix bouteillei and Calothrix parietina). Cyanobacteria constitute a significant fraction of the microbiota at breeding sites of mosquitoes. Remarkably, Cyanobacteria species account for 40% of the bacterial midgut content of larval and pupal stages in An. gambiae (41).
Moreover, Calothrix sp has been detected in the midgut bacterial flora of An. stephensi in larval stage (42). In addition, GC content of Cyanobacteria genomes is closer to the Diptera RIPs (Table 1). Altogether; the presented phylogenetic inferences, the shared ecological niches between these bacteria and insects, and the colonization of mosquitoes in their early developmental stages (41) strongly suggest that Cyanobacteria are the most plausible donor species of RIP genes through HGT to the Culicinae subfamily. In line with Huang (17), we postulate that Weisman barrier is, if not absent, markedly weakened in the egg, pupa and larva stages of mosquitoes. Consequently, these early developmental stages of mosquitoes could be particularly prone to the acquisition of heritable foreign genes by environmental bacteria.
An important question about the HGT derived genes is their fate. In other words, the issue at stake is whether "foreign" genes will impact on fitness, and if so, to what extent these genes will be affected by natural selection and/or genetic drift. Probably, mosquitoes RIP genes display a defensive role which has helped them to be fixed by natural selection. However, the fact that individuals from the C. quinquefasciatus MR4 colony are homozygous for a null mutation of RIP gene shows that this gene is not essential for viability under laboratory conditions. On the other hand, all mosquitos' sequences have several codons under purifying selection, which is also reflected by the occurrence of the majority of changes at the third position of codons (Fig 5). According to the nearly-neutral evolutionary theory, slightly deleterious mutations can be fixed in populations with low effective size (as in the case of laboratory colonies). On the contrary, these mutations are efficiently purged from larger populations, such as natural populations (43). This seems to be the case of culicine RIPs, since null mutant C. quinquefasciatus are viable in captivity, while clear evidence of selection pressure on coding sequences in wild specimens has been found.

Conclusions
We have found solid evidence suggesting that culicine RIP genes derive from a single ancestral sequence acquired by HGT from a prokaryotic organism, probably a Cyanobacterium.
Moreover, we have also demonstrated that these genes have evolved, after transference, under purifying selection, implying a functional role in the host organism. S1