Conserved and ubiquitous expression of piRNAs and PIWI genes in mollusks antedates the origin of somatic PIWI/piRNA expression to the root of bilaterians

PIWI proteins and a specific class of small non-coding RNAs, termed Piwi interacting RNAs (piRNAs), suppress transposon activity in animals on the transcriptional and post-transcriptional level, thus protecting genomes from detrimental insertion mutagenesis. While in vertebrates the PIWI/piRNA system appears to be restricted to the germline, somatic expression of piRNAs directed against transposons is widespread in arthropods, likely representing the ancestral state for this phylum. Here, we show that somatic expression of PIWI genes and piRNAs directed against transposons is conserved in mollusks, suggesting that somatic PIWI/piRNA expression was already realized in an early bilaterian ancestor. We further describe lineage specific adaptations regarding transposon composition of piRNA clusters and show that different piRNA clusters are dynamically expressed during oyster development. Finally, bioinformatics analyses suggest that different populations of piRNAs participate in the ping-pong amplification loop in a tissue specific manner.


Introduction
In virtually all animals, PIWI proteins protect germ cells from the steady threat of mobile genetic elements, so-called transposons (Thomson andLin 2009, Iwasaki et al. 2015). Based on sequence complementarity to their target transcripts, 23-31 nt non-coding RNAs, termed PIWI-interacting (pi-) RNAs, function as guide molecules for PIWI proteins that endonucleolytically slice snatched targets. Besides post-transcriptional transposon silencing, PIWI proteins and piRNAs can trigger the establishment of repressive epigenetic DNA or chromatin modifications, thus inducing efficient transposon silencing on the transcriptional level (Reuter et al. 2011, Giacomo et al. 2013, Pezic et al. 2014, Manakov et al. 2015. Analysis on piRNA pathways in representatives of many animal taxa have unveiled a great diversity of lineage specific adaptations, challenging the universal validity of insights obtained from model organisms (Grimson et al. 2008, Houwing et al. 2008, Das et al. 2008, Li et al. 2013, Lim et al. 2014, Ha et al 2014, Hirano et al. 2014, Gebert et al. 2015, Madison-Villar et al. 2016, Praher et al. 2017, Lewis et al. 2018. For a long time, PIWI proteins and piRNAs were thought to be dispensable for female germ cell development in mammals until it became clear that the model organisms mouse and rat represent an exception from the mammalian rule (Flemr et al. 2013. Similarly, evidence for a gene regulatory role of piRNAs (Zhang et al. 2015, Gebert et al. 2015, Russel et al. 2017 and their widespread somatic expression in many animals (Palakodeti et al. 2008, Perrat et al. 2013, Nandi et al. 2016, Jones et al. 2016, Lewis et al. 2018 have eroded the dogma that the piRNA pathway is restricted to the germline, being exclusively responsible for silencing of transposons. Beyond transposon control, piRNAs are essential for regeneration and stem cell maintenance in the flatworm Schmidtea mediterranea (Palakodeti et al. 2008), they provide adaptive immunity against virus infections in Aedes aegypti (Miesen et al. 2015), are responsible for sex determination in Bombyx mori (Kiuchi et al. 2014), memory-related synaptic plasticity in Aplysia californica (Rajasethupathy et al. 2012) and cleave mRNA in pig, mouse and human (Zhang et al. 2015, Gebert at el 2015. Despite the more than eighty thousand living molluskan species (Rosenberg 2014) there exists only one description of PIWI proteins and piRNAs for this taxon, based on experiments in the sea slug Aplysia californica (Rajasethupathy et al. 2012), making any conclusions on conserved or lineage-specific features of the PIWI/piRNA system in mollusks impossible. In order to overcome this lack of information, we have reconstructed the evolution of PIWI genes in mollusks based on 11 sequenced genomes. We performed quantitative real-time PCR experiments to analyze the expression patterns of the identified PIWI paralogs across a representative set of tissues from the great pond snail Lymnaea stagnalis (L. stagnalis) and the pacific oyster Crassostrea gigas (C. gigas). We applied high-throughput sequencing of small RNAs from L. stagnalis to verify the presence of piRNAs in germline and muscle tissue. We further reanalyzed published small RNA sequence data from C. gigas to characterize the dynamic expression of piRNAs from distinct piRNA clusters during oyster development. Finally, we used bioinformatics approaches to show that different piRNA populations participate in the ping-pong amplification loop in a tissue specific manner.

Results
The molluskan PIWI gene repertoire A number of previously published gene tree reconstructions of PIWI family members suggest that Drosophila Ago3 and deuterostomian Piwi-like genes derived from an ancestral gene that was present in the common ancestor of today-living bilaterian species (Seto et al. 2007). Since the number of PIWI paralogs differs across model organisms, we first wanted to characterize the PIWI protein equipment of sequenced mollusks to infer the ancestral state and subsequent evolution of PIWI paralogs in the molluskan clade. We used available PIWI protein sequence data from six molluskan species (Biomphalaria glabrata, Aplysia californica, Crassostrea gigas, Crassostreas virginica, Mizuhopecten yessoensis, Octopus bimaculoides) and further manually annotated PIWI genes based on five publicly available but yet unannotated genomes (Lymnaea stagnalis, Radix auricularia, Lottia gigantea, Bathymodiolus platifrons, Pinctada martensii). We found that the PIWI family members Piwil1 and Piwil2 are conserved in mollusks and orthologous to Piwil1 and Piwil2 in vertebrates, suggesting a duplication event in an early bilaterian ancestor with subsequent loss of Piwil1 in arthropods ( Figure 1A). While we did not observe further gene duplication events within the molluskan Piwil2 clade, several duplication events are present in the Piwil1 clade resulting in two Piwil1 paralogs in Bathymodiolus platifrons and even three Piwil1 paralogs in Lymnaea stagnalis and Radix auricularia. Generally, PIWI gene duplication events are in line with the previously described erratic evolution of PIWI family genes in arthropods (Lewis et al. 2016, Dowling et al. 2016, Lewis et al. 2018. Noteworthy, it was also a successive duplication of Piwil1 on the eutherian lineage that gave rise to Piwil3 (with subsequent loss on the murine lineage) and Piwil4 (Sasaki et al. 2003, Murchison et al. 2008, Figure 1A).

Ubiquitous expression of PIWI genes in Lymnaea stagnalis and Crassostrea gigas
To investigate the expression of PIWI genes in mollusks we chose two representative species, the pacific oyster Crassostrea gigas (C. gigas, Bivalvia) showing no Piwil1 duplication, and the great pond snail Lymnaea stagnalis (L. stagnalis, Gastropoda), featuring three predicted Piwil1 paralogs ( Figure 1A). We performed quantitative real-time PCR for each PIWI paralog on a representative set of tissues from both species. For the hermaphroditic great pond snail L. stagnalis we measured PIWI expression on the mRNA level in the reproductive tract, comprising both male and female gametes, foot muscle, lung and brain. Significant expression was detectable for Piwil1 and particularly Piwil2, while the Piwil1 duplicates Piwil1b and Piwil1c are expressed only at very low levels ( Figure 1B and 1C). As expected, we observed the highest expression of Piwil1 and Piwil2 in the reproductive tract. However, both genes are significantly expressed in the other analyzed tissues as well, reaching 15%, 62% and 21% of germline expression for Piwil1 in brain, muscle and lung, respectively, and 12%, 36% and 53% of germline expression for Piwil2 in brain, muscle and lung, respectively ( Figure 1D). Next, we turned to the dioecious pacific oyster C. gigas, were we measured PIWI expression on the mRNA level in the male gonad, labial palps, gill, adductor muscle and mantle. We detected significant expression of Piwil1 and Piwil2 across all analyzed tissues, particularly in gonadal tissue ( Figure 1E and 1F). In relation to gonadal expression, Piwil1 and Piwil2 are expressed in levels ranging from 21% (Piwil1 in labial palps) to 111% (Piwil2 in adductor muscle, Figure 1G). The observed expression patterns suggest that a functional PIWI machinery acting in the soma and the germline is conserved in mollusks. Considering the somatic expression of PIWI proteins and piRNAs in many arthropod species (Lewis et al. 2018), it is parsimonious to assume that somatic PIWI/piRNA expression represents the ancestral state that was established in an early bilaterian ancestor.

piRNAs in Lymnaea stagnalis muscle and reproductive tract
In order to characterize molluskan piRNAs, we sequenced small RNA transcriptomes from L. stagnalis extracted from the reproductive tract and (foot-) muscle, since muscle tissue was found to exhibit the highest somatic PIWI expression in both L. stagnalis and C. gigas. The sequence read length profiles of both probes show a maximum for 21 nt RNAs, with a considerable amount of 22 nt RNAs being present in the muscle, but not in the reproductive tract. We further observed a smaller fraction of RNAs in the range of 24-29 nt in both probes ( Figure 2A). sRNA sequence annotation with unitas (Gebert et al. 2017) revealed a similar proportion of different sRNA classes in the two probes, with miRNAs accounting for 46% and 51% of reads in the reproductive tract and muscle, respectively ( Figure 2B). Interestingly, we ascertained a substantial difference considering the abundance of tRNA fragments (tRFs). In both probes, 21 nt RNAs derived from the 3' end of tRNAs (3' tRFs) constitute the vast majority of tRNA fragments.
However, the share of 3' tRFs in the reproductive tract is nearly twice as high compared to muscle (18% and 10%, respectively, Figure 2B). Recently, 3' tRFs were found to silence Long Terminal Repeat (LTR) retrotransposons in mouse stem cells by targeting their functionally essential and highly conserved primer binding sites (Schorn et al. 2017). The remarkable amount of 3' tRFs in the analyzed probes supports the idea proposed by Schorn and coworkers who assume that this mechanism could be highly conserved across different species, providing an innate immunity against LTR propagation.
Focusing on putative piRNAs, we analyzed the fraction of sequence reads that did not match to any other class of non-coding RNA. This dark matter of sRNAs comprises 18% and 17% of sequence reads in the reproductive tract and in muscle, respectively. Strikingly, these sRNAs are strongly enriched for transposon sequences, suggesting their involvement in transposon control ( Figure 2B). To verify the presence of piRNAs, we checked for the so-called ping-pong signature (bias for 10 bp 5' overlap of   mapped sequence reads), which is a hallmark of secondary piRNA biogenesis and requires the catalytic activity -and thus expression -of PIWI proteins (Czech and Hannon 2016). Remarkably, we detected a significant ping-pong signature in both, the reproductive tract and muscle ( Figure 2C), suggesting active PIWI/piRNA-dependent transposon silencing in the germline and the soma. Next, we used proTRAC (Rosenkranz and Zischler 2012) to identify 151 piRNA clusters, covering 0.13% of the genome of L. stagnalis ( Figure 2D). Although piRNAs originate from essentially identical clusters in the reproductive tract and in muscle, we found that 11.4% of sequence reads from the reproductive tract map to piRNA clusters, while only 2.7% of sequence reads from muscle do so, indicating rather moderate production of primary piRNAs in the soma compared to the germline ( Figure 2E). Nevertheless, we found that the number of piRNAs that participate in ping-pongamplification (ping-pong reads per million bootstrapped reads, ppr-mbr) is even slightly higher in muscle compared to the situation in the reproductive tract, emphasizing the functional importance of somatic PIWI/piRNA expression ( Figure 2E). In line with the transposon-suppressive role of piRNAs, the identified piRNA clusters are clearly enriched for transposon sequences compared to the whole genome situation (45% and 31%, respectively, figure 2F and 2G). Interestingly, the transposon composition in piRNA clusters does not reflect the transposon landscape of the genome. Instead, piRNA clusters are enriched for Gypsy retrotransposons and particularly DNA transposons such as PiggyBac or hAT5, showing up to 120-fold enrichment in piRNA clusters (figure 2G and 2H). This non-random distribution suggests a selective regime that favors insertion events of evolutionary young and active transposons.

Ubiquitous and dynamic expression of piRNAs in Crassostrea gigas
Based on our observation that PIWI genes and piRNAs are expressed in the soma and the germline of L. stagnalis, we reanalyzed previously published small RNA datasets from C. gigas that were used to investigate the dynamic expression of miRNAs during oyster development without further examination of a putative piRNA fraction (Xu et al. 2014, NCBI Sequence Read Archive Project ID SRP007591). We annotated C. gigas sRNAs from the male and female gonad, different developmental stages ranging from the egg to juvenile, and a representative set of somatic tissues from adult animals (Supplementary Table 1). In all datasets, particularly in gonadal tissues, eggs and early embryo stages but also in hemolymph we detected a large amount of sequence reads that did not match to any known ncRNA class but was instead enriched for transposon sequences. The transposonmatching sub-fraction itself was enriched for antisense sequences (Supplementary Table 1). Analogous to the procedure applied for the L. stagnalis datasets, we verified the presence of primary and secondary piRNAs by analyzing the pingpong signature of each dataset. Remarkably, we detected a significant ping-pong signature across all analyzed datasets ( Figure 3A, Supplementary Figure  1), but also found that the number of ping-pong reads (measured as ppr-mbr) differs considerably depending on the tissue and developmental stage ( Figure 3A, Supplementary Figure 2). Noteworthy, a ping-pong signature is also detectable when taking only those reads into account that match protein coding sequences, suggesting a relevant role of the PIWI/piRNA pathway in post-transcriptional regulation of protein coding genes in gonads, egg blastula, digestive gland and hemolymph (Supplementary Table 2). We further used sequences without ncRNA annotation to predict piRNA clusters with proTRAC and checked whether we can observe a differential expression of specific piRNA clusters in time and space ( Figure 3A). In contrast to the situation in L. stagnalis, we found that different genomic loci are responsible for production of primary piRNAs in the germline and in the soma, but also during different developmental stages. A clustering approach based on average linkage (Babicki et al. 2016) revealed four distinct groups of piRNA clusters which we named class 1-4 piRNA clusters ( Figure 4A). Class 1 piRNA clusters are active in the adult germline (male and female) and in the early embryo until the D-shaped veliger stadium where larvae are approximately 14 hours old. The same applies to class 2 piRNA clusters, however, following the D-shape veliger stadium, class 1 piRNA clusters become inactive, while class 2 piRNA clusters remain active and class 3 piRNA clusters start piRNA production. Both, class 2 and class 3 piRNA cluster activity is measurable until the juvenile stadium, where oysters are approximately 20 days old. In somatic tissues of adult oysters, class 4 piRNA clusters represent the main source of primary piRNAs ( Figure 4A, bottom). Interestingly, all four classes of piRNA clusters are active in hemocytes, which also feature the highest amount of clustered reads, and ping-pong reads compared to other somatic tissues. This might reflect the presence of stem cells within the hemocyte cell population which is subject to complex differentiation processes (Fisher 1986, Lau et al. 2017. Interestingly, the four classes of piRNA clusters differ considerably regarding the overall transposon content as well as the specific transposon composition ( Figure 3B-D). Class 1 and class 2 piRNA clusters are generally enriched for transposon sequences showing 38% and 36% transposon derived sequences, respectively, compared to a genomic transposon content of 29%. The surprisingly high accumulation of young (as deduced from the divergence from their consensus) Gypsy elements in piRNA clusters, suggests a strong selection for Gypsy element insertions, probably as a consequence of Gypsy activity in C. gigas. Considering transposons that are generally enriched in piRNA clusters ( Figure 3E) we found that R2 retrotransposons (149-fold enrichment in piRNA clusters) and Dada DNA transposons (40fold enrichment in piRNA clusters) are most abundant in class 1 piRNA clusters. In contrast, Polinton DNA transposons (32-fold enrichment in piRNA clusters) and BEL retrotransposons (5-fold enrichment in piRNA clusters) are most abundant in class 2 piRNA clusters. Different from class 1 and class 2 piRNA clusters, class 3 and class 4 piRNA clusters display only slight transposon enrichment (30% and 31%, respectively). Noteworthy, high copy number Gypsy retrotransposons (5-fold enrichment in piRNA clusters) are most abundant in class 3 piRNA clusters, while Academ, Crypton and Tx1 transposons are most abundant in class 4 piRNA clusters. These results contrast with the situation in L. stagnalis, where identical piRNA producing loci are active in the germline and in the soma. Moreover, we can observe considerable differences in the transposon composition of piRNA clusters in the two species, which likely reflect divergent transposon activity and resulting selective constraints on the different phylogenetic lineages.

Homotypic and heterotypic ping-pong amplification
The so-called ping-pong amplification loop describes a process that is responsible for the posttranscriptional silencing of transposable elements. In Drosophila and mouse, this process typically involves two PIWI paralogs (heterotypic ping-pong), one loaded with antisense piRNAs targeting transposon transcripts, and the other loaded with sense piRNAs targeting piRNA cluster transcripts, which contain transposon sequences in antisense orientation , Aravin et al. 2008. Likely for steric reasons, premature piRNAs loaded onto the different PIWI paralogs are more or less rigorously trimmed at their 3' ends. This is why piRNA populations bound to different PIWI paralogs not only differ regarding the amount of sense-and antisensetransposon sequences, but also in their sequence length profiles , Kawaoka et al. 2011, Czech and Hannon 2016. In addition to the heterotypic ping-pong amplification, homotypic pingpong has been shown to occur in qin mutant flies (Aub:Aub, Zhang et al. 2011), andwildtype prenatal mouse testis (Miwi2:Miwi2, Mili:Mili, Aravin et al. 2008).
Since the typical molluskan genome encodes two ubiquitously expressed PIWI paralogs, Piwil1 and Piwil2, we asked whether we are able to show the participation of distinct piRNA populations in the ping-pong cycle. We conducted a bioinformatics approach under the premise that Piwil1-and Piwil2bound piRNAs exhibit different length profiles, which is the case for the corresponding mouse homologs Piwil1 (Miwi) which preferentially binds 29/30 nt piRNAs, and Piwil2 (Mili), which preferentially binds 26/27 nt piRNAs (Vourekas et al. 2012). We analyzed pairs of mapped C. gigas and L. stagnalis sequence reads that showed a 10 bp 5' overlap (ping-pong pairs), with respect to the sequence length of each ping-pong partner (Figure 4, Supplementary Figure  3). In the female gonad of C. gigas, most ping-pong pairs combine piRNAs with a length of 25 nt and 29 nt, which suggests heterotypic ping-pong amplification. (Figure 4) In support of this, 29 nt piRNAs, presumably bound to Piwil1, are heavily biased for a 5' uridine (a hallmark of primary piRNAs), whereas 25 nt piRNAs, presumably bound to Piwil2, show a stronger bias for an adenine at position 10 (typical for secondary piRNAs). In contrast, ping-pong pairs in C. gigas muscle predominantly combine two 29 nt piRNAs, suggesting homotypic, Piwil1-based pingpong amplification. Generally, the observed patterns of ping-pong pairs are very diverse across the different samples, for instance displaying heterotypic ping-pong in the digestive gland and homotypic, Piwil2-based ping-pong in hemolymph cells (Supplementary Figure 3). Since the expression of Piwil1 compared to Piwil2 is considerably lower in L. stagnalis, we were curious to check whether the corresponding ping-pong pairs might reflect this fact. Indeed, 26/26 nt pairs (homotypic, Piwil2-based ping-pong) represent the majority of ping-pong pairs in the reproductive tract, followed by 25/28 nt pairs (Figure 4). In addition, homotypic Piwil2-based ping-pong amplification with 24/25 nt ping-pong pairs is also dominant in the L. stagnalis muscle.

Conclusions
Our results reveal that mollusks utilize the PIWI/piRNA pathway as a defense against transposable elements in the germline and in the soma, which corresponds to the situation in arthropods and therefor suggests somatic PIWI/piRNA expression as an ancestral bilaterian character state. In addition, based on the observation that a substantial fraction of arthropod and oyster piRNAs targets messenger RNAs (a gene annotation for L. stagnalis was not available) it seems likely that the last common ancestor of arthropods and mollusks applied the PIWI/piRNA pathway also for post-transcriptional regulation of protein coding genes. In vertebrates, somatic PIWI/piRNA expression appears to have fade away and reports on somatically expressed piRNAs in mammals are often considered with skepticism. However, remnants of the former somatic expression might have outlasted to fulfill special functions in specific cells and/or in narrowly defined timespans of development or cell differentiation in the one or the other clade. In any case, we should be aware of the attitude that experiments with Drosophila and mouse will tell us everything that is worth to know about the PIWI/piRNA pathway.

Piwi gene annotation and tree reconstruction
In order to reconstruct the phylogenetic relations of mollusk Piwi proteins, we first searched for Piwi genes in species with an available genome sequence that lack proper annotation (Lymnaea stagnalis, Radix auricularia, Lottia gigantea, Bathymodiolus platifrons, Pinctada martensii). To this end, we scanned the relevant genomes for sequences that are homologous to annotated Piwi paralogs of the pacific oyster (EKC35279 and EKC29295) by aligning translated DNA sequences using tblastx. Neighboring hits with a distance smaller than 10 kb were grouped as exons of distinct gene loci. Only groups containing the overall best hits for a given locus were retained. Finally, the predicted gene homologs were checked for presence of PIWI and PAZ domains using NCBI conserved domain database (Marchler-Bauer et al. 2015). Similarly, for Piwi expression analysis by qPCR in the pond snail, we identified the housekeeping gene GPI (glucose-6-phosphate isomerase) by comparison with the human ortholog (ARJ36701). The predicted and annotated Piwi protein sequences of all available molluskan species together with human Piwi paralogs (Piwil1-4) and Drosophila Ago3 were aligned using MUSCLE (Edgar 2004). The resulting protein alignment was then used for phylogenetic tree reconstruction with PhyML (Guindon et al. 2009) using approximate likelihood-ratio test (SH-like) and Dayhoff substitution model.

qPCR
To estimate the expression of the Piwil homologs in several tissues of L. stagnalis and C. gigas we performed qPCR with cDNA synthesized from the total RNA fraction of these tissues. Total RNA was isolated with TriReagent and the polyadenylated transcriptome was reversely transcribed with SuperScript IV using the RT-primer 5'-CGAATTCTAGAGCTCGAGGCAGGCGACATGT25VN-3'. Primers amplifying ~ 200 bp long products of the respective Piwil homologs and housekeeping genes were designed with the NCBI tool primer-BLAST on basis of the L. stagnalis genome scaffold (14639) and the C. gigas genome assembly (10758). To prevent amplification of residual genomic DNA, primers were designed to be exon-junction spanning or to span at least several intronic regions. The respective biological replicates were analyzed as technical duplicates on a Corbett Rotor-Gene 6000 real-time PCR cycler and the transcript expression was quantified by standard curves of the individual primer pair amplicons. For each cDNA sample the calculated PIWI concentrations were finally normalized by the calculated copy numbers of the housekeeping genes.

Small RNA extraction and sequencing
Experiments were performed on commercially available C. gigas animals from the western French Atlantic coast (lle d'Oleron) and captured wild living L. stagnalis animals from South-western Germany (Heppenheim). We extracted total RNA from L. stagnalis reproduction tract (incl. ovotestis, oviduct, spermatheca, spermiduct, prostate, uterus, vagina, vas deferenc) and foot muscle, and total RNA from C. gigas adductor muscle and gonadal tissue with TriReagent according to the manufacturer's instructions. For each species we sampled two different individuals per tissue. The small RNA fractions of each obtained total RNA probe were sequenced at BGI, Hong Kong, on a BGISEQ-500 unit. Sequence datasets are deposited at NCBI's Sequence Read Archive (SRA) and can be accessed under the SRA project ID SRP123456. We further used previously published small RNA sequence data from C. gigas (Xu et al. 2014) to analyze piRNA expression and characteristics with respect to different developmental stages.

Repeat annotation
We performed de novo prediction of repetitive elements in the genome of L. stagnalis with RepeatScout (v. 1.0.5, Price et al. 2005). Predicted repetitive elements were classified with RepeatClassifier which is part of the RepeatModeler (v. 1.0.11) package. The resulting repeat sequences, as well as a complete collection of currently available molluskan repeat sequences from RepBase (Bao et al. 2015) were used as reference sequences for repeat masking of the L. stagnalis and C. gigas genomes with RepeatMasker (v. 4.0.7) using the cross_match search engine and the option -s for most sensitive masking.
Processing and annotation of small RNA sequence data Small RNA sequence datasets were collapsed to non-identical sequences, retaining information on sequence read counts using the Perl script collapse. Sequences >36nt were rejected using the Perl script length-filter. Finally, low complexity sequences were filtered using the Perl script duster with default parameters. All Perl scripts mentioned are part of the NGS toolbox . We then applied a customized mapping strategy of the remaining small RNA sequence reads based on the consideration that our datasets presumably contain considerable amounts of transposonderived piRNAs as well as post-transcriptionally edited (e.g. A-to-I) or tailed miRNAs and piRNAs. Genomic mapping was performed with SeqMap (Jiang and Wong 2008) using the option /output_all_matches and allowing up to three mismatches. The obtained alignments were further filtered using an in-house Perl script that is available upon request. For the final alignments we allowed up to two non-template 3' nucleotides and up to one internal mismatch. For each sequence, we only considered the best alignments in terms of mismatch counts, but did not reject alignments with equal quality in case of multiple mapping sequences. Sequences that did not produce at least one valid alignment to the reference genome were rejected. To improve small RNA sequence annotation, we performed de novo tRNA, rRNA and miRNA prediction based on the available reference genome assemblies GCA_900036025.1 v1.0 (L. stagnalis) and GCA_000297895.1 oyster_v9 (C. gigas). tRNA annotation was performed with a local copy of tRNAscan (v. 1.3.1, Lowe and Chan 2016). Only tRNAs with less than 5% N's were taken for further analysis. rRNA sequences were predicted using a local copy of RNAmmer (v. 1.2, Lagesen et al. 2007) and hmmer (v. 2.2g, Johnson et al. 2010). Both tools were run with default parameters. We pooled small RNA sequence reads from different replicates and tissues for each species separately to perform miRNA de novo prediction with ShortStack (v. 3.8.4, Axtell 2013) using default parameters. The predicted tRNA, rRNA and miRNA precursor sequences, as well as previously published miRNA precursor sequences (Xu et al. 2014, Zhou et al. 2014, Zhao et al. 2016, were used as additional reference sequences for small non-coding RNA annotation with unitas (v.1.4.6, Gebert at. al. 2017).

piRNA cluster identification
Sequences that did not produce a match to known non-coding RNAs were considered as putative piRNAs and were used for prediction of piRNA clusters with proTRAC (v. 2.4.0) applying default settings. piRNA clusters were predicted for each dataset and species separately. The resulting piRNA cluster predictions for each species were condensed, merging clusters with less than 10 kb distance from each other. Finally, we calculated the sequence read coverage [rpm] for each of the resulting piRNA clusters per dataset. For C. gigas piRNA clusters, a heat map for the top 100 piRNA clusters in terms of maximum rpm coverage (accounting for 64% of summed rpm values) was constructed with Heatmapper (Babicki et al. 2016) applying Pearson distance and average linkage clustering. For L. stagnalis piRNA clusters we

Ping-pong quantification
In order to compare ping-pong signatures across multiple datasets with different sequencing depth, we constructed a software tool, PPmeter, that creates bootstrap pseudo-replicates from original datasets and subsequently analyzes the ping-pong signature and number of ping-pong sequence reads of each pseudo-replicate (default: 100 pseudo-replicates each comprising one million sequence reads). The obtained parameters 'ping-pong score per million bootstrapped reads' (pps-mbr) and 'ping-pong reads per million bootstrapped reads' (ppr-mbr) can be used for quantification and direct comparison of ping-pong activity in different small RNA datasets. The software is freely available at http://www.smallRNAgroup.uni-mainz.de/software.html. Z-score Z-score Z-score Z-score Z-score Z-score Z-score Z-score Z-score Z-score Z-score Z-score Z-score Z-score Z-score Z-score 10 10 10 10 10 10 5' overlap 5' overlap 5' overlap 5' overlap 5' overlap 5' overlap 10 10 10 10 10 10 5' overlap 5' overlap 5' overlap 5' overlap 5' overlap 5' overlap 10 10 10 10 10 5' overlap 5' overlap 5' overlap 5' overlap 5' overlap Supplementary Figure 2. Graphs depict the average number of sequence read pairs with a specific 5' overlap for 100 pseudo-replicates (PR) per dataset with one million reads per PR. The value for read pairs with 10 nt overlap can serve as a measurement for the intensity of ping-pong amplification and is also directly comparable across different datasets.