Decoding the 5′ nucleotide bias of PIWI-interacting RNAs

Stein, Chad B.; Genzor, Pavol; Mitra, Sanga; Elchert, Alexandra R.; Ipsaro, Jonathan J.; Benner, Leif; Sobti, Sushil; Su, Yijun; Hammell, Molly; Joshua-Tor, Leemor; Haase, Astrid D.

doi:10.1038/s41467-019-08803-z

Download PDF

Article
Open access
Published: 19 February 2019

Decoding the 5′ nucleotide bias of PIWI-interacting RNAs

Nature Communications volume 10, Article number: 828 (2019) Cite this article

5748 Accesses
38 Citations
7 Altmetric
Metrics details

Subjects

Abstract

PIWI-interacting RNAs (piRNAs) are at the center of a small RNA-based immune system that defends genomes against the deleterious action of mobile genetic elements (transposons). PiRNAs are highly variable in sequence with extensive targeting potential. Their diversity is restricted by their preference to start with a Uridine (U) at the 5′ most position (1U-bias), a bias that remains poorly understood. Here we uncover that the 1U-bias of Piwi-piRNAs is established by consecutive discrimination against all nucleotides but U, first during piRNA biogenesis and then upon interaction with Piwi’s specificity loop. Sequence preferences during piRNA processing also restrict U across the piRNA body with the potential to directly impact target recognition. Overall, the uncovered signatures could modulate specificity and efficacy of piRNA-mediated transposon restriction, and provide a substrate for purifying selection in the ongoing arms race between genomes and their mobile parasites.

piRNA processing by a trimeric Schlafen-domain nuclease

Article Open access 27 September 2023

Structural basis for piRNA targeting

Article 01 September 2021

Relaxed targeting rules help PIWI proteins silence transposons

Article Open access 21 June 2023

Introduction

PIWI-interacting RNAs (piRNAs) and their PIWI protein partners establish restriction of transposons in germ cells, and thus guard genomic identity^1,2,3,4. Mutations in core piRNA pathway genes result in sterility and threaten the survival of a species. While mechanisms of piRNA biogenesis differ greatly from those of other small silencing RNAs, core concepts of small RNA-guided regulation of gene expression are conserved: At the heart of all RNA silencing mechanisms resides the RNA-induced silencing complex (RISC), which consists of a small non-coding RNA and its Argonaute protein partner (AGOs and PIWIs)⁵. Within RISC, the small RNA determines target specificity by complementary base-pairing, and the Argoanute protein initiates transcriptional or post-transcriptional silencing mechanisms^6,7.

It has long been observed that several classes of small RNAs preferentially harbor a Uridine (U) at their 5′-most position, particularly piRNAs⁸. In contrast to the well-defined mechanism of microRNA (miRNA) biogenesis, processing of piRNAs from long single-stranded transcripts is poorly understood⁹. The 5′ ends of mature piRNAs are generated first, either by the action of the conserved endonuclease Zucchini(Zuc)/PLD6^10,11 or by the slicer activity of a PIWI protein itself^8,12,13. After association with PIWI proteins, the 3′ end of mature piRNAs is generated either by a second Zuc/PLD6-cut¹⁴, or by exonucleolytic trimming to resemble what is believed to be the footprint of the associated PIWI protein^14,15,16. The resulting piRNAs are highly diverse in sequence and variable in length, and are best defined by the association with their PIWI partners.

While the first base of piRNAs is hidden within PIWI proteins, all other nucleotides could contribute to target recognition. The rules of their target engagement, however, remain largely elusive, and potentially range from fully complementary base-pairing to a seed-based recognition mechanism^17,18. Together with the high diversity of piRNAs, with hundreds of thousands of unique sequences, and lack of sequence conservation, it remains unclear how target-specificity of these potent silencing pathways is regulated^4,8,12,19,20.

One prominent feature restricts the enormous sequence space of mature piRNAs: they preferentially harbor a Uridine (U) at their 5′ most position. This 1U-bias is conserved across species and has also been observed for other classes of small RNAs⁹. While the molecular source of the 1U-bias and its function are poorly understood, the physical position of the small RNA’s first base within RISC is well-defined as anchored to a specialized pocket in the middle (MID) domain of Argonaute proteins, termed the specificity loop (SL)²¹. Based on structural data and in vitro binding studies, the SL has been proposed to establish the 1U bias by selecting for 1U small RNAs^{16,22,23,24,25}. However, the recent observation of phased piRNA production by the endonuclease Zucchini (Zuc) implies 1U-specific processing of piRNA precursors instead^26,27. Here, we investigate both hypotheses in vivo, using Drosophila as a model system. Our data support that the 1U-bias of Piwi-piRNAs is established by the consecutive and differential selection against all nucleotides but U, first during piRNA biogenesis and then by Piwi’s specificity loop. Furthermore, we uncover processing-dependent selection against U within the piRNA body with the potential to directly shape the piRNA-target repertoire. Overall, we propose that the complex establishment of an ultimate 1U-bias could provide a substrate for purifying selection to improve the specificity and efficiency of transposon silencing.

Results

The specificity loop (SL) contributes to but does not solely determine the 1U-bias

To directly probe the impact of the SL on the 1U-bias of piRNAs, we generated mutants for Piwi’s SL, characterized their associated piRNAs, and investigated their function in Drosophila ovaries and in ovarian somatic sheath cells (OSC)^28,29. Our Piwi-SL mutants comprise three categories by design (Fig. 1a, b): (I) Substitute mutants exchange Piwi’s SL with the SL of other Argonaute proteins, whose physiological first nucleotide bias is well defined, including a 1C-bias (At AGO5)^22,24 and equal first nucleotide distribution (Dm Ago3)^8,13. (II) Synthetic loops were designed to weaken the 1U-fit based on available protein structures^{21,30,31,32,33,34}, including a complete replacement of Piwi’s SL with a flexible stretch of three Glycine-Serine-Serine repeats (GSS). Finally, we generated a loop deficient (LD) mutation (III) that removes the SL entirely. We purified FLAG-HA-tagged Piwi mutants (FH-Piwi-SL) from OSC and analyzed their associated small RNAs by Illumina sequencing (Fig. 1c). PiRNAs associated with Piwi-SL mutants were similar to wild type in length, targeting potential, and genomic annotation (Fig. 1d–f, Supplementary Fig. 1a, b). Surprisingly, while most alterations to Piwi’s SL slightly reduced the 1U frequency of Piwi-piRNAs, none of the mutants abrogated the observed 1U-bias, thus suggesting an SL-independent establishment (Fig. 1g, Supplementary Fig. 1c).

The identity of Piwi’s SL is not required for Piwi’s function in vivo

To re-evaluate this observation in vivo, and to investigate the biological impact of alterations to Piwi’s SL, we characterized Piwi-SL mutants in Drosophila ovaries. FH-Piwi-SL(Ago3) and FH-Piwi-SL(GSS), restored fertility and ovary morphology in a piwi null background similar to a wild type rescue construct, but in contrast to the loop-deficient mutant (Piwi-LD) (Fig. 2a, and Supplementary Fig. 2a–c). The potential of Piwi-SL-mutants to restore piwi function correlated with their ability to form functional Piwi-piRISC, as indicated by the accumulation of nuclear FH-Piwi-SL protein^35,36 (Fig. 2b). Thus, a particular primary sequence of Piwi’s SL is not required for Piwi function in vivo, but complete removal of the SL likely impairs protein folding and consequently piRISC formation as previously described^36,37. PiRNAs associated with Piwi-LD (in piwi > Gal4 heterozygous flies (indicated by *)), however, closely resemble those associated with other SL mutants and Piwi wt in length (Fig. 2c), targeting potential (Fig. 2d), and genomic annotation (Fig. 2e). As in OSC, Piwi-piRNAs maintained a 1U-bias in SL mutants with 1U frequencies above 60% in all experiments (Fig. 2f). Taken together, our comprehensive characterization of Piwi-SL mutants revealed that the 1U-bias of Piwi-piRNAs is established independently of but reinforced by Piwi’s SL. Our SL mutants eliminate SL-dependent nucleotide selection in the first position but retain the ability to associate with Zuc-dependent piRNAs. Thus, they likely grant an unrestricted view on piRNA 5′-termini generated by the Zuc-processor complex, which are otherwise rapidly degraded in the absence of their Piwi protein partner^38,39.

Preferences during piRNA biogenesis establish sequence restrictions

To characterize potential processing preferences that could extend beyond the first position of mature piRNAs in the context of the precursor transcripts, we collapsed all uniquely mapping piRNAs by their 5′ end (position 0), extended their genomic interval, and calculated nucleotide frequencies for each position within a 101 nt window (Fig. 3). This metagene analysis revealed that the U-rich mature 5′ ends of Piwi-piRNAs reside in a relatively U-poor and G-rich environment, which spans the observed piRNA body (0 to +25; indicated in gray) and one piRNA length upstream (−1 to −25) (Fig. 3a). We also observed peaks in U frequencies and reciprocal drops in purine frequencies (A and G) one piRNA-length upstream (−26) and downstream (+26) of the observed piRNA 5′ end, perhaps indicating the first position of neighboring piRNAs. This pattern is reminiscent of a phased processing signature^26,27,40, and likely represents coupled processing of the 5′ and 3′ termini of neighboring piRNAs by a single Zuc-cleavage event. The observed sequence preferences are dependent on piRNA processing as they do not reflect a bias in the genomic sequence of piRNA clusters when considering all possible 5′ positions, stratified by the first nucleotide, in our background analysis (Fig. 3a insert). These signatures were consistent between Piwi-piRNAs from OSC and from fly ovaries, and to a great extent conserved in Miwi/Piwil1-piRNAs from adult mouse testis (Fig. 3c, d). Importantly, these preferences were independent of Piwi’s SL with correlation coefficients (Pearson) for individual nucleotides greater than 0.6 across the 101 nt interval in OSC and Drosophila ovaries (Fig. 3d, e, and Supplementary Fig. 3). While the 5′-most nucleotide of piRNAs is deeply buried in PIWI’s MID domain and not available for target recognition by base pair complementarity, the 1U-bias has the potential to modulate the overall target space by altering the representation of individual piRNA sequences. In contrast, the observed preferences throughout the piRNA body directly shape the piRNA-target interface.

The 1U-bias is established by a universal mechanism across piRNA generating regions

Most piRNAs originate from a few piRNA generating regions termed piRNA clusters^8,12. If the 1U-bias is indeed primarily established by a general piRNA processing mechanism, we would predict that the 1U frequency of piRNA-groups, according to their generating clusters, should correlate with the T-content of the cluster itself and result in a constant 1U-enrichment factor. To test this hypothesis, we correlated the first nucleotide frequencies of piRNA-groups with the genomic nucleotide frequencies of the corresponding clusters (Fig. 4a). The observed 1U frequencies of piRNA-groups positively correlated with the genomic T-content of their generating clusters (Correlation coefficient (Pearson) p > 0.5) with a universal enrichment of about 2.3-fold (observed over expected 1U frequencies) (Fig. 4a insert), further supporting 1U enrichment during piRNA biogenesis.

Our metagene analysis as well as characterization of individual piRNA cluster-groups suggest the establishment of a 1U-bias during piRNA processing. These findings offer an explanation why Ago3-piRNAs exhibit an unexpected 1U-bias when aberrantly fueled by the Zuc-processor instead of ping-pong-generated piRNAs in OSC⁴¹. Nevertheless, the observed processing-dependent sequence preferences are surprising, because neither fly nor mouse Zuc exhibit obvious sequence-specific processing in vitro^10,11. However, cofactors or post-translational modifications likely create sequence preferences of the Zuc-processor complex in vivo^26,42.

Piwi’s SL selects against 1C-piRNAs to re-enforce an overall 1U-bias

Interestingly, and reinforcing our interest on Piwi’s SL, the most robust SL-dependent change affected 1C-piRNAs across different clusters in OSC (Fig. 4b) and in flies (Supplementary Fig. 4a, b), despite the overall U-richness and A-richness of these piRNA generating regions (Supplementary Fig. 4c). The observed ~40% increase in 1C-piRNAs eliminates any bias against 1C to frequencies that are expected by genomic origin (Fig. 4a insert). In contrast, purines (A and G) were not significantly altered in SL-mutants (Fig. 4c, d and Supplementary Fig. 4a, b). Interestingly, aversion of 1C due to repulsion by the SL has previously been suggested based on structural modeling and the inability to obtain such Argonaute-small RNA crystals^21,37. The initial hypothesis of 1U-selection by attractive interactions, based on the visible fit of 1U in structural studies, might have diverted our attention from the less visible but potentially more significant negative selection of 1C: while 1U-enrichment by affinity requires two hydrogen bonds to stabilize the 1U-PIWI interaction, a single repulsive interaction is expected to reduce 1C binding (Fig. 5 and Supplementary Fig. 5). An exclusionary rather than affinity function of PIWI’s SL offers an explanation as to why, in the only available structure of a PIWI protein (silkworm PIWI, i.e., SIWI), only one hydrogen bond is observed between the SL and 1U, despite the protein’s marked preference for 1U piRNAs in vivo³³. Taken together, our data suggest that the 1U-bias of Piwi-piRNAs is established through a two-step gating mechanism that sequentially reduces A, G, and C by SL-independent and SL-dependent preferences (Fig. 5).

Discussion

The intricate establishment of the final 1U-bias inspires hypotheses about its role beyond reflecting a “key-lock fit” to stabilize piRISC. Binding preferences of silkworm PIWI in vitro¹⁶, and the extensive interaction of human AGOs with the first U or A of small RNAs^21,30,31,34, might be a consequence rather than the cause of the 1U-bias, ensuring optimal RISC stability. This hypothesis could explain why positive interactions between the SL and the first base of the small RNA, requiring at least two hydrogen bonds, would favor both 1U and 1A equally, in contrast to the strong preference for 1U over 1A small RNAs in vivo^21,34. With the A/U-richness of piRNA precursors and the rejection of A and G during piRNA biogenesis, PIWI’s SL only has to discriminate between 1U and 1C to reinforce an overall 1U preference in vivo.

But what is the main function of this 1U-bias and the subsequent restriction of piRNA sequence space? Reduction of 1G- and 1C-RNAs might decrease mis-incorporation of RNA fragments from protein coding sequences, and preferentially represent non-coding A/U-rich transcripts, including piRNA precursors. This effect might contribute to the observed preference of mRNA-derived piRNAs to originate from the A/U-rich 3′UTRs rather than G/C-rich coding sequences^29,43. Furthermore, sequence preferences during piRNA processing are associated with relative U-depletion and G-enrichment in piRNA bodies, and directly modulate the sequence space for complementary target interactions. While we do not know the targeting rules of any nuclear piRISC, a mechanism following miRNA-like pairing¹⁷ was recently identified for C. elegans PRG-1¹⁸. Such a seed-based mechanism involves base-pairing of positions two to eight of the piRNA with the target transcript, and additional 3′ supplemental interactions, all of which would be affected by the observed processing preferences.

Overall, the sequence bias during piRNA processing and piRISC formation could provide substrates for purifying selection in the ongoing arms race between genomes and their mobile parasites to improve specificity and efficiency of piRNA silencing, and avoid auto-aggression while restricting the entire transposome. As a defense against ancient retroviral invaders and other transposons, rules of piRNA silencing might foremost follow those that govern efficient viral defense: establishment of restriction and specificity at multiple independent steps to hamper development of resistance by the invader, where every contribution counts⁴⁴.

Methods

DNA constructs and transgenic flies

Plasmids and oligonucleotides are listed in Supplementary Data 1. Piwi coding sequence (cds) was cloned into pENTRD/TOPO (ThermoFisher). Mutations in Piwi’s specificity loop (SL) were introduced using the Q5 Site-Directed Mutagenesis Kit (NEB), and mutagenic primers were purchased as recommended by the manufacturer (NEBuilder). Expression vectors were generated through the LR Clonase (ThermoFisher) reaction, using the pPFHW destination vector (UASp promoter, N-terminal 3×FLAG/3×HA) (DGRC: #1125).

CRISPR constructs for transgenic piwi>Gal4 flies

For the sgRNA construct, oligonucleotides (piwi_sgRNA_f and piwi_sgRNA_r) were annealed and ligated into BbsI-digested U6-BbsI-chiRNA (Addgene # 45946). The donor construct for genome-editing was generated as follows: pDONR-ETG3R and its homology arm-containing derivatives were cloned by HiFi Assembly (NEB). First, pDONR221 (ThermoFisher) was digested with PciI and EcoRI (NEB) and the KanR-containing 2.4 kb fragment was gel extracted and purified (Zymo Research). Next, EGFP-T2A-Gal4 was PCR amplified from burs-mCD8-EGFP-T2A-Gal4 (a gift of Dr. Benjamin White, Addgene plasmid #39463), and 3×P3-driven dsRed2 was PCR amplified from pT-GEM(1) (Addgene plasmid #62893). These three fragments were HiFi assembled to produce pDONR-ETG3R. Homology arms corresponding to 1 kb upstream and 1 kb downstream of the Piwi start codon were PCR amplified using genomic DNA from TH_attP2 nos-Cas9 flies (gift of Dr. Benjamin White) extracted using the GenElute Genomic DNA Miniprep Kit (Sigma-Aldrich). Finally, pDONR-ETG3R was digested with PciI and EcoRI (NEB), and the homology arms were assembled flanking the Gal4-containing cassette by HiFi Assembly (NEB). All oligonucleotides were purchased from IDT.

The CRISPR piwi>Gal4 allele was generated by co-injection of plasmids encoding the sgRNA and the donor construct for genome-editing into TH_attP2 nos-Cas9 flies (Bestgene plan Plan RI, Marker: RFP+/DsRed +). UAS-Piwi△SL transgenic flies were generated by random insertion into w1118 (Service type: Plan C; screening marker w+).

Immunofluorescence and microscopy

Drosophila ovaries fixed in 4% paraformaldehyde were dispersed into ovarioles to ensure maximum exposure to antibodies. Samples were washed 3 × 10 min in PBSTw (PBS+ 0.1% Tween) on rocker. Ovarioles were permeabilized for 30 min in PBSTr (PBS+ 0.2% Triton), washed, and blocked in 2% BSA for 1h with rocking. Primary antibody (α-HA mouse 1:100, Biolegend Cat#:90501; α-Piwi rabbit polyclonal antibody, immunized against (MADDQGRGRRRPLNEDC) 1:100) in 2% BSA was incubated overnight at 4 ˚C. Secondary antibodies coupled to Alexa fluorophores: goat α-mouse Alexa 488 (cat#: A-11029 1:500) and goat α-rabbit Alexa 488 (cat#: A-21245 1:500) in 2% BSA were incubated for 1 h, followed by a 30 min incubation with DAPI (1 µg/mL) and washed in PBSTw for 1 h. The stained ovarioles were mounted with VectaShield HardSet antifade mounting medium on Fisherbrand superfrost plus microscope slides with coverslip and edges sealed with nail polish. Images were taken using a Zeiss LSM710 confocal microscope (NIDDK microscopy core).

Tissue culture of ovarian somatic sheath cells (OSC)

OSC were purchased from DGRC (OSS, stock #190) and cultured according to the DGRC protocol. Cells were transfected using Xfect (Clontech) according to the manufactures guidelines.

Immunoprecipiation and small RNA library preparation for Illumina sequencing

FLAG-HA-Piwi et and ΔSL constructs were immunoprecipitated using anti-FLAG M2 magnetic beads (Sigma-Aldrich). Lysis, immunoprecipitation and three washes were performed in IP-buffer (20 mM Tris HCl pH 7.5, 250 mM NaCl, 1% NP-40, 2 mM MgCl₂). NaCl was increased to 0.5 M for a fourth wash followed by a final rinse with IP buffer. RNA was extracted (Zymo Direct-zol RNA miniprep kit). Small RNA libraries for illumina sequencing were prepared according to Benhalevy et al.⁴⁵ with small modifications: in brief, libraries were size selected after 3′ ligation and 5′ linker-ligation. cDNA generated with Superscript IV reverse transcriptase was amplified with Q5 Polymerase (NEB) in a low cycle PCR (12 cycles). Pippin prep was used to remove ligated linker-linker (143–185 bp). Pilot PCR on 10% of pippin-prepped cDNA was performed to determine appropriate number of additional PCR cycles. Tapestation (Agilent) was used to determine quality and concentration of the resulting libraries. The samples were sequences on a HiSeq2500 sequencing system (Illumina) (NIDDK genomics core facility).

Analyses of the next-generation sequencing data and code availability

Mapping statistics are provided in Supplementary Data 2. Adapters and barcodes were removed using cutadapt (cutadapt 1.16). Structural RNAs were removed by mapping to a concatenated fasta file that includes (UCSC miRNA, rRNA, snRNA, snoRNA, tRNA) using STAR (STAR 2.5.4a)⁴⁶. Unmapped reads were obtained and mapped to the Drosophila genome (dm6, igenomes) using STAR. Primary alignment for multimapping reads (M ≤ 100) were extracted using Flag (-F 256). Unique mappers (M = 1) were filtered with the Flag (“NH:i:1”). When both unique mappers and Multi mappers are included for our analyses, we call them All Mappers. PiRNA cluster intervals were defined according to the original definition: ≥1 unique mapping read per kilobase (kB) of genome space, over at least five kb⁸. Original and processed data files, and count tables are deposited in GEO (GSE115839). All the analyses and plotting was performed using R (version 3.5.0 or 3.3.1) and RStudio (Version 1.0.136) either locally or using National Institutes of Health High Performance Computing Cluster (http://hpc.nih.gov). The packages used in analysis were: data.table, plyr, dplyr, reshape, reshape2, Rmisc, GenomicRanges, GenomicAlignments, ShortRead, Rsamtools, Biostrings, rlist, ggplot2, ggrepel, scales, parallel, extrafont, BSGenome.Dmelanogaster.UCSC.dm6, and BSgenome.Mmusculus.UCSC.mm10⁴⁷.

Generating count tables and plotting of data

Count tables for each Figure are provided in Supplementary Data 3–6. The sample bam files were loaded into the R environment and all un-spliced reads in size range from 19 to 50-nt long were retained for the further analysis. Reads mapping to non-standard chromosomes (Un_*) were removed. GenomicRanges R package was used to perform all the counting⁴⁸. To count transposable elements (TEs), most recent rmsk.gtf file for dm6 genome was downloaded from TEtoolkit website⁴⁹. To count cluster piRNAs, cluster.bed file was defined as described above.

To plot read size distribution, the individual sample read counts were normalized by total library size to the wt libraries. When replicates were available, we calculated and plotted mean and standard error values (Figs. 1d, 2c). To plot transposon targeting potential, reads mapping to TE antisense were retrieved, and annotated to transposon families according to repeat masker file (rmsk.gtf from TEtoolkit) (Figs. 1e, 2d). For plotting genomic annotation, reads mapping to both sense and antisense TEs, exons and introns were filtered respectively and annotated (Figs. 1f, 2e). Nucleotide frequency was calculated by counting identity of the first nucleotide of read (Figs. 1g, 2f). The genomic tracks were prepared in IGV by using a tiled data file (.tdf) that was derived from bam files using igvtools. Sequencing logos were generated with weblogo 3, for which fasta files of piRNA sequences were the primary requirement (Supplementary Figs. 1 & 2). For metagene analysis, first, genomic start position of every read was extended by 50-nucleotide upstream and downstream. Then, the sequences of corresponding genomic ranges were aggregated and the nucleotide abundance at each position was counted. To compare different nucleotides, individual nucleotide signatures were plotted on the same baseline using median expression over 101-nt window as baseline shift distance (Fig. 3a). To generate the background model, every nucleotide of piRNA-generating clusters was considered as potential piRNA, and the metagene was generated in the same manner as before (Fig. 3a insert). To simplify comparison of nucleotide signatures between samples, individual nucleotide frequencies were offset to avoid overlap, and correlation coefficients were calculated between wt and mutants (Fig. 3b–e). The same data were used to plot actual nucleotide frequencies and ratios of mutants compared to wt (Supplementary Fig. 3a–d). For calculating nucleotide bias, the cluster genomic nucleotide frequencies (expected) and the nucleotide frequencies of the 1st position of all reads (observed) were calculated. Then the ratio between observed (o) and expected (e) was designated as “nucleotide bias” (Fig. 4, Supplementary Fig. 4). To calculate correlations values, only complete pairwise observations were considered and Pearson correlation coefficients were calculated (Fig. 4). With the exception of size distribution plots, we only considered reads from 24 to 29-nt for all analyses and plotting. Fig. 4c is based on GSM2516689 (GSE95580).

Molecular modeling

Initial structural models of Drosophila melanogaster Piwi (from wildtype and mutant primary sequences) were generated using the Phyre2⁵⁰ protein fold recognition server in intensive mode. The top structures informing the Dm Piwi model were that of silkworm Piwi (Siwi)³³ and human Ago2³⁰. The models output from Phyre2 then served as input for loop remodeling of the specificity loop sequences using Rosetta version 3.9⁵¹. Fifty structures were generated and scored for each mutant; the lowest energy structure for each sequence is depicted in Fig. 1 and Supplementary Fig. 5. The lowest energy structures were then superposed onto the structure of Siwi bound to RNA to assess the predicted interfaces of the specificity loops with the 5′ nucleotide. Molecular graphics were generated using PyMOL⁵². Electrostatic surfaces were calculated with APBS⁵³.

Reporting summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Original and processed data files, and count tables are deposited in GEO (GSE115839). All data is available from the authors upon reasonable request.

References

Vagin, V. V. A distinct small RNA pathway silences selfish genetic elements in the germline. Science 313, 320–324 (2006).
Article ADS CAS PubMed Google Scholar
Girard, A., Sachidanandam, R., Hannon, G. J. & Carmell, M. A. A germline-specific class of small RNAs binds mammalian Piwi proteins. Nature 418, 244 (2006).
Google Scholar
Aravin, A. et al. A novel class of small RNAs bind to MILI protein in mouse testes. Nature 431, 343 (2006).
Google Scholar
Lau, N. C. Characterization of the piRNA complex from rat testes. Science 313, 363–367 (2006).
Article ADS CAS PubMed Google Scholar
Hammond, S. M., Bernstein, E., Beach, D. & Hannon, G. J. An RNA-directed nuclease mediates post-transcriptional gene silencing in Drosophila cells. Nature 404, 293–296 (2000).
Article ADS CAS PubMed Google Scholar
Zamore, P. D., Tuschl, T., Sharp, P. A. & Bartel, D. P. RNAi: double-stranded RNA directs the ATP-dependent cleavage of mRNA at 21 to 23 nucleotide intervals. Cell 101, 25–33 (2000).
Article CAS PubMed Google Scholar
Verdel, A. et al. RNAi-mediated targeting of heterochromatin by the RITS complex. Science 303, 672–676 (2004).
Article ADS CAS PubMed PubMed Central Google Scholar
Brennecke, J. et al. Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell 128, 1089–1103 (2007).
Article CAS PubMed Google Scholar
Iwasaki, Y. W., Siomi, M. C. & Siomi, H. PIWI-interacting RNA: its biogenesis and functions. Annu. Rev. Biochem. 84, 405–433 (2015).
Article CAS PubMed Google Scholar
Nishimasu, H. et al. Structure and function of Zucchini endoribonuclease in piRNA biogenesis. Nature 491, 284–287 (2012).
Article ADS CAS PubMed Google Scholar
Ipsaro, J. J., Haase, A. D., Knott, S. R., Joshua-Tor, L. & Hannon, G. J. The structural biochemistry of Zucchini implicates it as a nuclease in piRNA biogenesis. Nature 491, 279–283 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Aravin, A. A., Sachidanandam, R., Girard, A., Fejes-Toth, K. & Hannon, G. J. Developmentally regulated piRNA clusters implicate MILI in transposon control. Science 316, 744–747 (2007).
Article ADS CAS PubMed Google Scholar
Gunawardane, L. S. et al. A slicer-mediated mechanism for repeat-associated siRNA 5′ end formation in Drosophila. Science 315, 1587–1590 (2007).
Article ADS CAS PubMed Google Scholar
Hayashi, R. et al. Genetic and mechanistic diversity of piRNA 3′-end formation. Nature 539, 588–592 (2016).
Article CAS PubMed PubMed Central Google Scholar
Izumi, N. et al. Identification and functional analysis of the pre-piRNA 3′ trimmer in silkworms. Cell 164, 962–973 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kawaoka, S., Izumi, N., Katsuma, S. & Tomari, Y. 3′ End formation of PIWI-interacting RNAs in vitro. Mol. Cell 43, 1015–1022 (2011).
Article CAS PubMed Google Scholar
Bartel, D. P. Metazoan MicroRNAs. Cell 173, 20–51 (2018).
Article CAS PubMed PubMed Central Google Scholar
Shen, E.-Z. et al. Identification of piRNA binding sites reveals the argonaute regulatory landscape of the C. elegans germline. Cell 172, 937–951.e18 (2018).
Article CAS PubMed PubMed Central Google Scholar
Williams, Z. et al. Discovery and characterization of piRNAs in the human fetal ovary. Cell Rep. 13, 854–863 (2015).
Article CAS PubMed Google Scholar
Roovers, E. F. et al. Piwi proteins and piRNAs in mammalian oocytes and early embryos. Cell Rep. 10, 2069–2082 (2015).
Article CAS PubMed Google Scholar
Frank, F., Sonenberg, N. & Nagar, B. Structural basis for 5′-nucleotide base-specific recognition of guide RNA by human AGO2. Nature 465, 818–822 (2010).
Article ADS CAS PubMed Google Scholar
Frank, F., Hauver, J., Sonenberg, N. & Nagar, B. Arabidopsis argonaute MID domains use their nucleotide specificity loop to sort small RNAs. EMBO J. 31, 3588–3595 (2012).
Article CAS PubMed PubMed Central Google Scholar
Izumi, N. et al. Hsp90 facilitates accurate loading of precursor piRNAs into PIWI proteins. RNA 19, 896–901 (2013).
Article CAS PubMed PubMed Central Google Scholar
Mi, S. et al. Sorting of small RNAs into Arabidopsis argonaute complexes is directed by the 5′ terminal nucleotide. Cell 133, 116–127 (2008).
Article CAS PubMed PubMed Central Google Scholar
Montgomery, T. A. et al. Specificity of ARGONAUTE7-miR390 interaction and dual functionality in TAS3 trans-acting siRNA formation. Cell 133, 128–141 (2008).
Article CAS PubMed Google Scholar
Han, B. W., Wang, W., Li, C., Weng, Z. & Zamore, P. D. piRNA-guided transposon cleavage initiates Zucchini-dependent, phased piRNA production. Science 348, 817–821 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Mohn, F., Handler, D. & Brennecke, J. piRNA-guided slicing specifies transcripts for Zucchini-dependent, phased piRNA biogenesis. Science 348, 812–817 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Niki, Y., Yamaguchi, T. & Mahowald, A. P. Establishment of stable cell lines of Drosophila germ-line stem cells. Proc. Natl Acad. Sci. USA 103, 16325–16330 (2006).
Article ADS CAS PubMed PubMed Central Google Scholar
Saito, K. et al. A regulatory circuit for piwi by the large Maf gene traffic jam in Drosophila. Nature 461, 1296–1299 (2009).
Article ADS CAS PubMed Google Scholar
Schirle, N. T. & MacRae, I. J. The crystal structure of human Argonaute2. Science 336, 1037–1040 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Elkayam, E. et al. The structure of human Argonaute-2 in complex with miR-20a. Cell 150, 100–110 (2012).
Article CAS PubMed PubMed Central Google Scholar
Nakanishi, K., Weinberg, D. E., Bartel, D. P. & Patel, D. J. Structure of yeast Argonaute with guide RNA. Nature 486, 368–374 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Matsumoto, N. et al. Crystal structure of silkworm PIWI-clade Argonaute Siwi bound to piRNA. Cell 167, 484–497 (2016).
Article CAS PubMed Google Scholar
Faehnle, C. R., Elkayam, E., Haase, A. D., Hannon, G. J. & Joshua-Tor, L. The making of a slicer: activation of human Argonaute-1. Cell Rep. 3, 1901–1909 (2013).
Article CAS PubMed PubMed Central Google Scholar
Saito, K. et al. Roles for the Yb body components armitage and Yb in primary piRNA biogenesis in Drosophila. Genes Dev. 24, 2493–2498 (2010).
Article CAS PubMed PubMed Central Google Scholar
Le Thomas, A. et al. Piwi induces piRNA-guided transcriptional silencing and establishment of a repressive chromatin state. Genes Dev. 27, 390–399 (2013).
Article PubMed PubMed Central Google Scholar
Cora, E. et al. The MID-PIWI module of Piwi proteins specifies nucleotide- and strand-biases of piRNAs. RNA 20, 773–781 (2014).
Article CAS PubMed PubMed Central Google Scholar
Haase, A. D. et al. Probing the initiation and effector phases of the somatic piRNA pathway in Drosophila. Genes Dev. 24, 2499–2504 (2010).
Article CAS PubMed PubMed Central Google Scholar
Olivieri, D., Sykora, M. M., Sachidanandam, R., Mechtler, K. & Brennecke, J. An in vivo RNAi assay identifies major genetic and cellular requirements for primary piRNA biogenesis in Drosophila. EMBO J. 29, 3301–3317 (2010).
Article CAS PubMed PubMed Central Google Scholar
Homolka, D. et al. PIWI slicing and RNA elements in precursors instruct directional primary piRNA biogenesis. Cell Rep. 12, 418–428 (2015).
Article CAS PubMed Google Scholar
Sato, K. et al. Krimper enforces an antisense bias on piRNA pools by binding AGO3 in the Drosophila germline. Mol. Cell 59, 553–563 (2015).
Article CAS PubMed Google Scholar
Rogers, A. K., Situ, K., Perkins, E. M. & Toth, K. F. Zucchini-dependent piRNA processing is triggered by recruitment to the cytoplasmic processing machinery. Genes Dev. 31, 1858–1869 (2017).
Article CAS PubMed PubMed Central Google Scholar
Robine, N. et al. A broadly conserved pathway generates 3′UTR-directed primary piRNAs. Curr. Biol. 19, 2066–2076 (2009).
Article CAS PubMed PubMed Central Google Scholar
Daugherty, M. D. & Malik, H. S. Rules of engagement: molecular insights from host-virus arms races. Annu. Rev. Genet. 46, 677–700 (2012).
Article CAS PubMed Google Scholar
Benhalevy, D., McFarland, H. L., Sarshad, A. A. & Hafner, M. PAR-CLIP and streamlined small RNA cDNA library preparation protocol for the identification of RNA binding protein target sites. Methods 118–119, 41–49 (2017).
Article CAS PubMed Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Huber, W. et al. Orchestrating high-throughput genomic analysis with bioconductor. Nat. Methods 12, 115–121 (2015).
Article CAS PubMed PubMed Central Google Scholar
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Article CAS PubMed Google Scholar
Jin, Y., Tam, O. H., Paniagua, E. & Hammell, M. TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets. Bioinformatics 31, 3593–3599 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N. & Sternberg, M. J. E. The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protoc. 10, 845–858 (2015).
Article CAS PubMed PubMed Central Google Scholar
Mandell, D. J., Coutsias, E. A. & Kortemme, T. Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling. Nat. Meth 6, 551–552 (2009).
Article CAS Google Scholar
Rigsby, R. E. & Parker, A. B. Using the PyMOL application to reinforce visual understanding of protein structure. Biochem Mol. Biol. Educ. 44, 433–437 (2016).
Article CAS PubMed Google Scholar
Dolinsky, T. J., Nielsen, J. E., McCammon, J. A. & Baker, N. A. PDB2PQR: an automated pipeline for the setup of Poisson–Boltzmann electrostatics calculations. Nucleic Acids Res. 32, W665–W667 (2004).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank Jenny Hinshaw, Andy Kehr, Leo Kong, and Jinwei Zhang for advice on the design of the mutant specificity loops; Benjamin White’s group (NIMH) for plasmids and fly strains; Markus Hafner for extensive discussions and for comments on the manuscript; We thank members of the Haase, Hafner and Mammen groups and the NIH RNA community, especially Aishe Angeletti Sarshad, Daniel Benhalevy, Dimitrios Anastasakis, Bob Crouch, Rich Maraia, Vivian Cheung, Nick Guydosh, and Pedro Batista for constructive discussions. L.J. is an investigator of the Howard Hughes Medical Institute. M.H. is a scholar of the Rita Allen Foundation. We also thank the NIH High-Performance Computing group, and the genomics and microscopy core facilities at the NIDDK. This work was supported by the intramural research program of the NIDDK.

Author information

Chad B. Stein
Present address: PhD Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA
These authors contributed equally: Chad B. Stein, Pavol Genzor, Sanga Mitra, Alexandra R. Elchert.

Authors and Affiliations

National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
Chad B. Stein, Pavol Genzor, Sanga Mitra, Alexandra R. Elchert, Leif Benner, Sushil Sobti & Astrid D. Haase
W.M. Keck Structural Biology Laboratory, Howard Hughes Medical Institute, Cold Spring Harbor, 11724, USA
Jonathan J. Ipsaro & Leemor Joshua-Tor
Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
Jonathan J. Ipsaro, Molly Hammell & Leemor Joshua-Tor
Department of Biology, Johns Hopkins University, Baltimore, MD, 21218, USA
Leif Benner
National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Bethesda, MD, 20892, USA
Yijun Su

Authors

Chad B. Stein
View author publications
You can also search for this author in PubMed Google Scholar
Pavol Genzor
View author publications
You can also search for this author in PubMed Google Scholar
Sanga Mitra
View author publications
You can also search for this author in PubMed Google Scholar
Alexandra R. Elchert
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan J. Ipsaro
View author publications
You can also search for this author in PubMed Google Scholar
Leif Benner
View author publications
You can also search for this author in PubMed Google Scholar
Sushil Sobti
View author publications
You can also search for this author in PubMed Google Scholar
Yijun Su
View author publications
You can also search for this author in PubMed Google Scholar
Molly Hammell
View author publications
You can also search for this author in PubMed Google Scholar
Leemor Joshua-Tor
View author publications
You can also search for this author in PubMed Google Scholar
Astrid D. Haase
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.B.S. initiated the project and generated FH-Piwi-SL mutants, P.G. and S.M. analyzed Illumina sequencing data. A.R.E. and C.B.S. performed experiments in tissue culture and in flies with the help of L.B., S.S. and Y.S. J.J.I. generated select SL mutants and performed structural modeling experiments. M.H. advised bioinformatic and biostatistics analyses. L.J. advised structural design. A.D.H. conceived the project, designed experiments and wrote the manuscript. All authors commented on the manuscript.

Corresponding author

Correspondence to Astrid D. Haase.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Journal peer review information: Nature Communications thanks Katalin Tóth and the other anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Peer Review File

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Stein, C.B., Genzor, P., Mitra, S. et al. Decoding the 5′ nucleotide bias of PIWI-interacting RNAs. Nat Commun 10, 828 (2019). https://doi.org/10.1038/s41467-019-08803-z

Download citation

Received: 01 August 2018
Accepted: 30 January 2019
Published: 19 February 2019
DOI: https://doi.org/10.1038/s41467-019-08803-z

This article is cited by

Themes and variations on piRNA-guided transposon control
- Zuzana Loubalova
- Parthena Konstantinidou
- Astrid D. Haase
Mobile DNA (2023)
Emerging roles and functional mechanisms of PIWI-interacting RNAs
- Xin Wang
- Anne Ramat
- Mo-Fang Liu
Nature Reviews Molecular Cell Biology (2023)
Unraveling mitochondrial piRNAs in mouse embryonic gonadal cells
- Odei Barreñada
- Eduardo Larriba
- Jesús del Mazo Martínez
Scientific Reports (2022)
An evolutionarily conserved stop codon enrichment at the 5′ ends of mammalian piRNAs
- Susanne Bornelöv
- Benjamin Czech
- Gregory J. Hannon
Nature Communications (2022)
piRNA-IPdb: a PIWI-bound piRNAs database to mining NGS sncRNA data and beyond
- Odei Barreñada
- Eduardo Larriba
- Jesús del Mazo
BMC Genomics (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.