Efficient and specific oligo-based depletion of rRNA

In most organisms, ribosomal RNA (rRNA) contributes to >85% of total RNA. Thus, to obtain useful information from RNA-sequencing (RNA-seq) analyses at reasonable sequencing depth, typically, mature polyadenylated transcripts are enriched or rRNA molecules are depleted. Targeted depletion of rRNA is particularly useful when studying transcripts lacking a poly(A) tail, such as some non-coding RNAs (ncRNAs), most bacterial RNAs and partially degraded or immature transcripts. While several commercially available kits allow effective rRNA depletion, their efficiency relies on a high degree of sequence homology between oligonucleotide probes and the target RNA. This restricts the use of such kits to a limited number of organisms with conserved rRNA sequences. In this study we describe the use of biotinylated oligos and streptavidin-coated paramagnetic beads for the efficient and specific depletion of trypanosomal rRNA. Our approach reduces the levels of the most abundant rRNA transcripts to less than 5% with minimal off-target effects. By adjusting the sequence of the oligonucleotide probes, our approach can be used to deplete rRNAs or other abundant transcripts independent of species. Thus, our protocol provides a useful alternative for rRNA removal where enrichment of polyadenylated transcripts is not an option and commercial kits for rRNA are not available.

communication with ThermoFisherScientific). Similar problems arise in case of production stops. Even though a recent study comparing three kits for the depletion of bacterial rRNA found the Ribo-Zero kit (Illumina) to exhibit the highest efficiency 14 , production of this kit has been discontinued as of November 2018.
Thus, our goal was to establish a protocol for the specific depletion of rRNA molecules that would allow the generation of high-quality transcriptome analyses without the need for commercially available kits.
Here, we describe an efficient and highly specific rRNA removal approach that can be easily adapted to deplete rRNA or other transcripts of any species. Using a set of only 12 biotinylated DNA oligos, we were able to reduce the levels of the most abundant rRNA transcripts to less than 5% of the total RNA with minimal off-target effects.

Results
Design of rRNA-specific biotinylated oligonucleotides. Similar to ribosomes from other eukaryotes, the T. brucei ribosome contains more then 50 different rRNAs 15 . Sequencing total RNA, we found that 28S alpha, 28S beta and 18S are the most abundant rRNA transcripts in T. brucei, contributing to 75% of the total T. brucei RNA (Supplementary Table 1). Thus, to deplete rRNAs of the small and large ribosomal subunits, we established a protocol for the efficient removal of five rRNAs belonging to the major subunits (28S alpha, 28S beta, 18S, 5.8S and 5S) using customized, biotinylated hybridization probes and capturing of the probe:rRNA hybrid by streptavidin-coated beads (outlined in Fig. 1a). To ensure depletion of partially degraded transcripts, rRNA transcripts longer than 1 kb (28S alpha, 28S beta and 18S) were targeted with three probes-one against the 5´-end, one against the middle and the other against the 3′-end of the transcript. The efficiency of rRNA removal strongly relies on sequence-specific hybridization of these probes. Non-specific cross-hybridization can cause biases in transcriptome profiles.
Thus, to obtain highly specific probes, 25 bp probes were designed using Primer3Plus 16 and evaluated for possible cross-hybridizations using BLAST 17 . Next, to increase the annealing temperature, the probe length was extended to 50 bp and re-evaluated for possible cross-hybridization using BLAST. A total of 12 probes (Table 1) were ordered with a 5′-biotin, to allow easy pull-out with streptavidin-coated paramagnetic beads. Alternatively, probes may be designed with specifically developed programs 18 .
Using biotinylated oligos, rRNA levels can be decreased to <5% of total RNA. To establish conditions for efficient rRNA depletion, we compared different template-to-probe ratios and evaluated the effect of multiple rounds of rRNA depletion using different amounts of magnetic beads.
Previously, a template-to-probe mass ratio of 1:2 has been successfully used for the depletion of bacterial rRNA 19 . Thus, we based our tests on similar ratios and compared hybridization of 2 µg of total T. brucei RNA with 0.5 µg, 1 µg, 2 µg or 4 µg (240 pmoles) of biotinylated oligos. Based on the binding capacity of the beads (Dynabeads MyOne Streptavidin C1) and the size of our oligos, 1.8 mg and 3.6 mg of beads seemed ideal to capture 2 µg and 4 µg of oligos, respectively.
Analyzing depleted and control RNA by gel electrophoresis, we observed the most efficient rRNA depletion when 4 µg of probes were used ( Fig. 1b and Supplementary Fig. 1). In addition, our tests indicated that with 2 µg of total RNA and 4 µg of probes, three rounds of oligo capture, using 1.2 mg of beads per round, were necessary for efficient rRNA removal ( Fig. 1c and Supplementary Fig. 2).
Using these conditions, RNA-seq analysis of the rRNA-depleted and control samples indicated that following 0 (control), 1, 2 or 3 rounds of oligo capture, rRNA (28S, 5.8S, 18S and 5S) contributed to 75.6%, 77.6%, 30.4% and 4.3% of total RNA, respectively ( Fig. 1d and Supplementary Table 2). Thus, a set of 12 oligos was sufficient to reduce 28S alpha, 28S beta, 18S, 5.8S and 5S rRNA levels to <5%. www.nature.com/scientificreports www.nature.com/scientificreports/ Depletion of non-rRNA is minimal. The usefulness of hybridization-based rRNA depletion depends strongly on its specificity, since cross-hybridization of rRNA probes to other transcripts will result in incorrect transcript level measurements.
To determine whether the set of 12 biotinylated probes designed for this study cross-hybridized with non-rRNA, we performed differential expression analyses, comparing the effect of different rounds of oligo capture to an untreated control sample.
When RNA was exposed to increasing rounds of oligo capture and analyzed by RNA-seq, we found RNA levels of 0, 13 and 50 non-rRNA transcripts (of 8459 total) to be significantly (padj < 0.1) different by more than 2-fold after 1, 2 and 3 rounds of oligo capture, respectively, compared to the untreated control, (Fig. 2 Table 3).

; Supplementary
Given the reproducibility of off-target effects, we suspected that the 50 co-depleted transcripts shared some homology with our rRNA-probes. Indeed, using BLAST 17 , we observed that some of the co-depleted non-rRNA transcripts did show homology to our probes ( Fig. 3 and Supplementary Table 4). However, other transcripts that exhibited a similar degree of homology showed no co-depletion. Overall, our data contained no evidence that the degree of homology between probe and non-rRNA transcript and the degree of co-depletion of non-rRNA transcripts are correlated.
Thus, while we were unable to determine why a set of 50 transcripts was co-depleted with rRNA transcripts, our rRNA-seq data indicated that 99.6% of genes are not affected by the depletion of rRNA transcripts. rRNA depletion yields a higher percentage of ncRNA than poly(A) enrichment. Most transcriptome analyses benefit from the removal of rRNA. Yet the analysis of transcripts lacking a poly(A) tail, such as many non-protein coding transcripts and immature transcripts, eliminates the possibility of using strategies enriching for polyadenylated RNA. Instead, methods to specifically deplete rRNA are needed. Our RNA-seq data indicated that depletion of rRNA greatly improves the coverage across most genes (Fig. 4).
To better understand the effects of rRNA depletion and mRNA-enrichment-based approaches on transcriptome data, we compared the transcriptome data generated in this study with previously published data generated from poly(A)-enriched RNA and RNA from which the rRNA had been removed using the RiboMinus Eukaryote Kit. We found our dataset to correlate strongly with previously published RNA-seq datasets (Fig. 5 and  Supplementary Table 5). Yet, for individual genes we observed marked differences in apparent transcript levels  www.nature.com/scientificreports www.nature.com/scientificreports/ (Fig. 6). Since mRNA degradation involves shortening of the poly(A) tail, and transcripts with a short poly(A) are captured inefficiently by oligo-dT-based enrichment methods 20 , we investigated whether very long transcripts or transcripts with a short half-life were underrepresented in poly(A)-enriched RNA datasets compared to our rRNA depleted RNA-seq data. While no general correlation could be observed ( Supplementary Fig. 3) for individual RNAs, for example the FUTSCH RNA, which is the third-longest RNA in T. brucei 21 , transcript levels were strongly reduced in the poly(A) data compared to our dataset (fold-change (log2) = −3.298; padj ≤ 0.1).
Overall, we found that transcriptome data from our rRNA-depleted pool of RNA contained a higher percentage of reads aligning outside of ORFs than reads from a poly(A)-enriched RNA pool: 79.8% compared to 54.5% 22 . Such reads could be splicing intermediates or ncRNAs.

Discussion
We describe the establishment of a highly specific and efficient approach to deplete rRNA from total RNA extracts that can be easily adapted to remove any species-specific rRNA or other abundant transcripts that may mask signal from low abundance transcripts.
Performing a systematic RNA-seq analysis following multiple rounds of rRNA depletion, we found that 3 rounds of probe capture were necessary to reduce rRNA levels to <5%. No rRNA depletion was observed following 1 round of oligo capture. We suspect that streptavidin-coated beads have a higher binding affinity for unbound hybridization probes than for probes bound to rRNA. Thus, in the first IP we may have pulled out only unbound probes. In addition, analysis of RNA-seq data revealed a small set of transcripts that was co-depleted with rRNA. Surprisingly, the degree of co-depletion did not correlate with the degree of sequence homology between the co-depleted transcripts and the probes. Thus, we suspect that the off-target effects may be caused by non-specific binding of some transcripts to the streptavidin-coated beads. Given that the off-target depletion was reproducible, it should be possible to normalize for the observed co-depletion if necessary. In addition, the reproducibility means that the co-depletion will not affect comparative RNA-seq analyses. If a specific transcript is co-depleted at similar levels in RNA from wild type and mutant cells, the co-depletion will not affect the measurements of changes in transcript levels. Finally, we suspect that strategies employing enrichment of polyadenylated transcripts introduce biases and that differences in poly(A) tail length may affect the efficiency of poly(T)-primed reverse transcription 20 .
To evaluate the robustness of our protocol, we compared the distribution of RNA-seq reads from our rRNA-depleted samples to previously published RNA-seq datasets derived from poly(A)-enriched samples. RNA-seq data from samples in which rRNA was depleted by hybridization to complementary DNA oligos and RNaseH digestion were not included in the comparison since the RNA was isolated either from a genetically manipulated cell line 23 or a different life cycle stage 11 . Comparing reads mapping to ORFs, we observed very high correlations, underscoring the reproducibility of the different methods. However, in our dataset a much higher percentage of reads mapped to regions outside of annotated ORFs. We suspect this increase to be the result of the lack of enrichment for polyadenylated mRNA and to better represent the true distribution of cellular RNA.
In this analysis we focused on the depletion of five rRNA transcripts. However, the T. brucei transcriptome contains several other highly abundant transcripts such as VSG-2 transcripts (4% of total RNA after 3 rounds of rRNA depletion) or rRNAs from the gamma, delta and zeta subunits (~10% of total RNA in our control sample). To increase the detection of low abundance mRNA even further, our set of 12 oligos may be extended to target other highly abundant transcripts. While the need for three rounds of depletion adds to the cost of our approach, the cost per rRNA depletion is still lower than it would be with most commercially available kits. To further reduce the costs, we propose regeneration of the streptavidin-coated magnetic beads. It has been shown that the incubation of beads in water at temperatures above 70 °C destroys the biotin-streptavidin interactions without www.nature.com/scientificreports www.nature.com/scientificreports/ denaturing the proteins 24 . Alternatively, beads have been regenerated at 90% efficiency by exposure to a solution containing 25% aqueous ammonia and 25% aqueous ammonia in methanol (9:1) at 25 °C 25 .
In summary, our study shows that organism-specific design of hybridization probes can be successfully used for the depletion of abundant transcripts. We describe a protocol that allows the efficient depletion of rRNA with little off-target effects. Thus, our protocol provides a useful alternative for the depletion of rRNA where enrichment of polyadenylated transcripts is not an option and commercial kits for rRNA removal are not available.
Preparation of biotinylated oligonucleotides. Selected oligonucleotides were ordered with a 5´-biotin tag and HPLC-purified from Sigma-Aldrich. Each oligo was diluted to 100 μM in nuclease-free 10 mM Tris pH 8 (diluted from Ambion 1 M Tris pH 8; Invitrogen; cat. no. AM9856). The rRNA depletion mix was generated by combining equal volumes of each 100 μM oligonucleotide stock.
Removal of ribosomal RNA. The set-up for the hybridization reaction is based on a published protocol 19 . All solutions were kept free from nucleases. For each hybridization reaction, 2 μg of total RNA were mixed with 10 μl of formamide (SigmaAldrich; cat. no. F9037-100ML), 2.5 μl of 20× SSC (3 M NaCl, 0.3 M sodium citrate, the pH was adjusted to 7.0 with HCl), 5 μl of 0.005 M EDTA pH 8 (stock solution 0.5 M; ThermoFisherScientific; cat. no. AM9260G), 2.48 μl of 100 μM rRNA depletion mix (total 4 μg of oligos) and RNAse-free water (ThermoFisherScientific; cat. no. AM9938) to a total volume of 50 µl. Hybridization was performed in a Bio-Rad C1000 Touch Thermal Cycler capable of performing temperature ramps with the following program: 5 min at 80 °C, ramp down to 25 °C at intervals of 5 °C per minute. Subsequently, 2 μl of RNAse-OUT (ThermoFisherScientific; cat. no. 10777019) and 50 μl of 1x SCC containing 20% formamide were added. Dynabead MyOne Streptavidin C1 beads (ThermoFisherScientific; cat. no. 65001) were prepared as recommended by the manufacturer for RNA applications and immobilization of nucleic acids. For each round of oligo capture 120 μl (1.2 mg) of magnetics beads were prepared (unless indicated otherwise). For the first round of depletion, the hybridization reaction was added to the beads, incubated at room temperature (RT) for 15 min using gentle rotation followed by bead separation on a magnetic rack (2 min). For the second and third round of depletion, the supernatant was added to a new batch of beads, incubated at RT for 15 min using gentle rotation followed by bead separation on a magnetic rack (2 min). The resulting supernatant, containing rRNA-depleted RNA, was purified using RNeasy MinElute CleanUp Kit (QIAGEN; cat. no. 74204). Depletion of rRNAs was evaluated on a 1.2% TBE-agarose gel and on an Agilent 2100 Bioanalyzer (Agilent Technologies; cat. no. G2939BA) using the RNA 6000 Nano Kit (Agilent Technologies; cat. no. 5067-1511). cDNA synthesis, library preparation and sequencing. Synthesis of cDNA was performed using NEBNext Ultra Directional RNA Library Prep Kit for Illumina (New England Biolabs; cat. no. E7420) according to the manufacturer's instruction. The concentration of cDNA was measured using Qubit dsDNA HS Assay Kit (Invitrogen, cat. no. Q32854) and a Qubit 2.0 Fluorometer (Invitrogen; cat. no. Q32866). Sequencing libraries were prepared as described previously 27 . To generate strand-specific RNA-seq libraries, uracil excision and removing of the second strand was performed prior to conversion of Y-shaped adapters. Therefore, 3 μl of USER enzyme (New England Biolabs; cat. no. M5505) were mixed with 16 μl of adapter-ligated DNA, 1 μl of TruSeq PCR primer cocktail (50 μM) and 20 μl of KAPA HiFi HotStart ReadyMix (KAPA Biosystems, cat. no. KK2601). USER digestion was performed at 37 °C for 15 min, followed by the published amplification protocol. Library concentrations www.nature.com/scientificreports www.nature.com/scientificreports/ were determined in duplicates using Qubit dsDNA HS Assay Kit (Invitrogen, cat. no. Q32854) and a Qubit 2.0 Fluorometer (Invitrogen, cat. no. Q32866) and quantified using the KAPA Library Quantification Kit (KAPA Biosystems, cat. no. KK4824) according to the manufacturer's instruction. Strand-specific RNA-sequencing libraries were sequenced in paired-end mode on an Illumina NextSeq 500 sequencer (2 × 76 cycles).
Processing of sequencing data. Adapter sequences were removed using Cutadapt 28 and the sequencing datasets were mapped to the T. brucei Lister 427 genome assembly (release 36, downloaded from TriTrypDB 29 ) using BWA-mem 30 . The alignments were converted from SAM to BAM format, sorted and indexed using SAMtools version 1.8 31 . Additionally, unmapped, PCR or optical duplicate, not primary aligned and supplementary aligned reads were filtered out from the alignment files (SAM flag: 3332).
The same procedure was applied for processing the poly(A) RNA-seq datasets from Hutchinson et al. 22 . For on overview of the data analyses performed to generate the different figures see Supplementary Table 6.   13,22,[35][36][37] and this study (3x oligo capture). Pearson correlation coefficients were calculated based on the relative abundance of transcripts, excluding rRNA transcripts.