A method for simultaneous detection of small and long RNA biotypes by ribodepleted RNA-Seq

RNA sequencing offers unprecedented access to the transcriptome. Key to this is the identification and quantification of many different species of RNA from the same sample at the same time. In this study we describe a novel protocol for simultaneous detection of coding and non-coding transcripts using modifications to the Ion Total RNA-Seq kit v2 protocol, with integration of QIASeq FastSelect rRNA removal kit. We report highly consistent sequencing libraries can be produced from both frozen high integrity mouse hippocampal tissue and the more challenging post-mortem human tissue. Removal of rRNA using FastSelect was extremely efficient, resulting in less than 1.5% rRNA content in the final library. We identified > 30,000 unique transcripts from all samples, including protein-coding genes and many species of non-coding RNA, in biologically-relevant proportions. Furthermore, the normalized sequencing read count for select genes significantly negatively correlated with Ct values from qRT-PCR analysis from the same samples. These results indicate that this protocol accurately and consistently identifies and quantifies a wide variety of transcripts simultaneously. The highly efficient rRNA depletion, coupled with minimized sample handling and without complicated and high-loss size selection protocols, makes this protocol useful to researchers wishing to investigate whole transcriptomes.


Results
Modified library preparation protocol consistently produces high quality sequencing libraries. To determine whether this protocol (overview shown in Fig. 1) can effectively be used to create whole transcriptome sequencing libraries from total RNA, we extracted RNA from fresh frozen tissue (hippocampal region of APP/PS1 and wild-type littermate control mouse brain) using the mirVANA Paris kit (Invitrogen) total RNA procedure. The extracted RNA was consistently highly concentrated and of high quality, as reported by the Agilent Bioanalyzer RNA Nano Chip ( Table 1). The RIN was 9.1 ± 0.05 (mean ± standard deviation), with an average concentration of 174.6 ± 27.7 ng/μL. A260/280 ratios, as calculated by spectrometry, were > 2.12 (Table 1; 2.14 ± 0.01), strongly implying the lack of double-stranded DNA (dsDNA) contamination in the sample. RNA fragmentation by RNase III was optimised to 8 min to align with manufacturer's recommendations. Representative electropherograms of total input RNA and fragmented RNA are shown in Fig. 2a,b. Total RNA (Fig. 2a) shows clear 18S and 28S rRNA peaks, while post-fragmentation these peaks become distributed with the overall RNA size distribution shifting downwards (Fig. 2b). The characteristic small RNA peak at ~ 100 nucleotides (nt) is also clearly seen and is retained post-fragmentation.
Next, we assessed the remainder of the library preparation protocol, and a few modifications to the manufacturer's protocol resulted in better outcomes overall. We adjusted the adapter ligation period to a 16-h (overnight) incubation at 16 °C, rather than the recommended 30 °C for 30 min. This markedly increased adapter ligation efficiency 15-fold (from 53 to 745.5 pg/μL; Fig. 2g-i). While this does increase the time required for library preparation, we found that this is outweighed by the increase in ligation efficiency.
The next key improvement is the removal of rRNA from the sample prior to reverse transcription, which was achieved by addition of the Qiagen FastSelect rRNA removal agent to the cDNA synthesis steps. Hybridization of www.nature.com/scientificreports/ the FastSelect ONTs was achieved by step-wise cooldown of the reaction mix from 65 to 25 °C, before addition of Superscript Enzyme Mix. This prevents rRNA from undergoing reverse transcription to cDNA. The efficiency of the rRNA removal step is explored further below. The final cDNA libraries showed size distributions in-line with manufacturer's recommendations (up to 200-base fragments for the Ion Proton system), with < 50% of cDNA fragments falling under 160 base pairs (bp) (Fig. 2c). The library preparation protocol described here resulted in sequencing libraries ranging from 25.5 to 159.2 nmol/L in concentration. Library and sequence run metrics are given in Table 2. The mean read length for each library ranged from 63 to 113 bp, with an average of 23,695,819 total raw reads. Removal of adapter sequences and filtering of low quality reads using the specifications described in the methods resulted in between 11,991,333 (sample 1) and 22,422,158 reads (sample 3). Between 73.36% and 83.77% of adapter-trimmed reads remained post-filtering.
Alignment to the reference genome resulted in between 58 and 75% uniquely mapped reads, 17.5% and 35% multi-mapped reads, 1.33% and 2% mapped to too many (> 10) loci, and 5-12% unmapped reads (Table 3; Fig. 2j). The numbers of uniquely mapped reads are consistently on the higher end of previously reported mapping statistics 30 .
High-quality sequencing libraries even from fresh-frozen human post-mortem brain tissue. RNA derived from human post-mortem tissue can be challenging to extract intact, as variations in postmortem interval, sample storage and transfer between locations may promote degradation of RNA within a sample. We therefore aimed to assess whether this protocol is capable of producing sequencing libraries from such samples. RNA extracted from human brain tissue was less intact, as determined by electropherography, with an average RIN of 2.3 ± 0.2, and of lower concentrations than mouse RNA from similar amount of tissue input (68.57 ± 15.77 ng/μL; Table 4, Fig. 2d). A260/280 ratios, as calculated by NanoDrop, all lay above 2 (Table 4; 2.09 ± 0.04), again suggesting that the samples did not contain dsDNA. Notably, however, the resulting libraries were comparable in concentration and size distribution to those resulting from high quality mouse RNA (122.4 ± 21.5 nmol/L; Table 5). Representative electropherograms of starting input RNA, fragmented RNA, and final libraries are shown in Fig. 4. While the input RNA (Fig. 2d) lacks the defined 18S/28S rRNA peaks seen in Fig. 2a, the 1 min fragmented RNA (Fig. 2e) electropherogram shows a very similar size distribution to 8 min fragmented mouse RNA (Fig. 2b). Again, the characteristic small RNA peak at ~ 100 nt is also clearly seen and is retained post-fragmentation. Similarly, the final library size distribution (Fig. 2f) is comparable to that seen in Fig. 2c. Library and sequencing statistics are shown in Table 5. The mean read length varied from 71 to 115 bp, with an average of 25,441,497 reads per sample. Quality filtering and adapter removal resulted in on average 15,855,518 reads per sample, leaving between 70 and 81% of reads post-processing.
Alignment to the human reference genome uniquely mapped between 79 and 82% of reads, and multimapped between 12 and 15% of reads (Table 6; Fig. 2k). Only ~ 3% of reads were mapped to too many loci, and between 2 and 3% of reads were unmapped. The proportion of uniquely-mapped reads is consistent with previously described mapping statistics, though with a considerably lower percentage of unmapped reads 30,31 . We therefore demonstrate that the described protocol produces quality libraries from even fresh-frozen human post-mortem input RNA.
Ribosomal RNA removal by Qiagen FastSelect results in minimal rRNA content. To assess the effectiveness of Qiagen FastSelect rRNA removal agent, we used SeqMonk RNA-Seq QC to quantify the percentage of reads mapped to rRNA sequences in both the mouse and human samples. Ribosomal RNA content in RNA extracted from the mouse hippocampal tissue was between 0.23 and 2.58% (1.77 ± 0.91; n = 8; Table 7), and alignment to mitochondrial RNA (Mt-rRNA and Mt-tRNA) accounted for, on average, 0.25 ± 0.17% and 0.87 ± 0.49% respectively. Similarly, in the human RNA, the same protocol quantified rRNA content between 0.24 -1.34% (0.45 ± 0.40; n = 7; Table 8), with mitochondrial rRNA and tRNA accounting for, on average, 0.28 ± 0.27 and 5.11 ± 3.23% respectively. As an average Ion PI Chip loaded with four samples returns ~ 25 million reads per sample, a total RNA library prep without rRNA depletion would result in ~ 22 million reads mapping to rRNA, whereas the protocol described here resulted in only ~ 100-200,000 reads mapped to rRNA. Compared to other techniques for rRNA removal from sequencing libraries, the technique described here performed consistently www.nature.com/scientificreports/ better that previously published methods, which range anywhere from 1 to 20% rRNA content [18][19][20][21][22] . Thus the considerable reduction in rRNA content achieved by our protocol frees up valuable sequencing resources.  Uniquely mapped reads, multi-mapped reads, reads mapped to too many loci (> 10), and unmapped reads for each sample shown as a percentage of total trimmed and filtered reads. www.nature.com/scientificreports/    www.nature.com/scientificreports/ We calculated normalized Reads Per Kilobase Million (RPKM) for mouse and human RNA samples to normalise the number of unique transcripts detected for sequencing depth and gene length. RPKM was chosen here over the more commonly used transcripts per million reads (TPM) to allow direct comparison to other published reports of genes detected 18 . Mouse samples identified > 15,000 unique transcripts expressed at greater than one RPKM and an additional 4000 expressed at greater than 0.1 RPKM (Fig. 3a). For the human samples, a similar number of transcripts were found at > 1 RPKM, with more than ~ 6000 additional unique transcripts at > 0.1 RPKM (Fig. 3a). The number of genes detected at RPKM > 1 were comparable to that reported by other rRNA-depleted RNA-Seq protocols 18 . While the number and type of genes identified at RPKM > 1 were very similar between mouse and human samples, human samples showed a greater number of lowly-expressed genes (Fig. 3a). These findings are in stark contrast to some of the reported difficulties in RNA-Seq using low-quality input RNA-notably decreased proportions of mappable reads and reduction in sample complexity 26 . In fact, our data suggest that this protocol results in proportions of successfully mapped reads and levels of gene expression comparable to high-quality, undegraded RNA samples.

This library preparation protocol and analysis pipeline identifies a variety of coding-and
Breakdown of read alignment by transcript biotype (as annotated in each reference genome-Mus musculus GRCm38.100 and Homo sapiens GRCh38.96-as well as piRNA and tRNA from piRNABank and UCSC Genome Browser respectively) is shown in Table 7 and 8. The average percentage content by gene biotype is shown in Fig. 3b,c. The largest number of reads mapped to protein-coding mRNA (Mouse-48.86% ± 6.02;   32 . Additionally, the method described here, when compared to data obtained from more conventional library construction methods for both mouse 33,34 and human samples 35 , results in not only a greater number of total unique genes identified, but also a greater proportion of ncRNA biotypes compared to protein-coding mRNA (Supplementary Figs. 1 and 2). Many species of non-coding RNA show very narrow variance between samples (Fig. 4), while others varied significantly.
In particular, small nuclear RNA (snRNA) expression was highly variable, ranging from 4 to 23% in mouse RNA (Fig. 4a), and 2.5 to 15.5% in human RNA (Fig. 4b). Despite their frequent use as reference genes for qRT-PCR and gene array experiments, individual variability in snRNA expression has been reported previously 36,37 , and these observations are supported by the data presented here. This method was able to identify genes that have previously been reported to be differentially-expressed in both the APP/PS1 transgenic mouse model of AD, and in AD patients. Notably, among the top 20 DE genes in APP/PS1 mice, both App and Psen1 were highly altered (log fold change 1.22 and 0.85 respectively, both  Select gene sequencing reads significantly correlate with cycle threshold (Ct) values obtained by quantitative RT-PCR. In order to ascertain to what extent sequencing reads obtained from this protocol can be representative of the actual number of RNA molecules in the sample, we performed qRT-PCR analysis of selected genes and miRNA from the mouse samples. We then determined the correlation coefficient (Pearson's Product-Moment Correlation) of Ct values against normalized read counts (RPKM for mRNA; CPM for miRNA), in order to control for library size and gene length (Fig. 5). Counts per Million reads (CPM) were used for the miRNA for simplicity, due to the lack of variability in mature miRNA gene length. We found that both mRNA (Hprt, Trem2, Tyrobp, c-Fos, and Cst7; R = -0.81, p = 3.2e−10; Fig. 5a) and miRNA (miR-129-1-3p, miR-34a-5p, miR-34c-5p, miR-210-3p; R = −0.59, p = 0.00036; Fig. 5b) showed significant negative correlation between Ct values and normalized sequencing reads. This strongly suggests that data obtained from this sequencing method results in read counts that are largely representative of actual RNA content and lends credence to differential gene expression analyses performed with these data.

Discussion
RNA sequencing is an ever-evolving technique that offers unique insights into the transcriptome. Current protocols often require the researcher to choose between investigating mRNA (by poly-A selection) or small RNA (by size selection). Either one of these alone, while offering depth of sequencing, misses out on a great deal of information from excluded transcripts. Here, we report a significant advancement in RNA-Seq methodology, a novel method to investigate the whole transcriptome, from the same sample at the same time, using ribosomal-depleted RNA. This approach takes advantage of several existing commercially available kits, with some important alterations to manufacturer's protocols. This altered workflow resulted in high quality sequencing libraries from input RNA samples of a variety of quality, from both mouse and human tissue. Low quality input RNA had no negative effect on the final library quality. Qiagen FastSelect rRNA removal agent integrated seamlessly into the existing Ion Total RNA-Seq kit v2 library prep protocol and resulted in highly effective depletion of rRNA from the final libraries, even from degraded samples, which is often a drawback of other rRNA removal techniques. A high number of genes were identified in the RNA-Seq data, including transcripts often overlooked by more targeted RNA-Seq protocols (refer to Fig. 3b,c). The majority of reads mapped to species of www.nature.com/scientificreports/ non-coding RNA, and most of these were also highly consistent between samples within each species. Furthermore, sequencing reads (normalized to library size and gene length) correlate significantly with Ct values from qRT-PCR quantitation (which allows for precise quantification of RNA content), suggesting that read counts obtained from this RNA-Seq protocol can be used to infer quantitative gene expression. One important caveat to consider, however, is that PCR amplification is known to favour smaller fragments over larger ones 43 . As such, it is possible (even likely) that transcripts associated with small non-coding RNA are overrepresented in our final sequencing libraries. However, this effect is likely consistent across samples and libraries, and would certainly not affect the identification of unique transcripts. A similar protocol (fragmented, ribodepleted TGIRT-Seq) has been previously reported [44][45][46] , that also aimed to simultaneously sequence coding and non-coding RNA. While it has seen some limited implementation since 47,48 , it is yet to be widely accepted. There are a few factors where we believe improvements can be made. First, the use of QIASeq FastSelect for rRNA depletion appears to perform better than the RiboZero Gold used by Boivin and colleagues. While we did not compare these two methods directly, the comparison can be inferred through the literature. RiboZero Gold is no longer available to purchase and has been replaced by RiboZero Plus. However, there are no published data available to compare rRNA removal efficiency of RiboZero Plus. Crucially, rRNA-removal by FastSelect requires significantly less sample handling than RiboZero Gold, and does not require additional bead purification, preventing sample loss. Second, the protocol described here appears to give more representative non-coding RNA reads-in particular with regards to miRNA and piRNA-as compared to estimated abundances reported in literature 32,44 .
We recognise the variety of bioinformatics tools available for the analysis of RNA-Seq data. These include bwa, Bowtie2, TopHat2, cufflinks, and HISAT2 for read mapping/alignment, software such as kallisto or salmon for direct quantification, differential gene expression tools such as DESeq2, Cuffdiff, and limma, and integrated pipelines such as exceRpt. Several studies have aimed to evaluate the relative sensitivity and accuracy of these analysis tools and pipelines, but for the most part, no one tool or pipeline consistently performed better than the others [49][50][51][52][53] . An in-depth discussion of these tools is outside the purview of this report.
The ability to capture sequencing reads from a wide variety of RNA species, coding and non-coding, is valuable to investigate many aspects of the transcriptome. In our research into Alzheimer's disease, the ability to take a snapshot of the RNA environment allows us a unique insight into AD pathology, both in the APP/PS1 mouse model and in human post-mortem brain tissue. In particular, since non-coding RNA is being increasingly implicated in the aetiology and pathogenesis of AD [54][55][56] , knowing the changes that occur in the disease state can aid in understanding the disease, developing diagnostic tools, and hopefully developing new treatments. Since non-coding RNA are such a ubiquitous aspect of cellular function, the same approach, and therefore this method, can be applied to a variety of diseases and research areas.
Altogether, we believe this workflow may be useful to researchers wishing to investigate the whole transcriptome simultaneously, with effective rRNA depletion, and without complicated and high-loss size selection protocols commonly used for small RNA-Seq, or poly-A selection for mRNA-Seq.

Materials and methods
Animal studies. All   www.nature.com/scientificreports/ on In Vivo Experiments (ARRIVE) guidelines 57 . In this study we utilised a double transgenic model of Alzheimer's disease (APPswe/PS1dE9, B6C3 background, referred to as APP/PS1) originally sourced from The Jackson Laboratory (https:// www. jax. org/ strain/ 004462) and maintained as a colony at the University of Otago breeding facility. Mice were housed under specific pathogen-free conditions in a day-night controlled light cycle, with food and water access ad libitum. Animals underwent no additional procedures prior to their stated use. All mice were genotyped for the presence of human exon-9-deleted variant PSEN1, which co-segregates with the APPswe gene, as previously described 58  No other neurological diseases were present. Alzheimer's disease brains (n = 3; 3 female; age 76.5 ± 7.7) met the standard criteria for Alzheimer's disease neuropathological diagnosis. There were no significant differences between the ages of the two groups (two-tailed t-test, p = 0.55). Patient sex was self-reported. All samples were stored at −80 °C until used. RNA extracted from human MTG samples is henceforth referred to by the identifier "Patient #". Patient demographics and case information are available in Table 9.
Library preparation. Except where explicitly stated, all samples, regardless of species or group of origin, were treated identically. Sequencing libraries were prepared for Ion Proton using the Ion Total RNA-Seq kit v2 (Life Technologies; Cat #4479789) largely following manufacturer's instructions. Total RNA (500 ng) was used as input to the Ion Total RNA-Seq kit v2 (356 ng input was used instead of 500 ng for Patient 1 due to low RNA yield), to which was added 1 μL of 1:100 ERCC Spike-In Mix 1 (Invitrogen; Cat #4456740), commonly employed to control for cross-sample variation in library preparation. RNA fragmentation by RNase III was performed at 37 °C. The fragmentation time was optimised to 8 min for the mouse RNA, and 1 min for the human RNA. This will vary depending on quality and integrity of the input RNA material. The resulting fragmented RNA was purified using the Magnetic Bead Cleanup Module (Life Technologies; Cat #4475486), and purified RNA eluted in 13 μL nuclease-free water. Ligation of Ion adapters (Ion RNA-Seq Primer Set v2; Cat #4479789) was performed using 3 μL of the eluted purified fragmented RNA, added to 2 μL Ion Adapter Mix v2 and 3 μL Hybridization solution, and incubated in a thermal cycler at 65 °C for 10 min followed by 30 °C for 5 min. To this hybridization reaction was added 10 μL 2× Ligation Buffer and 2 μL Ligation Enzyme Mix, and incubated at 16 °C for 16 h in a thermal cycler. Following ligation, reverse transcription (RT) and rRNA removal was performed simultaneously as follows. RT master mix was prepared on ice (per sample; 1 μL nuclease-free water, 4 μL 10× RT buffer, 2 μL 2.5 mM dNTP Mix, 8 μL ion RT Primer v2, 1 μL QIAseq FastSelect rRNA removal agent). QIAseq FastSelect rRNA removal agent (Qiagen, Cat #334386) consists of ONTs complementary to ribosomal RNA sequences. These ONTs, when bound to rRNA sequences, prevent reverse transcription. The master mix was added to the ligation reaction, and incubated at 70 °C for 10 min, followed by a step-wise cooldown (2 min at 65 °C, 2 min at 60 °C, 2 min at 55 °C, 5 min at 37 °C, 5 min at 25 °C, hold at 4 °C). This step is necessary for the oligonucleotides in the FastSelect Table 9. Case information of source of human post-mortem MTG tissue. Shows age at death, self-reported gender, post-mortem interval (PMI; hours), diagnosis, pH of tissue, immediate cause of death, medication taken (if any), and post-mortem pathology. Sequencing on ion proton platform. The prepared sequencing libraries were diluted to equimolar concentrations of 100 pmol/L for pooling. Emulsion PCR was performed with the Ion OneTouch™ 2 system (Invitrogen) using the Ion PI™ Hi-Q™ OT2 200 kit (Invitrogen; Cat #A26434) according to the manufacturer's instructions. The four pairs of mouse samples (tg and wt) were processed simultaneously end-to-end, as were the seven human MTG samples. Libraries were sequenced on Ion PI™ v3 chips (Invitrogen; Cat #A26770), prepared using the Ion PI™ HiQ™ Seq 200 kit (Invitrogen; Cat #A26433, A26772). The mouse samples of two pools of mixed barcoded libraries were sequenced on two Ion PI v3 chips (2 wt and 2 tg per chip), avoiding the use of all sequential barcodes on the same chip. Similarly, human MTG libraries were sequenced in two pools of mixed barcoded libraries-one contained a pool of four samples (2 AD and 2 control), the other a pool of three samples (1 AD and 2 control).
For miRNA, 10 ng of total RNA from mouse samples was used. cDNA was generated using the TaqMan MicroRNA Reverse Transcription Kit (Applied Biosystems, Cat #4366596) according to manufacturer's instructions. The qRT-PCR reactions were prepared using TaqMan Universal PCR Master Mix (Applied Biosystems, Cat #4304437), with the following TaqMan miRNA primers: miR-34a-5p (Assay ID: 000426), miR-34c-5p (000428), miR-129-1-3p (002298), miR-210-3p (000512). The reactions were amplified on a Applied Biosystems ViiA 7 system as follows: Hold 50 °C 2 min, hold 95 °C 10 min, and 40 cycles at 95 °C for 15 s and 60 °C for 1 min. Genes and miRNA were chosen as a mix of housekeeping genes, genes that are known to be changed from the literature, and some genes of interest whose expression in APP/PS1 mice is unknown.
MicroRNA and mRNA qRT-PCR data were processed separately to account for differing input RNA amounts, and in each case, raw Ct values were used for analysis. Data analysis. Data from each barcoded library were separated into different data files automatically on the Ion Torrent Suite version 5.4 (life Technologies, USA). The Ion Torrent Suite was also used for analysis of ERCC Spike-In controls. Sequence read quality was evaluated using FastQC v0.11.5 (https:// www. bioin forma tics. babra ham. ac. uk/ proje cts/ fastqc/) 59 . Adapter sequences were trimmed from reads using AdapterRemoval v2.1.7 (https:// github. com/ Mikke lSchu bert/ adapt errem oval/) 60 . Reads were then trimmed for quality using Trimmomatic v0.38 (http:// www. usade llab. org/ cms/? page= trimm omatic) 61 using a 5-base sliding window, cutting when the average quality per base drops below 20, and dropping reads less than 17 bases long.
Piwi-interacting RNA (piRNA) sequences were obtained from piRNABank (http:// pirna bank. ibab. ac. in) 64 . Sequences for tRNA were obtained from the UCSC Genome Browser (http:// genome. ucsc. edu). Data were analysed using R version 4.0.2 in RStudio v1.3.959. The package Rsamtools 65 was used to convert STAR output .sam files to .bam files. The featureCounts function from the package Rsubread 66 was used to generate counts tables using aforementioned .gtf annotation files. The package edgeR 67 was used to generate a DGEList object from feature counts, filtered for lowly-expressed genes by the function filterByExpr, and the function exactTest used to perform differential expression analysis.