Efficient Enrichment of Bacterial mRNA from Host-Bacteria Total RNA Samples

Despite numerous advances in genomics and bioinformatics, technological hurdles remain to examine host-microbe transcriptomics. Sometimes the transcriptome of either or both can be ascertained merely by generating more sequencing reads. However, many cases exist where bacterial mRNA needs to be enriched further to enable cost-effective sequencing of the pathogen or endosymbiont. While a suitable method is commercially available for mammalian samples of this type, development of such methods has languished for invertebrate samples. Furthermore, a common method across multiple taxa would facilitate comparisons between bacteria in invertebrate vectors and their vertebrate hosts. Here, a method is described to concurrently remove polyadenylated transcripts, prokaryotic rRNA, and eukaryotic rRNA, including those with low amounts of starting material (e.g. 100 ng). In a Wolbachia-Drosophila system, this bacterial mRNA enrichment yielded a 3-fold increase in Wolbachia mRNA abundance and a concomitant 3.3-fold increase in the percentage of transcripts detected. More specifically, 70% of the genome could be recovered by transcriptome sequencing compared to 21% in the total RNA. Sequencing of similar bacterial mRNA-enriched samples generated from Ehrlichia-infected canine cells covers 93% of the Ehrlichia genome, suggesting ubiquitous transcription across the entire Ehrlichia chaffeensis genome. This technique can potentially be used to enrich bacterial mRNA in many studies of host-microbe interactions.


Results
Assessment of method on D. ananassae and Wolbachia endosymbiont RNA using microfluidics and qRT-PCR. We used both an Agilent Bioanalyzer and qRT-PCR to examine the relative abundance of the various transcripts in total RNA and in the bacterial mRNA-enriched RNA samples from D. ananassae from Malaysia (Table 1). In both cases, in order to enable direct comparisons, the template added reflects the same amount of starting material, as described in the methods. As expected, Drosophila rRNA removal by the Ribo-Zero Human/Mouse/Rat component was effective with a total loss of > 97% of RNA, as assessed on the Agilent Bioanalyzer (Fig. 1). This percentage loss is consistent with our reported prior rRNA reductions in this system 15 . Because the insect 23S rRNA is cleaved and co-migrates with the insect 18S rRNA and the bacterial 16S rRNA, the Wolbachia 16S rRNA is obscured prior to rRNA depletion. However, following rRNA depletion, the Wolbachia 16S and 23S rRNA were not observed, nor were the insect 18S and 28S rRNA, indicating that they were all efficiently removed (Fig. 1).
To better quantify the extent of bacterial mRNA enrichment, qRT-PCR was conducted with primers designed to target bacterial rRNA, eukaryotic rRNA, bacterial mRNA, and eukaryotic mRNA, allowing for a comparison of all four types of molecules. The bacterial mRNA enrichment method efficiently increased the Ct values for Wolbachia 16S rRNA with the Δ Ct ranging from − 5.5 to − 13.9, yielding a 97.7-99.9% depletion of the Wolbachia 16S rRNA (Table 1). Highly abundant Drosophila mRNAs were efficiently removed with the Dynabeads component of the bacterial mRNA enrichment method with the Δ Ct ranging from − 1.3 to − 3.38, which corresponds to a reduction of 71% of Act5C transcripts and 87% of ribosomal protein L32 transcripts removed (Table 1). Yet, Wolbachia mRNA abundance was unchanged by the bacterial mRNA enrichment, as illustrated by the similar qRT-PCR Ct values in pre-and post-enrichment of the samples for Wolbachia mRNA, yielding a Δ Ct near zero (ranging from − 0.33 to 0.69) in the three replicates examined (Table 1). Each replicate is the result of a separate RNA extraction as well as enrichment.
Assessment of method on D. ananassae and Wolbachia endosymbiont RNA using high throughput sequencing. Following the successful assessment of bacterial mRNA enrichment by qRT-PCR and microfluidics, three samples were prepared for Illumina MiSeq sequencing: total RNA, polyA-selected RNA, and the bacterial mRNA-enriched RNA. The MiSeq reads were mapped with BWA MEM 17 to a database containing the reference D. ananassae genome and the Wolbachia endosymbiont genome from strain wRi (NC_012416.1 18  While the percentage of Wolbachia reads is not higher in the bacterial mRNA-enriched sample when compared to total RNA, it is important to remember that the bacterial rRNA has been depleted, indicating more of this percentage should arise from bacterial mRNA than in the other samples. Thus, while the reads from the total RNA sample and polyA-enriched RNA samples span 298 kbp (21%) and 66 kbp (4.7%) of the 1.4 Mbp wRi reference Wolbachia genome, respectively, the reads spanned 977 kbp of the wRi genome (70%) in the bacterial mRNA-enriched sample. Therefore, while the relative abundance of Wolbachia reads are the same in the polyA-enriched and bacterial mRNA-enriched samples, the latter is the only sample where bacterial transcripts can be detected across the majority of the genome.

Comparisons of methods after normalization.
To measure the fold change, the coverage needs to be normalized by the variable numbers of bases sequenced. Therefore, the coverage was normalized by dividing the sequencing coverage by the number of bases sequenced and multiplying by a million to generate the NCPM, or normalized counts per million reads sequenced. This value is analogous to RPKM or FPKM but on a genome scale as opposed to a gene scale. As expected, the D. ananassae rRNA was found in all samples, but relative to the total RNA was 5.2-fold decreased in abundance in the polyA-enriched RNA and 2.1-fold decreased in abundance in the bacterial mRNA-enriched sample ( Fig. 2A), as assessed by integrating the area under the curve. For comparison, the actin transcript was 10-fold overrepresented in the polyA-enriched sample relative to total RNA and 33-fold underrepresented in the bacterial mRNA-enriched sample relative to total RNA, indicating that the polyA depletion was successful (Fig. 2B). Relative to the total RNA, Wolbachia rRNA was 64-fold underrepresented in the bacterial mRNA-enriched sample and 28-fold underrepresented in the polyA-enriched sample, demonstrating that the bacterial rRNA is reduced, but not eliminated, with both polyA enrichment and bacterial mRNA enrichment (Fig. 3A). Demonstrating the power of the method tested, the Wolbachia gene WRi_010910 was 2.7-fold more abundant in the bacterial mRNA-enriched sample than in total RNA and was completely undetected in the polyA-enriched sample. Transcripts could be detected from 70% of the genome in the bacterial mRNA-enriched sample, as opposed to just 21% of the genome in the total RNA sample. Therefore, this method simultaneously increases the number of transcripts detected and the number of reads/transcript.

Assessment of method on Canis lupus and Ehrlichia chaffeensis RNA.
After determining the efficacy of the bacterial mRNA enrichment method using the model system D. ananassae on a partial Illumina MiSeq run yielding ~10 million reads, we sought to further test the method to enrich Ehrlichia mRNA in complex RNA mixtures sequenced with ~50 million reads on the Illumina HiSeq. Total RNA was isolated from DH82 Canis lupus cells infected with Ehrlichia chaffeensis Wakulla (Fig. 4).
Due to limited amounts of RNA, the samples were not examined by qRT-PCR or the Bioanalyzer following bacterial mRNA enrichment, and transcriptome sequencing was undertaken directly of the bacterial mRNA-enriched sample. Mapped reads spanned 1,098,477 of the 1,179,491 bp Wakulla genome, or 93.1% of the genome. Despite there being a lower abundance of bacterial RNA when compared to eukaryotic RNA in these samples ( Fig. 4) relative to the Wolbachia-Drosophila samples (Fig. 1), the enrichment of Ehrlichia transcripts was better than that for Wolbachia, resulting in a higher percentage of reads mapping to the bacterial genome. This is likely due to better removal of vertebrate rRNA with the Ribo-Zero kit than insect rRNA. The coverage varies in a manner consistent with the genes in the genome with coverage troughs at gene boundaries and tRNAs, the latter of which is lost during RNeasy purification (Fig. 5). Previously, it has been demonstrated by proteomics that 99% of proteins with known function and > 80% of hypothetical proteins are expressed in Ehrlichia chaffeensis in infected human cells 19 . Simultaneous expression of all genes may be expected since the genome lacks transcriptional regulators relative to a free-living relative 7 , which in turn is consistent with a restricted intracellular life style. The ECH_0166 gene is the most abundantly transcribed gene, which is annotated as a hypothetical protein but has been shown to encode the immunoreactive protein TRP47, which is secreted during infection 20,21 . Assessment of low input method on RNA from Brugia malayi and its Wolbachia endosymbiont using qRT-PCR. While in many cases, it is possible to obtain 5 μ g of RNA for use with the standard Ribo-Zero kits, for many biologically significant conditions this is a large amount of RNA to acquire. One such example is some of the specific life stages of filarial nematodes, like Brugia malayi, where material is limited. However, newer procedures  for Ribo-Zero reductions allow the use of 100 ng of total RNA. Therefore, we sought to establish if the polyA removal would work efficiently on these low input samples. To this end, we tested a low input method for Ribo-Zero reduction and polyA subtraction using 100 ng of RNA from an adult filarial nematode and assessed the success using qRT-PCR.
As with the previous Wolbachia-host samples, to quantify the extent of bacterial mRNA enrichment, qRT-PCR was conducted with primers designed to target bacterial rRNA, eukaryotic rRNA, bacterial mRNA, and eukaryotic mRNA, allowing for a comparison of all four types of molecules (Table 1). Again, in order to enable direct comparisons, the template added to the qRT-PCR reactions reflects the same amount of starting material, as described in the methods. As expected, Brugia rRNA removal by the Ribo-Zero Human/Mouse/Rat component was effective, with a total loss of > 99% of 18S rRNA, resulting in a Δ Ct of − 17 between the pre-subtraction RNA and the post-subtraction RNA ( Table 1). The bacterial mRNA enrichment method efficiently decreased the Ct values for Wolbachia 16S rRNA with the Δ Ct of − 7.5, yielding a > 99% depletion of the Wolbachia 16S rRNA (Table 1). Highly abundant Brugia mRNAs were efficiently removed with the Dynabeads component of the bacterial mRNA enrichment method, with the Δ Ct of − 2.4, yielding 76-84% reduction, for actin transcripts and a Δ Ct of − 3.5, yielding 88-94% reduction, for Bm1_03910 (Table 1). Wolbachia mRNA abundance for two genes tested (Wbm0276 and Wbm0350) yielded a Δ Ct of − 1.0 following bacterial mRNA enrichment, or a reduction of 47-55%, which we attribute to general loss of RNA with the low input method (Table 1). Therefore, we conclude that lower input starting materials can be efficiently examined with this method, albeit with a larger percentage loss of material.

Discussion
Cost savings. In sequencing from Wolbachia mRNA-enriched samples, we observe a 3-fold increase in reads aligning to the gene analogous to WRi_010910 and 3.3-fold more transcript sequence detected. This is a substantial improvement in the detection of bacterial mRNA. The added cost of conducting the Ribo-Zero and polyA depletion was $124 for the standard input samples and $65 for low input samples (Table 2) with the Dynabeads adding $46 to the cost of the sample. Of course, actual costs, as opposed to list prices, vary greatly by location and currency.
To obtain 10 million mapped bacterial reads, we estimate that the library and sequencing costs would total $12,325 for the Wolbachia samples and $1,165 for the Ehrlichia samples ( Table 2). The 10-fold difference between the two samples relates to the efficiency of the Ribo-Zero depletion. The kit used was designed on the human, mouse, and rat rRNA. As described previously 15 , we are co-opting the use of this kit here for taxa on which it was not designed, specifically for removal of rRNA from canines and insects. Our results indicate that the kit performs better at removing canine rRNA than insect rRNA. Thus, this difference in depletion of rRNA leads to a difference in the sequencing costs to obtain 10 million mapped bacterial reads. Regardless, these techniques substantially decrease the cost of sequencing compared to total RNA, which we estimated to be around $600,000 to obtain 10 million mapped bacterial reads. The real costs to obtain equivalent data would actually be higher than $600 K since the vast majority of bacterial reads in total RNA will map to the bacterial rRNA. We did not include an estimate of the personnel time required to prepare the samples, but it is surely significantly less than the difference between the method described here to obtain bacterial mRNA enriched samples and the alternative, which is sequencing total RNA. As such, employing this technique should greatly enable transcriptome-based studies of bacteria-eukaryote interactions in the largely intractable bacteria in the order Rickettsiales, as well as other important taxa. . While 99% of the genome is transcribed, troughs are apparent for tRNAs, which are too small to be recovered with the RNA isolation method used here. The coverage peaks at the 5′ -end of transcripts and decays over the length of the transcript, as has been observed for other bacterial transcriptomes. Troughs are seen at transcriptional start sites, but not always at the 3′ -ends of transcripts. This suggests that either the 3′ -end of the transcripts overlap the ends of other transcripts, or that this strain lacks discrete transcription termination sites.
Scientific RepoRts | 6:34850 | DOI: 10.1038/srep34850 The magnitude of the removal of rRNA with Ribo-Zero subtraction kit has been a bit unpredictable as observed in the qRT-PCR results (Table 1). It is not clear where that variation arises, whether it is the quality of the total RNA, aspects of the kit, limitations of the qRT-PCR method, or user error. However, even with the lowest level of enrichment, the savings is significant given the alternative, which is sequencing total RNA.
Much of the cost savings likely comes from the rRNA depletion strategy, which was not included in the comparisons here. However, the relatively small cost of including the DynaBeads ($46) and the insubstantial amount of personnel hours added upon inclusion of this step means that it will be cost effective.
Recently, Agilent has introduced custom capture systems designed for capturing RNA, analogous to its systems for capturing DNA. For comparison, the cost of a custom capture of Wolbachia mRNA or Ehlrichia mRNA is estimated from the list price to be $563/sample. While this is substantially more than the cost of the bacterial mRNA enrichment method presented here, it may have benefits for samples where the rRNA depletion is less than ideal, like capturing bacterial mRNA from insect hosts. One study suggests a 1670-fold enrichment of RNA 22 , which would reduce the sequencing costs of the Wolbachia samples described here substantially to the levels of sequencing conducted on Ehrlichia and make the SureSelect the more cost effective option.

Conclusions
Here, we describe a method to efficiently remove eukaryotic host mRNA from 100 ng and 5 μ g of starting material through polyA depletion. Combined with Ribo-Zero reductions, which efficiently remove rRNA from bacteria and eukaryotes, more bacterial mRNA transcripts can be identified with higher coverage in a Wolbachia-Drosophila system. Sequencing of a bacterial mRNA-enriched sample isolated from a canine-Ehrlichia system results in 20% of the sequence reads arising from the bacterial transcriptome. In both cases, these methods enabled more cost-effective transcriptional profiling of host-bacteria samples than conventional methods. While both of the bacteria are from the order Rickettsiales, it is likely that this technique will be widely applicable for studying host-bacteria transcriptomics or host-microbiome metatranscriptomics.

Methods
Wolbachia and D. ananassae RNA Isolation. To examine the contribution of different RNAs in preand post-enrichment samples, we used wild-type D. ananassae from Klang, Selangor, Malaysia (UCSD Stock No. 14024-0371.33). Insects were reared on Jazz-Mix Drosophila food (Applied Scientific, Waltham, MA, USA) in plastic bottles in an insect growth chamber (Caron, Marietta, OH, USA) at 25 °C and 68% humidity. Natural infection by Wolbachia endosymbiont wAna was confirmed with Wolbachia-specific fluorescence in situ hybridization on ovaries 23 prior to total RNA extraction from ~50 adults using an RNeasy Mini Kit (Qiagen, Valencia, CA, USA), for each of three biological replicates. RNA was DNase-treated with the optional on column DNase digestion per the manufacturer's protocol. To further remove contaminating DNA, ≤ 87.5 μ L of the RNA sampled was combined with 10 μ L Buffer RDD, 2.5 μ L RNase-free DNase I (Qiagen, Valencia, CA, USA), and brought up to 100 μ L with Rnase-free water. The mixture was incubated at room temperature for 10 min and then RNA purified with the Qiagen RNeasy Mini protocol following the manufacturer's protocol.
Ehrlichia chaffeensis Wakulla and canine RNA isolation. The Ehrlichia chaffeensis Wakulla strain (originally provided by the Centers for Disease Control and Prevention, Atlanta, GA) was propagated in a canine macrophage cell line DH82 as described previously 24 in DMEM supplemented with 10% FBS and 2% L-glutamine. At 3 d post infection, when ~95% infected of cells were infected, the DH82 cells (~5 × 10 6 cells)  Table 2. Approximate Costs. * Based on a $1,830 kit of 24 reactions and using 8.5 μ L for the standard input and 1.7 μ L for the low input, as described in the methods. ** Based on a $515 kit of 6 reactions and using 1.5 μ L in the standard input and 0.3 μ L in the low input, as described in the methods. *** Based on a $462 kit of 10 reactions.
**** Based on the $7,500 list price for 16 reactions of a Tier 2 custom capture of a 0.5-2.9 Mbp genome and a $1,500 reagent kit. ***** Based on a $2,400 100-bp paired end HiSeq channel generating 200 million read pairs and targeting the acquisition of 10 million mapped bacterial read pairs/transcriptome, using the Drosophila-Wolbachia sequencing results, specifically that 1.0% of reads matched Wolbachia sequences as opposed to 0.02% in the total RNA; In reality the samples without reduction would perform markedly worse since the reads would most likely map to the rRNA. ****** Based on a $2,400 100-bp paired end HiSeq channel generating 200 million read pairs and targeting the acquisition of 10 million mapped bacterial read pairs/transcriptome, using the Ehrlichia-canine sequencing results, specifically that 14.3% of reads mapped to Ehrlichia following enrichment.
Scientific RepoRts | 6:34850 | DOI: 10.1038/srep34850 were harvested, and total RNA was extracted using an RNeasy Mini Kit (Qiagen, Valencia, CA, USA) with the optional on column DNase-digestion per the manufacturer's protocol.
Bacterial mRNA Enrichment. Using 5 μ g total RNA, the Ribo-Zero removal protocol (Epicentre, Madison, WI, USA), was carried out with the standard amount of magnetic beads (225 μ L per reaction), followed by the Invitrogen Dynabeads polyA enrichment protocol (Life Technologies, Grand Island, NY, USA) keeping the polyA depleted material in the supernatant and discarding the polyA-enriched material. The similarity in composition, pH, and ionic strength of Epicentre's and Invitrogen's reaction buffers allowed us to follow the protocols consecutively. Since the total RNA mixture was assumed to have a higher ratio of host rRNA compared to endosymbiont rRNA, we added 8.5 μ L Human/Mouse/Rat Removal Solution to 1.5 μ L Gram Negative Bacteria Removal Solution at the Ribo-Zero rRNA removal step for the standard removal. For the low input removal, we added 1.7 μ L Human/Mouse/Rat Removal Solution to 0.3 μ L Gram Negative Bacteria Removal Solution. The Ribo-Zero Magnetic Kit procedure was followed and, after removal of the magnetic beads, the eluate was processed with the standard Dynabeads protocol. Samples taken before and after bacterial mRNA enrichment were analyzed on an Agilent Bioanalyzer.
Using both the Agilent Bioanlayzer and qRT-PCR (below), the template added reflects the same amount of starting material. For example, if the input volume of starting material was 50 μ L and the final output supernatant was 100 μ L, 1 μ L of the starting material was used per qRT-PCR reaction or Bioanalyzer well for the total/starting RNA sample while 2 μ L of the supernatant was used for the bacterial mRNA enriched sample. This was done to reflect the dilution between the starting material and the final supernatant volumes. As a consequence, these areas of integration and the Ct values described below can be compared directly in order to evaluate the composition of the samples.
Bacterial mRNA Enrichment Using the Low Input Method. Using 100 ng total RNA and a protocol distributed by Clontech (http://www.clontech.com/JP/Products/cDNA_Synthesis_and_Library_Construction/ Next_Gen_Sequencing_Kits/ibcGetAttachment.jsp?cItemId= 75952&fileId= 6660677&sitex= 10025:22372:US), the Ribo-Zero rRNA removal protocol (Epicentre, Madison, WI, USA) was carried out using a smaller amount of rRNA removal beads (90 μ L), followed by the Invitrogen Dynabeads polyA enrichment protocol (Life Technologies, Grand Island, NY, USA) keeping the polyA-depleted material in the supernatant and discarding the poly-enriched material. Like the higher input method, the similarity of Epicentre's and Invitrogen's reaction buffers allowed us to follow the protocols consecutively. Since the total RNA mixture was assumed to have a higher ratio of host rRNA compared to endosymbiont rRNA, we added 1.7 μ L Human/Mouse/Rat Removal Solution to 0.3 μ L Gram Negative Bacteria Removal Solution at the Ribo-Zero rRNA removal step, maintaining the same removal solution ratio as the 5 μ g method. The modified Ribo-Zero Magnetic Kit procedure was followed and, after removal of the magnetic beads, the eluate was processed with the standard Dynabeads protocol.
Scientific RepoRts | 6:34850 | DOI: 10.1038/srep34850 (Life Technologies, Grand Island, NY, USA). The reactions were incubated at 50 °C for 30 min and then denatured at 95 °C for 15 min, followed by amplification with 45 cycles of 94 °C for 15 s, 55 °C for 30 s, and 72 °C for 30 s. Data was analyzed using a comparative cycle threshold (Δ Ct) method, comparing the pre-enrichment Ct to the post-enrichment Ct for each locus tested. Each sample was tested in triplicate.
Transcriptome Sequencing. Illumina RNA-Seq libraries were prepared with the TruSeq RNA Sample Prep kit (Illumina, San Diego, CA) per the manufacturer's protocol. For the bacterial RNA-enriched sample, the polyA isolation step was omitted. The total RNA sample was prepared from all RNA present in the sample, while the polyA-selected library had the polyA isolation step performed. Adapters containing seven nucleotide indexes were ligated to the double-stranded cDNA. The DNA was purified between enzymatic reactions and the size selection of the library was performed with AMPure XT beads (Beckman Coulter Genomics, Danvers, MA). The libraries were pooled and sequenced on an Illumina MiSeq or Illumina HiSeq sequencer paired end run, as specified in the text.
Sequence Read Mapping of D. ananassae Data. Forward and reverse reads were aligned with the BWA MEM command implemented in BWA v0.7.12 using the -M option 17 against a database containing the D. ananassae and wRi genomes. The alignments were sorted and duplicates removed using Picard v.1.129. Coverage across the genome was measured using SAMTOOLS MPILEUP implemented in v.0.1.19 with the options "-BQ0 -d10000000" 25 .