Targeted enrichment outperforms other enrichment techniques and enables more multi-species RNA-Seq analyses

Enrichment methodologies enable the analysis of minor members in multi-species transcriptomic data. We compared the standard enrichment of bacterial and eukaryotic mRNA to a targeted enrichment using an Agilent SureSelect (AgSS) capture for Brugia malayi, Aspergillus fumigatus, and the Wolbachia endosymbiont of B. malayi (wBm). Without introducing significant systematic bias, the AgSS quantitatively enriched samples, resulting in more reads mapping to the target organism. The AgSS-enriched libraries consistently had a positive linear correlation with their unenriched counterparts (r2 = 0.559–0.867). Up to a 2,242-fold enrichment of RNA from the target organism was obtained following a power law (r2 = 0.90), with the greatest fold enrichment achieved in samples with the largest ratio difference between the major and minor members. While using a single total library for prokaryote and eukaryote enrichment from a single RNA sample could be beneficial for samples where RNA is limiting, we observed a decrease in reads mapping to protein coding genes and an increase in multi-mapping reads to rRNAs in AgSS enrichments from eukaryotic total RNA libraries compared to eukaryotic poly(A)-enriched libraries. Our results support a recommendation of using AgSS targeted enrichment on poly(A)-enriched libraries for eukaryotic captures, and total RNA libraries for prokaryotic captures, to increase the robustness of multi-species transcriptomic studies.

Nematode preparation. To examine larval development in the vector, groups of mosquitoes were sampled at 18 h post infection (hpi), 4 days post infection (dpi), and 8 dpi. Because larval development occurs in the thoracic muscle of the mosquito, thoraces of infected mosquitoes containing larval B. malayi were (1) separated from the head, abdomen, legs, and wings, (2) flash frozen in liquid nitrogen, (3) and stored at −80 °C prior to RNA isolation. To generate third stage larvae (L3) of B. malayi, infected mosquitoes at 9-16 dpi were processed in bulk using the NIAID/NIH Filariasis Research Reagent Resource Center (FR3) Research Protocol 8.4 (www. filariasiscenter.org). Larvae were isolated in RPMI media containing 0.4 U penicillin and 4 µg streptomycin per mL (RPMI + P/S), flash frozen in liquid nitrogen, and stored at −80 °C. To generate fourth larval stages (L4) and adult worms, freshly isolated L3s were injected into the peritoneal cavities of Mongolian gerbils (Meriones unguiculatus). Briefly, male Mongolian gerbils three months of age or older (Charles River, Wilmington, MA, USA) were anesthetized with 5% isoflurane, immobilized on a thermal support, and administered ocular Paralube. The inguinal region for each gerbil was shaved and disinfected with iodine. Infections were performed by delivering L3 larvae into the peritoneal cavity using a butterfly catheter, which was left in place and flushed afterwards with 1 mL warmed RPMI + P/S to ensure delivery of all larvae. Afterwards, gerbils were removed from the plane of anesthesia and allowed to fully recover in a hospital cage prior to returning to group housing. Worms were later harvested by euthanizing the gerbils and soaking the peritoneal cavities in warm RPMI + P/S. Worms were washed in warm RPMI + P/S to remove traces of gerbil tissue, flash-frozen with liquid nitrogen, and stored at −80 °C. All animal care and use protocols were carried out in accordance with the relevant guidelines and regulations and were approved by the UWO IACUC.
Mosquito/Nematode/Wolbachia RNA isolation. Mosquito thoraces were combined with TRIzol (Zymo Research, Irvine, CA, USA) at a ratio of 1 mL TRIzol per 50-100 mg mosquito tissue while nematode samples were processed using a 3:1 volume ratio of TRIzol to sample. β-mercaptoethanol was added to a final concentration of 0.1%. The tissues were homogenized in a TissueLyser (Qiagen, Germantown, MD) at 50 Hz for SciENtific RepoRTS | (2018) 8:13377 | DOI: 10.1038/s41598-018-31420-7 5 min. The homogenate was transferred to a new tube and centrifuged at 12,000 × g for 10 min at 4 °C. After incubating at room temperature for 5 min, 0.2 volumes of chloroform were added. The samples were shaken by hand for 15 s, incubated at room temperature for 3 min, then loaded into a pre-spun, phase lock gel heavy tube (5Prime, Gaithersburg, MD, USA) and centrifuged for 5 min at 12,000 × g at 4 °C. The upper phase was removed to a new tube and one volume of 100% ethanol was added prior to loading onto a PureLink RNA Mini column (Ambion, Austin, TX). The samples were then processed following manufacturer instructions, quantified using a Qubit fluorometer (Qiagen, Germantown, MD, USA) and/or a NanoDrop spectrometer (NanoDrop, Wilmington, DE, USA). The RNA was subsequently treated with the TURBO DNA-free kit (Ambion, ThermoFisher Scientific, Waltham, MA, USA) according to the manufacturer's protocol.
A. fumigatus infection. Two different models of invasive aspergillosis, both using male BALB/c mice, were employed 12,13 . In the non-neutropenic model, the mice were immunosuppressed with 5 doses of cortisone acetate, with 500 mg/kg administered subcutaneously every other day, starting four days pre-infection. In the neutropenic model, the mice were administered cyclophosphamide, 250 mg/kg intraperitoneally, and cortisone acetate, 250 mg/kg subcutaneously, two days pre-infection. Four days post infection, the mice were given a second dose of cyclophosphamide, 200 mg/kg intraperitoneally, and cortisone acetate, 250 mg/kg subcutaneously. All mice were infected by placing them in a chamber containing an aerosol of A. fumigatus conidia for 1 h. On days 2 and 4 for the non-neutropenic model and on days 4 and 7 for the neutropenic model, 3 mice per time point were sacrificed, after which their lungs were harvested and stored in RNAlater (Ambion, ThermoFisher Scientific, Waltham, MA, USA) for subsequent RNA isolation.
A. fumigatus RNA isolation. Each lung sample was placed in a lysing matrix C tube (MP Biomedicals, Santa Ana, CA, USA) with a single 0.25 inch diameter ceramic sphere (MP Biomedicals, Santa Ana, CA, USA) and homogenized with a bead beater (FastPrep FP120, Qbiogene, Montreal, Quebec, Canada).The RNA was isolated using the RiboPure kit (Ambion, ThermoFisher Scientific, Waltham, MA, USA) following the manufacturer's instructions, treated with DNase I (ThermoFisher Scientific, Waltham, MA, USA), and purified using the RNA clean & concentrator kit (Zymo Research, Irvine, CA, USA).

Illumina NEBNext Ultra Directional RNA libraries without capture. Whole transcriptome libraries
were constructed for sequencing on the Illumina platform using the NEBNext Ultra Directional RNA Library Prep Kit (New England Biolabs, Ipswich, MA, USA). When targeting eukaryotic mRNA, polyadenylated RNA was isolated using the NEBNext Poly(A) mRNA magnetic isolation module. When targeting bacterial mRNA, samples underwent rRNA-and poly(A)-reductions, as previously described 10,14 . SPRIselect reagent (Beckman Coulter Genomics, Danvers, MA, USA) was used for cDNA purification between enzymatic reactions and size selection. For indexing, the PCR amplification step was performed with primers containing a 7-nt index sequence. Libraries were evaluated using the GX touch capillary electrophoresis system (Perkin Elmer, Waltham, MA) and sequenced on a HiSeq2500, generating 100-bp paired end reads.
AgSS probe design. Capture probes were designed using the SureSelect DNA Advanced Design Wizard for B. malayi, A. fumigatus, and wBm. Probes were designed to capture every 120 bp for each coding sequence in each of the three organisms with no overlap, except at one end to ensure the entire coding sequence was covered. DustMasker was used to identify and mask low complexity regions of the genome from the probe design 15 . A total of 167,997 probes were designed for B. malayi to capture 11,085 genes, ranging from 1-540 probes/gene with an average of 15.3 probes/gene. For A. fumigatus, 126,394 probes were designed to capture 9,835 genes, ranging from 1-213 probes/gene with an average of 12.9 probes/gene. For wBm, 6,393 probes were designed with 1-72 probes/ gene and an average of 8.0 probes/gene. The baits were not tested for specificity against the host genome.
Wolbachia AgSS RNA capture. Pre-capture libraries were constructed from 500-1000 ng of total RNA samples using the NEBNext Ultra Directional RNA Library Prep kit (NEB, Ipswich, MA, USA). First strand cDNA was synthesized without mRNA extraction to retain non-polyadenylated transcripts and was fragmented at 94 °C for 8 min. After adaptor ligation, cDNA fragments were amplified with 10 cycles of PCR before capture. Wolbachia transcripts were captured from 200 ng of the amplified libraries using an Agilent SureSelectXT RNA (0.5-2 Mbp) bait library designed specifically for wBm. Library-bait hybridization reactions were incubated at 65 °C for 24 h then bound to MyOne Streptavidin T1 dynabeads (Invitrogen, Carlsbad, CA, USA). After multiple washes, bead-bound captured library fragments were amplified with 18 cycles of PCR. The libraries were loaded on a HiSeq4000 generating 151-bp paired end reads.
B. malayi AgSS RNA capture. Pre-capture libraries were constructed from 1000 ng of total RNA samples using NEBNext Ultra Directional RNA Library Prep kit (NEB #E7420, Ipswich, MA, USA). Except where noted in the text, poly(A)-enrichment according to the manufacturer's protocol was used. After adaptor ligation, cDNA fragments were amplified for 10 cycles of PCR before capture. B. malayi transcripts were captured from 200 ng of the amplified libraries using an Agilent SureSelectXT Custom (12-24 Mbp) bait library designed specifically for B. malayi. Library-bait hybridization reactions were incubated at 65 °C for 24 h then bound to MyOne Streptavidin T1 Dynabeads (Invitrogen, Carlsbad, CA). After multiple rounds of washes, bead-bound captured library fragments were amplified with 16 cycles of PCR. The libraries were loaded on a HiSeq4000 generating 151-bp paired end reads. In comparisons describing captures on poly(A)-enriched libraries with total RNA libraries, total RNA libraries were constructed after first strand cDNA was synthesized without mRNA extraction to retain non-polyadenylated transcripts and fragmented at 94 °C for 8 min.
SciENtific RepoRTS | (2018) 8:13377 | DOI:10.1038/s41598-018-31420-7 A. fumigatus AgSS capture. Pre-capture libraries were constructed from 700 ng of total RNA samples using NEBNext Ultra Directional RNA Library Prep kit (NEB #E7420, Ipswich, MA, USA) according to the manufacturer's protocol. Briefly, mRNA was extracted with oligo-d(T) beads and reverse-transcribed into first strand cDNA with random primers. First strand cDNA was fragmented at 94 °C for 8 min. After adaptor ligation, cDNA fragments were amplified in a thermocycler for 10 cycles before capture. Aspergillus transcripts were captured from 100 ng of the amplified libraries using an Agilent SureSelectXT Custom (12-24 Mbp) bait library for A. fumigatus. Libraries were incubated at 65 °C for 24 h then bound to MyOne Streptavidin T1 Dynabeads (Invitrogen, Carlsbad, CA, USA). After multiple rounds of washes, bead-bound captured library fragments were amplified with 16 cycles of PCR. The libraries were loaded on a HiSeq4000 generating 151-bp paired end reads.
The sequencing reads for the wBm comparisons were mapped to the wBm assembly 19 using Bowtie v0.12.9 20 . Counts for each wBm gene were calculated by summing the sequencing depth per base pair, obtained using the DEPTH function of SAMtools v1.1 21 , for each of the unique genomic positions in each gene. This value was then divided by the average read length for the sample, yielding a read count value for each gene. Across all comparisons, read counts were normalized between each of the different samples through a conversion to transcripts per million (TPM).
The fold enrichment, F e , of reads conferred by the AgSS to its target organism compared to the other enrichment techniques was calculated as

Results
Agilent SureSelect probe G + C content and specificity. The G + C content of the B. malayi AgSS probes, our largest set of designed probes, was assessed to determine whether the G + C content of a probe impacted its ability to capture its target gene. For any given gene in the B. malayi genome, the G + C content of its set of probes ranged from 19.2-66.2%, with an average G + C content of 38.6% ( Supplementary Fig. 1). For each G + C content, in increments of 1%, the average ratio of the AgSS TPM to the poly(A)-selected TPM average ratio of all genes with probe sets of that specified G + C content were calculated and compared. Between 26-55%, there were >10 genes with probe sets at each of the G + C increments. At these percentages and across six different samples, the relationship between the AgSS TPM to poly(A)-selected TPM remained linear, indicating that the ability of a probe set to capture its gene set is not largely affected by G + C content ( Supplementary Fig. 2).
Across the six B. malayi samples, <2,004 probes were invalid, which accounts for ~1.2% of the 167,997 designed probes for B. malayi. For each of the samples, an invalid probe was defined as a probe whose intended capture positions had a larger read count in the poly(A)-selected sample relative to its AgSS counterpart. to the B. malayi genome has the greatest fold enrichment (110-146x) while the 8 dpi samples, with the highest percentage of B. malayi mapped reads, has the lowest fold enrichment (3-4x).

Comparison of poly(A)-enriched AgSS to only poly(A)-enriched libraries for
A principal component analysis (PCA) using the log 2 -TPM values for the 11,085 predicted B. malayi protein-coding genes reveals that the B. malayi samples cluster based on life stage rather than library preparation ( Fig. 2A), suggesting that the capture does not introduce a systematic bias to the transcriptome of a sample. The exceptions are the 18 hpi vector samples, which we attribute to the poly(A)-enrichment only samples having an insufficient number of reads mapped to features (143,078 and 212,342 reads, accounting for <20 reads/ gene assuming an equal distribution) to accurately represent the transcriptome. Supporting this, in the 18 hpi samples, an average of 2,460 protein coding genes (22.18% of the B. malayi protein coding genes) had reads mapping to them in the poly(A)-enriched AgSS samples but not its poly(A)-enriched counterpart, while in the 4 dpi and 8 dpi samples, an average of 952 (8.58%) and 566 (5.09%) genes, respectively, had reads mapped in the poly(A)-enriched AgSS libraries but not the poly(A)-enriched libraries. Collectively, across all 6 samples, 392 unique genes were detected in only the poly(A)-enrichment, but not its poly(A)-enriched AgSS counterpart in at least one comparison. The AgSS probe design was based on an older version of the annotation compared to the version used for feature calling, and as such, 91 of these 392 genes were not covered by a probe.
The log 2 TPM values from the poly(A)-enriched AgSS and the poly(A)-enriched libraries had a positive linear correlation with r 2 values ranging from 0.559-0.867 (Fig. 2B), with the 4 dpi and 8 dpi correlations being the strongest (Fig. 2B), for the reasons discussed above, namely that fewer reads were recovered from the 18 hpi vector samples without AgSS enrichment. To identify individual genes in which the library preparations significantly affected the calculated TPM, the log 2 ratio of the poly(A)-enriched AgSS TPM relative to the poly(A) enrichment TPM for each gene was calculated (Fig. 2C). Genes with relatively similar expression levels in the two library preparations would have a log 2 ratio of 0 while genes more highly expressed in the poly(A)-enriched AgSS preparation would be >0 and genes more highly expressed in the poly(A) enrichment would be <0. Genes with a log 2 -ratio value of >3 standard deviations from the mean were defined as having significantly altered levels of expression between the two libraries. Across all six comparisons, the number of genes with significantly higher TPM values in the AgSS libraries never accounted for >0.31% (<34 genes) of all protein-coding genes in the B. malayi genome. The number of genes with significantly higher TPM values in the poly(A)-enrichment  Supplementary Dataset 1, Supplementary Fig. 3).

Comparison of poly(A)-enriched AgSS to only poly(A)-enrichment libraries for A. fumigatus.
The A. fumigatus AF293 transcriptome from poly(A)-enriched AgSS and poly(A)-enriched only libraries were compared for 11 RNA samples from immunocompromised mice or neutropenic mice infected with A. fumigatus AF293 (Fig. 1a). Of the 11 samples, 6 samples originated from 3 replicates of immunocompromised mice taken 2 dpi and 4 dpi with A. fumigatus. The remaining 5 samples consist of two replicates of neutropenic mice taken 4 dpi with A. fumigatus and three replicates of neutropenic mice taken 7 dpi with A. fumigatus.
Across all 11 samples, the total percentage of reads mapping to A. fumigatus from the poly(A)-enriched samples was consistently ≤0.02%, while 0.30%-11.74% of reads mapped in the poly(A)-enriched AgSS samples. In the poly(A)-enrichment only samples, a maximum of 27,531 reads mapped to A. fumigatus AF293, accounting for <3 reads/gene assuming an even distribution. As such, comparisons between AgSS and poly(A)-enriched libraries are not possible.  Supplementary Fig. 4). For all 11 comparisons, using the AgSS methodology conferred a 614x to 1,212x fold enrichment of A. fumigatus AF293 reads relative to the poly(A)-enrichment, going from data that was unamenable to a traditional transcriptomics analysis to obtaining data where robust statistical tests can be performed. The proportion of reads mapping to features relative to the total number of reads mapped is consistent between samples of the same library preparation, with A. fumigatus poly(A)-enriched AgSS samples ranging from 52.23-54.87% and the poly(A)-enrichment samples ranging from 45.89-51.22%.
Comparison of total RNA AgSS to rRNA-, poly(A)-depleted libraries for the bacterial Wolbachia endosymbiont wBm. With both of the previously discussed biological systems, we tested the effectiveness of the AgSS platform to study eukaryote-eukaryote host-pathogen interactions. By enriching for wBm reads in B. malayi samples, we sought to test the effectiveness of AgSS libraries in extracting prokaryotic reads. The standard protocol for AgSS capture in eukaryotes relies on capturing targets from poly(A)-enriched libraries, but for prokaryotic samples the capture is performed on total RNA libraries. These total RNA AgSS libraries were compared to rRNA-, poly(A)-depleted library preparations from nine B. malayi samples taken at various stages of the life cycle to analyze the transcriptome of its Wolbachia endosymbiont, wBm (Fig. 1b). Three RNA samples were obtained from the mammalian portion of the Brugia lifecycle where worms are large enough to be physically separated from the mammalian tissue. These RNA samples generate predominantly Brugia reads with a minority of Wolbachia reads. Of these three samples, one sample was recovered at 24 dpi, representative of immature female worms while the other two samples are from two reproductively mature adult females recovered 3 months post-infection. The other six samples are from the vector portion of the B. malayi lifecycle where tripartite RNA samples are obtained, consisting of predominantly mosquito reads, with a smaller percentage of Brugia reads, and an even smaller percentage of Wolbachia reads. These six samples consist of two A. aegypti replicates each taken at18 hpi, 4 dpi, and 8 dpi with B. malayi, representative of the L1 through early L3 B. malayi life stages. Of the six samples taken from the vector life stages, at most 1,552 reads (<0.01% of sequenced reads) mapped to the wBm genome from the rRNA-, poly(A)-depleted samples, of which <730 reads were identified as mapping to protein-coding genes for an average of <1 read mapped/gene (Supplementary Dataset 3, Supplementary Fig. 5). In contrast, the rRNA-, poly(A) depleted adult female B. malayi samples taken from the mammalian host had a minimum of 17,107 reads mapping to wBm protein-coding genes, equating to ~20 reads/gene while the 24 dpi sample had 246,549 reads mapped to protein-coding genes, equating to ~294 reads/gene.  (Fig. 3B).
A total of 45 unique genes (~5.4% of protein-coding genes in wBm) were detected in at least one of the three gerbil comparisons in the sample treated using the rRNA-, poly(A)-depletion libraries but not its total RNA AgSS counterpart. Of these 45 genes, 41 were not covered by any of the AgSS probes while the remaining 4 genes had <11% of their length covered by a probe. To identify genes with significantly greater expression in one enrichment method over the other, a cutoff of three standard deviations from the log 2 ratio of the AgSS TPM to the rRNA-, poly(A)-depleted TPM was used (Fig. 3C). Genes with a log 2 ratio of >3 standard deviations below the average log 2 ratio were determined to have significantly higher expression in the rRNA-, poly(A)-depletion sample while genes with a log 2 ratio >3 standard deviations above the average, were identified as having significantly higher expression in the total RNA AgSS sample. Across the three comparisons, a total of 20 unique genes had TPM values significantly higher in the rRNA-, poly(A)-depleted sample compared to the AgSS sample in at least one comparison. Of these genes, 14 genes were not covered by any probe, four genes had <17% of their length covered by a probe, and the remaining two genes are covered in the entirety of their length by the probe design. Comparison of AgSS B. malayi transcriptome data from total RNA and poly(A)-enriched samples. For eukaryote transcriptome experiments, the AgSS platform is typically preceded by construction of a poly(A)-selected library that is then hybridized to the bait probes. The poly(A)-enrichment step serves to remove high abundance, non-mRNA transcripts, such as rRNAs, tRNAs, and other ncRNAs. For bacterial samples, libraries must be constructed on total RNA since bacterial mRNAs lack the poly(A) tails present in eukaryotic transcripts. For eukaryote-prokaryote transcriptomic experiments, it may be advantageous to use the AgSS platform for both bacteria and eukaryotes on the same single library constructed from total RNA, especially for clinical samples where RNA may be limiting. Therefore, we sought to compare the AgSS enrichment on a library constructed from total B. malayi RNA with one constructed following poly(A)-enrichment using two samples taken from the mosquito vector portion of the B. malayi life stages at 18 hpi and 8 dpi (Fig. 1c).
For both the 18 hpi and 8 dpi time points, most genes yield comparable log 2 TPM values between the two enrichment methods (Fig. 4A) with <1.5% of genes having a bias towards a particular library construction protocol (Fig. 4B). However, despite the 8 dpi comparison yielding a similar number of genes detected unique to each enrichment method, in the 18 hpi sample comparison, 1,033 genes (9.3% of all protein-coding genes) were detected only in the poly(A)-enriched AgSS library while 323 (2.91%) were only detected with the total RNA  AgSS samples were enriched for reads mapping to B. malayi by 2.4x for the 18 hpi sample and 1.1x for the 8 dpi sample relative to the total RNA AgSS. Additionally, in the 18 hpi and 8 dpi poly(A)-enriched AgSS samples, 46.63% and 46.35% of their mapped reads mapped to protein-coding genes while in the total RNA AgSS samples only 20.22% and 32.21% mapped to protein-coding genes (Supplementary Dataset 4, Supplementary Fig. 6). This is explained by the increased number of multi-mapping reads observed in the total RNA AgSS samples, with 70.29% and 39.29% of the total number of mapped reads in the 18 hpi and 8 dpi vector samples multi-mapping to the B. malayi genome. In comparison, only 5.93% and 5.42% of mapped reads were multi-mapping in the 18 hpi and 8 dpi poly(A)-enriched AgSS samples. A total of 80.82% and 69.86% of these multi-mapping reads in the total RNA AgSS 18 hpi and 8 dpi vector samples were found to map to the B. malayi rRNAs compared to only 13.76% and 1.56% in the poly(A)-enriched AgSS samples (Supplementary Dataset 4, Supplementary Fig. 6). The AgSS platform on poly(A)-enriched RNA was able to obtain both a higher percentage of B. malayi mapped reads and a higher percentage of B. malayi reads mapping to protein-coding genes. Therefore, we recommend the use of the AgSS enrichment on poly(A)-enriched libraries for eukaryotic systems whenever possible.
Recovering the transcriptome of the major organism from libraries prepared with the AgSS of the minor organism. We explored whether sequencing data from both the major and minor organism could be accurately obtained using only data collected from a AgSS capture of the minor organism for eukaryote-eukaryote dual-transcriptomics experiments. Using the sample with the highest fold enrichment conferred by the AgSS platform, an immunocompromised mouse at 2 dpi, the host mouse transcriptome was compared between two libraries enriched using only the poly(A)-enrichment versus a poly(A)-enriched library that was AgSS-captured for A. fumigatus transcripts. When plotting the TPM values of two library preparations against one another ( Supplementary Fig. 7), the TPM values share a positive linear correlation with a r 2 of 0.85, similar to the correlation values observed in the B. malayi comparisons (Fig. 2B), indicating that the use of the A. fumigatus AgSS did not appear to largely impact the mouse transcriptome. However, of the 39,179 mouse genes, 5,424 (13.8%) were detected in only the polyA-library but not the corresponding AgSS-captured library. In comparison, only 128 mouse genes were detected only in the A. fumigatus AgSS library but not the poly(A)-library. While the transcripts captured seem to have the same expression profile, >10% of the genes were missed, and thus, we would recommend only assessing the organism for which the AgSS capture system was designed if the goal is to obtain the complete transcriptional profile of the major member.

Discussion
With multi-species RNA-Seq experiments emerging at the forefront of experimental designs for transcriptome analyses, novel methods are being developed to more efficiently extract the transcriptomes for multiple organisms from a single sample. By comparing the transcriptomes generated with standard enrichment methods (e.g. poly(A)-enrichment or rRNA-/poly(A)-depletion) to the AgSS platform, we sought to independently validate the use of the AgSS platform in enriching for transcripts originating from a specific organism of interest while minimizing any type of library bias on the transcriptome. When analyzing the B. malayi transcriptomes of samples prepared with or without the AgSS enrichment, most genes could be detected using both library preparations. However, the samples prepared using the AgSS system enriched the number of reads mapped by 3x-146x compared to samples enriched only with a poly(A)-enrichment enabling the consistent detection of a greater number of genes, most of which were low abundance transcripts. In the A. fumigatus comparisons, the poly(A)-enrichment by itself was unable to extract a sufficient number of A. fumigatus reads to accurately represent the transcriptome of the sample. When the poly(A)-enrichment was supplemented with AgSS capture, the enrichment of A. fumigatus reads was increased 614-1,212x. When using the AgSS instead of the rRNA-, poly(A)-depletion preparation for the Wolbachia endosymbiont wBm transcriptome, the AgSS samples were enriched for wBm mapped reads by 20-41x in the gerbil life cycle samples and 353-2,242x in the vector life cycle samples.
For each of these samples, if the fold enrichment for the number of mapped reads in the poly(A)-enriched or total RNA AgSS is plotted against the percentage of reads mapped in its poly(A)-enrichment or poly(A), rRNA depletion counterpart, the data follows a power law (Fig. 5). As expected, the fold enrichment obtained with the AgSS capture is inversely proportional to the percentage of mapped reads obtained with the currently used enrichment methods of poly(A)-enrichment to capture eukaryotic RNA, or rRNA-and poly(A)-depletion to capture bacterial RNA from multi-species samples, indicating samples with a lower relative abundance of RNA from the target organism yield the greatest fold enrichment.
While the AgSS can enrich a sample for transcripts of interest, the platform requires probes designed for each gene to avoid any enrichment bias. Most instances of the AgSS failing to detect a gene that was detected in samples enriched using only the poly(A)-enrichment or rRNA-, poly(A)-depletion was due to a lack of designed probes for the gene. Because the B. malayi annotation is currently being updated, the baits were designed with an older annotation than the one used for feature counting. Similarly, the bait design for wBm was based on the annotation published by Foster et al. 19 while the annotation used for feature counting originated from GenBank (NC_006833.1). In both cases, most genes present only in the annotation used for the bait design were not detected with the AgSS platform.
These results illustrate the power and advantages of using probe-based enrichment to facilitate the analysis of samples that are the most insightful, like those in animal tissue at a biologically relevant multiplicity of infection. The AgSS platform serves as a method to extract reads from a secondary or tertiary organism in a sample as well as to detect low abundance transcripts. Using this platform, we successfully extracted a sufficient number of reads to represent the B. malayi and wBm transcriptomes in the vector portion of the B. malayi life cycle, the A. fumigatus transcriptome in a mouse infection, and the wBm transcriptome in the mammalian portion of the B. malayi life cycle. Provided a transcriptome experiment has an adequate bait design, we believe the AgSS platform to be ideal and necessary in enriching samples for multi-species transcriptomic experiments, especially those in which secondary organisms are of low abundance. , and wBm (green) was calculated by taking the ratio of the percentage of AgSS reads mapped to the percentage of poly(A)-enrichment/depletion reads mapped. The relationship between these fold enrichment values and the percentage of the poly(A)-enrichment/depletion reads mapped to the reference genome can be fitted to a power law. The fold enrichment conferred using the AgSS platform exponentially increases with the exponential decrease of the percentage of reads mapped using the standard enrichment/depletion methods up to an ~1300-fold enrichment.