Comparison of DNA and RNA sequencing of total nucleic acids from human cervix for metagenomics

Although metagenomics and metatranscriptomics are commonly used to identify bacteria and viruses in human samples, few studies directly compare these strategies. We wished to compare DNA and RNA sequencing of bacterial and viral metagenomes and metatranscriptomes in the human cervix. Total nucleic acids from six human cervical samples were subjected to DNA and RNA sequencing. The effect of DNase-treatment before reverse transcription to cDNA were also analyzed. Similarities and differences in the metagenomic findings with the three different sequencing approaches were evaluated. A higher proportion of human sequences were detected by DNA sequencing (93%) compared to RNA sequencing without (76%) and with prior DNase-treatment (11%). On the contrary, bacterial sequences increased 17 and 91 times. However, the number of detected bacterial genera were less by RNA sequencing, suggesting that only a few contribute to most of the bacterial transcripts. The viral sequences were less by RNA sequencing, still twice as many virus genera were detected, including some RNA viruses that were missed by DNA sequencing. Metatranscriptomics of total cDNA provided improved detection of mainly transcribed bacteria and viruses in cervical swabs as well as detection of RNA viruses, compared to metagenomics.

www.nature.com/scientificreports/ transcribed genes and has also the possibility to detect viruses with RNA genomes, which cannot be detected by shotgun DNA sequencing if RNA is not reverse-transcribed first. Cervical cancer is caused by persistent infection of oncogenic high risk human papillomavirus (HPV) types [11][12][13] . The most oncogenic HPV types that cause neoplasia and cancer of the cervix are HPV types 16 and 18, but other HPV types are also found in cervical neoplasia [14][15][16] . Cervical squamous intraepithelial neoplasia stage 3 (CIN3), cervical cancer in situ, may continue to develop into invasive cancer if not treated, but it is not entirely clear which factors are involved in the persistence of the infection and the progression into cancer or clearance of the infection. The vaginal microbiota has been characterized in both normal and cervical intraepithelial neoplasia as well as cervical cancer 17 . Metagenomic studies suggest that only one or a few species of Lactobacillus within the phylum Firmicutes are usually dominating the vaginal microbiota 18 . However, the diversity tends to increase with neoplasia and HPV-positivity 19 . The vaginal microbiota may have an important role to protect against harmful infections like for instance HPVs, hence, the cervical microbiota may serve as a potential biomarker for cancer progression risk 17,20,21 .
To eliminate cervical cancer, a combination of broad vaccination programs 22 and early biomarkers with high specificity and sensitivity in order to identify women with low grade neoplasia with high risk of progression into cancer is important. In the present study, the aim was to compare massively parallel DNA and RNA sequencing of total nucleic acid extracted material for a comprehensive detection of the detectable and actively transcribing DNA and RNA microbes in cervical specimens.

Material and methods
Study participants. The sample collection of the Swedish Center for Cervical Cancer Prevention currently holds cervical samples from > 400,000 women, stored under validated methods in − 25 °C 23 . All women 23-64 years of age in this sample collection were diagnosed with CIN2 + (including CIN2, CIN3 = squamous cell carcinoma in situ, adenocarcinoma in situ and invasive cervical cancer) during the years 2013-2015 and identified by linkage to the National Cervical Screening Registry (NKCx), which also has the Center for Cervical Cancer Prevention as responsible organization. A random of 6 cervical samples (3 CIN3 + samples and 3 normal with no cervical disease diagnosed) were included for the present study from different women (n = 6). The CIN3 samples were matched with the normal samples by age in 5-year bands and calendar year of liquid-based cytology (LBC) test. Ethical approval was granted by the Regional Ethical Review Board of Stockholm, Sweden (EPN-Dnr: 2014/1242-31/4). Written informed consent was obtained from all participants. All methods were carried out in accordance with relevant guidelines and regulations.
Sample preparation. Isolation of total nucleic acid from the 6 LBC swab samples was performed using MagNA Pure LC instrument and total nucleic acid isolation kit (Roche, Basel, Switzerland) in accordance with the manufacturer´s protocol (Roche Molecular Systems, Inc., Alameda, CA, USA). 100 ul were used for total nucleic extraction and total nucleic acid was eluted in 50 µL elution buffer. The extracted DNA concentration was quantified in all 6 total nucleic acid extracted samples using a fluorometric assay (QuantiFluorST, Promega, US) according to manufacturer´s instructions, and ranged from 0.08 to 2.9 ng/ul (average 1.04 ng/ul).
Total nucleic acid extracted material from the 6 samples was thereafter subjected to 3 different approaches to analyze the microbiota: metagenomic DNA sequencing, metatranscriptomic RNA sequencing and metatranscriptomic RNA sequencing including a step of DNase-treatment of the total nucleic acids before RNA reverse transcription to cDNA. In total, 18 DNA/cDNA libraries were sequenced. DNA approach: library preparation and sequencing. For DNA sequencing, DNA libraries were prepared directly from the total nucleic acid extracted material from the 6 cervical samples using Nextera XT DNA library preparation kit (Illumina, San Diego, CA, USA) following the manufacturers' reference guide, starting with 1 ng of DNA (as recommended by the manufacturer) and using unique indexed adapters to facilitate pooling of the libraries. The 6 DNA libraries were normalized to 1 nM and pooled prior paired-end sequencing at 1.8 pM for 2 × 150 bp using NextSeq500 (Illumina, San Diego, CA, USA).
RNA approach: library preparation and sequencing. For RNA sequencing, RNA libraries were prepared directly from the total nucleic acid extracted material (not RNA extracted material) from the 6 cervical samples using Smarter stranded total RNA-seq kit v2-pico input mammalian (Takara Bio USA, Mountain View, CA, USA) following the user manual. Starting material was 8 ul as recommended in the guidelines, prior fragmentation for 2 min. After cDNA synthesis, adapters including an index part, unique for each individual sample, were added to the cDNA fragments, following ribosomal cDNA depletion and enrichment of uncleaved fragments by 14 cycles of PCR. The 6 RNA libraries were then normalized to 4 nM and pooled prior paired-end sequencing at 1.2 pM for 2 × 75 bp using NextSeq500 (Illumina, San Diego, CA, USA).
RNA with DNase-treatment approach: library preparation and sequencing. To observe if there was an effect of DNA presence in RNA sequencing, another 6 RNA libraries were prepared and sequenced as described in the RNA approach, by treating the total nucleic acid with DNase prior library preparation. 8 ul of total nucleic acid extracted material from the 6 cervical samples (recommended input for library preparation) were treated with DNase using Turbo DNA-free kit (TermoFisher Scientific, MA, USA) according to manufacturer's user guide in order to remove DNA. www.nature.com/scientificreports/ Bioinformatics. Indices, included in the Illumina adaptors, were used to assign raw sequence reads obtained from the NextSeq500 (Illumina, San Diego, CA, USA) platform to the originating samples. Demultiplexing was performed using bcl2fastq2 conversion software version 2.19 (Illumina, San Diego, CA, USA). All reads were filtered based on quality and adaptors were trimmed using Trimmomatic version 0.36 24 with default parameters and 18 bp and 36 bp as minimal read length for RNA and DNA sequenced specimens, respectively. The first 3 nucleotides from every R2 read were trimmed for the RNA sequencing fastq-files using cutadapt version 1.18, as advised within the Smarter stranded total RNA-seq kit used for library preparation. High-quality paired reads were screened against the human reference genome version GRCh38 using Next-GenMap version 0.5.2 25 . The program was run under default settings, except for -i 0.95, -R 0.75 and-silent-clip. Reads that did map to the human reference genome (if they aligned with more than 95% identity over 75% of their length to the human genome) were used to assess sample and protocol adequacy by analyzing presence of reads mapping to the human reference protein coding gene actin beta (ACTB).
High-quality non-human reads were classified using Kraken2 v. 2.1.1 26 , which was run against a reference database containing all RefSeq bacterial and viral genomes (built in December 2020) with a 0.1 confidence threshold. A cut-off of 10 classified reads was used to discriminate positive genera for bacteria and viruses, and results reported all genera which comprised more than 1% of total bacterial or viral reads, respectively.
As HPV is a necessary cause for cervical cancer, we also queried all non-human reads to several HPV databases. Non-human DNA reads were queried against a database of known HPV sequences including all HPV genomes officially established by the International HPV Reference Center (221 officially established HPV types, https:// www. hpvce nter. se, accessed on 2020-01-20), together with complete genome sequences from HPV types that are not officially established yet (n = 222, https:// pave. niaid. nih. gov, accessed on 2020-01-20), using Next-GenMap version 0.5.2 25 with the same parameters as described previously. Reads that mapped with more than 90% identity over 75% of their length (-i 0.9 -R 0.75) were included for further analysis and subjected to visual inspection using Integrative Genomics Viewer to confirm mapping. Samples were considered positive for HPV (cut-off) if a minimum of 10 reads were detected for any HPV type with at least 90% identity and with an HPV genome coverage of above 10% of that particular HPV type (approximately 750 bp).
Non-human reads from RNA sequencing were queried against all HPV protein sequences included in the PaVE database (Papillomavirus Episteme, accessed on 2019-07-28, including all protein sequences from HPV reference and non-reference genomes), using the open source software Diamond 27 for protein and translated DNA alignment by blastx with default parameters and -top 1. The same cut-off applied for HPV-positivity (minimum of 10 reads and above 10% genome coverage).

Ethics approval and consent to participate. Ethical approval was granted by the Regional Ethical
Review Board of Stockholm, Sweden (EPN-Dnr: 2014/1242-31/4). Written informed consent was obtained from all participants.

Results
The metagenome analysis from DNA sequencing of 6 cervical swab samples showed a median number of 138 million (M) high quality reads per sample. The metatranscriptome analysis from RNA sequencing without and with prior DNase-treatment of the extracted genetic material from the swab samples had a median number of 94 M and 46 M high quality reads per sample, respectively (Table 1). However, two of the samples (sample 2 and 5) generated much fewer reads by the RNA sequencing after DNase-treatment (1.4% and 1.7% of the median quality reads) and the quality control by fragment analyses of these two libraries showed no detectable fragments, indicating that the DNase-treatment removed all DNA and that the samples contained very little RNA (Table 1).
In agreement with the differences in total read output among the different sequencing approaches, both human and viral reads showed a corresponding decrease in total reads per sample when comparing RNA sequencing output with DNA sequencing output ( Table 1). The median relative abundance of annotated human reads was 93% in DNA sequencing, 76% for RNA sequencing and only 11% for RNA sequencing after DNasetreatment ( Table 1). The median relative abundance of viral reads followed the human read abundance decrease (0.01%, 0.007% and 0.005% respectively) ( Table 1). On the contrary, the annotated bacterial reads increased in relative frequency as well as in total number of reads by RNA sequencing compared to DNA sequencing (Fig. 1a). When comparing the 3 different sequencing approaches, 65% of all annotated bacterial reads were generated by RNA sequencing after DNase-treatment, 33.4% by RNA sequencing without prior DNase-treatment and only 1.6% by DNA sequencing (Fig. 1a). Sample 2 and 5 generated no detectable levels of library fragments by the bioanalyzer after DNase-treatment and hence, very few sequences were obtained, however a more even distribution of bacterial reads among the 6 samples was detected by the DNA sequencing (Fig. 1b). Overall, RNA sequencing with DNase-treatment generated the highest number of annotated bacterial reads per sample (median of 38.7 M reads per sample) as well as the highest relative abundance of reads (median of 82% bacterial reads of the total quality reads), a 91 and 5 times higher relative abundance than the DNA sequencing and RNA sequencing without prior DNase-treatment, respectively ( Table 1).
As all 3 sequencing approaches originated from total nucleic acid extracted material (not RNA extracted or DNA extracted material), analysis of the ACTB human gene, which is known to be expressed in cervical cells and commonly used as a reference gene in normalization of RT-qPCR data in cervical cancer cell lines 28 , was performed to investigate presence of DNA in both RNA approaches. Analysis of the ACTB gene showed presence of reads in both introns and exons with the DNA approach, mostly in the exons (with a few reads covering introns) for the RNA approach without prior DNase-treatment, and only reads covering exons for the RNA sequencing with prior DNase-treatment approach, implying that RNA sequencing of the total nucleic acids contains a small fraction of DNA, but almost no DNA left after DNase-treatment prior RNA sequencing (Fig. 2 www.nature.com/scientificreports/ Even if transcripts cover less than 5% of the human genome 29 , they are much more abundant, highly transcribed genes, as ACTB in cervical cells, may have several 1000's of copies of a certain transcript in one cell compared to the 2 copies of DNA. Hence, many more reads cover the exon regions compared to introns in RNA sequencing of a total nucleic acids without prior DNase-treatment.
Bacterial genera with at least 1% of the total number of classified bacterial reads in each of the 18 libraries among the 3 sequencing approaches (DNA, RNA and RNA with prior DNase-treatment) are shown in Fig. 3. Overall, Table 1. Distribution and classification of high-quality reads among the cervical liquid-based cytology normal and cervical intraepithelial neoplasia 3 (CIN3) swab specimens. Cyt stage = Cytology stage, CIN3 = cervical intraepithelial neoplasia stage 3, reads = number of generated raw DNA sequences from the sequencing.   www.nature.com/scientificreports/ the 3 sequencing approaches corresponded well to each other within each sample, with both RNA sequencing approaches being more similar in the bacteriome patterns of annotated reads compared to DNA sequencing within each sample. In total, 12 bacterial genera were detected with more than 1% of the bacterial reads in at least one sample, with genome sizes ranging from 1.4 to 5.8 Mb (Fig. 3). Sample 1, 2 and 6 shared a similar bacteriome pattern for the annotated bacterial reads, with Lactobacillus genus showing the highest percentage of reads (66% to 99% of the total bacterial reads) for the 3 sequencing approaches. Sample 3 revealed most annotated reads corresponding to Gardnerella genus (83% to 97% of the total bacterial reads). In sample 4 most of the annotated reads corresponded to both Gardnerella and Lactobacillus genera (27% to 50% of the total bacterial reads). Sample 5 showed no reads belonging to the mentioned genera (Lactobacillus and Gardnerella) and instead, most of the annotated reads belonged to Bifidobacterium (37% to 50% of the total bacterial reads) and Streptococcus (28% to 47% of the total bacterial reads) ( Table 2). Streptococcus was also detected in sample 3 (2% to 4% of the total bacterial reads).
Staphylococcus was present in all 6 samples from the DNA sequencing at 3% to 12% of the total bacterial reads. On the contrary, RNA sequencing detected Staphylococcus in only one sample without the DNase-treatment (1% of total bacterial reads) and in none of the samples with prior DNase-treatment (Table 2). Additional genera with at least 1% of the total bacterial reads from the DNA sequencing were Klebsiella and Pasteurella, detected in all 6 samples (1% to 7%), Ralstonia, detected in 4 of 6 samples (1% to 3%), Escherichia, detected in 1 sample (4%) and Fannyhessea, detected in 1 sample (15%) ( Table 2). Other genera constituted between 2 and 6% of the total bacterial reads, however less than 1% for each individual genus (Table 2). From the two RNA sequencing approaches, it was only Fannyhessea of these additional genera that had at least 1% of the total bacterial reads. However, two other genera were detected only by RNA sequencing, Megasphaera (1%) and Anaerococcus (2%) in one sample each ( Table 2). www.nature.com/scientificreports/ DNA sequencing detected in total 10 bacterial genera with more than 1% of the total annotated bacterial reads, compared to 9 and 7 genera detected when performing RNA sequencing without and with prior DNasetreatment, respectively (Table 2). In all 6 samples, DNA sequencing detected a higher variety of bacterial genera (between 4 and 7 genera) which contributed to at least 1% of the total bacterial annotated reads, compared to RNA sequencing. By RNA sequencing, the bacterial annotated read distribution was dominated by less than 4 genera for all samples except one sample where the bacterial reads were divided between 6 genera with at least 1% of the total bacterial reads ( Table 2).
Cervicovaginal virome. The cervicovaginal virome was analyzed using the Kraken2 program. The viral genera which comprised at least 1% of the total number of annotated viral reads in each of the 18 libraries are shown based on the percentage of reads for the 3 sequencing approaches (DNA, RNA and RNA with prior DNase-treatment) (Fig. 4).
We detected 14 viral genera with at least 1% of the total viral reads whereof 4 genera were detected by DNA sequencing, 10 genera by RNA sequencing and 8 genera by RNA sequencing with prior DNase-treatment (Table 3). One virus genus, Gorganvirus, was present in all 6 samples by all 3 sequencing approaches (Table 3). Gorganvirus belongs to the Siphoviridae, a family of viruses with double-stranded DNA genomes of about 50 kb that infects bacteria and archaea, and the family is known to infect the vagina 4 . In 5 of 6 samples the Gorganvirus genus contributed to most of the viral reads (50% to 100% of the total viral reads) by DNA sequencing. Gorganvirus did also contribute to most of the viral reads in 4 of 6 samples by RNA sequencing without and with DNasetreatment (21% to 100% of the total viral reads) ( Table 3). Only one sample (sample 4) had another viral genus that contributed to most of the viral reads, Roseolovirus (89% by DNA sequencing and 87% by RNA sequencing), which was only present in that one sample, but not detected by RNA sequencing with prior DNase-treatment ( Table 3). The absence of Roseolovirus reads by the RNA sequencing with prior DNase-treatment implies that the detected reads from the RNA sequencing without prior DNase-treatment correspond to the DNA genome of the Roseolovirus and that the virus is not actively transcribing any of the genes. Roseolovirus genus belongs to Herpesviridae, a family of double-stranded DNA viruses with genome size of around 200 kb. Roseolovirus consists of three species that infect humans (Human betaherpesvirus 6A, 6B and 7) which are known to infect the mucosal cells of the cervix 33 .
Alphapapillomavirus genus consists of small double-stranded DNA viruses with genome size of about 8 kb and includes the causative virus types for CIN3-lesions which are commonly detected in cervical swabs. Alphapapillomavirus was present in 3 of 6 cervical samples (37%, 50% and 6% of the total viral reads) by DNA sequencing and in one additional sample (sample 3), but with less than 1% and therefore not included in the table (Table 3). Alphapapillomavirus was also detected by RNA sequencing in the same 3 samples (4%, 29% and 2% of the total viral reads), but not detected by RNA sequencing with prior DNase-treatment.
The fourth virus genus detected by DNA sequencing was Acionnavirus, another genus of double-stranded DNA bacteriophages within Myoviridae known to infect the vagina, detected with 27% of the total viral reads in sample 3, but not detected by RNA sequencing (Table 3). Orthobunyavirus, which is a genus of negative-sense www.nature.com/scientificreports/ www.nature.com/scientificreports/ RNA viruses which belong to Bunyaviridae-the largest family of RNA viruses with more than 350 isolates-was detected by both RNA sequencing approaches in 5 of 6 samples (not detected in sample 2), where it contributed to most or second most of the viral reads (Table 3). Two additional virus genera with RNA genomes were also detected by RNA sequencing, Betaretrovirus and Gammaretrovirus ( HPV detection. HPV types were detected by mapping non-human reads to known HPV nucleotide and protein databases. Both DNA and RNA sequencing approaches detected HPV in all 3 CIN3-samples and in 1 of 3 normal samples, showing a very good concordance between DNA and RNA sequencing ( Table 4). The same HPV types were detected by both DNA and RNA sequencing (HPV16, 33, 42, 45, 56, 58, 59 and 67) except for one type (HPV16), detected with a few reads in one of the CIN3-samples only by DNA sequencing (Table 4). When adding DNase-treatment before RNA sequencing there were no HPV reads in any of the 6 samples (Table 4). HPV mapping results are in agreement with Kraken2 taxonomy classification, which found HPV positivity in the same 4 samples. However, one of the positive samples (sample 3) had less than the cutoff at 1% of the total viral reads by Kraken2 classification (only 7 reads of Alphapapillomavirus 7 and 1 read for HPV type 85) and hence, were not included in Table 3.

Discussion
The aim of the study was to compare three different approaches of sequencing of total nucleic acids from cervical swab samples, (i) DNA sequencing, (ii) RNA sequencing and (iii) RNA sequencing with prior DNase-treatment. Analysis of the metagenomes and metatranscriptomes in 18 cervical swab libraries revealed distinct difference in relative detectability between microbial and viral genomes and the human genome. While the number of human reads and its relative abundance decreased when subjecting the samples to RNA sequencing in comparison with DNA sequencing, bacterial genomes generated many more reads in both absolute number of reads as well as in relative abundance. Human DNA depletion is known to favor and optimize the effective detection of bacterial communities by increasing the microbial sequencing depth, especially in clinical samples where approximately 90-95% of metagenomic sequencing reads from samples are annotated as human 10,[34][35][36][37] . In our study, treatment with DNase I, an endonuclease that cleaves both ssDNA and dsDNA anywhere along the chain, prior to RNA sequencing, decreased the human relative abundance from 76 to 11% and increased the bacterial relative abundance from 15 to 82%, suggesting a higher number of bacterial transcripts compared to human transcripts in the samples, despite more human DNA. www.nature.com/scientificreports/ Metagenome analysis from DNA sequencing using Kraken2 program revealed a wider distribution of the bacterial reads to different genera compared to metatranscriptome analysis from RNA sequencing for all samples. In 5 of 6 samples, most of the bacterial reads (about 90%) was generated from only a couple of genera, between 1 and 3 genera had more than 1% of the total bacterial reads, from both RNA sequencing approaches (without and with prior DNase-treatment). By DNA sequencing, between 4 and 7 genera had more than 1% of the total bacterial reads in all 6 samples. One explanation could be that only a few bacteria in each sample contribute to most of the bacterial transcripts, which are sequenced by RNA sequencing. However, several more bacteria are present, but with less transcription activity and hence, detected by DNA sequencing. Despite those differences, the detected bacteriome is overall in concordance between the 3 different sequencing approaches. Table 3. The relative abundance of the annotated viral reads for each viral genus in percentage of the total number of viral reads detected in the 6 cervical swab samples by each sequencing approach (DNA sequencing, RNA sequencing and RNA sequencing with prior DNase-treatment). DNA = DNA sequencing, RNA = RNA sequencing, DNase = RNA sequencing with prior DNase-treatment. CIN3 = cervical intraepithelial neoplasia stage 3, Normal = healthy without neoplasia. *Other comprises viruses belonging to genera that presented < 1% of the total viral reads and viruses that cannot be classified within a genus. The genera are sorted from the highest to lowest percentage of the total viral reads by DNA sequencing. www.nature.com/scientificreports/ Viral metagenome and metatranscriptome analysis revealed the opposite pattern compared to the bacteriome. RNA sequencing generated a wider distribution of viral reads to different genera (10 and 8 genera with more than 1% of the total viral reads by RNA sequencing without and with prior DNase-treatment, respectively) than DNA sequencing (4 genera with more than 1% of the total viral reads). The virus genera detected with most of the viral reads showed a strong concordance between the 3 sequencing approaches, except for the RNA viruses, which were only detected by both RNA sequencing approaches (without and with prior DNase-treatment) and not by DNA sequencing. This fact is also a strong indicator that RNA sequencing of cDNA libraries from total nucleic acids indeed sequence RNA genomes and transcripts.
All the viral genera with DNA genomes that were detected with more than 1% of the total viral reads were double-stranded DNA viruses, but with different genome sizes, from small genomes like Alphapapillomavirus of 8 kb to large genomes like Roseolovirus around 200 kb and giant genomes like Mimivirus of above 1 Mb. Alphapapillomavirus, the known cause of cervical cancer and pre-cancer lesions, was not detected when samples were subjected to DNase-treatment. Therefore, the DNase-treatment should be avoided if the relative abundance or the number of reads of a specific target is expected to be low, as for small virus genomes, and/or if the virus is known to integrate within the human genome, as the DNase will deplete it.
One theoretical advantage of subjecting specimens to RNA sequencing will be the possible detection of RNA viruses, which could be missed when only sequencing the total DNA without prior enrichment for viruses. Indeed, three virus genera with RNA genomes were detected in 5 of the 6 samples by RNA sequencing, but not with DNA sequencing. Furthermore, RNA sequencing provides the result of transcription status, proving that there is an infection. Some viruses (e.g. HPV detected in skin) can be the result of environmental deposition or a transient microbe and DNA detection is in this case not proof of an active infection 10 .
Analysis of detectability of HPV types found few or no differences between DNA and RNA sequencing. While all CIN3 samples showed presence of HPV types, normal samples showed negativity in 2 of 3 samples and one sample being positive for 4 HPV types. The relative read abundance of Alphapapillomavirus of the total viral reads was 37%, 50% and 6% for DNA sequencing and 4%, 29% and 2% for RNA sequencing in the 3 positive samples and less than 1% in one sample by Kraken2 analysis. However, Alphapapillomavirus is a small DNA virus (approximately 8000 bp) with few genes compared to e.g. Roseolovirus (200 kb), which has approximately 25 times larger genome. The 15 times more reads of Roseoloevirus compared to Alphapapillomavirus would actually be the sequencing result of an equal proportion between the two virus genera or even a dominance of the Alphapapillomavirus. Gardnerella is known to be a biomarker for HPV progression, whereas abundance of Lactobacillus has been associated with clearance of HPV infections 38 . The two samples without HPV infection had no Gardnerella detected above 1% of the total bacterial reads. On the contrary, 3 of 4 HPV positive samples had Gardnerella reads between 15 and 83% of the total bacterial reads. The detection of Lactobacillus showed no difference between normal and CIN3 samples, where 2 of 3 of both normal and CIN3 samples were positive for Lactobacillus. However, the study contained only a small pilot of 6 samples, and hence, case-control analysis cannot be conducted.
In conclusion, DNase-treatment of nucleic acids prior RNA sequencing tends to decrease human DNA sequences, but also loose to much of the virome nucleic acids. RNA sequencing generated much more bacterial reads than DNA sequencing and has the advantages of detecting actively transcribed infections, enabling differential gene expression analysis and the possibility to detect RNA genomes. By RNA sequencing, the variety of bacterial genera decreased despite more bacterial reads compared to DNA sequencing, suggesting that just a few of the bacterial genera contribute to most of the bacterial transcripts. More virus genera were detected by RNA sequencing, including both DNA and RNA virus genomes, the latter not possible to detect by DNA sequencing. Table 4. Human papillomavirus (HPV) types detected in the 6 cervical swab samples by querying the nonhuman reads from DNA sequencing, RNA sequencing and RNA sequencing with prior DNase-treatment against HPV databases. DNA = DNA sequencing, RNA = RNA sequencing, DNase = RNA sequencing with prior DNase-treatment. Cyt stage = Cytology stage, CIN3 = cervical intraepithelial neoplasia stage 3, Normal = healthy without neoplasia, HPV = Human papillomavirus, reads = number of reads mapped for each HPV type (within brackets).