Next-generation sequencing in the diagnosis of viral encephalitis: sensitivity and clinical limitations

Identification of pathogens causing viral encephalitis remains challenging, and in over 50% of cases the etiologic factor remains undetermined. Next-generation sequencing (NGS) based metagenomics has been successfully used to detect novel and rare infections, but its value for routine diagnosis of encephalitis remains unclear. The aim of the present study was to determine the sensitivity of shotgun metagenomic sequencing protocols, which include preamplification, and testing it against cerebrospinal fluid (CSF) samples from encephalitis patients. For sensitivity testing HIV and HBV positive sera were serially diluted in CSF from an uninfected patient. NGS repeatedly detected HIV and HBV sequences present at concentrations from 105 to 102 and from 105 to 10 viral copies/reaction, respectively. However, when the same protocols were applied to RT-PCR/PCR positive CSF samples from 6 patients with enteroviral encephalitis (median viral load 47 copies/ml) and 15 patients with HSV, CMV or VZV encephalitis (median viral load 148 copies/ml), only 7 (28.6%) were identified as positive. In conclusions, while NGS has the advantage of being able to identify a wide range of potential pathogens it seems to be less sensitive compared to the standard amplification-based assays in the diagnosis of encephalitis, where low viral loads are common.

The aim of the present study was to determine the sensitivity of our shotgun metagenomic sequencing protocols which include filtration, DNase treatment and preamplification steps for the detection of RNA and DNA viruses and testing it against a panel of well-defined cerebrospinal fluid (CSF) samples from encephalitis patients.

Results
Sensitivity of metagenomics for the detection of RNA and DNA viral template. Serial dilutions of HIV and HBV standards were prepared, sequenced and analyzed in two independent runs (Run A and B) and the results are presented in Table 1. Total numbers of reads in both runs were similar: 165,980,852 and 156,803,545. After quality control, adapter removal and trimming the average number of reads per sample was 11,552,731 and 10,991,219 for run A and B, respectively. HIV reads were detected in both runs in every dilution from 10 5 to 10 2 copies/reaction but not in the dilution containing 10 copies/reaction and the percentage of recovered genome was decreasing with lower template number ( Table 1).
Sequencing of HBV serial dilutions provided 70,965,028 (mean per sample: 10,219,414) and 72,106,781 (mean per sample: 10,137,861) reads for run A and B, respectively (Table 1). HBV-specific reads were detected in all dilutions from 10 5 to 10 copies/reaction in both independent runs and the percentage of recovered genome was 100% for all but the very last dilutions (Table 1). Importantly, neither HIV nor HBV sequences were detected in samples with no viral template input. However, 413 to 92,406 reads in these samples mapped to various other viruses present in viral genomic database (Table 1).
Metagenomic detection of viral pathogens in cSf. The above described protocols were applied to CSF samples from 21 patients with encephalitis of well-defined viral origin. Samples from 6 patients with enteroviral (EV) encephalitis were analyzed by RNA-based metagenomics (samples R1-R6), whereas samples from the remaining 15 patients (13 had HSV encephalitis, one had CMV encephalitis and one had VZV encephalitis) www.nature.com/scientificreports/ were subjected to DNA metagenomic workflow (samples D1-D15). Viral loads ranged from 12 to 458 copies/ ml (median: 47 copies/ml) for EV and from 74 to 344 copies/ml (median: 148 copies/ml) for the three different DNA-viruses ( Table 2). Next-generation sequencing generated 266,337,771 reads overall with an average number of 12,682,751 reads per sample (Table 2). An average number of reads per sample was similar for RNA (12,521,018) and DNA (12,747,444) analysis and the mean number of reads mapping to viral genomic database was 4,949 reads per sample, ranging from 1,210 to 11,052 reads.
When six CSF samples containing EV (samples R1 to R6) were analyzed, metagenomics revealed the presence of the expected pathogen only in the sample which had the highest viral load of all samples. In this sample 2,253 reads mapped to EV and this allowed for the reconstruction of 18% of enterovirus A genome.
In the case of 15 CSF samples containing DNA viruses, metagenomics identified the right pathogen in seven (samples D2, D3, D5, D7, D9, D10 and D14); ( Table 2). The number of viral reads ranged from 12 (1% of recovered genome) to 1,361 (8% of recovered genome). In sample D14, HSV reads were mapping to the same region of viral genome and therefore, this sample was considered to be NGS negative as it did not fulfill our initially established criteria for positivity.

Discussion
NGS-based metagenomics is a promising new tool in the diagnostics of a wide range of pathogens 15 . It has already been successfully applied in respiratory and intestinal infections and several groups have used this approach to identify causative agents in CNS infections [16][17][18][19][20] .
In the present study we evaluated metagenomics for the detection of RNA and DNA viruses using serial dilutions of viral template as well as CSF samples from encephalitis patients. Two major problems in metagenomic analysis of CSF are host and bacterial genetic background and low concentration of viral particles in this compartment 13 . To mitigate the first problem host/bacterial cells were separated by of low speed centrifugation followed by filtration and then samples were digested with DNase to degrade circulating free DNA not protected by viral capsid or envelope 13,[21][22][23] . CSF is a low biomass sample-for example HSV, which is the most frequently identified viral encephalitis pathogen, has an average load of only 100 copies/ml 24,25 and in our samples the median HSV concentration was only 150 copies/ml. While commercial NGS library systems require as little as 1 ng to 100 ng of nucleic acid input, in our study the amount of extracted DNA/RNA was below the levels of detection for Qubit HS (high sensitivity) kit, which is 0.2 ng and 5 ng for DNA and RNA, respectively. To overcome the problem of insufficient nucleic acid input we utilized commercially available Ovation RNA-Seq V2 System and SeqPlex Enhanced DNA Amplification Kit.
To evaluate the sensitivity of our methods we prepared serial dilutions of HBV and HIV viral template in CSF collected from an uninfected patient. We selected HBV and HIV as these were not present in any of the studied patients, making cross-contamination unlikely and are not the subject of any research in our lab thus lowering the risk of amplicon contamination. For HIV serial dilutions a positive alignment to HIV genome was obtained for samples containing from 10 5 to 10 2 viral copies per reaction. While in run A 10 2 viral copies allowed for the reconstruction of 20% of HIV genome, for run B it was only 1% suggesting a likely limit of detection. In analogous experiments with HIV spiked into CSF free matrix Schlaberg et al. found the sensitivity of metagenomics to be approximately 100 copies/ml 26 . Similar sensitivity was reported by Edridge et al. 27 who used virus discovery cDNA amplified fragment length polymorphism-next-generation sequencing protocol (VIDICSA-NGS) for the detection of RNA viruses in CSF. The authors were able to detect HIV in a sample containing 1.07 × 10 2 viral copies/ml. In a protocol designed for the detection of both RNA and DNA viruses and using serial dilutions of equine arteritis virus and phocine herpesvirus 1 spiked into influenza A virus positive samples (recreating clinical sample background in respiratory system infections) van Boheemen et al. found the limit of detection to be 50-250 viral copies/reaction 28 .
Using serial dilutions of HBV in CSF we identified HBV reads in all dilutions from 10 5 to 10 copies/reaction, but not in the negative controls. The percentage of genome recovery was high even for samples containing as little as 10 HBV copies (23% and 45% for runs A and B, respectively). These values are much higher than for the corresponding dilutions of HIV which could, at least in part, be influenced by the fact that the HBV genome (3.2 kb) is almost three times smaller than the genome of HIV (~ 10 kb). Previously mentioned VIDICSA-NGS failed to detect DNA viruses in CSF samples with viral load ranging from 5.28 × 10 3 to 1.62 × 10 7 copies/ml, but detected VZV present at a concentration of 9.29 × 10 7 DNA viral copies/ml 27 . However, in two other studies the sensitivity of metagenomics for the detection of DNA viruses was very similar to our findings. Schlaberg et al. using CMV spiked into CSF found the limit of detection to be 9.4 copies/ml, while sensitivity of 10 copies/ml was reported by Xia et al. in their case report in which metagenomics was applied to detect human polyomavirus 2 in CSF from a patient with progressive multifocal leukoencephalopathy 17,26 .
When CSF samples from six patients with EV encephalitis were analyzed, the EV genome was detected only in the sample with the highest viral load (458 copies/ml) and only 18% of enterovirus A genome could be reconstructed. EV was not detected in any of the samples in which the viral load was below 100 copies/ml. Among 15 clinical CSF samples from patients infected with DNA viruses (13 samples with HSV, one with CMV and one with VZV), six were found to contain reads aligning to HSV genome. In one sample (D14) all 12 HSV reads mapped to the same position on HSV genome, thus they did not meet our initially established criteria for positive pathogen detection. Taking into account the exclusion of the latter sample, metagenomics confirmed etiology in 23.8% of all analyzed cases which is a proportion similar to that reported by a tertiary diagnostics center where 29.3% of metagenomic findings matched the results of routine diagnostic tests conducted on CSF, blood samples, throat swabs, stool and tissue biopsy samples 29 . In another study Wilson et al. showed 42% compatibility between metagenomics and routine diagnostic tests in patients with CNS infection but it should by emphasized that the protocol allowed not only for the detection of viruses but also bacteria, fungi and parasites. Detection of Herpesviruses could be negatively affected by DNase treatment. In our previous study we found that while DNase treatment resulted in more than twofold decrease in the number of host-derived sequences and increased the number of bacterial and other sequences 30-50 times, it reduced the yield of HHV-1 four-fold and markedly lowered gene coverage when plotted to full-length HHV-1 reference sequence 30 . This sensitivity of HHV-1 to DNase treatment has been since confirmed by others 27 and seems to be due to the fact that in cell-free clinical material DNA of Herpesviruses is largely present in highly fragmented naked form and not as encapsulated virions 31 . In the study by Hong et al. 32 which did not use the DNase digestion step, metagenomics detected HHV-1 in 5 out of 7 RT-PCR positive patients.
In addition to specific viral targets, multiple reads mapping to various viral genomes but primarily to bacteriophages were present in all analyzed samples, including negative controls. It was previously reported that bacterial DNA contamination of commercial DNA extraction kits and PCR reagents is very common and this is likely to be an indirect source of bacteriophagial genomes 30,33,34 . Since similar sets of viral sequences were present in multiple analyzed samples it is highly likely that they indeed represented such a contamination originating from reagents, although some could be the products of amplification errors 35 www.nature.com/scientificreports/ In conclusion, while NGS has the advantage of being able to identify a wide range of potential pathogens, its sensitivity in the diagnosis of viral encephalitis is still inferior to standard amplification-based assays.

Methods
control and patient samples. The sensitivity of our DNA and RNA workflows was evaluated using human immunodeficiency virus type 1 (HIV; viremia 10 6 copies/ml) and Hepatitis B virus (HBV; viremia 7 × 10 4 copies/ ml) positive sera, which were diluted in CSF from an uninfected patient. Final concentrations were adjusted to contain 10 5 , 10 4 , 10 3 , 500, 100 and 10 viral copies per reaction.
Next, our protocols were tested against a panel of well-defined 21 CSF samples from patients with encephalitis who were part of a large prospective epidemiological study of encephalitis in Poland 2 . Six patients had enteroviral infection, 13 had herpes simplex virus (HSV), one had cytomegalovirus (CMV), and one had varicella zoster virus (VZV). We tested all patients who were CSF-positive by real-time RT-PCR/PCR and in whom an unthawed vial of CSF sample was preserved from the original study. CSF samples were analyzed using in-house quantitative real-time RT-PCR/PCR described previously 37-40 . nucleic acids extraction. After collection all CSF samples were centrifuged at 1200 rpm for 20 min at 4 °C, aliquoted and kept frozen at − 80 °C until analysis. Each 225 µl of CSF supernatant/standard was filtrated using Millex-HV Syringe Filter Unit (Merck KgaA, Germany) with a pore size of 0.45 μm and digested with 2U of TURBO DNase (Thermo Fisher Scientific, USA) for 30 min.
Next, 250 μl of filtrated and digested CSF/standard were subjected to RNA extraction with TRIzol LS (Thermo Fisher Scientific, USA) or DNA extraction using NucleoSpin Plasma XS kit (Macherey-Nagel, Germany), following manufacturers' protocols. RNA and DNA were eluted in 5 μl and 12 μl of water, respectively. RNA and DNA preamplification. Since the typical yield of RNA and DNA extraction from CSF was very low and below the limit of detection by Qubit dsDNA (> 0.2 ng) and RNA (> 5 ng) HS Assays (Thermo Scientific, USA), all samples and all standards underwent preamaplification. Five microliters of RNA was first reversely transcribed for 5 min at 65 °C and preamplified by a single-primer isothermal amplification (Ribo-SPIA) 41 using Ovation RNA-Seq V2 system (NuGEN, San Carlos, USA) following manufacturer's protocol. Preamplification of DNA was done using SeqPlex Enhanced DNA Amplification protocol (Sigma-Aldrich, USA); 12 μl of extracted DNA was loaded into each reaction which underwent 29 cycles of amplification. Preamplified cDNA and DNA were subsequently purified using 0.8 ratio of Agencourt AMPure XP beads (Beckman Coulter, USA) to reaction mixture and finally eluted in 30 μl of water. To assess the ability of pre-amplification steps to enrich RNA and DNA input, we spiked known number of copies of either RNA (represented by HIV) or DNA (represented by HBV) virus into negative CSF sample and performed filtration, nuclease digestion and RNA/DNA extraction steps followed by either RNA/DNA pre-amplification or no pre-amplification. Real-time PCR revealed that the preamplification step increased the yield of both RNA and DNA viral genomes significantly (Supplementary Information).
Library preparation and sequencing. Libraries for sequencing were prepared by Nextera XT Kit (Illumina, USA) using one ng of preamplified cDNA/DNA and following manufacturer's protocol with two minor modifications: the number of amplification cycles was increased from 12 to 14 cycles and the ratio of Agencourt AMPure XP beads (Beckman Coulter, USA) to reaction mixture in the last cleanup step was 0.6. The quality and average length of NGS libraries were assessed using Bioanalyzer (Agilent Technologies, USA) and DNA HS kit (Agilent Technologies, USA). Next, samples were indexed, pooled and sequenced on Illumina HiSeq (101nt, paired-end reads).
next-generation sequencing (nGS) data analysis. Reads generated in NGS were evaluated for their quality using FastQC (Phred quality score above 30) 42 . Adapter removal and trimming were done with the help of Trimmomatic software 43 . All filtered reads were first mapped to human reference sequence (hg19) using Stampy 44 and the remaining, unmapped reads were aligned to NCBI RefSeq viral genomic database (9238 complete viral genomes) by Bowtie2 45 . All viral reads were sorted and counted with SAMtools and phyloseq package in R 46,47 . Visualization of alignments, coverage, and calculations of percentage of recovered genomes were done using CLC Genomics Workbench (Qiagen, Germany).
The following criteria were applied for positive virus detection by NGS: (1) at least three reads specific for a particular viral species, (2) reads distributed over the whole genome, (3) no presence of any of the former viral reads in the negative control samples. The same criteria were previously applied by other groups for NGS identification of viruses 26,29 .
All patients gave a written informed consent and all methods were performed in accordance with the relevant guidelines and regulations. The study was approved by the Internal Review Board of Warsaw Medical University.

Data availability
The datasets generated during the current study are available in the Sequence Read Archive (SRA accession: PRJNA658239).