Single molecule, near full-length genome sequencing of dengue virus

Current methods for dengue virus (DENV) genome amplification, amplify parts of the genome in at least 5 overlapping segments and then combine the output to characterize a full genome. This process is laborious, costly and requires at least 10 primers per serotype, thus increasing the likelihood of PCR bias. We introduce an assay to amplify near full-length dengue virus genomes as intact molecules, sequence these amplicons with third generation “nanopore” technology without fragmenting and use the sequence data to differentiate within-host viral variants with a bioinformatics tool (Nano-Q). The new assay successfully generated near full-length amplicons from DENV serotypes 1, 2 and 3 samples which were sequenced with nanopore technology. Consensus DENV sequences generated by nanopore sequencing had over 99.5% pairwise sequence similarity to Illumina generated counterparts provided the coverage was > 100 with both platforms. Maximum likelihood phylogenetic trees generated from nanopore consensus sequences were able to reproduce the exact trees made from Illumina sequencing with a conservative 99% bootstrapping threshold (after 1000 replicates and 10% burn-in). Pairwise genetic distances of within host variants identified from the Nano-Q tool were less than that of between host variants, thus enabling the phylogenetic segregation of variants from the same host.

1) Make up enough RT reaction mix for the number of reactions plus one. Make up reaction mix on ice. Do not add template. 2) Pipette 1 µl of Primer (see above, 10 µM) and 1 µl of dNTPs (10 mM) into each 200 µl PCR tube. 3) Add 7 µl of extracted RNA to the tubes containing primer and dNTPs. (Do not add in pre-PCR room) and incubate at 65 o C for 5 mins and then place on ice for 1 min. 4) Add 11 µl of RT reaction mix to RNA/primer mix on ice and cycle as below. Separate PCR products on a 0.8% agarose gel, stain with GelRed and visualise under UV light. Size select the 10kb band with gel extraction and purify with Agencourt Ampure XP magnetic beads (Beckman Coulter, A63881) prior to submitting for sequencing.

Nano-q tool for within host variant identification
Nano-q is currently executable on Linux systems, on command prompt. It uses contains nine user defined parameters and three optional parameters. This tool and its instructions for use are publicly available at: https://github.com/PrestonLeung/Nano-Q Nano-q tool was executed for dengue samples with the following command python nano-q.py -b ../<example.sorted.bam> -l 10000 -nr 1 -q 5 -j 50 -c 1 -ht 400 -mc 20 -l: length cut-off per read.
-nr: number of references for the alignment (usually one) -q: threshold for base quality score for cleaning reads.
-c: starting codon (in the reference) for eligible reads.
-ht: Hamming distance cut-off where all reads within this value will fall into a single cluster.
-mc: minimum acceptable number of reads per cluster.
The values for each of these parameters were set after performing a sensitivity analysis on an insilico data set of Hepatitis C virus plasmids (both between and within host) mixed in known proportions. These experiments, development and calibration of Nano-q tool will be published elsewhere.

Safeguards against detecting false variants
There are five safeguards in Nano-q tool to avoid detection of false minor variants 1. setting the -q parameter high. This will ensure the detected single nucleotide polymorphisms are of higher quality and hence reliable.
2. increasing the -mc parameter. When similar reads are arranged to clusters to generate a mini consensus, each cluster will have more reads for better reliability 3. decreasing -ht parameter. This will ensure that very similar reads are included in a cluster and together with a high -mc parameter, will further increase the reliability of a true cluster 4. decreasing -l parameter. If non-size selected amplicons are sequenced, reducing -l may ensure more reads are available for the analysis (more information to make a reliable estimate) 5. using optional -d parameter. The code has the option to plot a dendrogram (parameter -d) which shows the user how the hierarchical clustering is currently cutting the clusters. This feature provides support to determine what to set for parameter -ht.
For all above parameters, a default number is encoded into the algorithm, based on sensitivity analysis performed using in silico HCV sequence mixes, to act as a starting point. Being more conservative in setting these parameters will increase the reliability of major variants but may also lose true minor variants. A sensitivity analysis is highly recommended by varying each of the parameters when a new dataset is used with the tool.

Supplementary Figure 1
Maximum likelihood phylogenetic trees of DENV2 nanopore consensuses made from each of the subgenomic segments compared against that of the near full-length genome.
Caption: Capsid-PrM, envelope and NS5 region trees identify only one of the two true clusters while NS3 region identifies none. NS1, NS4 trees identifies both true clusters but another additional cluster.
NS 2 region identifies only the two true clusters but cannot resolve relationships within the larger cluster as can be done with the near full-length genome tree. Table S1. .08 X 10 2 -6.61 X 10 6 30.01 -40 135 2,780 2.4 X 10 1 -3.19 X 10 5 *This data is derived from 299 dengue samples tested with the same protocol, in the same machine by same operators. This includes samples where full-length genome was successfully extracted or failed for work described in this paper and other samples currently awaiting full-genome extraction. Please see next table for viral load for each sample described in this paper. **Mean was calculated after log transformation and reconverted (inverse log) to give value in PFU/ml. The standard deviation (SD) and confidence limits (2 SD) was calculated as log numbers and upper and lower limits of the log distribution were reconverted as PFU/ml. ST 4 -7. Genbank accession numbers of public database sequences used to design primers for the full-length dengue genome amplification assay Table S4. DENV1 reference sequences for primer design