Analytical validity of nanopore sequencing for rapid SARS-CoV-2 genome analysis

Viral whole-genome sequencing (WGS) provides critical insight into the transmission and evolution of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Long-read sequencing devices from Oxford Nanopore Technologies (ONT) promise significant improvements in turnaround time, portability and cost, compared to established short-read sequencing platforms for viral WGS (e.g., Illumina). However, adoption of ONT sequencing for SARS-CoV-2 surveillance has been limited due to common concerns around sequencing accuracy. To address this, here we perform viral WGS with ONT and Illumina platforms on 157 matched SARS-CoV-2-positive patient specimens and synthetic RNA controls, enabling rigorous evaluation of analytical performance. We report that, despite the elevated error rates observed in ONT sequencing reads, highly accurate consensus-level sequence determination was achieved, with single nucleotide variants (SNVs) detected at >99% sensitivity and >99% precision above a minimum ~60-fold coverage depth, thereby ensuring suitability for SARS-CoV-2 genome analysis. ONT sequencing also identified a surprising diversity of structural variation within SARS-CoV-2 specimens that were supported by evidence from short-read sequencing on matched samples. However, ONT sequencing failed to accurately detect short indels and variants at low read-count frequencies. This systematic evaluation of analytical performance for SARS-CoV-2 WGS will facilitate widespread adoption of ONT sequencing within local, national and international COVID-19 public health initiatives.


Supplementary Figure 3.
False-positive and false-negative variants at error-prone low-complexity sequence. p.4

Supplementary Figure 4.
Examples of large deletions detected by ONT sequencing of SARS-CoV-2 specimens. p.5

Supplementary Figure 1. Performance metrics for ONT and Illumina sequencing of synthetic SARS-CoV-2 controls.
Read lengths (a) and MapQ scores (b) for ONT (gold) and Illumina (blue) sequencing reads. Bars show mean ± 95% CI; n = 8 replicates were analysed. (c) Frequency distributions show Phred base quality scores (Root mean square) within ONT and Illumina sequencing alignments. Bases that were mismatched to the reference sequence, indicative of sequencing errors, are shown separately from bases that matched the reference. (d,e) Read-level error rates for ONT (d) and Illumina (e) sequencing reads. Error rates are shown separately for the complete SARS-CoV-2 genome, and lowcomplexity sites (n = 15, approx. 1% of the genome) where elevated error rates were observed. Bars show mean ± standard deviation across 8 replicates for each technology. (f-i) Scatter plot show correlation in per-base error frequency profiles between replicates for ONT (f,g) and Illumina (h,i). Substitution and indel errors plotted separately. Correlation in error frequency profiles between replicates indicates profiles are non-random, with local sequence context influencing error rates.

Supplementary Figure 2. Erroneous variants detected in ONT sequencing of synthetic SARS-CoV-2 controls. (a,b)
Curves show the number of substitution and indel errors detected in synthetic RNA controls relative to their read count frequencies. Error bars show mean +/-SD. (c,d) Genome browser views of two single-base insertions that were erroneously detected by ONT sequencing (upper panels) on with SARS-CoV-2 controls. Both erroneous insertions were called at error-prone low-complexity sequences and neither was supported by Illumina sequencing alignments (lower panels) at the same regions.

Supplementary Figure 3. False-positive and false-negative variants at error-prone low-complexity sequence. (a,b)
Genome browser views show ONT (upper) and Illumina (lower) sequencing alignments at a T-rich low-complexity site in the orf1ab gene, where false-positives and false-negatives were found in multiple specimens. In example a, an SNV identified by Illumina reads was erroneously interpreted as a 1 bp deletion. In example b, the leftmost SNV was not detected by ONT due to confounding errors at the same position.

Supplementary Figure 4. Examples of large deletions detected by ONT sequencing of SARS-CoV-2 specimens. (a,b)
Genome browser views show ONT (upper) and Illumina (lower) sequencing alignments at the site of deletions detected in orf1ab (a) and ORF6 (b). In both cases, evidence for split short-read alignments support the candidate variant detected by ONT sequencing. For the deletion in a, breakpoints were accurately resolved by ONT sequencing, whereas the breakpoints identified in b are not precise. (c) Genome browser views shows 2 identical 328 bp deletions in ORF8 that were identified in multiple independent specimens. (d) Genome browser views shows 2 highly similar, but non-identical, 28 and 29 bp deletions in S that were identified in multiple independent specimens.

Supplementary Figure 5. Standard curve defining the relationship between SARS-CoV-2 viral titre & measured abundance (Ct).
Synthetic SARS-CoV-2 RNA controls from Twist were supplied in solution at a known RNA concentration (10 6 copies per µL). In order to estimate SARS-CoV-2 RNA concentration in clinical specimens, this stock was serially diluted and analysed by qPCR to generate a standard curve that defines the relationship between measured abundance (Ct value) and concentration (copies/µL) of SARS-CoV-2 RNA.