The Ebola virus disease epidemic in West Africa is the largest on record, responsible for over 28,599 cases and more than 11,299 deaths1. Genome sequencing in viral outbreaks is desirable to characterize the infectious agent and determine its evolutionary rate. Genome sequencing also allows the identification of signatures of host adaptation, identification and monitoring of diagnostic targets, and characterization of responses to vaccines and treatments. The Ebola virus (EBOV) genome substitution rate in the Makona strain has been estimated at between 0.87 × 10−3 and 1.42 × 10−3 mutations per site per year. This is equivalent to 16–27 mutations in each genome, meaning that sequences diverge rapidly enough to identify distinct sub-lineages during a prolonged epidemic2, 3, 4, 5, 6, 7. Genome sequencing provides a high-resolution view of pathogen evolution and is increasingly sought after for outbreak surveillance. Sequence data may be used to guide control measures, but only if the results are generated quickly enough to inform interventions8. Genomic surveillance during the epidemic has been sporadic owing to a lack of local sequencing capacity coupled with practical difficulties transporting samples to remote sequencing facilities9. To address this problem, here we devise a genomic surveillance system that utilizes a novel nanopore DNA sequencing instrument. In April 2015 this system was transported in standard airline luggage to Guinea and used for real-time genomic surveillance of the ongoing epidemic. We present sequence data and analysis of 142 EBOV samples collected during the period March to October 2015. We were able to generate results less than 24 h after receiving an Ebola-positive sample, with the sequencing process taking as little as 15–60 min. We show that real-time genomic surveillance is possible in resource-limited settings and can be established rapidly to monitor outbreaks.
At a glance
European Nucleotide Archive
- World Health Organisation. Ebola Situation Report - 11 November 2015. http://apps.who.int/ebola/current-situation/ebola-situation-report-11-november-2015 (2015)
- Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science 345, 1369–1372 (2014) et al.
- Temporal and spatial analysis of the 2014–2015 Ebola virus outbreak in West Africa. Nature 524, 97–101 (2015) et al.
- Distinct lineages of Ebola virus in Guinea during the 2014 West African epidemic. Nature 524, 102–104 (2015) et al.
- Ebola virus epidemiology, transmission, and evolution during seven months in Sierra Leone. Cell 161, 1516–1526 (2015) et al.
- Genetic diversity and evolutionary dynamics of Ebola virus in Sierra Leone. Nature 524, 93–96 (2015) et al.
- Monitoring of Ebola Virus Makona evolution through establishment of advanced genomic capability in Liberia. Emerg. Infect. Dis. 21, 1135–1143 (2015) et al.
- Real-time digital pathogen surveillance—the time is now. Genome Biol. 16, 155 (2015) , &
- Data sharing: make outbreak research open access. Nature 518, 477–479 (2015) , &
- https://www.genomeweb.com/sequencing-technology/liberias-libr-genome-center-monitors-ebola-outbreak-emerging-pathogens (2015) Liberia’s LIBR Genome Center monitors Ebola outbreak, emerging pathogens.
- A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer. Gigascience 3, 22 (2014) , &
- Improved data analysis for the MinION nanopore sequencer. Nature Methods 12, 351–356 (2015) et al.
- http://biorxiv.org/content/early/2015/05/13/019281 (2015) , , & Sequencing ultra-long DNA molecules with the Oxford Nanopore MinION. Preprint at
- Rapid draft sequencing and real-time nanopore sequencing in a hospital outbreak of Salmonella. Genome Biol. 16, 114 (2015) et al.
- Rapid metagenomic identification of viral pathogens in clinical samples by real-time nanopore sequencing analysis. Genome Med . 7, 99 (2015) et al.
- A complete bacterial genome assembled de novo using only nanopore sequencing data. Nature Methods 12, 733–735 (2015) , &
- http://virological.org/t/recent-evolution-patterns-of-ebola-virus-obtained-by-direct-sequencing-in-sierra-leone/150 (2015) et al. Recent evolution patterns of Ebola virus obtained by direct sequencing in Sierra Leone.
- Biosensors and Biodetection 504, 441–458 (Humana, 2009) , , & in
- Molecular evidence of sexual transmission of Ebola virus. N. Engl. J. Med. 373, 2448–2454 (2015) et al.
- Buffer AVL alone does not inactivate Ebola virus in a representative clinical sample type. J. Clin. Microbiol. 53, 3148–3154 (2015) et al.
- Poretools: a toolkit for analyzing nanopore sequence data. Bioinformatics 30, 3399–3401 (2014) &
- http://arxiv.org/abs/1303.3997 (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at
- http://arxiv.org/abs/1207.3907 (2012) & Haplotype-based variant detection from short-read sequencing. Preprint at
- Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004) et al.
- RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014)
- MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004)
- FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS One 5, e9490 (2010) , &
- BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214 (2007) &
- Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22, 160–174 (1985) , &
- Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39, 306–314 (1994)
- Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol. Biol. Evol. 30, 713–724 (2013) et al.
- Relaxed phylogenetics and dating with confidence. PLoS Biol. 4, e88 (2006) , , &
- Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009) et al.
Extended data figures and tables
Extended Data Figures
- Extended Data Figure 1: Primer schemes employed during the study. (176 KB)
We designed PCR primers to generate amplicons that would span the EBOV genome. a, We initially designed 38 primer pairs which were used in the initial validation study and which cover >97% of the EBOV genome. During in-field sequencing we used a 19-reaction scheme or 11-reaction scheme, which generated longer products. The predicted amplicon products are shown with forward primers and reverse primers indicated by green bars on the forward and reverse strand, respectively, scaled according to the EBOV virus coordinates. b, c, The amplicon product sizes expected are shown for the 19-reaction scheme (b) and the 11-reaction scheme (c). No amplicon covers the extreme 3′ region of the genome. The last primer pair, 38_R, ends at position 18578, 381 bases away from the end of the virus genome. The primer diagram was created with Biopython33.
- Extended Data Figure 2: List of equipment and consumables to establish the genome surveillance system. (479 KB)
a–c, We show the list of equipment (a), disposable consumables (b) and reagents (c) to establish in-field genomic surveillance. Sufficient reagents were shipped for 20 samples. MinION sequencing requires a mix of chilled and frozen reagents. Recommended shipping conditions are specified. The picture underneath depicts MinION flowcells ready for shipping with insulating material (left) and frozen reagents (right).
- Extended Data Figure 3: Bioinformatics workflow. (220 KB)
This figure summarizes the steps performed during bioinformatics analysis (ordered from top to bottom), in order to generate consensus sequences. The right column shows the example software command executed at each step.
- Extended Data Figure 4: Results of MinION validation. (554 KB)
a, The results of comparing four MinION sequences with Illumina sequences generated as part of a previous study3 are shown. Each row in the table demonstrates the number of true positives, false positives and false negatives for a sample. False negatives may result in masked sequences, owing to being outside of regions covered by the amplicon scheme, having low coverage or falling within a primer binding site. Results before and after quality filtering (log likelihood ratio of >200) are shown. After quality filtering, no false positive calls were detected. All detected false negatives were masked with Ns in the final consensus sequence. No positions were called incorrectly. b, The four consensus sequences, plus an additional sample that had missing coverage in one amplicon are shown as part of a phylogenetic reconstruction with genomes from Carroll et al.3. Sample labels in red, blue, pink, yellow and blue represent pairs of sequences generated on MinION and llumina. These fall into identical clusters.
- Extended Data Figure 5: Relationship between coverqage and log-likelihood ratio for sample 076769. (240 KB)
Line-plot showing the relationship between sequence depth of coverage (x axis) and the log likelihood ratio for detected SNPs derived by subsampling reads from a single sequencing run to simulate the effect of low coverage. The horizontal and vertical line indicates the cut-offs (quality and coverage respectively) for consensus calling. Therefore, all variants are detected below 25× coverage, and the vast majority meet the threshold quality at 25× coverage or slightly above. Any combination of log likelihood ratio or coverage that placed variants in the grey box would be represented as a masked position in the final consensus sequence.
- Extended Data Figure 6: Duration of MinION sequencing runs. (75 KB)
For each sequence run the sequencing duration, measured as the difference between timestamp of the first read seen and the last read transferred for analysis. 127 runs are shown, with 15 outliers with duration greater than 200 min excluded.
- Extended Data Figure 7: Histogram of Ct values for study samples. (62 KB)
Ct values for samples in the study (where information was available) ranged between 13.8 and 35.7, with a mean of 22.
- Extended Data Figure 8: Sequence accuracy for samples. (577 KB)
a, b, Accuracy measurements for the entire set of two-direction reads were made for the validation samples, sequenced in the United Kingdom (a) and each of the 142 samples from real-time genomic surveillance (b). Accuracy is defined according to the definition from Quick et al.11. Vertical dashed lines indicate the mean accuracy for the sample.
- Extended Data Figure 9: Maximum likelihood phylogenetic inference of 125 Ebola virus samples from this study with 603 previously published sequences. (263 KB)
Coloured nodes are from this study. Node shape reflects country of origin. a–c, the entire data set is shown (a), with zoomed regions focusing on lineages GN1 (b) and SL3 (c) identified during real-time sequencing. Map figure adapted from SimpleMaps website (http://simplemaps.com/resources/svg-gn).
- Extended Data Figure 10: Root-to-tip divergence plot and mean evolutionary rate estimate. (259 KB)
a, Root-to-tip divergence plot for the 728 Ebola samples generated through maximum likelihood analysis. Samples from real-time genomic surveillance are coloured as per Fig. 3 and Extended Data Fig. 9. b, Mean evolutionary rate estimate (in substitutions per site per year) across the EBOV phylogeny recovered using BEAST under a relaxed lognormal molecular clock. Blue area corresponds to the 95% highest posterior density (HPD) (mean of the distribution is 1.19 × 10−3, 95% HPDs: 1.09–1.29 × 10−3 substitutions per site per year). Hatched regions in red are outside the 95% HPD intervals.
- Supplementary Information (428 KB)
This file contains a Field Guide to Nanopore Sequencing - a detailed discussion of logistical issues that arose during this project and Supplementary Tables 1-4.