Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Improved data analysis for the MinION nanopore sequencer

Abstract

Speed, single-base sensitivity and long read lengths make nanopores a promising technology for high-throughput sequencing. We evaluated and optimized the performance of the MinION nanopore sequencer using M13 genomic DNA and used expectation maximization to obtain robust maximum-likelihood estimates for insertion, deletion and substitution error rates (4.9%, 7.8% and 5.1%, respectively). Over 99% of high-quality 2D MinION reads mapped to the reference at a mean identity of 85%. We present a single-nucleotide-variant detection tool that uses maximum-likelihood parameter estimates and marginalization over many possible read alignments to achieve precision and recall of up to 99%. By pairing our high-confidence alignment strategy with long MinION reads, we resolved the copy number for a cancer-testis gene family (CT47) within an unresolved region of human chromosome Xq24.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Molecular events and ionic-current trace for a 2D read of an M13 phage dsDNA molecule.
Figure 2: Read-length distributions and identity plots for M13.
Figure 3: Maximum-likelihood alignment parameters derived using expectation maximization (EM).
Figure 4: M13 sequencing depth.
Figure 5: Exploring SNV calling with MinION reads.
Figure 6: Resolution of CT47 repeat copy-number estimate on human chromosome Xq24.

Accession codes

Primary accessions

European Nucleotide Archive

References

  1. Chaisson, M.J. & Tesler, G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics 13, 238 (2012).

    Article  CAS  Google Scholar 

  2. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at http://arxiv.org/pdf/1303.3997.pdf (2013).

  3. Frith, M.C., Wan, R. & Horton, P. Incorporating sequence quality data into alignment improves DNA read mapping. Nucleic Acids Res. 38, e100 (2010).

    Article  Google Scholar 

  4. Harris, R.S. Improved Pairwise Alignment of Genomic DNA. PhD thesis, Pennsylvania State Univ. (2007).

  5. Benson, D.A. et al. GenBank. Nucleic Acids Res. 41, D36–D42 (2013).

    Article  CAS  Google Scholar 

  6. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  Google Scholar 

  7. Do, C.B. & Batzoglou, S. What is the expectation maximization algorithm? Nat. Biotechnol. 26, 897–899 (2008).

    Article  CAS  Google Scholar 

  8. Paten, B., Herrero, J., Beal, K., Fitzgerald, S. & Birney, E. Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res. 18, 1814–1828 (2008).

    Article  CAS  Google Scholar 

  9. Schwartz, A.S. & Pachter, L. Multiple alignment by sequence annealing. Bioinformatics 23, e24–e29 (2007).

    Article  CAS  Google Scholar 

  10. Quick, J., Quinlan, A. & Loman, N. A reference bacterial genome dataset generated on the MinION portable single-molecule nanopore sequencer. Gigascience 3, 22 (2014).

    Article  Google Scholar 

  11. Ashton, P.M. et al. MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island. Nat. Biotechnol. doi:10.1038/nbt.3103 (8 December 2014).

  12. Davey, J.W. et al. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat. Rev. Genet. 12, 499–510 (2011).

    Article  CAS  Google Scholar 

  13. Bourlat, S.J. et al. Genomics in marine monitoring: new opportunities for assessing marine health status. Mar. Pollut. Bull. 74, 19–31 (2013).

    Article  CAS  Google Scholar 

  14. Stucki, D. & Gagneux, S. Single nucleotide polymorphisms in Mycobacterium tuberculosis and the need for a curated database. Tuberculosis (Edinb.) 93, 30–39 (2013).

    Article  CAS  Google Scholar 

  15. Holmes, I. & Bruno, W.J. Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics 17, 803–820 (2001).

    Article  CAS  Google Scholar 

  16. Chen, Y.T., Iseli, C. & Venditti, C. Identification of a new cancer/testis gene family, CT47, among expressed multicopy genes on the human X chromosome. Genes Chromosomes Cancer 45, 392–400 (2006).

    Article  CAS  Google Scholar 

  17. Treangen, T.J. & Salzberg, S.L. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet. 13, 36–46 (2012).

    Article  CAS  Google Scholar 

  18. Tremblay, D.C., Alexander, G., Moseley, S. & Chadwick, B.P. Expression, tandem repeat copy number variation and stability of four macrosatellite arrays in the human genome. BMC Genomics 11, 632 (2010).

    Article  Google Scholar 

  19. Brahmachary, M. et al. Digital genotyping of macrosatellites and multicopy genes reveals novel biological functions associated with copy number variation of large tandem repeats. PLoS Genet. 10, e1004418 (2014).

    Article  Google Scholar 

  20. Mikheyev, A.S. & Tin, M.M. A first look at the Oxford Nanopore MinION sequencer. Mol. Ecol. Resour. 14, 1097–1102 (2014).

    Article  CAS  Google Scholar 

  21. Schreiber, J. et al. Error rates for nanopore discrimination among cytosine, methylcytosine, and hydroxymethylcytosine along individual DNA strands. Proc. Natl. Acad. Sci. USA 110, 18910–18915 (2013).

    Article  CAS  Google Scholar 

  22. Laszlo, A.H. et al. Detection and mapping of 5-methylcytosine and 5-hydroxymethylcytosine with nanopore MspA. Proc. Natl. Acad. Sci. USA 110, 18904–18909 (2013).

    Article  CAS  Google Scholar 

  23. Wescoe, Z.L., Schreiber, J. & Akeson, M. Nanopores discriminate among five C5-cytosine variants in DNA. J. Am. Chem. Soc. 136, 16582–16587 (2014).

    Article  CAS  Google Scholar 

  24. Cherf, G.M. et al. Automated forward and reverse ratcheting of DNA in a nanopore at 5-Å precision. Nat. Biotechnol. 30, 344–348 (2012).

    Article  CAS  Google Scholar 

  25. Lieberman, K.R., Dahl, J.M., Mai, A.H., Akeson, M. & Wang, H. Dynamics of the translocation step measured in individual DNA polymerase complexes. J. Am. Chem. Soc. 134, 18816–18823 (2012).

    Article  CAS  Google Scholar 

  26. Schibel, A.E. et al. Nanopore detection of 8-oxo-7,8-dihydro-2′-deoxyguanosine in immobilized single-stranded DNA via adduct formation to the DNA damage site. J. Am. Chem. Soc. 132, 17992–17995 (2010).

    Article  CAS  Google Scholar 

  27. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM https://github.com/lh3/bwa/blob/master/NEWS.md/#release-079-19-may-2014 (2014).

  28. Frith, M.C., Hamada, M. & Horton, P. Parameters for accurate genome alignment. BMC Bioinformatics 11, 80 (2010).

    Article  Google Scholar 

  29. Durbin, R., Eddy, S.R., Krogh, A. & Mitchison, G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. (The Press Syndicate of The University of Cambridge, 1998).

  30. Paten, B. et al. Cactus: algorithms for genome multiple sequence alignment. Genome Res. 21, 1512–1528 (2011).

    Article  CAS  Google Scholar 

  31. Eddy, S.R. Profile hidden Markov models. Bioinformatics 14, 755–763 (1998).

    Article  CAS  Google Scholar 

  32. Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

Research reported in this publication was supported by the National Human Genome Research Institute of the US National Institutes of Health under award numbers HG006321 (M.A.), HG007827 (M.A.) and U54HG007990 (B.P.). The authors thank Oxford Nanopore Technologies for their gift to the UCSC Nanopore Group. The authors also thank D. Deamer for support, reading of the manuscript and helpful discussion. The authors gratefully acknowledge D. Haussler and J. Kent for their support.

Author information

Authors and Affiliations

Authors

Contributions

M.A. conceived experiments and directed research. B.P. conceived and directed bioinformatics analysis. B.P., M.J., I.T.F. and K.H.M. were responsible for bioinformatics analysis and software development. M.J. and H.E.O. were responsible for the completion of sequencing experiments and data processing. M.J. and H.E.O. were responsible for preparing DNA sequencing standards. H.E.O. was responsible for Sanger sequencing of M13 dsDNA. B.P. and I.T.F. were responsible for k-mer and BLAST analysis. B.P. and M.J. were responsible for SNV analysis. B.P. developed and implemented expectation-maximization and realignment strategies. K.H.M. conceived and directed BAC experiments and data analysis. All authors contributed to the writing, editing and completion of the manuscript.

Corresponding authors

Correspondence to Benedict Paten or Mark Akeson.

Ethics declarations

Competing interests

M.A. is a consultant to Oxford Nanopore Technologies.

Integrated supplementary information

Supplementary Figure 1 MinION technology progression.

Progression of read identity distributions with MinION versions since June 2014.

Supplementary Figure 2 Venn diagram representing read mappability for MinION reads across three replicate M13 experiments using R7.3 chemistry.

Mappability represents the proportion of reads that can be aligned to either the M13 or the phage λ DNA control using the tuned parameters for each mapper. In our analysis, 2D reads had the highest mappability, with 99% of reads being mappable, followed by complement and template reads, with 98% and 95% of their respective read proportions being mappable. Among the four aligners used, LAST and LASTZ performed the best for M13, with LAST capturing the greatest proportion of mappable reads on its own.

Supplementary Figure 3 Venn diagram representing read mappability to a reversed reference for MinION reads from three replicate M13 experiments using R7.3 chemistry.

Because the reference was reversed, effectively no reads should map; this is thus a proxy measure of specificity. Results were obtained using the tuned alignment parameters.

Supplementary Figure 4 Structure for the hidden Markov model (HMM) used for expectation maximization (EM).

Structure of HMM used for EM, along with the estimated parameters for transition probabilities for template, complement and 2D reads. For each transition in order, the mean estimates and standard error across all experiments for that read type are shown.

Supplementary Figure 5 Convergence of log-likelihood ratios achieved using expectation maximization.

Convergences of log-likelihood for three independent runs of expectation maximization, each from a randomly parameterized model, each run for 100 iterations of training. The y-axis gives likelihood normalized by the highest log-likelihood found. The training used 2D reads from one MinION run of the M13 data using release R7.3 chemistry and a guide alignment generated by tuned LAST.

Supplementary Figure 6 Frequency plots for insertions and deletions in MinION read alignments.

Representative insertion and deletion plot for reads (fitted with an exponential distribution) from one M13 experiment using R7.3 chemistry, aligned using expectation maximization–trained LAST.

Supplementary Figure 7 Substitution matrices from alignments using expectation maximization–trained model.

Maximum-likelihood estimates and standard-error parameters for substitution matrices show trends across template, complement and 2D reads across three M13 experiments using R7.3 chemistry. The top row illustrates the average maximum-likelihood estimates for these substitutions, with the standard error represented in the lower row. For all aligners, thymine-to-adenosine and adenosine-to-thymine substitution rates were low, indicating that the device rarely miscalled one as the other. The color scheme is fitted on a log scale, and the substitution values are on an absolute scale.

Supplementary Figure 8 Substitution matrices from alignments using tuned parameters.

Substitution matrices for each of the four tuned aligners across three M13 experiments using R7.3 chemistry. For all aligners, thymine-to-adenosine and adenosine-to-thymine substitution rates were low, indicating that the device rarely miscalled one as the other. The color scheme is fitted on a log scale, and the substitution values are on an absolute scale.

Supplementary Figure 9 Realignment improves read identity.

Read identity for template, complement and 2D reads for three M13 replicate experiments using R7.3 chemistry, aligned using LAST. Three versions of the LAST alignment are shown: tuned LAST, trained LAST realignments and naive LAST realignments.

Supplementary Figure 10 An alignment quality measurement for 2D reads across three M13 replicate experiments.

Alignments were obtained using expectation maximization–trained LAST realignments. The two density clusters correspond to M13 and phage λ DNA control.

Supplementary Figure 11 Error profiles for 2D reads after realigning using expectation maximization–trained model.

Error profile analysis of 2D reads aligned using expectation maximization–trained LAST realignments indicates a moderate correlation between mismatches and indels per aligned base, and a weak correlation between insertions per aligned base and deletions per aligned base.

Supplementary Figure 12 The coverage and percentage of GC across the M13 genome.

(ac) Coverage, smoothed by binning over a sliding 5-bp window, matching the k-mer length used in base calling. The GC content was calculated by binning over a 50-bp sliding window. Halving and doubling this window size did not drastically alter the result. (df) Coverage histograms across three M13 replicate experiments using R7.3 chemistry and aligned using expectation maximization–trained LAST realignments. About 2.1%, 2.0% and 2.6% of the M13 genome was underrepresented in template, complement and 2D reads, respectively.

Supplementary Figure 13 MinION data for full-length (48-kb) λ phage dsDNA.

Data for a 2D read of a full-length λ phage dsDNA from the MinION. (a) Molecular events for translocation of a single 48-kb λ dsDNA molecule through the MinION nanopore sequencer. DNA length and conformation are simplified for purposes of illustration. (i) Open channel. (ii) dsDNA with ligated loading (blue and brown) and hairpin adaptors (red) captured by the nanopore with the aid of a membrane anchor and an applied voltage across the membrane. (iii) Translocation of the 5′ end of the loading adaptor through the nanopore under control of a molecular motor and driven by the applied potential across the membrane. DNA translocation through the nanopore starts. (iv) Translocation of the template strand of DNA (gold). (v) Translocation of the hairpin adaptor (red). (vi) Translocation of the complement strand (blue). (vii) Translocation of the 3′ portion of the loading adaptor. (viii) Return to open-channel nanopore. (b) Raw current trace for the entire passage of the DNA construct through the nanopore (approximately 2,789 s). Regions of the ionic current trace corresponding to steps i–viii are labeled. (c) Expanded 1-s time scale of raw current traces for DNA capture and translocation of 5′ loading adaptors (i–iii), template strand (iv), hairpin adaptor (v), complement strand (vi), 3′ loading adaptor and return to open channel (vii–viii). Each adaptor generates a unique signal used for position reference in base determination. The FASTA sequence is available at http://figshare.com/articles/UCSC_Full_Length_Lambda_2D_Read/1209636.

Supplementary Figure 14 Pulse-field gel electrophoresis of RP11-482A22 BAC DNA to determine insert length.

The span of BAC end sequences relative to GRCh38 reference assembly provides estimates of 57 kb to the right of the repeats and 76 kb to the left of the repeats (depicted in black). To determine the length of the repeats, we performed NotI and AatII digests on RP11-482 DNA. The NotI digest isolates the insert DNA in its entirety from the cloning vector insert, pBACe3.6, providing evidence for a cloned insert in the range of 170–175 kb (blue) and an 11.6-kb cloning vector band (red). After subtraction of the known flanking region sizes, this estimate provides a repeat region in the range of 36.7–41.7 kb, or 7.5 to 8.5 copies of the CT47 repeat. The AatII digest was expected to cut the BAC three times, as illustrated in the schematic, providing three resulting fragments: (a) 108 kb including the upstream flanking region (50 kb), the downstream flanking region (46 kb) and the cloning vector insert (11.6 kb), shown in purple; (b) a 23-kb region directly downstream from the repeat array (blue), and a region observed by PFGE to be 50 kb that spans the CT47 repeat cluster (providing evidence for a 37-kb repeat region after subtraction of 12 kb of known flanking sequence, marked with gray shading). Regions providing evidence for repeat copy number are highlighted in yellow.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–14, Supplementary Tables 1, 2, 4–6 and Supplementary Notes 1–4 (PDF 13928 kb)

Supplementary Table 3

BLAST hits of unmapped reads (XLSX 11 kb)

Supplementary Data

MinION long read CT47-repeat characterization (XLSX 56 kb)

Supplementary Software 1

nanopore pipeline (ZIP 35023 kb)

Supplementary Software 2

marginAlign pipeline (ZIP 109 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jain, M., Fiddes, I., Miga, K. et al. Improved data analysis for the MinION nanopore sequencer. Nat Methods 12, 351–356 (2015). https://doi.org/10.1038/nmeth.3290

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.3290

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing