De novo sequencing and variant calling with nanopores using PoreSeq

Szalay, Tamas; Golovchenko, Jene A

doi:10.1038/nbt.3360

Letter
Published: 09 September 2015

De novo sequencing and variant calling with nanopores using PoreSeq

Nature Biotechnology volume 33, pages 1087–1091 (2015)Cite this article

8781 Accesses
62 Citations
45 Altmetric
Metrics details

Subjects

Abstract

The accuracy of sequencing single DNA molecules with nanopores is continually improving, but de novo genome sequencing and assembly using only nanopore data remain challenging. Here we describe PoreSeq, an algorithm that identifies and corrects errors in nanopore sequencing data and improves the accuracy of de novo genome assembly with increasing coverage depth. The approach relies on modeling the possible sources of uncertainty that occur as DNA transits through the nanopore and finds the sequence that best explains multiple reads of the same region. PoreSeq increases nanopore sequencing read accuracy of M13 bacteriophage DNA from 85% to 99% at 100× coverage. We also use the algorithm to assemble Escherichia coli with 30× coverage and the λ genome at a range of coverages from 3× to 50×. Additionally, we classify sequence variants at an order of magnitude lower coverage than is possible with existing methods.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Nanopore sequencing fundamentals.**

Improving prime editing with an endogenous small RNA-binding protein

Article Open access 03 April 2024

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Genome assembly in the telomere-to-telomere era

Article 22 April 2024

Accession codes

Accessions

European Nucleotide Archive

ERP007108

References

Jain, M. et al. Improved data analysis for the MinION nanopore sequencer. Nat. Methods 12, 351–356 (2015).
Article CAS Google Scholar
Lieberman, K.R. et al. Processive replication of single DNA molecules in a nanopore catalyzed by phi29 DNA polymerase. J. Am. Chem. Soc. 132, 17961–17972 (2010).
Article CAS Google Scholar
Laszlo, A.H. et al. Decoding long nanopore sequencing reads of natural DNA. Nat. Biotechnol. 32, 829–833 (2014).
Article CAS Google Scholar
Loman, N.J. & Quinlan, A.R. Poretools: a toolkit for analyzing nanopore sequence data. Bioinformatics 30, 3399–3401 (2014).
Article CAS Google Scholar
Quick, J., Quinlan, A.R. & Loman, N.J. A reference bacterial genome dataset generated on the MinION(TM) portable single-molecule nanopore sequencer. Gigascience 3, 22 (2014).
Article Google Scholar
Ashton, P.M. et al. MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island. Nat. Biotechnol. 33, 296–300 (2014).
Article Google Scholar
Bayley, H. Nanopore sequencing: from imagination to reality. Clin. Chem. 61, 25–31 (2014).
Article Google Scholar
Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 30, 693–700 (2012).
Article CAS Google Scholar
Chin, C.-S. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563–569 (2013).
Article CAS Google Scholar
Manrao, E.A. et al. Reading DNA at single-nucleotide resolution with a mutant MspA nanopore and phi29 DNA polymerase. Nat. Biotechnol. 30, 349–353 (2012).
Article CAS Google Scholar
Manrao, E.A., Derrington, I.M., Pavlenok, M., Niederweis, M. & Gundlach, J.H. Nucleotide discrimination with DNA immobilized in the MspA nanopore. PLoS ONE 6, e25723 (2011).
Article CAS Google Scholar
Cherf, G.M. et al. Automated forward and reverse ratcheting of DNA in a nanopore at 5-Å precision. Nat. Biotechnol. 30, 344–348 (2012).
Article CAS Google Scholar
Bellman, R. Dynamic Programming: A Bibliography of Theory and Application (Dover Publications, Reprint Edition (2003), 1957).
Google Scholar
Viterbi, A.J. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13, 260–269 (1967).
Article Google Scholar
Timp, W., Comer, J. & Aksimentiev, A. DNA base-calling from a nanopore using a Viterbi algorithm. Biophys. J. 102, L37–L39 (2012).
Article CAS Google Scholar
Vintsyuk, T.K. Speech discrimination by dynamic programming. Cybernetics 4, 52–57 (1972).
Article Google Scholar
Gotoh, O. Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J. Mol. Biol. 264, 823–838 (1996).
Article CAS Google Scholar
Brudno, M. & Morgenstern, B. Fast and sensitive alignment of large genomic sequences. Proc. IEEE Comput. Soc. Bioinform. Conf. 1, 138–147 (2002).
Article Google Scholar
Smith, T.F. & Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981).
Article CAS Google Scholar
Sachidanandam, R. et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001).
Article CAS Google Scholar
Schreiber, J. et al. Error rates for nanopore discrimination among cytosine, methylcytosine, and hydroxymethylcytosine along individual DNA strands. Proc. Natl. Acad. Sci. USA 110, 18910–18915 (2013).
Article CAS Google Scholar
Wescoe, Z.L., Schreiber, J. & Akeson, M. Nanopores discriminate among five C5-cytosine variants in DNA. J. Am. Chem. Soc. 136, 16582–16587 (2014).
Article CAS Google Scholar
Gollnick, B. et al. Probing DNA helicase kinetics with temperature-controlled magnetic tweezers. Small 11, 1273–1284 (2015).
Article CAS Google Scholar
Howorka, S., Cheley, S. & Bayley, H. Sequence-specific detection of individual DNA strands using engineered nanopores. Nat. Biotechnol. 19, 636–639 (2001).
Article CAS Google Scholar
Butler, T.Z., Pavlenok, M., Derrington, I.M., Niederweis, M. & Gundlach, J.H. Single-molecule DNA detection with an engineered MspA protein nanopore. Proc. Natl. Acad. Sci. USA 105, 20647–20652 (2008).
Article CAS Google Scholar
Maglia, G., Restrepo, M.R., Mikhailova, E. & Bayley, H. Enhanced translocation of single DNA molecules through alpha-hemolysin nanopores by manipulation of internal charge. Proc. Natl. Acad. Sci. USA 105, 19720–19725 (2008).
Article CAS Google Scholar
Loman, N.J., Quick, J. & Simpson, J.T. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat. Methods 12, 733–735 (2015).
Article CAS Google Scholar
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
Article Google Scholar

Download references

Acknowledgements

We would like to thank E. Brandin for molecule preparation, D. Branton for obtaining MinION sequencers, S. Fleming for helpful algorithmic discussions and Figure 1a, and A. Kuan and M. Burns for feedback on this manuscript. The computations in this paper were run on the Odyssey cluster supported by the Faculty of Arts and Sciences Division of Science, Research Computing Group at Harvard University, and the work was supported by the National Institutes of Health Award no. R01HG003703 to J.A. Golovchenko and D. Branton.

Author information

Authors and Affiliations

School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, USA
Tamas Szalay & Jene A Golovchenko
Department of Physics, Harvard University, Cambridge, Massachusetts, USA
Jene A Golovchenko

Authors

Tamas Szalay
View author publications
You can also search for this author in PubMed Google Scholar
Jene A Golovchenko
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.S.: algorithm development, data analysis and interpretation, writing of manuscript; J.A.G.: data analysis and interpretation, writing of manuscript.

Corresponding author

Correspondence to Jene A Golovchenko.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Allowed 5-mer transitions

An illustration of all of the 5-mer transitions from a single state (GCTAT), including normal steps (black), skips (red) and stays (green).

Supplementary Figure 2 Local likelihood alignment and mutation finding

a) The local (differential) log-likelihood from the candidate and an alternate strand, showing the high degree of noise in the data. b) The cumulative sum approach applied to the above data highlights the regions with beneficial mutations, shown with green shading.

Supplementary Figure 3 Optimizations used in PoreSeq

a) Matrix banding optimization shown, where the cells are only calculated in the blue band near the previous or estimated alignment (black line), while the rest of the matrix is implicitly set to 0 (white). b) Forward-backward optimization, where the matrix is calculated in both directions so that the full alignment score can be calculated from a single column in both matrices. In order to test a mutation in the orange region, only that handful of columns need to be recalculated in the forward direction.

Supplementary Figure 4 Flowcell runs used in this work

Details about all MinION flowcell runs used in the manuscript. Note in particular the larger spread in error for the λ DNA run as a result of an older sequencing kit (SQK-MAP003) being used, as well as the wider distribution of lengths due to the g-Tube shearing protocol. The λ run had 6831 reads total, of which 761 had 2D sequences and 700 aligned to the reference for a total coverage of around 125X. The M13 and CS runs had pass/fail filtering that selected for only 2D reads; M13 had 1195 reads total with 1113 aligned, for a depth of 1086, while the CS run had 907 reads with 860 aligned for an average depth of 720. While we recognize that other MAP participants have seen better performance and higher yield from certain flowcell runs, we found that the number of reads were sufficient for our purposes, and many of the issues we encountered (bubbles in the flowcells) have since been fixed by the manufacturer.

Supplementary Figure 5 Error analysis results for M13mp18

a) All errors are shown (labeled as % of total), binned by the base in the de novo sequence from all trials at 50× coverage against the true base in M13. b) The fraction of all deletions of a particular base that are part of a homopolymer region, defined as 5 or more identical contiguous bases (top), and the total number of each base belonging to homopolymer regions in M13 (bottom). Note that the distribution of homopolymer-related errors in (a) and (b) is largely a result of the underlying base-specific prevalence of homopolymer regions.

Supplementary information

Supplementary Text and Figures

Supplementary Notes 1–6; Supplementary Figures 1–5 (PDF 4266 kb)

Supplementary Data 1 (XLS 30 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Szalay, T., Golovchenko, J. De novo sequencing and variant calling with nanopores using PoreSeq. Nat Biotechnol 33, 1087–1091 (2015). https://doi.org/10.1038/nbt.3360

Download citation

Received: 19 February 2015
Accepted: 28 August 2015
Published: 09 September 2015
Issue Date: October 2015
DOI: https://doi.org/10.1038/nbt.3360

This article is cited by

Solid-state nanopore fabrication by automated controlled breakdown
- Matthew Waugh
- Kyle Briggs
- Vincent Tabard-Cossa
Nature Protocols (2020)
Mitochondrial DNA alterations may influence the cisplatin responsiveness of oral squamous cell carcinoma
- Amnani Aminuddin
- Pei Yuen Ng
- Eng Wee Chua
Scientific Reports (2020)
Athena: Automated Tuning of k-mer based Genomic Error Correction Algorithms using Language Models
- Mustafa Abdallah
- Ashraf Mahgoub
- Somali Chaterji
Scientific Reports (2019)
Stable fabrication of a large nanopore by controlled dielectric breakdown in a high-pH solution for the detection of various-sized molecules
- Itaru Yanagi
- Rena Akahori
- Ken-ichi Takeda
Scientific Reports (2019)
Two-step breakdown of a SiN membrane for nanopore fabrication: Formation of thin portion and penetration
- Itaru Yanagi
- Hirotaka Hamamura
- Ken-ichi Takeda
Scientific Reports (2018)