A complete bacterial genome assembled de novo using only nanopore sequencing data

Journal name:
Nature Methods
Volume:
12,
Pages:
733–735
Year published:
DOI:
doi:10.1038/nmeth.3444
Received
Accepted
Published online

We have assembled de novo the Escherichia coli K-12 MG1655 chromosome in a single 4.6-Mb contig using only nanopore data. Our method has three stages: (i) overlaps are detected between reads and then corrected by a multiple-alignment process; (ii) corrected reads are assembled using the Celera Assembler; and (iii) the assembly is polished using a probabilistic model of the signal-level data. The assembly reconstructs gene order and has 99.5% nucleotide identity.

At a glance

Figures

  1. Single-contig assembly of E. coli K-12 MG1655.
    Figure 1: Single-contig assembly of E. coli K-12 MG1655.

    (a) Dot plot comparing our assembly with the reference genome. (b) Histogram of read coverage when nanopore reads are aligned against the assembled genome. (c) Read coverage and G+C composition across the length of the assembled genome. (d) View in the genome-assembly tool Tablet16 of the read coverage (top blue panel) for a randomly chosen section of the assembly genome, and the underlying reads used to construct the assembly.

  2. Comparing 5-mer counts of the assembly and the reference genome before and after signal-level polishing.
    Figure 2: Comparing 5-mer counts of the assembly and the reference genome before and after signal-level polishing.

    (a,c) Correlation between 5-mer counts in the reference (x axis) and an assembly (y axis) before (a) and after (c) signal-level polishing. Red dots in a,c denote 5-mers with ≥50% more occurrences in the reference genome than the unpolished assembly. (b,d) The counts for these 5-mers are compared to the reference for the unpolished assembly (b) and the polished assembly (d).

  3. Kernel density plot showing the accuracy of reads from the four individual MinION runs used to generate the de novo assembly.
    Supplementary Fig. 1: Kernel density plot showing the accuracy of reads from the four individual MinION runs used to generate the de novo assembly.

    The mean accuracy varies from 78.2% (run 3) to 82.2% (run 1).

  4. Kernel density plot demonstrating the raw nanopore read accuracy and effect of two rounds of error correction on accuracy.
    Supplementary Fig. 2: Kernel density plot demonstrating the raw nanopore read accuracy and effect of two rounds of error correction on accuracy.

    The mauve area represents uncorrected sequencing reads, where the green area shows the improvement in accuracy after the first round of correction and the yellow shows improvement from the second round of correction. Further rounds of correction did not improve the accuracy further.

  5. Spec file for Celera Assembler.
    Supplementary Fig. 3: Spec file for Celera Assembler.

Accession codes

Primary accessions

European Nucleotide Archive

References

  1. Jain, M. et al. Nat. Methods 12, 351356 (2015).
  2. Koren, S. et al. Genome Biol. 14, R101 (2013).
  3. Koren, S. et al. Nat. Biotechnol. 30, 693700 (2012).
  4. Rasko, D.A. et al. N. Engl. J. Med. 365, 709717 (2011).
  5. Chin, C.-S. et al. Nat. Methods 10, 563569 (2013).
  6. Kim, K.E. et al. Sci. Data 1, 140045 (2014).
  7. Koren, S. & Phillippy, A.M. Curr. Opin. Microbiol. 23, 110120 (2015).
  8. Quick, J., Quinlan, A.R. & Loman, N.J. Gigascience 3, 22 (2014).
  9. Goodwin, S. et al. Preprint at bioRxiv doi:10.1101/013490 (2015).
  10. Loman, N.J. & Quinlan, A.R. Bioinformatics 30, 33993401 (2014).
  11. Myers, G. in Int. Workshop Algorithms Bioinformatics (eds. Brown, D. & Morgenstern, B.) 5267 (Springer, 2014).
  12. Lee, C., Grasso, C. & Sharlow, M.F. Bioinformatics 18, 452464 (2002).
  13. Myers, E.W. et al. Science 287, 21962204 (2000).
  14. Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. Bioinformatics 29, 10721075 (2013).
  15. Darling, A.E., Mau, B. & Perna, N.T. PLoS ONE 5, e11147 (2010).
  16. Milne, I. et al. Brief. Bioinform. 14, 193202 (2013).
  17. Treangen, T.J., Sommer, D.D., Angly, F.E., Koren, S. & Pop, M. Curr. Protoc. Bioinformatics 33, 11.8 (2011).
  18. Delcher, A.L., Phillippy, A., Carlton, J. & Salzberg, S.L. Nucleic Acids Res. 30, 24782483 (2002).
  19. Li, H. Preprint at http://arxiv.org/abs/1303.3997 (2013).
  20. Quinlan, A.R. & Hall, I.M. Bioinformatics 26, 841842 (2010).
  21. Cock, P.J.A. et al. Bioinformatics 25, 14221423 (2009).

Download references

Author information

Affiliations

  1. Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK.

    • Nicholas J Loman &
    • Joshua Quick
  2. Ontario Institute for Cancer Research, Toronto, Ontario, Canada.

    • Jared T Simpson
  3. Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.

    • Jared T Simpson

Contributions

N.J.L. and J.T.S. conceived the project. N.J.L., J.Q. and J.T.S. implemented the Nanocorrect pipeline. J.T.S. conceived and implemented the Nanopolish pipeline. J.Q. generated the nanopore E. coli sequence data. N.J.L. and J.T.S. performed de novo assembly and analyzed the results. N.J.L. and J.T.S. wrote the manuscript. All authors approved the final manuscript.

Competing financial interests

N.J.L. and J.T.S. are members of the MinION Access Programme (MAP). N.J.L. has received free-of-charge reagents for nanopore sequencing presented in this study. N.J.L., J.Q. and J.T.S. have received travel and accommodation expenses to speak at an Oxford Nanopore–organized symposium. N.J.L. and J.Q. have ongoing research collaborations with Oxford Nanopore but do not receive financial compensation for this.

Corresponding author

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: Kernel density plot showing the accuracy of reads from the four individual MinION runs used to generate the de novo assembly. (41 KB)

    The mean accuracy varies from 78.2% (run 3) to 82.2% (run 1).

  2. Supplementary Figure 2: Kernel density plot demonstrating the raw nanopore read accuracy and effect of two rounds of error correction on accuracy. (50 KB)

    The mauve area represents uncorrected sequencing reads, where the green area shows the improvement in accuracy after the first round of correction and the yellow shows improvement from the second round of correction. Further rounds of correction did not improve the accuracy further.

  3. Supplementary Figure 3: Spec file for Celera Assembler. (19 KB)

PDF files

  1. Supplementary Text and Figures (804 KB)

    Supplementary Figures 1–3, Supplementary Tables 1 and 2 and Supplementary Note

Additional data