Abstract

Shotgun sequencing enables the reconstruction of genomes from complex microbial communities, but because assembly does not reconstruct entire genomes, it is necessary to bin genome fragments. Here we present CONCOCT, a new algorithm that combines sequence composition and coverage across multiple samples, to automatically cluster contigs into genomes. We demonstrate high recall and precision on artificial as well as real human gut metagenome data sets.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. Nature 428, 37–43 (2004).

  2. 2.

    et al. MBio 4, e00569–e00512 (2013).

  3. 3.

    & Science 342, 1057–1058 (2013).

  4. 4.

    , , & BMC Bioinformatics 10, 316 (2009).

  5. 5.

    , , & Front. Microbiol. 3, 410 (2012).

  6. 6.

    , , & in Res. Comput. Mol. Biol. (eds. Vingron, M. & Wong, L.) 17–28 (Springer, 2008).

  7. 7.

    & BMC Bioinformatics 11, 544 (2010).

  8. 8.

    et al. Genome Res. 23, 111–120 (2013).

  9. 9.

    et al. Nat. Biotechnol. 31, 533–538 (2013).

  10. 10.

    & in Artif. Intell. Stat. 2001 (eds. Jaakkola, T. & Richardson, T.) 27–34 (Morgan Kaufmann, 2001).

  11. 11.

    Human Microbiome Project Consortium. Nature 486, 207–214 (2012.).

  12. 12.

    et al. Genome Res. 11, 1404–1409 (2001).

  13. 13.

    et al. Genome Biol. 10, R85 (2009).

  14. 14.

    et al. J. Am. Med. Assoc. 309, 1502–1510 (2013).

  15. 15.

    et al. Infect. Immun. 72, 2240–2247 (2004).

  16. 16.

    et al. Proc. Natl. Acad. Sci. USA 109, 13272–13277 (2012).

  17. 17.

    , , , & Genome Biol. 13, R122 (2012).

  18. 18.

    , & Science 278, 631–637 (1997).

  19. 19.

    et al. Science 311, 1283–1287 (2006).

Download references

Acknowledgements

This research arose out of a workshop funded through the COST project ES1103 and hosted by P. Fernandes at the Instituto Gulbenkian de Ciência. This work was funded by grants (to A.F.A.) from the Swedish Research Councils VR (grant 2011-5689), FORMAS (grant 2009-1174) and EC BONUS project BLUEPRINT. C.Q. is funded by an EPSRC Career Acceleration Fellowship—EP/H003851/1. M.S. is supported by Unilever R&D Port Sunlight, Bebington, UK. L.L. is supported by the Academy of Finland (grant 256950), N.L. by a UK Medical Research Council Special Training Fellowship in Biomedical Informatics and J.Q. by the UK National Institute for Health Research (NIHR) Centre for Surgical Reconstruction and Microbiology. This paper presents independent research funded by the NIHR Surgical Reconstruction and Microbiology Research Centre (partnership between University Hospitals Birmingham National Health Service (NHS) Foundation Trust, the University of Birmingham and the Royal Centre for Defence Medicine). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

Author information

Author notes

    • Johannes Alneberg
    •  & Brynjar Smári Bjarnason

    These authors contributed equally to this work.

    • Anders F Andersson
    •  & Christopher Quince

    These authors jointly directed this work.

Affiliations

  1. KTH Royal Institute of Technology, Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, Stockholm, Sweden.

    • Johannes Alneberg
    • , Brynjar Smári Bjarnason
    • , Ino de Bruijn
    •  & Anders F Andersson
  2. Bioinformatics Infrastructure for Life Sciences (BILS), Stockholm, Sweden.

    • Ino de Bruijn
  3. School of Engineering, University of Glasgow, Glasgow, UK.

    • Melanie Schirmer
    • , Umer Z Ijaz
    •  & Christopher Quince
  4. Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK.

    • Joshua Quick
    •  & Nicholas J Loman
  5. National Institute for Health Research Surgical Reconstruction (NIHR) Surgical Reconstruction and Microbiology Research Centre, University of Birmingham, UK.

    • Joshua Quick
  6. Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland.

    • Leo Lahti
  7. Laboratory of Microbiology, Wageningen University, Wageningen, the Netherlands.

    • Leo Lahti

Authors

  1. Search for Johannes Alneberg in:

  2. Search for Brynjar Smári Bjarnason in:

  3. Search for Ino de Bruijn in:

  4. Search for Melanie Schirmer in:

  5. Search for Joshua Quick in:

  6. Search for Umer Z Ijaz in:

  7. Search for Leo Lahti in:

  8. Search for Nicholas J Loman in:

  9. Search for Anders F Andersson in:

  10. Search for Christopher Quince in:

Contributions

C.Q. developed the core algorithm and cluster validation metrics and performed analyses. A.F.A. assisted with the analyses, developed the SCG validation and contributed to algorithm development. J.A. and B.S.B. developed the CONCOCT software pipeline and contributed to algorithm development. I.d.B. performed assemblies and mappings. M.S. generated simulation data. J.Q. performed E. coli mappings. U.Z.I. assisted with SCG validation and production of graphics. L.L. helped with graphics and algorithm design. N.J.L. performed E. coli analysis and contributed to algorithm development. All authors contributed to the writing of the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Anders F Andersson or Christopher Quince.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–15, Supplementary Tables 1–8 and Supplementary Note

Zip files

  1. 1.

    Supplementary Software

    CONCOCT version 0.3.3 software

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nmeth.3103

Further reading