Efficient de novo assembly of single-cell bacterial genomes from short-read data sets


Whole genome amplification by the multiple displacement amplification (MDA) method allows sequencing of DNA from single cells of bacteria that cannot be cultured. Assembling a genome is challenging, however, because MDA generates highly nonuniform coverage of the genome. Here we describe an algorithm tailored for short-read data from single cells that improves assembly through the use of a progressively increasing coverage cutoff. Assembly of reads from single Escherichia coli and Staphylococcus aureus cells captures >91% of genes within contigs, approaching the 95% captured from an assembly based on many E. coli cells. We apply this method to assemble a genome from a single cell of an uncultivated SAR324 clade of Deltaproteobacteria, a cosmopolitan bacterial lineage in the global ocean. Metabolic reconstruction suggests that SAR324 is aerobic, motile and chemotaxic. Our approach enables acquisition of genome assemblies for individual uncultivated bacteria using only short reads, providing cell-specific genetic information absent from metagenomic studies.

Figure 1: Assembling single-cell reads using Velvet-SC.
Figure 2: Comparison of contigs generated by Velvet versus EULER+Velvet-SC for single-cell E. coli lane 1.
Figure 3: A 16S maximum likelihood tree of Deltaproteobacterial 16S sequences including SAR324_MDA (red).

This work was partially supported by grants to R.S.L. from the National Human Genome Research Institute (NIH-2 R01 HG003647) and the Alfred P. Sloan Foundation (Sloan Foundation-2007-10-19), and by a grant to P.A.P. and G.T. from the US National Institutes of Health (NIH grant 3P41RR024851-02S1). We thank M. Kim (J. Craig Venter Institute) for bioinformatics support.

All authors analyzed data. H.C. and G.T. wrote software. M.N., J.L.Y.-G., M.-J.L. and L.J.F. performed wet lab experiments. Illumina sequencing was performed at Illumina Cambridge Ltd. O.S.-T. analyzed sequencing data at Illumina. H.C., J.L.Y.-G., G.T., C.L.D., M.-J.L., L.J.F., N.A.G., P.A.P. and R.S.L. wrote the manuscript. H.C., G.T., M.-J.L., C.L.D., J.H.B., D.B.R. and N.A.G. created figures and tables. R.S.L. and M.-J.L. supervised the JCVI group. P.A.P. and G.T. supervised the UCSD group. N.A.G. and D.J.E. supervised the Illumina group. G.P.S. initiated the Illumina-JCVI collaboration.

Correspondence to Roger S Lasken.

L.J.F., N.A.G., O.S.-T., G.P.S. and D.J.E. are employees of Illumina, the commercial source of Illumina sequencing, which is evaluated in this manuscript.

Supplementary Text and Figures

Supplementary Tables 1–5, Supplementary Methods, Supplementary Data 3 and Supplementary Figures 1–13 (PDF 2029 kb)

Supplementary Data 1

Velvet-SC source code (TGZ 4047 kb)

Supplementary Data 2

EULER-SR Error correction source code (TGZ 129 kb)

Reprints and Permissions

Chitsaz, H., Yee-Greenbaum, J., Tesler, G. et al. Efficient de novo assembly of single-cell bacterial genomes from short-read data sets. Nat Biotechnol 29, 915–921 (2011).

