Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome

Kuleshov, Volodymyr; Jiang, Chao; Zhou, Wenyu; Jahanbani, Fereshteh; Batzoglou, Serafim; Snyder, Michael

doi:10.1038/nbt.3416

Article
Published: 01 January 2016

Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome

Volodymyr Kuleshov^1,2,
Chao Jiang²,
Wenyu Zhou²,
Fereshteh Jahanbani²,
Serafim Batzoglou¹^na1 &
…
Michael Snyder²^na1

Nature Biotechnology volume 34, pages 64–69 (2016)Cite this article

10k Accesses
66 Citations
140 Altmetric
Metrics details

Subjects

Metagenomics

Abstract

Identifying bacterial strains in metagenome and microbiome samples using computational analyses of short-read sequences remains a difficult problem. Here, we present an analysis of a human gut microbiome using TruSeq synthetic long reads combined with computational tools for metagenomic long-read assembly, variant calling and haplotyping (Nanoscope and Lens). Our analysis identifies 178 bacterial species, of which 51 were not found using shotgun reads alone. We recover bacterial contigs that comprise multiple operons, including 22 contigs of >1 Mbp. Furthermore, we observe extensive intraspecies variation within microbial strains in the form of haplotypes that span up to hundreds of Kbp. Incorporation of synthetic long-read sequencing technology with standard short-read approaches enables more precise and comprehensive analyses of metagenomic samples.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: The Nanoscope pipeline and the Lens algorithm.**

**Figure 2: Long reads aligned to assembled metagenomic contigs reveal extensive variation within bacterial strains.**

**Figure 3: Bacterial species identified only by long reads (blue), only by short reads (magenta), ordered by abundance.**

HiFi metagenomic sequencing enables assembly of accurate and complete genomes from human gut microbiota

Article Open access 26 October 2022

Chan Yeong Kim, Junyeong Ma & Insuk Lee

Fast and accurate metagenotyping of the human gut microbiome with GT-Pro

Article 23 December 2021

Zhou Jason Shi, Boris Dimitrov, … Katherine S. Pollard

A high-quality genome compendium of the human gut microbiome of Inner Mongolians

Article 05 January 2023

Hao Jin, Keyu Quan, … Zhihong Sun

Accession codes

Primary accessions

Sequence Read Archive

SRP065223

References

Rinke, C. et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499, 431–437 (2013).
Article CAS Google Scholar
Thomas, T., Gilbert, J. & Meyer, F. Metagenomics - a guide from sampling to data analysis. Microb. Inform. Exp. 2, 3 (2012).
Article Google Scholar
Daniel, R. The metagenomics of soil. Nat. Rev. Microbiol. 3, 470–478 (2005).
Article CAS Google Scholar
Venter, J.C. et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science 304, 66–74 (2004).
Article CAS Google Scholar
Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
Iverson, V. et al. Untangling genomes from metagenomes: revealing an uncultured class of marine Euryarchaeota. Science 335, 587–590 (2012).
Article CAS Google Scholar
Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533–538 (2013).
Article CAS Google Scholar
Nielsen, H.B. et al. MetaHIT Consortium; MetaHIT Consortium. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014).
Article CAS Google Scholar
Burton, J.N., Liachko, I., Dunham, M.J. & Shendure, J. Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps. G3 (Bethesda) 4, 1339–1346 (2014).
Article CAS Google Scholar
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).
Article CAS Google Scholar
Fichot, E.B. & Norman, R.S. Microbial phylogenetic profiling with the Pacific Biosciences sequencing platform. Microbiome 1, 10 (2013).
Article Google Scholar
Kuleshov, V. et al. Whole-genome haplotyping using long reads and statistical methods. Nat. Biotechnol. 32, 261–266 (2014).
Article CAS Google Scholar
Di Rienzi, S.C. et al. The human gut and groundwater harbor non-photosynthetic bacteria belonging to a new candidate phylum sibling to Cyanobacteria. eLife 2, e01102 (2013).
Article Google Scholar
Sharon, I. et al. Accurate, multi-kb reads resolve complex populations and detect rare microorganisms. Genome Res. 25, 534–543 (2015).
Article CAS Google Scholar
McCoy, R.C. et al. Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PLoS One 9, e106689 (2014).
Article Google Scholar
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).
Article CAS Google Scholar
Myers, E.W. et al. A whole-genome assembly of Drosophila. Science 287, 2196–2204 (2000).
Article CAS Google Scholar
Sommer, D.D., Delcher, A.L., Salzberg, S.L. & Pop, M. Minimus: a fast, lightweight genome assembler. BMC Bioinformatics 8, 64 (2007).
Article Google Scholar
Zhu, W., Lomsadze, A. & Borodovsky, M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38, e132 (2010).
Article Google Scholar
Magoc, T. et al. GAGE-B: an evaluation of genome assemblers for bacterial organisms. Bioinformatics 29, 1718–1725 (2013).
Article CAS Google Scholar
Mao, F., Dam, P., Chou, J., Olman, V. & Xu, Y. DOOR: a database for prokaryotic operons. Nucleic Acids Res. 37, D459–D463 (2009).
Article CAS Google Scholar
Chen, W.H., Minguez, P., Lercher, M.J. & Bork, P. OGEE: an online gene essentiality database. Nucleic Acids Res. 40, D901–D906 (2012).
Article CAS Google Scholar
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012).
Article CAS Google Scholar
Gusfield, D. Efficient algorithms for inferring evolutionary trees. Networks 21, 19–28 (1991).
Article Google Scholar
Parks, D.H., MacDonald, N.J. & Beiko, R.G. Classifying short genomic fragments from novel lineages using composition and homology. BMC Bioinformatics 12, 328 (2011).
Article CAS Google Scholar
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).
Article CAS Google Scholar
Burton, J.N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
Article CAS Google Scholar
Berlin, K. et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol. 33, 623–630 (2015).
Article CAS Google Scholar
Lieberman, T.D. et al. Parallel bacterial evolution within multiple patients identifies candidate pathogenicity genes. Nat. Genet. 43, 1275–1280 (2011).
Article CAS Google Scholar
Walker, B.J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963 (2014).
Article Google Scholar
Nijkamp, J.F., Pop, M., Reinders, M.J.T. & de Ridder, D. Exploring variation-aware contig graphs for (comparative) metagenomics using MaryGold. Bioinformatics 29, 2826–2834 (2013).
Article CAS Google Scholar
Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013).
Article CAS Google Scholar
Treangen, T.J. et al. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. Genome Biol. 14, R2 (2013).
Article Google Scholar
Schloss, P.D. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009).
CAS PubMed PubMed Central Google Scholar
Duitama, J. et al. Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques. Nucleic Acids Res. 40, 2041–2053 (2012).
Article CAS Google Scholar
Berger, E., Yorukoglu, D., Peng, J. & Berger, B. HapTree: a novel Bayesian framework for single individual polyplotyping using NGS data. PLoS Comput. Biol. 10, e1003502 (2014).
Article Google Scholar
Aguiar, D. & Istrail, S. Haplotype assembly in polyploid genomes and identical by descent shared tracts. Bioinformatics 29, i352–i360 (2013).
Article CAS Google Scholar
Niklas, N. et al. cFinder: definition and quantification of multiple haplotypes in a mixed sample. BMC Res. Notes 8, 422 (2015).
Article Google Scholar
Pulido-Tamayo, S. et al. Frequency-based haplotype reconstruction from deep sequencing data of bacterial populations. Nucleic Acids Res. 43, e105 (2015).
Article Google Scholar
Gusfield, D. Inference of haplotypes from samples of diploid populations: complexity and algorithms. J. Comput. Biol. 8, 305–323 (2001).
Article CAS Google Scholar

Download references

Acknowledgements

This work was supported by US National Institutes of Health/National Human Genome Research Institute (NIH/NHGRI) grant T32 HG000044. V.K. was supported by a Natural Sciences and Engineering Research Council of Canada (NSERC) post-graduate fellowship. We thank Illumina, Inc. for their assistance in sample preparation.

Author information

Serafim Batzoglou and Michael Snyder: These authors contributed equally to this work.

Authors and Affiliations

Department of Computer Science, Stanford University, Stanford, California, USA
Volodymyr Kuleshov & Serafim Batzoglou
Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
Volodymyr Kuleshov, Chao Jiang, Wenyu Zhou, Fereshteh Jahanbani & Michael Snyder

Authors

Volodymyr Kuleshov
View author publications
You can also search for this author in PubMed Google Scholar
Chao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Wenyu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Fereshteh Jahanbani
View author publications
You can also search for this author in PubMed Google Scholar
Serafim Batzoglou
View author publications
You can also search for this author in PubMed Google Scholar
Michael Snyder
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.B. and M.S. conceived the study. W.Z. and F.J. performed library preparation. V.K. developed the Nanoscope pipeline and the Lens algorithm. V.K. and C.J. performed computational analyses. V.K., C.J., S.B. and M.S. wrote the paper. S.B. and M.S. supervised the study.

Corresponding authors

Correspondence to Volodymyr Kuleshov, Serafim Batzoglou or Michael Snyder.

Ethics declarations

Competing interests

V.K. serves as a consultant for Illumina Inc. S.B. is a co-founder of DNAnexus and a member of the scientific advisory boards of 23andMe and Eve Biomedical. M.S. is a co-founder of Personalis and a member of the scientific advisory boards of Personalis, AxioMx and Genapsys.

Integrated supplementary information

Supplementary Figure 1 Histogram of long read lengths for the mock metagenome

Supplementary Figure 2 Histogram of long read lengths for the real metagenome

Supplementary Figure 3 Fraction of genome covered with short and long reads, per organism, given an equal number of bases sequenced with each technology.

For several organisms, the % coverage greatly varies between the two technologies, indicating different types of bias.

Supplementary Figure 4 Estimated abundance using short and long reads.

For several organisms, the estimated abundances vary significantly.

Supplementary Figure 5 Comparison of contig lengths obtained from short and long sequencing (real metagenome).

About twenty contigs obtained from long read sequencing are longer than 1 Mbp.

Supplementary Figure 6 Recovery of operons from the assemblies obtained from short reads, long reads, and from the joint assembly (mock metagenome).

Short reads were assembled using Soapdenovo2, long reads were assembled with Celera; the two were merged with Minimus2. The joint assembly recovers more than half of all operons, and twice more than only short reads. Interestingly, long and short reads seem to recover different types of operons.

Supplementary Figure 7 Recovery of genes from the assemblies obtained from short reads, long reads, and from the joint assembly (mock metagenome).

Short reads were assembled using Soapdenovo2, long reads were assembled with Celera; the two were merged with Minimus2. The joint assembly recovers more than half of all genes, and twice more than only short reads. Interestingly, long and short reads seem to recover different types of genes.

Supplementary Figure 8 Fragment of 110 kbp genomic region in which there is variation between several bacterial subspecies.

The contig belongs to the bacterium Parabacteroides distasonis.

Supplementary Figure 9 Genomic region 50 kbp in length in which there is variation between several bacterial subspecies.

The contig belongs to the bacterium Odoribacter splanchnicus.

Supplementary Figure 10 Percentage of genomic regions where all haplotypes are in perfect phylogeny, as a function of the percentage of positions that have to be corrected to ensure phylogeny.

More than 85% of positions are in perfect phylogeny, and by correcting less than 5% of positions, we can increase this number to more than 92%.

Supplementary Figure 11 Summary of the length and depth of genomic regions at which there is variation among bacteria.

Blue regions are in perfect phylogeny, and red regions are not. Blue regions are in perfect phylogeny, and red regions are not.

Supplementary Figure 12 Recovery of a 2.3 Mbp long contig from a species belonging to the genus Acinetobacter for which no finished genome was previously available.

We mapped contigs from an earlier fragmented assembly (bottom) to a 2.3 Mbp contig that we assembled (top). Most of the long contig appears to be covered by shorter contigs from the fragmented assembly.

Supplementary Figure 13 Abundance estimates in the mock metagenome obtained from Nanoscope, compared to the abundances obtained from mapping short reads to the 20 known genome references.

Supplementary Figure 14 Genomic variation statistics for 10 gut microbial species selected from our gut metagenome sample (at least 40% genomes were covered by reads).

There is no obvious correlation between genome size/coverage and SNP density and π, which may be due to limited number of genomes analyzed.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–14, Supplementary Tables 1–33 and Supplementary Methods (PDF 3325 kb)

Supplementary Code (TAR 96160 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuleshov, V., Jiang, C., Zhou, W. et al. Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome. Nat Biotechnol 34, 64–69 (2016). https://doi.org/10.1038/nbt.3416

Download citation

Received: 13 December 2014
Accepted: 23 October 2015
Published: 01 January 2016
Issue Date: January 2016
DOI: https://doi.org/10.1038/nbt.3416

This article is cited by

Examining horizontal gene transfer in microbial communities
- Ilana Lauren Brito
Nature Reviews Microbiology (2021)
Unlinked rRNA genes are widespread among bacteria and archaea
- Tess E Brewer
- Mads Albertsen
- Noah Fierer
The ISME Journal (2020)
Human microbiome: an academic update on human body site specific surveillance and its possible role
- Elakshi Dekaboruah
- Mangesh Vasant Suryavanshi
- Anil Kumar Verma
Archives of Microbiology (2020)
Microbial community analysis using high-throughput sequencing technology: a beginner’s guide for microbiologists
- Jihoon Jo
- Jooseong Oh
- Chungoo Park
Journal of Microbiology (2020)
Long-read metagenomic exploration of extrachromosomal mobile genetic elements in the human gut
- Yoshihiko Suzuki
- Suguru Nishijima
- Shinichi Morishita
Microbiome (2019)