Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Structural variation in the gut microbiome associates with host health


Differences in the presence of even a few genes between otherwise identical bacterial strains may result in critical phenotypic differences. Here we systematically identify microbial genomic structural variants (SVs) and find them to be prevalent in the human gut microbiome across phyla and to replicate in different cohorts. SVs are enriched for CRISPR-associated and antibiotic-producing functions and depleted from housekeeping genes, suggesting that they have a role in microbial adaptation. We find multiple associations between SVs and host disease risk factors, many of which replicate in an independent cohort. Exploring genes that are clustered in the same SV, we uncover several possible mechanistic links between the microbiome and its host, including a region in Anaerostipes hadrus that encodes a composite inositol catabolism-butyrate biosynthesis pathway, the presence of which is associated with lower host metabolic disease risk. Overall, our results uncover a nascent layer of variability in the microbiome that is associated with microbial adaptation and host health.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Fig. 1: SVs replicate across cohorts and are stable within individuals over time.
Fig. 2: SVs associate with microbial growth rates and specific functions.
Fig. 3: SVs associate with disease risk, replicated in another cohort.
Fig. 4: Risk-associated SVs harbour functionally diverse genes

Data availability

The 7 strains samples used in Fig. 1c are available through ENA (, accession ENA: PRJEB25194. The 887 samples are publicly available through ENA, accession numbers ENA: PRJEB11532, ENA: PRJEB17643. The raw metagenomic sequencing data for the Lifelines DEEP cohort, and age and sex information per sample are available from the European genome-phenome archive ( at accession number EGAS00001001704. Other phenotypic data can be requested from the Lifelines cohort study ( following the standard protocol for data access.


  1. McCarroll, S. A. & Altshuler, D. M. Copy-number variation and association studies of human disease. Nat. Genet. 39 (Suppl), S37–S42 (2007).

    Article  CAS  Google Scholar 

  2. Taniguchi, Y. et al. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science 329, 533–538 (2010).

    Article  ADS  CAS  Google Scholar 

  3. Sokurenko, E. V. et al. Pathogenic adaptation of Escherichia coli by natural variation of the FimH adhesin. Proc. Natl Acad. Sci. USA 95, 8922–8926 (1998).

    Article  ADS  CAS  Google Scholar 

  4. Gill, S. R. et al. Insights on evolution of virulence and resistance from the complete genome analysis of an early methicillin-resistant Staphylococcus aureus strain and a biofilm-producing methicillin-resistant Staphylococcus epidermidis strain. J. Bacteriol. 187, 2426–2438 (2005).

    Article  CAS  Google Scholar 

  5. Koeth, R. A. et al. Intestinal microbiota metabolism of l-carnitine, a nutrient in red meat, promotes atherosclerosis. Nature Med. 19, 576–585 (2013).

    Article  ADS  CAS  Google Scholar 

  6. Han, B. et al. Microbial genetic composition tunes host longevity. Cell 169, 1249–1262 (2017).

    Article  CAS  Google Scholar 

  7. Greenblum, S., Carr, R. & Borenstein, E. Extensive strain-level copy-number variation across human gut microbiome species. Cell 160, 583–594 (2015).

    Article  CAS  Google Scholar 

  8. Swann, J. R. et al. Systemic gut microbial modulation of bile acid metabolism in host tissue compartments. Proc. Natl Acad. Sci. USA 108 (Suppl 1), 4523–4530 (2011).

    Article  ADS  CAS  Google Scholar 

  9. LeBlanc, J. G. et al. Bacteria as vitamin suppliers to their host: a gut microbiota perspective. Curr. Opin. Biotechnol. 24, 160–168 (2013).

    Article  CAS  Google Scholar 

  10. Levy, M. et al. Microbiota-modulated metabolites shape the intestinal microenvironment by regulating NLRP6 inflammasome signaling. Cell 163, 1428–1443 (2015).

    Article  CAS  Google Scholar 

  11. Zeevi, D. et al. Personalized nutrition by prediction of glycemic responses. Cell 163, 1079–1094 (2015).

    Article  CAS  Google Scholar 

  12. Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).

    Article  ADS  CAS  Google Scholar 

  13. Halfvarson, J. et al. Dynamics of the human gut microbiome in inflammatory bowel disease. Nature Microbiol. 2, 17004 (2017).

    Article  CAS  Google Scholar 

  14. Pascal, V. et al. A microbial signature for Crohn’s disease. Gut 66, 813–822 (2017).

    Article  CAS  Google Scholar 

  15. Rowan, S. et al. Involvement of a gut–retina axis in protection against dietary glycemia-induced age-related macular degeneration. Proc. Natl Acad. Sci. USA 114, E4472–E4481 (2017).

    Article  CAS  Google Scholar 

  16. Li, J. et al. An integrated catalog of reference genes in the human gut microbiome. Nature Biotechnol. 32, 834–841 (2014).

    Article  CAS  Google Scholar 

  17. Manor, O. & Borenstein, E. Systematic characterization and analysis of the taxonomic drivers of functional shifts in the human microbiome. Cell Host Microbe 21, 254–267 (2017).

    Article  Google Scholar 

  18. Franzosa, E. A. et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nature Methods 15, 962–968 (2018).

    Article  CAS  Google Scholar 

  19. Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nature Rev. Genet. 12, 363–376 (2011).

    Article  CAS  Google Scholar 

  20. Korem, T. et al. Bread affects clinical parameters and induces gut microbiome-associated personal glycemic responses. Cell Metab. 25, 1243–1253 (2017).

    Article  CAS  Google Scholar 

  21. Zhernakova, A. et al. Population-based metagenomics analysis reveals markers for gut microbiome composition and diversity. Science 352, 565–569 (2016).

    Article  ADS  CAS  Google Scholar 

  22. Rothschild, D. et al. Environment dominates over host genetics in shaping human gut microbiota. Nature 555, 210–215 (2018).

    Article  ADS  CAS  Google Scholar 

  23. Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).

    Article  CAS  Google Scholar 

  24. Zerbino, D. R. et al. Ensembl 2018. Nucleic Acids Res. 46 (D1), D754–D761 (2018).

    Article  CAS  Google Scholar 

  25. El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. (2018).

  26. Korem, T. et al. Growth dynamics of gut microbiota in health and disease inferred from single metagenomic samples. Science 349, 1101–1106 (2015).

    Article  ADS  CAS  Google Scholar 

  27. Hayashi, F. et al. The innate immune response to bacterial flagellin is mediated by Toll-like receptor 5. Nature 410, 1099–1103 (2001).

    Article  ADS  CAS  Google Scholar 

  28. Shen, Y. et al. Flagellar hooks and hook protein Flge participate in host microbe interactions at immunological level. Sci. Rep. 7, 1433 (2017).

    Article  ADS  Google Scholar 

  29. Weiser, J. N. et al. Phosphorylcholine on the lipopolysaccharide of Haemophilus influenzae contributes to persistence in the respiratory tract and sensitivity to serum killing mediated by C-reactive protein. J. Exp. Med. 187, 631–640 (1998).

    Article  CAS  Google Scholar 

  30. Ross, J. I. et al. Inducible erythromycin resistance in staphylococci is encoded by a member of the ATP-binding transport super-gene family. Mol. Microbiol. 4, 1207–1214 (1990).

    Article  CAS  Google Scholar 

  31. Zupancic, M. L. et al. Analysis of the gut microbiota in the old order Amish and its relation to the metabolic syndrome. PLoS One 7, e43052 (2012).

    Article  ADS  CAS  Google Scholar 

  32. Karlsson, F. H. et al. Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013).

    Article  ADS  CAS  Google Scholar 

  33. Yoshida, K. et al. myo-Inositol catabolism in Bacillus subtilis. J. Biol. Chem. 283, 10415–10424 (2008).

    Article  CAS  Google Scholar 

  34. Bergman, E. N. Energy contributions of volatile fatty acids from the gastrointestinal tract in various species. Physiol. Rev. 70, 567–590 (1990).

    Article  CAS  Google Scholar 

  35. Harig, J. M., Soergel, K. H., Komorowski, R. A. & Wood, C. M. Treatment of diversion colitis with short-chain-fatty acid irrigation. N. Engl. J. Med. 320, 23–28 (1989).

    Article  CAS  Google Scholar 

  36. Gao, Z. et al. Butyrate improves insulin sensitivity and increases energy expenditure in mice. Diabetes 58, 1509–1517 (2009).

    Article  CAS  Google Scholar 

  37. Mende, D. R. et al. proGenomes: a resource for consistent functional and taxonomic annotations of prokaryotic genomes. Nucleic Acids Res. 45 (D1), D529–D534 (2017).

    Article  CAS  Google Scholar 

  38. Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017).

    Article  CAS  Google Scholar 

  39. Marco-Sola, S., Sammeth, M., Guigó, R. & Ribeca, P. The GEM mapper: fast, accurate and versatile alignment by filtration. Nat. Methods 9, 1185–1188 (2012).

    Article  CAS  Google Scholar 

  40. Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903 (2015).

    Article  CAS  Google Scholar 

  41. Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).

    Article  Google Scholar 

  42. Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L. Bracken: estimating species abundance in metagenomics data. Peer. J. Comput. Sci. 3, e104 (2017).

    Google Scholar 

  43. Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).

    Article  CAS  Google Scholar 

  44. Potter, S. C. et al. HMMER web server: 2018 update. Nucleic Acids Res. 46, W200–W204 (2018).

    Article  Google Scholar 

  45. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

    MathSciNet  MATH  Google Scholar 

  46. Suez, J. et al. Artificial sweeteners induce glucose intolerance by altering the gut microbiota. Nature 514, 181–186 (2014).

    Article  ADS  CAS  Google Scholar 

  47. Sczyrba, A. et al. Critical Assessment of Metagenome Interpretation – a benchmark of metagenomics software. Nat. Methods 14, 1063–1071 (2017).

    Article  CAS  Google Scholar 

  48. Liu, B., Gibbons, T., Ghodsi, M. & Pop, M. MetaPhyler: taxonomic profiling for metagenomic sequences. 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 95–100 (IEEE, 2010).

Download references


We thank members of the Segal group and the Center for Studies in Physics and Biology for discussions. E.S. is supported by grants from the European Research Council and the Israel Science Foundation. D.Z. is supported by the James S. McDonnell Foundation and the Dan David Prize Scholarship. D.Z. and T.K. were partly supported by the Israeli Ministry of Science and Technology. Lifelines DEEP was funded by: ERC-2012-322698 and NWO-SPI-92-266 to C.W.; ERC-715772 and NWO-178.056 to A.Z.; NWO-864.13.013 and CVON-2012-03 to J.F.

Reviewer information

Nature thanks Ami Bhatt, Julie Segre and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Authors and Affiliations



T.K. and D.Z. conceived and designed the study, designed and conducted all analyses, interpreted the results and wrote the manuscript. T.K. and D.Z. equally contributed to this work and are listed in random order. A.G. and N.B. developed methods. A.K., J.F., C.W. and A.Z. analysed the Dutch Lifelines cohort. M.L.-P. and A.W. did experimental work. A.W. designed the study. E.S. conceived, directed and designed the project and analyses, interpreted the results and wrote the manuscript.

Corresponding authors

Correspondence to David Zeevi or Eran Segal.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Superior assignment of metagenomic reads using the Iterative Coverage-based Read-Assignment (ICRA) algorithm.

a, Boxplot (centre, median; box, IQR; whiskers, 10th and 90th percentiles) of ambiguous read assignment ratios of 887 samples11,20 mapped to a reference database of 3,953 representative microbial genomes (Methods) before (blue) and after (yellow) ICRA correction. b, Illustration of our computational pipeline. ce, Swarm-plots of the ratio of correct read assignment per taxonomy level with no assignment correction (blue) or following assignment correction with ICRA (yellow), Kraken41 (red) or MetaPhyler48 (green) for CAMI47 high complexity (c; n = 5), medium complexity (d; n = 2) and low complexity (e; n = 1) datasets. Note that MetaPhyler did not provide sub-species level read assignments. *P < 0.05, **P < 0.01, two-sided Mann–Whitney U-test.

Extended Data Fig. 2 ICRA estimates relative abundances with accuracy comparable to other tools.

a, Dot-plot of the calculated relative abundances of 7 bacterial species in 100 samples, using either ICRA (yellow), MetaPhlAn240 (blue) or Bracken42 (red), as compared to the true relative abundances. Inset shows a violin plot (white dot, median; black rectangle, IQR, whiskers, 1.5 × IQR) of Bray–Curtis dissimilarities between the estimates (n = 100) of each method and the true abundances. **, two-sided Wilcoxon signed-rank P = 1.3 × 10−4; ****P = 3.0 × 10−18. bh, Dot-plot of the calculated relative abundances (y axis) of A. finegoldii (b), B. faecium (c), C. flavigena (d), E. faecalis (e), L. gasseri (f), S. cristatus (g) and A. muciniphila (h) in 100 samples, using either ICRA (yellow), MetaPhlAn2 (blue) or Bracken (red), as compared to the true relative abundances (x axis). R2 was calculated using Pearson correlation.

Extended Data Fig. 3 SV Explorer enables investigation of co-varying genes.

a, b, Illustration of the online SV explorer available at, spanning the entire R. torques genome (a) and spanning a 26-kbp region of this genome (b).

Extended Data Fig. 4 SVs are prevalent in the human microbiome across two cohorts.

a, Heatmap showing the number of subjects with SVs (yellow colour scale), the number of SVs (green colour scale), the mean SV size (blue colour scale) and the fraction of the genome that is variable (red colour scale), for each microbe analysed, along with their phylogenetic tree. b, Heatmap showing the genomic length percentage of variable and deletion SVs replicated in the Lifelines cohort for each microbe analysed.

Extended Data Fig. 5 Growth rates-associated SVs harbour specific functions.

Fold difference (x axis) and statistical significance (Methods; y axis) of the enrichment of functional KEGG modules in SVs present in regions significantly associated with microbial growth dynamics. A total of 56,088 genes were considered, 3,805 of them in growth rates-associated SVs.

Extended Data Fig. 6 SVs are associated with microbial growth rates.

a, Boxplot (centre, median; box, IQR; whiskers, IQR × 1.5) of microbial growth rates calculated using PTR26 in individuals harbouring a 7-segment deletion in the E. eligens genome (blue, n = 281) and individuals with no deletion (maroon, n = 166). b, Genomic map of E. eligens with the 7 segments marked in yellow. c, As in a for a 9-segment deletion SV in the E. eligens genome (blue, n = 57) and individuals with no deletion (maroon, n = 390). d, As in b with the 9 segments marked in orange. P value determined by two-sided Mann–Whitney U-test.

Extended Data Fig. 7 SVs are associated with disease risk, replicated in a second cohort.

Full heatmap of statistically significant correlations (Methods) between disease risk factors and variable SVs, depicting associations replicated (yellow star), replicated using a different variable (orange star) or reversed (grey star) in the Lifelines cohort.

Extended Data Fig. 8 Gene content of SVs associated with host risk factors.

a, Boxplot (centre, median; box, IQR; whiskers, IQR × 1.5) of glycated haemoglobin in individuals harbouring an 11-kbp deletion in the E. rectale genome (blue, n = 253) and individuals with no deletion (maroon, n = 377); P - two-sided Mann–Whitney U-test. b, Same as Fig. 4d for this 11-kbp genomic region of E. rectale. c, Boxplot of BMI in individuals harbouring a 4-kbp deletion in the A. hadrus genome (blue, n = 276) and individuals with no deletion (maroon, n = 403). d, Same as Fig. 4d for this 4-kbp genomic region of A. hadrus. e, Depiction of the genes encoded in the region, which encode key enzymes in the folate biosynthesis pathway. Note correspondence of enzyme commission (EC) numbers with d. f, Boxplot of total cholesterol in individuals harbouring an 18-kbp deletion in the R. intestinalis genome (blue, n = 194) and individuals with no deletion (maroon, n = 68). g, same as Fig. 4d for a 10-kbp stretch of the 18-kbp region in R. intestinalis. h, Boxplot of BMI in individuals harbouring an 8-kbp deletion in the C. comes genome (blue, n = 158) and individuals with no deletion (maroon, n = 294). i, Same as Fig. 4d for this 8-kbp genomic region of C. comes. P - two-sided Mann–Whitney U-test. Boxplots - centre, median; box, IQR; whiskers, IQR × 1.5.

Extended Data Fig. 9 Detailed examples of SV replication.

Replication of deletion and variable regions depicted in Fig. 4 and Extended Data Fig. 8 between the Israeli (yellow) and Dutch Lifelines DEEP (blue) cohorts.

Extended Data Fig. 10 SV of A. hadrus associated with host risk factors.

ac, Boxplot of waist circumference (a), BMI (b) and HDL cholesterol (c) in individuals of the Israeli cohort harbouring the 31-kbp deletion in the A. hadrus genome depicted in Fig. 4 (blue, n = 213) and individuals with no deletion (maroon, n = 468). d, Boxplot of BMI in individuals of the Dutch Lifelines DEEP cohort harbouring the same 31-kbp deletion in the A. hadrus genome (blue, n = 249) and individuals with no deletion (maroon, n = 547). P value determined by two-sided Mann–Whitney U-test. Boxplots: centre, median; box, IQR; whiskers, IQR × 1.5.

Supplementary information

Supplementary Information

This file contains Supplementary Note 1 (Validation of the Iterative Coverage-based Read Assignment (ICRA) Algorithm) and Supplementary Note 2 (Community metabolic potential (CMP) of a 31-kbp deletion-SV in A. hadrus).

Reporting Summary

Supplementary Table 1

Modules enriched and depleted in SVs KEGG23 modules enriched (p<0.05) in variable-SVs (columns A-E), deletion-SVs (columns G-K) and conserved regions (columns M-Q). Each table records the KEGG module ID (‘KEGG ID’), module name (‘Name’), number of genes belonging to the module that were in each region type (‘Module genes in region’), number of genes in the module (‘Module genes count’), fold change as compared to non-SV regions of the genome (‘Fold change’), whether the module is enriched or depleted in SVs (‘isEnriched’; TRUE if enriched, FALSE if depleted) and two-sided permutation test p-value (‘p’; Methods). 167,389 genes were analyzed in total, of which 14,147, 34,372 and 112,343 and were in variable-SVs, deletion-SVs and conserved regions, respectively.

Supplementary Table 2

Deletion-SVs associated with growth rates of the harboring bacteria Columns record harboring microbe (‘Microbe’; formatted as <NCBI taxonomy ID>.<NCBI bioproject accession>), SV (‘Region’), the difference in median PTR between microbes harboring the SV and those that do not (‘EffectSize’), n of samples where the given region was deleted (‘Samples with deletion’), and n of samples where it was retained (‘Samples with retention’), two-sided Mann-Whitney U p-value (‘p’). Only associations with p<3*10-5 (FWER) are shown.

Supplementary Table 3

Genes on two E. eligens growth rate-associated SVs Genes on E. eligens SVs negatively (columns A-F) and positively (columns H-M) associated with growth of E. eligens.

Supplementary Table 4

Genes on a 31-kbp deletion-SV in A. hadrus significantly associated with lower body weight, waist circumference, BMI, and higher HDL cholesterol.

Supplementary Table 5

NCBI taxonomy ID and bioproject accession for all microbial genomes in our reference database.

Supplementary Table 6

Growth specifications of microbial strains used for validation of ICRA.

Supplementary Table 7

Difference in the community metabolic potential (CMP) of compounds in subjects with a 31-kbp deletion-SV in A. hadrus (n=213) as compared to subjects with no deletion (n=468) p - two-sided Mann-Whitney U test; q - FDR corrected p-value.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zeevi, D., Korem, T., Godneva, A. et al. Structural variation in the gut microbiome associates with host health. Nature 568, 43–48 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing