Tracking microbial evolution in the human gut using Hi-C reveals extensive horizontal gene transfer, persistence and adaptation


Despite the importance of horizontal gene transfer for rapid bacterial evolution, reliable assignment of mobile genetic elements to their microbial hosts in natural communities such as the human gut microbiota is lacking. We used high-throughput chromosomal conformation capture coupled with probabilistic modelling of experimental noise to resolve 88 strain-level metagenome-assembled genomes of distal gut bacteria from two participants, including 12,251 accessory elements. Comparisons of two samples collected 10 years apart for each of the participants revealed extensive in situ exchange of accessory elements as well as evidence of adaptive evolution in core genomes. Accessory elements were predominantly promiscuous and prevalent in the distal gut metagenomes of 218 adult individuals. This research provides a foundation and approach for studying microbial evolution in natural environments.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Genomic configuration space and an anchor–union representation.
Fig. 2: Genotyping complex microbial communities using Hi-C.
Fig. 3: Core and accessory divergence from species-level reference genomes.
Fig. 4: Attributes of accessory genes.
Fig. 5: Ten-year community evolution.
Fig. 6: Population-based perspective on accessory genes for the two participants.

Data availability

Unprocessed DNA sequence reads and recovered MAGs are available in the NCBI database under project PRJNA505354. MAGs can be downloaded from

Code availability

HPIPE is available for download as an open-source tool at


  1. 1.

    Soucy, S. M., Huang, J. & Gogarten, J. P. Horizontal gene transfer: building the web of life. Nat. Rev. Genet. 16, 472–482 (2015).

  2. 2.

    von Wintersdorff, C. J. H. et al. Dissemination of antimicrobial resistance in microbial ecosystems through horizontal gene transfer. Front. Microbiol. 7, 173 (2016).

  3. 3.

    Allen, H. K. et al. Call of the wild: antibiotic resistance genes in natural environments. Nat. Rev. Microbiol. 8, 251–259 (2010).

  4. 4.

    Smillie, C. S. et al. Ecology drives a global network of gene exchange connecting the human microbiome. Nature 480, 241–244 (2011).

  5. 5.

    Maiques, E. et al. β-Lactam antibiotics induce the SOS response and horizontal transfer of virulence factors in Staphylococcus aureus. J. Bacteriol. 188, 2726–2729 (2006).

  6. 6.

    Zhang, X. et al. Quinolone antibiotics induce Shiga toxin-encoding bacteriophages, toxin production, and death in mice. J. Infect. Dis. 181, 664–670 (2000).

  7. 7.

    Modi, S. R., Lee, H. H., Spina, C. S. & Collins, J. J. Antibiotic treatment expands the resistance reservoir and ecological network of the phage metagenome. Nature 499, 219–222 (2013).

  8. 8.

    Stecher, B. et al. Gut inflammation can boost horizontal gene transfer between pathogenic and commensal Enterobacteriaceae. Proc. Natl Acad. Sci. USA 109, 1269–1274 (2012).

  9. 9.

    Faith, J. J. et al. The long-term stability of the human gut microbiota. Science 341, 1237439 (2013).

  10. 10.

    Koonin, E. V. & Wolf, Y. I. Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res. 36, 6688–6719 (2008).

  11. 11.

    Tettelin, H., Riley, D., Cattuto, C. & Medini, D. Comparative genomics: the bacterial pan-genome. Curr. Opin. Microbiol. 11, 472–477 (2008).

  12. 12.

    Nielsen, H. B. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014).

  13. 13.

    Truong, D. T., Tett, A., Pasolli, E., Huttenhower, C. & Segata, N. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res. 27, 626–638 (2017).

  14. 14.

    Brown Kav, A. et al. Insights into the bovine rumen plasmidome. Proc. Natl Acad. Sci. USA 109, 5452–5457 (2012).

  15. 15.

    Jørgensen, T. S., Xu, Z., Hansen, M. A., Sørensen, S. J. & Hansen, L. H. Hundreds of circular novel plasmids and DNA elements identified in a rat cecum metamobilome. PLoS ONE 9, e87924 (2014).

  16. 16.

    Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).

  17. 17.

    Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

  18. 18.

    Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).

  19. 19.

    Sexton, T. et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell 148, 458–472 (2012).

  20. 20.

    Le, T. B. K., Imakaev, M. V., Mirny, L. A. & Laub, M. T. High-resolution mapping of the spatial organization of a bacterial chromosome. Science 342, 731–734 (2013).

  21. 21.

    Duan, Z. et al. A three-dimensional model of the yeast genome. Nature 465, 363–367 (2010).

  22. 22.

    Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).

  23. 23.

    Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).

  24. 24.

    Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).

  25. 25.

    Marie-Nelly, H. et al. High-quality genome (re)assembly using chromosomal contact data. Nat. Commun. 5, 5695 (2014).

  26. 26.

    Putnam, N. H. et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350 (2016).

  27. 27.

    Marbouty, M. et al. Metagenomic chromosome conformation capture (meta3C) unveils the diversity of chromosome organization in microorganisms. eLife 3, e03318 (2014).

  28. 28.

    Burton, J. N., Liachko, I., Dunham, M. J. & Shendure, J. Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps. G3-Genes Genom. Genet. 4, 1339–1346 (2014).

  29. 29.

    Beitel, C. W. et al. Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. PeerJ 2, e415–e419 (2014).

  30. 30.

    Marbouty, M., Baudry, L., Cournac, A. & Koszul, R. Scaffolding bacterial genomes and probing host-virus interactions in gut microbiome by proximity ligation (chromosome capture) assay. Sci. Adv. 3, e1602105 (2017).

  31. 31.

    Press, M. O. et al. Hi-C deconvolution of a human gut microbiome yields high-quality draft genomes and reveals plasmid-genome interactions. Preprint at (2017).

  32. 32.

    Stalder, T., Press, M. O., Sullivan, S., Liachko, I. & Top, E. M. Linking the resistome and plasmidome to the microbiome. ISME J. 13, 2437–2446 (2019).

  33. 33.

    Mukherjee, S. et al. Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements. Nucleic Acids Res. 45, D446–D456 (2017).

  34. 34.

    Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).

  35. 35.

    Kang, D. D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015).

  36. 36.

    DeMaere, M. Z. & Darling, A. E. bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes. Genome Biol. 20, 46 (2019).

  37. 37.

    Duchêne, S. et al. Genome-scale rates of evolutionary change in bacteria. Microb. Genom. 2, e000094 (2016).

  38. 38.

    Puigbò, P., Lobkovsky, A. E., Kristensen, D. M., Wolf, Y. I. & Koonin, E. V. Genomes in turmoil: quantification of genome dynamics in prokaryote supergenomes. BMC Biol. 12, 66 (2014).

  39. 39.

    McDonald, J. H. & Kreitman, M. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351, 652–654 (1991).

  40. 40.

    Bishara, A. et al. High-quality genome sequences of uncultured microbes by assembly of read clouds. Nat. Biotechnol. 36, 1067–1075 (2018).

  41. 41.

    Kuleshov, V. et al. Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome. Nat. Biotechnol. 34, 64–69 (2016).

  42. 42.

    Zhao, S. et al. Adaptive evolution within gut microbiomes of healthy people. Cell Host Microbe 25, 656–667 (2019).

  43. 43.

    Garud, N. R., Good, B. H., Hallatschek, O. & Pollard, K. S. Evolutionary dynamics of bacteria in the gut microbiome within and across hosts. PLoS Biol. 17, e3000102 (2019).

  44. 44.

    Sickle v.1.33 (GitHub, 2011);

  45. 45.

    SeqPrep (GitHub, 2011);

  46. 46.

    Schmieder, R. & Edwards, R. Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS ONE 6, e17288 (2011).

  47. 47.

    Li, D. et al. MEGAHITv1.0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102, 3–11 (2016).

  48. 48.

    Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).

  49. 49.

    Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).

  50. 50.

    Zhu, W., Lomsadze, A. & Borodovsky, M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38, e132 (2010).

  51. 51.

    Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).

  52. 52.

    Benson, D. A. et al. GenBank. Nucleic Acids Res. 41, D36–D42 (2013).

Download references


We thank M. Kennedy for help in processing clinical samples and the members of the Relman and Holmes laboratories for discussion and feedback. This work was supported by NIH R01AI112401 and NIH R56AI147023 (D.A.R.), EMBO Long-Term Fellowship ALTF 772-2014 (E.Y.), the Chan Zuckerberg Biohub Microbiome Initiative (D.A.R.) and the Thomas C. and Joan M. Merigan Endowment at Stanford University (D.A.R.).

Author information

E.Y. and D.A.R. designed the study. E.Y. developed the methodology, performed and supervised the experiments, and performed the analysis. E.Y. and D.A.R. reviewed the analysis and wrote the manuscript.

Correspondence to David A. Relman.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Hi-C contact density as a function of linear distance.

Intra-contig read density as a function of the distance between mapped read sides, colored according to the relative strand orientation of the two read sides.

Extended Data Fig. 2 Validation on a simulated microbial community.

The genomes of 55 common gut microbes (GOLD database) were downloaded and 120 M simulated shotgun reads and 100 M simulated Hi-C reads were generated, with relative representation ranging from 1 to 1000. HPIPE identified 32 MAGs. Shown is the density plot of the relative abundance of the entire metagenomic assembly (contigs >1k), as in Fig. 1d. The abundance is the enrichment of the read coverage over a uniform distribution of reads. White/gray stripes denote chunks of 10 Mb. The fraction of the assembly that was included in any recovered MAG (‘anchored contigs’) is depicted with a red line.

Extended Data Fig. 3 Validation on a synthetic microbial community.

The community was composed of Pediococcus pentosaceus (ATCC 25745), Lactobacillus brevis (ATCC 367), Burkholderia thailandensis (E264) and two strains of Escherichia coli (BL21 and K-12), as described in Beitel et al. 29. The pipeline recovered 4 anchor/union pairs. Shown is a pairwise gene alignment between the 4 inferred MAGs (genome unions) and the 5 reference genomes.

Extended Data Fig. 4 Contig-anchor contact enrichments over all anchors.

On the x-axis is the observed number of contacts between the contig and the anchor, and on the y-axis is the enrichment score over the background model. Anchor contigs are colored red, contigs belonging to other anchors are colored blue, and all other contigs are colored gray. Anchors are extended into MAGs (genome unions) by including contigs with >=10-fold contact enrichment (dashed horizontal line), >=8 contacts (dashed vertical line), and a false positive probability of 10-6 assuming a binomial distribution (transition between vertical and horizontal line).

Extended Data Fig. 5 Examples of 2 putative novel MAGs.

On top, 68% of the genes of MAG a27 align to the Ruminococcaceae family (mean identity 74.3%), suggesting it is a novel species in that family. On the bottom, 88% of the genes of MAG a70 align to the Clostridiales order (mean identity 74.5%), indicating it is a novel genome within Lachnospiraceae or Eubacteriaceae. Each taxon is colored according to the mean amino acid identity, and the colored fraction of each rectangle represents the percentage of the aligned genes.

Extended Data Fig. 6 Comparison of HPIPE to alternative metagenomic binning methods.

Single-copy gene estimates of genome completeness percentage (in black) and contamination percentage (in red) with HPIPE, metaBAT2, and bin3C, sorted according to completeness. Minimal completeness (50%) and maximal contamination (10%) thresholds are depicted with dashed horizontal lines. Our results (HPIPE, as in Fig. 2c), are compared to metaBAT2 (tool based on abundance and tetranucleotide frequency), and bin3C (tool based on clustering of Hi-C data).

Extended Data Fig. 7 Comparison of anchors and cores.

(a) Shown for all 44 MAGs (genome unions) is the breakdown of genes into ‘core-only, ‘anchor-only’, ‘both’ or ‘neither’, sorted according to the ‘both’ fraction. (b) The fraction of the 4 gene classifications, colored as in (a), is averaged over all 44 MAGs. Core-only genes (29%) are present due to the stringent selection of anchors which considers only long contigs (>10 kb).

Extended Data Fig. 8 Species-level reference genomes for participant B.

Shown are the core and accessory fractions for the 44 MAGs that had a species-level reference for participant B. For both the recovered MAGs (left) and the matching species-level reference genomes (right), the core fraction is depicted using a colored rectangle, and the accessory fraction (that is, strain-specific genes) is depicted using a gray rectangle. Cores are colored according to genome similarity (nucleotide sequence identity) between MAG cores and matching reference cores.

Extended Data Fig. 9 Polymorphism and 10-year divergence patterns for participant B.

(a) Polymorphism levels, estimated using the density of intermediate alleles (SNPs with a frequency in the range 20%-80%), are shown for 35 MAGs of participant B that had at least 10x coverage. (b) Host classification for the 44 MAGs of participant B. (c) The distribution among element classes, stratified according to element type (shared and non-shared). Data are normalized so that each type sums to 100%. (d) The distribution among element classes, stratified according to host class. Data are normalized so that each host class sums to 100%. Standard deviations are depicted using error bars.

Extended Data Fig. 10 Attributes of the 12 MAGs classified as persistent over the 10-year period.

Columns indicate the participant (or Subject) in whom the MAG was found, the number of non-persistent accessory genes (HGT column), the number of non-synonymous (#Pn) and synonymous (#Ps) sites that were polymorphic within the genotyped sample, and the number of non-synonymous (#Dn) and synonymous (#Ds) sites that were divergent between the genotyped sample and the 10-year sample. Matching site densities (Pn, Ps, Dn and Ds) equal the number of sites divided by the total number of sites of each type (synonymous or non-synonymous). P-values are for the McDonald-Kreitman test (χ2), which examines whether the ratios, Pn/Ps and Dn/Ds differ significantly.

Supplementary information

Supplementary Information

Supplementary Notes 1 and 2, and Supplementary Figs. 1–6.

Reporting Summary

Supplementary Table 1

This table includes three Supplementary Tables: Supplementary Table 1 contains gut genomes used for simulated data; Supplementary Table 2 contains public gut microbiome samples used in this study; and Supplementary Table 3 contains gene ontology enrichment tables a–e.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yaffe, E., Relman, D.A. Tracking microbial evolution in the human gut using Hi-C reveals extensive horizontal gene transfer, persistence and adaptation. Nat Microbiol 5, 343–353 (2020).

Download citation