Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A host–microbiota interactome reveals extensive transkingdom connectivity

Abstract

The myriad microorganisms that live in close association with humans have diverse effects on physiology, yet the molecular bases for these impacts remain mostly unknown1,2,3. Classical pathogens often invade host tissues and modulate immune responses through interactions with human extracellular and secreted proteins (the ‘exoproteome’). Commensal microorganisms may also facilitate niche colonization and shape host biology by engaging host exoproteins; however, direct exoproteome–microbiota interactions remain largely unexplored. Here we developed and validated a novel technology, BASEHIT, that enables proteome-scale assessment of human exoproteome–microbiome interactions. Using BASEHIT, we interrogated more than 1.7 million potential interactions between 519 human-associated bacterial strains from diverse phylogenies and tissues of origin and 3,324 human exoproteins. The resulting interactome revealed an extensive network of transkingdom connectivity consisting of thousands of previously undescribed host–microorganism interactions involving 383 strains and 651 host proteins. Specific binding patterns within this network implied underlying biological logic; for example, conspecific strains exhibited shared exoprotein-binding patterns, and individual tissue isolates uniquely bound tissue-specific exoproteins. Furthermore, we observed dozens of unique and often strain-specific interactions with potential roles in niche colonization, tissue remodelling and immunomodulation, and found that strains with differing host interaction profiles had divergent interactions with host cells in vitro and effects on the host immune system in vivo. Overall, these studies expose a previously unexplored landscape of molecular-level host–microbiota interactions that may underlie causal effects of indigenous microorganisms on human health and disease.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Assembling a host exoproteome–microbiome interaction atlas using BASEHIT.
Fig. 2: Organizational principles of human microbiome–host exoprotein interactions.
Fig. 3: Shared and divergent host exoprotein-binding patterns define distinct subsets of phylogenetically related bacterial strains.
Fig. 4: Exoprotein interactions imply key roles in bacterial colonization and disease modulation.
Fig. 5: Differential effects of exoprotein-binding and non-binding strains.

Similar content being viewed by others

Data availability

All data supporting this study are included in the paper and its associated supplementary tables or deposited in publicly available databases. Source Data is available for all figures (Figs. 15 and Extended Data Figs. 110). Raw BASEHIT sequence data were deposited and are available at the NCBI Sequence Read Archive with the BioProject identifier: PRJNA1039280. Mapped barcode data have been deposited and are available at Zenodo (https://doi.org/10.5281/zenodo.10606150)51. RNA sequencing data and whole-genome sequences for Staphylococcus strains were also deposited and can be found at PRJNA1039280. Public databases used: bioBakery 3 (https://github.com/biobakery), Species Genome Bin (http://segatalab.cibio.unitn.it/data/Pasolli_et_al.html), ProTraits (http://protraits.irb.hr/), UniProt (https://www.uniprot.org/), Gene Ontology (https://geneontology.org/), proteins physical properties55 and the Human Protein Atlas (https://www.proteinatlas.org). Source data are provided with this paper.

Code availability

The custom code for the analysis of BASEHIT data has been deposited and is available at Zenodo (https://doi.org/10.5281/zenodo.10606150)51.

References

  1. Ruff, W. E., Greiling, T. M. & Kriegel, M. A. Host–microbiota interactions in immune-mediated diseases. Nat. Rev. Microbiol. 18, 521–538 (2020).

    Article  CAS  PubMed  Google Scholar 

  2. Fan, Y. & Pedersen, O. Gut microbiota in human metabolic health and disease. Nat. Rev. Microbiol. 19, 55–71 (2021).

    Article  CAS  PubMed  Google Scholar 

  3. Fischbach, M. A. Microbiome: focus on causation and mechanism. Cell 174, 785–790 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Niemann, H. H., Schubert, W. D. & Heinz, D. W. Adhesins and invasins of pathogenic bacteria: a structural view. Microbes Infect. 6, 101–112 (2004).

    Article  CAS  PubMed  Google Scholar 

  5. Poole, J., Day, C. J., von Itzstein, M., Paton, J. C. & Jennings, M. P. Glycointeractions in bacterial pathogenesis. Nat. Rev. Microbiol. 16, 440–452 (2018).

    Article  CAS  PubMed  Google Scholar 

  6. Chatterjee, S., Basak, A. J., Nair, A. V., Duraivelan, K. & Samanta, D. Immunoglobulin-fold containing bacterial adhesins: molecular and structural perspectives in host tissue colonization and infection. FEMS Microbiol. Lett. 368, fnaa220 (2021).

  7. Foster, T. J., Geoghegan, J. A., Ganesh, V. K. & Hook, M. Adhesion, invasion and evasion: the many functions of the surface proteins of Staphylococcus aureus. Nat. Rev. Microbiol. 12, 49–62 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Langley, R., Patel, D., Jackson, N., Clow, F. & Fraser, J. D. Staphylococcal superantigen super-domains in immune evasion. Crit. Rev. Immunol. 30, 149–165 (2010).

    Article  CAS  PubMed  Google Scholar 

  9. Rooijakkers, S. H. & van Strijp, J. A. Bacterial complement evasion. Mol. Immunol. 44, 23–32 (2007).

    Article  CAS  PubMed  Google Scholar 

  10. Okumura, R. et al. Lypd8 promotes the segregation of flagellated microbiota and colonic epithelia. Nature 532, 117–121 (2016).

    Article  ADS  CAS  PubMed  Google Scholar 

  11. Gur, C. et al. Binding of the Fap2 protein of Fusobacterium nucleatum to human inhibitory receptor TIGIT protects tumors from immune cell attack. Immunity 42, 344–355 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Walch, P. et al. Global mapping of Salmonella enterica–host protein–protein interactions during infection. Cell Host Microbe 29, 1316–1332.e12 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Penn, B. H. et al. An Mtb–human protein–protein interaction map identifies a switch between host antiviral and antibacterial responses. Mol. Cell 71, 637–648.e5 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Schweppe, D. K. et al. Host–microbe protein interactions during bacterial infection. Chem. Biol. 22, 1521–1530 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Weimer, B. C., Chen, P., Desai, P. T., Chen, D. & Shah, J. Whole cell cross-linking to discover host–microbe protein cognate receptor/ligand pairs. Front. Microbiol. 9, 1585 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Nicod, C., Banaei-Esfahani, A. & Collins, B. C. Elucidation of host–pathogen protein–protein interactions to uncover mechanisms of host cell rewiring. Curr. Opin. Microbiol. 39, 7–15 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Martinez-Martin, N. Technologies for proteome-wide discovery of extracellular host–pathogen interactions. J. Immunol. Res. 2017, 2197615 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Wood, L. & Wright, G. J. Approaches to identify extracellular receptor–ligand interactions. Curr. Opin. Struct. Biol. 56, 28–36 (2019).

    Article  CAS  PubMed  Google Scholar 

  19. Wang, E. Y. et al. High-throughput identification of autoantibodies that target the human exoproteome. Cell Rep. Methods 2, 100172 (2022).

  20. Korotkova, N. et al. A subfamily of Dr adhesins of Escherichia coli bind independently to decay-accelerating factor and the N-domain of carcinoembryonic antigen. J. Biol. Chem. 281, 29120–29130 (2006).

    Article  CAS  PubMed  Google Scholar 

  21. Berger, C. N., Billker, O., Meyer, T. F., Servin, A. L. & Kansau, I. Differential recognition of members of the carcinoembryonic antigen family by Afa/Dr adhesins of diffusely adhering Escherichia coli (Afa/Dr DAEC). Mol. Microbiol. 52, 963–983 (2004).

    Article  CAS  PubMed  Google Scholar 

  22. Garrett, W. S. et al. Enterobacteriaceae act in concert with the gut microbiota to induce spontaneous and maternally transmitted colitis. Cell Host Microbe 8, 292–300 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Brbic, M. et al. The landscape of microbial phenotypic traits and associated genes. Nucleic Acids Res. 44, 10074–10090 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Jung, P. et al. Isolation and in vitro expansion of human colonic stem cells. Nat. Med. 17, 1225–1227 (2011).

    Article  CAS  PubMed  Google Scholar 

  25. Lee, S. M. et al. Bacterial colonization factors control specificity and stability of the gut microbiota. Nature 501, 426–429 (2013).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  26. Van Rossum, T., Ferretti, P., Maistrenko, O. M. & Bork, P. Diversity within species: interpreting strains in microbiomes. Nat. Rev. Microbiol. 18, 491–506 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Crost, E. H. et al. Utilisation of mucin glycans by the human gut symbiont Ruminococcus gnavus is strain-dependent. PLoS ONE 8, e76341 (2013).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  28. Hall, A. B. et al. A novel Ruminococcus gnavus clade enriched in inflammatory bowel disease patients. Genome Med. 9, 103 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Kostic, A. D. et al. Genomic analysis identifies association of Fusobacterium with colorectal carcinoma. Genome Res. 22, 292–298 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Castellarin, M. et al. Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. Genome Res. 22, 299–306 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Kostic, A. D. et al. Fusobacterium nucleatum potentiates intestinal tumorigenesis and modulates the tumor–immune microenvironment. Cell Host Microbe 14, 207–215 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Gur, C. et al. Fusobacterium nucleatum supresses anti-tumor immunity by activating CEACAM1. Oncoimmunology 8, e1581531 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Abed, J. et al. Colon cancer-associated Fusobacterium nucleatum may originate from the oral cavity and reach colon tumors via the circulatory system. Front. Cell. Infect. Microbiol. 10, 400 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Parhi, L. et al. Breast cancer colonization by Fusobacterium nucleatum accelerates tumor growth and metastatic progression. Nat. Commun. 11, 3259 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  35. Matsui, S. et al. Human Fat2 is localized at immature adherens junctions in epidermal keratinocytes. J. Dermatol. Sci. 48, 233–236 (2007).

    Article  CAS  PubMed  Google Scholar 

  36. Jonca, N. et al. Corneodesmosomes and corneodesmosin: from the stratum corneum cohesion to the pathophysiology of genodermatoses. Eur. J. Dermatol. 21, 35–42 (2011).

    Article  CAS  PubMed  Google Scholar 

  37. Johnson, N. C. XG: the forgotten blood group system. Immunohematology 27, 68–71 (2011).

    Article  CAS  PubMed  Google Scholar 

  38. Uhlen, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).

    Article  PubMed  Google Scholar 

  39. Bourhis, E. et al. Wnt antagonists bind through a short peptide to the first β-propeller domain of LRP5/6. Structure 19, 1433–1442 (2011).

    Article  CAS  PubMed  Google Scholar 

  40. Kahn, M. Can we safely target the WNT pathway? Nat. Rev. Drug Discov. 13, 513–532 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Anastas, J. N. & Moon, R. T. WNT signalling pathways as therapeutic targets in cancer. Nat. Rev. Cancer 13, 11–26 (2013).

    Article  CAS  PubMed  Google Scholar 

  42. Carvalheiro, T. et al. Leukocyte associated immunoglobulin like receptor 1 regulation and function on monocytes and dendritic cells during inflammation. Front. Immunol. 11, 1793 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Weiskopf, K. et al. Engineered SIRPα variants as immunotherapeutic adjuvants to anticancer antibodies. Science 341, 88–91 (2013).

    Article  ADS  CAS  PubMed  Google Scholar 

  44. Blondel, C. J. et al. CRISPR/Cas9 screens reveal requirements for host cell sulfation and fucosylation in bacterial type III secretion system-mediated cytotoxicity. Cell Host Microbe 20, 226–237 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Sauer, M. M. et al. Catch-bond mechanism of the bacterial adhesin FimH. Nat. Commun. 7, 10738 (2016).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  46. Adrian, J., Bonsignore, P., Hammer, S., Frickey, T. & Hauck, C. R. Adaptation to host-specific bacterial pathogens drives rapid evolution of a human innate immune receptor. Curr. Biol. 29, 616–630.e5 (2019).

    Article  CAS  PubMed  Google Scholar 

  47. Baker, E. P. et al. Evolution of host–microbe cell adherence by receptor domain shuffling. eLife 11, e73330 (2022).

  48. Xiang, H. et al. Crystal structures reveal the multi-ligand binding mechanism of Staphylococcus aureus ClfB. PLoS Pathog. 8, e1002751 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. UniProt, C. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 49, D480–D489 (2021).

    Article  Google Scholar 

  50. Carpenter, B. et al. Stan: a probabilistic programming language. J. Stat. Softw. 76, 1–32 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  51. andrewGhazi/basehitmodel: basehitmodel-0.1.0. Zenodo https://doi.org/10.5281/zenodo.10606151 (2024).

  52. Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 649–662.e20 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still going strong. Nucleic Acids Res. 47, D330–D338 (2019).

    Article  Google Scholar 

  54. Zhou, X., Kao, M. C. & Wong, W. H. Transitive functional annotation by shortest-path analysis of gene expression data. Proc. Natl Acad. Sci. USA 99, 12783–12788 (2002).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  55. Wang, T. & Tang, H. The physical characteristics of human proteins in different biological functions. PLoS ONE 12, e0176234 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate — a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

    MathSciNet  Google Scholar 

  57. Beghini, F. et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. eLife 10, e65088 (2021).

  58. Suzek, B. E. et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2015).

    Article  CAS  PubMed  Google Scholar 

  59. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Asnicar, F. et al. Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0. Nat. Commun. 11, 2500 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  61. Segata, N., Bornigen, D., Morgan, X. C. & Huttenhower, C. PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat. Commun. 4, 2304 (2013).

    Article  ADS  PubMed  Google Scholar 

  62. Sukumaran, J. & Holder, M. T. DendroPy: a Python library for phylogenetic computing. Bioinformatics 26, 1569–1571 (2010).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank all members of the Palm, Ring and Huttenhower laboratories for helpful advice and assistance. This work was supported by a grant from the Leona M. and Henry B. Helmsley Charitable Trust (3083 to N.W.P. and A.M.R.). N.W.P. is additionally supported by an NIH Director’s New Innovator Award (DP2DK125119), the NIA and NIGMS (R01AG068863 and RM1GM141649), a Pew Scholar Award, the Chan Zuckerberg Initiative, Aligning Science Across Parkinson’s, F. Hoffmann-La Roche Ltd, and gifts from the Mathers Family Foundation and Ludwig Family Foundation. A.M.R. is additionally supported by an NIH Director’s Early Independence Award (DP5OD023088), a Pew-Stewart Scholar award, and gifts from the Mathers Family Foundation, the Ludwig Family Foundation and the Robert T. McCluskey Foundation. C.E.R. and N.D.S. were supported by the National Science Foundation Graduate Research Fellowship Program. The computations in this paper were run in part on the FASRC Cannon cluster supported by the FAS Division of Science Research Computing Group at Harvard University. Illustrations in Figs. 1a, 4a and 5a,c were generated with BioRender (https://biorender.com).

Author information

Authors and Affiliations

Authors

Contributions

C.E.R., N.D.S., N.W.P. and A.M.R. designed the study. C.E.R. and N.D.S. established the BASEHIT platform and performed BASEHIT screens. C.E.R., Y.D., S.F. and A.M.R. created the exoprotein yeast display library. A.R.G. developed the BASEHIT statistical model and performed associated analysis. E.A.F. performed the global network and phylogenetic analysis. N.D.S. and C.E.R. performed all other analyses. C.E.R., N.D.S., A.A.B. and Y.C. acquired and grew bacteria for BASEHIT screens. C.E.R., N.D.S. B.D.-L., J.A.G.-H., J.D.H. and T.A.R. contributed essential reagents for and performed orthogonal validations. N.D.S., B.D.-L. and J.A.G.-H. performed the in vitro functional experiments. N.D.S., Y.Y., M.T.N. and D.S. assessed potential phenotypes and performed the in vivo experiments. Y.Y. performed the whole-genome sequencing of Staphylococcus strains. C.G. and J.O. contributed Staphylococcus strains. A.L.M. assisted with the gnotobiotic mouse experiments. C.H., A.M.R. and N.W.P. supervised the study. C.E.R., N.D.S., A.R.G., E.A.F., C.H., A.M.R. and N.W.P. wrote the paper with input from all authors.

Corresponding authors

Correspondence to Aaron M. Ring or Noah W. Palm.

Ethics declarations

Competing interests

C.E.R., N.W.P. and A.M.R. are inventors of patents related to the BASEHIT technology and specific host–microorganism interactions discovered through BASEHIT. N.W.P. is a co-founder of Artizan Biosciences and Design Pharmaceuticals. All other authors declare no competing interests.

Peer review

Peer review information

Nature thanks Mikhail Savitski and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Yeast exoproteome library composition and diversity and bacterial strain collection composition and diversity.

a, Extracellular protein sequences are curated and cloned into a standardized backbone featuring a C-terminal epitope tag. Proper display is confirmed via epitope tag staining, as well as binding by confirmation-specific antibodies or endogenous ligands for a subset of proteins. b, Schematic of expression construct used in the yeast display library. c, Proportion of the human exoproteome represented in the yeast display library. d, Each protein is represented by multiple barcodes, with a median of 20 barcodes per protein. Boxplot shows median, IQR, and whiskers extend to 1.5x IQR for n = 3,406 epitopes from 3,336 proteins in the library. e, Tissue expression (defined as Normalized Expression (NX) > 10 in the Human Protein Atlas) of proteins in the library, grouped by barrier, immune, and sterile tissues. f, Percentage of proteins in the library belonging to highly represented protein families. g, Number of strains from indicated genera, showing all genera with 9 or more strains. h, Number of strains from indicated species, showing all species with 5 or more strains. i, Number of strains from different body sites, showing all body sites with 5 or more strains.

Source Data

Extended Data Fig. 2 BASEHIT optimization with AIEC identifies conditions that yield selectivity and specificity and are broadly specific across diverse known host-microbe interactions.

a, Enrichment of CD55 and CEACAM1 by AIEC using different bead:cell ratios. Enrichment is defined as the fold change in frequency of reads for the indicated protein in the post-selection library relative to the pre-selection library. Enrichment of both CD55 and CEACAM1 decreases with increasing cell:bead ratio. b, Enrichment of CD55 and CEACAM1 by AIEC labelled with variable concentrations of sulfo-NHS-biotin reagent. Increasing or decreasing concentrations of biotin decrease enrichment of CD55 and CEACAM1. c, Enrichment of CD55 and CEACAM1 by various E. coli strains with or without expression of Dr-family adhesins as indicated. CD55 and CEACAM1 are specifically enriched by the Dr-adhesin containing AIEC strain. d, Exoproteome-wide host exoprotein binding pattern of AIEC determined by BASEHIT. CD55 and CEACAM1 are enriched substantially more than any other protein. Data in ab represent the mean ± s.d., from n = 3 independent samples. e, Diverse bacterial strains with previously described interactions with human exoproteins were screened by BASEHIT and assessed for enrichment. Interactions that were successfully detected by BASEHIT are shown as filled circles, while interactions that BASEHIT failed to detect are shown as empty circles. The overall rate of detection of previously reported interactions (54%) is shown in the pie chart on the right.

Source Data

Extended Data Fig. 3 Impacts of biotinylation and bacterial cell density on the detection of interactions via BASEHIT.

a, Four bacterial strains with differing interaction profiles were grown and labeled with a titration of biotin ranging from 50 nM to 500 µM and then screened by BASEHIT. The enrichments of each protein hit are shown across all conditions, along with the enrichments of two predicted inert proteins — the coronaviral spike protein 229E-S1, and the arylsulfatase ARSA, which serve as internal negative controls. The biotin concentration used for labelling in our large-scale screen (5 µM) is highlighted in teal. Across all tested interactions, 5 µM biotin exhibited enrichments within two-fold of the “optimal” condition, and no appreciable enrichment of inert proteins was observed under any conditions. Data represent the mean ± s.d. from n = 3 independent experiments. b, Five strains were screened via BASEHIT at bacterial amounts ranging from 50 µL of 0.25 OD/mL to 10 OD/mL per well. The enrichments of hits identified in the BASEHIT screen, as well as the predicted inert proteins 229E-S1 and ARSA. The density used in our large-scale BASEHIT screen, 5 OD/mL, is highlighted in each graph. Across all tested interactions, an input of 50 µL of 5 OD/mL provided enrichment within two-fold of the “optimal” condition, and no appreciable enrichment of inert proteins was observed under any conditions. The density of bacterial particles was determined via volumetric counts for 97 strains used in our large-scale BASEHIT screen (all strains were at ~5 OD/mL). The five strains selected approximated the lower and upper bounds of particle density (~1 × 107 to ~3 × 108 particles/mL).

Source Data

Extended Data Fig. 4 Modelling and scoring procedure metrics.

a, A histogram of the protein barcode representation in the input library. The wide spread on the log10 x-axis indicates a high degree of variability. The model accounts for this by using barcode input concentration as an offset term. Each tick mark across the x-axis below the histogram represents a protein. b, A Venn diagram showing interaction counts that pass each of the three hit-calling thresholds for the standard threshold set (95% interval excludes zero, estimated effect size > 0.5, and concordance score > 0.75). c, A plot of normalized counts demonstrating the utility of the concordance threshold. Both interactions shown have about the same interaction score (around 1.9) and similarly variable inputs in the Pre library (top panels), but the concordance between normalized output counts (bottom panels) in the TFF2:HM645 interaction is much higher than in SLC6A9:HM1171. Grey cells represent zero counts. d, A histogram of concordance scores for all interactions in the assay. Dashed vertical lines indicate the stringent and standard thresholds. e, Saturation curves from repeated rarefaction analysis. Given that both sets of thresholds have roughly plateaued, we can conclude that we have identified most of the interactions that are detectable under the experimental conditions. f, Comparison of the results of an initial run of the scoring method against five repeated runs where the standard deviation of the normal prior on interaction scores varied from 0.075 to 0.3. Each dot represents the score of a particular interaction. Only interactions that were a hit in at least one run are shown. The middle panel uses the same value as the initial run, showing the extent of Monte Carlo error. As expected, the rank and relative magnitude of scores are highly consistent between runs, while narrower priors lead to lower scores and fewer hits and wider priors lead to higher scores and more hits. The two distinct groups of interactions visible in the panels with wide priors represent subpopulations of interactions that are either more or less amenable to the zero-inflation component of the model.

Source Data

Extended Data Fig. 5 Proteins from multiple tissues bind bacteria with a power-log distribution, and bacteria from different tissues or phyla show similar distributions of host protein binding.

a, Plot of number of bacterial strains bound (interaction called as a hit) for proteins expressed in multiple host tissues. Tissues expression is defined as Human Protein Atlas normalized expression NX > 10. b, Plot of number of proteins bound (interaction called as a hit) for all bacteria with hits as well as for all bacteria including non-binders. c, Same plot as b but depicting strains isolated from specific tissues. Maximum and mean reported for bacteria with one or more hits. d, Same plot as b but depicting strains from indicated phyla. Maximum and mean reported for bacteria with one or more hits.

Source Data

Extended Data Fig. 6 Biophysical properties are significantly different between interacting and non-interacting proteins.

Proteins which bound at least one bacterial strain (“Targets”) are compared with “Non-targets” for various biophysical properties as indicated. FDR shown is for a two-tailed Wilcoxon Rank-Sum test. Box plots show median, IQR, and whiskers extending to 1.5x IQR, for n = 631 “Targets” and n = 2,705 “Non-targets”.

Source Data

Extended Data Fig. 7 Relationships between similarity in strains’ interaction profiles and their phylogenetic distance.

a, We computed a phylogenetic tree over 108 genomes of tested strains based on ~ 400 broadly distributed protein families. We compared distances in this tree with similarity of strains’ interaction profiles using Spearman correlation (n = 5,565 strain pairs). Phylogenetic distance is expressed in units of amino acid substitutions per amino acid site. Interaction similarity was measured as the Jaccard overlap score between strains’ sets of human protein binding partners (ignoring strains with no binding partners). b, We separately considered the subset of n = 907 strain pairs with phylogenetic distance <0.02 substitutions per site, which was largely synonymous with a conspecific relationship in taxonomy. In both regimes, interaction similarity and phylogenetic distance were strongly and significantly negatively correlated. In both cases a two-tailed Mantel test with 104 permutations with FDR adjustments was performed.

Source Data

Extended Data Fig. 8 Superbinder Staphylococcus show highly overlapping sub-networks.

a, Network of 7S. pasteuri and 8 other Staphylococcus superbinders, highlighted in green and orange respectively. The 5 proteins bound by the most strains are labeled. b, Overlap in interaction profiles across strains. Proteins are binned according to whether they are bound by more than half of the S. pasteuri strains (“Pasteuri core”), or by multiple or only one superbinder strains (“Multiple” and “Unique”, respectively). c, Top proteins bound by multiple superbinders. Overall interaction profiles of proteins bound by 7 or more superbinder strains are colored according to the strains they recognize, including all other Staphylococcus strains as well as non-Staphylococcus strains. d, Interactions for skin-expressed proteins CDSN, FAT2, and XG for all 519 bacterial strains organized by tissue of origin. Dashed red line at 0.5 represents hit threshold.

Source Data

Extended Data Fig. 9 Phylogenetic specificity of interactions with tissue-specific proteins across all tested strains.

The interaction scores for all 519 tested strains are shown for the indicated proteins, which are highlighted in Fig. 4a. Strains are colored by phylum, and all scores above the hit threshold line at 0.5 are indicated and labeled with the genus of the strain. Parentheses indicate the frequency of hits within a genus.

Source Data

Extended Data Fig. 10 Ruminococcus gnavus and Fusobacterium strains influence host cell binding and function.

a, Representative flow cytometry plots of CD7-binding and non-binding R. gnavus strains labelling mock, CD7-, and CD55-expressing EXPI293 cells as shown in Fig. 5b. b, Representative flow cytometry plots of THP-1 phagocytosis of CFSE-labelled Fusobacterium spp. and of fluorescein-labelled E. coli K12 BioParticles incubated with unlabelled Fusobacterium spp. from Fig. 5d,e.

Source Data

Supplementary information

Source data

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sonnert, N.D., Rosen, C.E., Ghazi, A.R. et al. A host–microbiota interactome reveals extensive transkingdom connectivity. Nature 628, 171–179 (2024). https://doi.org/10.1038/s41586-024-07162-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-024-07162-0

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing