Abstract
Bacteriophages, viruses that infect bacteria, have great specificity for their bacterial hosts at the strain and species level. However, the relationship between the phageome and associated bacterial population dynamics is unclear. Here we generated a computational pipeline to identify sequences associated with bacteriophages and their bacterial hosts in cell-free DNA from plasma samples. Analysis of two independent cohorts, including a Stanford Cohort of 61 septic patients and 10 controls and the SeqStudy cohort of 224 septic patients and 167 controls, reveals a circulating phageome in the plasma of all sampled individuals. Moreover, infection is associated with overrepresentation of pathogen-specific phages, allowing for identification of bacterial pathogens. We find that information on phage diversity enables identification of the bacteria that produced these phages, including pathovariant strains of Escherichia coli. Phage sequences can likewise be used to distinguish between closely related bacterial species such as Staphylococcus aureus, a frequent pathogen, and coagulase-negative Staphylococcus, a frequent contaminant. Phage cell-free DNA may have utility in studying bacterial infections.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Sequencing data with human reads removed have been deposited into NCBI SRA under bioproject PRJNA860730. Publicly available data utilized: the SepSeq study data have been previously published under bioproject PRJNA507824. No new computational tools were developed as part of this study. The INPHARED v1.7 database was downloaded and used for analyses in this study (https://github.com/RyanCook94/inphared). Infection aetiology metadata associated with samples sequenced for this study are included in the Stanford sepsis cohort sheet of Supplementary Data. The CPD FASTA file used for creating the Blast database is publicly available at https://doi.org/10.5281/zenodo.7154236. The Phage dictionary and Coliphage dictionary are additionally available as sheets in Supplementary Data. All associated supplementary files have additionally been made publicly available at https://doi.org/10.5281/zenodo.7644125.
Code availability
The R code used to summarize BLAST phageome annotations with the CPD has been made publicly available at https://doi.org/10.5281/zenodo.7734114. This includes an R markdown file detailing processing of BLAST outputs to create phage hit tables across all samples, and subsequent use of the CPD to summarize representation of phage taxonomic families and known bacterial host characteristics. A phage hit table for our sequenced samples is available along with this R code and can be used to re-create phage summary tables as well as for calculation of diversity using the R package ‘vegan’. Processing of raw data, removal of human reads, and BLAST annotations were done using existing software and are described in the relevant Methods sections.
References
Executive Board, 140. Improving the Prevention, Diagnosis and Clinical Management of Sepsis (The Secretariat, 2017).
Grabuschnig, S. et al. Putative origins of cell-free DNA in humans: a review of active and passive nucleic acid release mechanisms. Int. J. Mol. Sci. 21, 1–24 (2020).
Kowarsky, M. et al. Numerous uncharacterized and highly divergent microbes which colonize humans are revealed by circulating cell-free DNA. Proc. Natl Acad. Sci. USA 114, 9623–9628 (2017).
Cheng, A. P. et al. Cell-free DNA profiling informs all major complications of hematopoietic cell transplantation. Proc. Natl Acad. Sci. USA 119, e2113476118 (2022).
Fan, H. C., Blumenfeld, Y. J., Chitkara, U., Hudgins, L. & Quake, S. R. Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc. Natl Acad. Sci. USA 105, 16266–16271 (2008).
De Vlaminck, I. et al. Circulating cell-free DNA enables noninvasive diagnosis of heart transplant rejection. Sci. Transl. Med. 6, 241ra77 (2014).
Snyder, T. M., Khush, K. K., Valantine, H. A. & Quake, S. R. Universal noninvasive detection of solid organ transplant rejection. Proc. Natl Acad. Sci. USA 108, 6229–6234 (2011).
Schwarzenbach, H., Hoon, D. S. B. & Pantel, K. Cell-free nucleic acids as biomarkers in cancer patients. Nat. Rev. Cancer 11, 426–437 (2011).
Grumaz, S. et al. Next-generation sequencing diagnostics of bacteremia in septic patients. Genome Med. 8, 73 (2016).
Grumaz, C. et al. Rapid next-generation sequencing-based diagnostics of bacteremia in septic patients. J. Mol. Diagn. 22, 405–418 (2020).
Chen, P. et al. Rapid diagnosis and comprehensive bacteria profiling of sepsis based on cell-free DNA. J. Transl. Med. 18, 5 (2020).
Wang, L. et al. Plasma microbial cell-free DNA sequencing technology for the diagnosis of sepsis in the ICU. Front. Mol. Biosci. 8, 659390 (2021).
Eichenberger, E. M. et al. Microbial cell-free DNA identifies the causative pathogen in infective endocarditis and remains detectable longer than conventional blood culture in patients with prior antibiotic therapy. Clin. Infect. Dis. https://doi.org/10.1093/CID/CIAC426 (2022).
Burnham, P. et al. Urinary cell-free DNA is a versatile analyte for monitoring infections of the urinary tract. Nat. Commun. 9, 2412 (2018).
Hogan, C. A. et al. Clinical impact of metagenomic next-generation sequencing of plasma cell-free DNA for the diagnosis of infectious diseases: a multicenter retrospective cohort study. Clin. Infect. Dis. 72, 239–245 (2021).
Cheng, H. K. et al. Combined use of metagenomic sequencing and host response profiling for the diagnosis of suspected sepsis. Preprint at bioRxiv https://doi.org/10.1101/854182 (2019).
Sinha, M. et al. Emerging technologies for molecular diagnosis of sepsis. Clin. Microbiol. Rev. 31, e00089-17 (2018).
Navarro, F. & Muniesa, M. Phages in the human body. Front. Microbiol. 8, 566 (2017).
Barr, J. J. A bacteriophages journey through the human body. Immunol. Rev. 279, 106–122 (2017).
Hatfull, G. F. Dark matter of the biosphere: the amazing world of bacteriophage diversity. J. Virol. 89, 8107–8110 (2015).
Shkoporov, A. N. & Hill, C. Bacteriophages of the human gut: the ‘known unknown’ of the microbiome. Cell Host Microbe 25, 195–209 (2019).
de Jonge, P. A., Nobrega, F. L., Brouns, S. J. J. & Dutilh, B. E. Molecular and evolutionary determinants of bacteriophage host range. Trends Microbiol. 27, 51–63 (2019).
Flores, C. O., Meyer, J. R., Valverde, S., Farr, L. & Weitz, J. S. Statistical structure of host–phage interactions. Proc. Natl Acad. Sci. USA 108, E288–E297 (2011).
Koskella, B. & Meaden, S. Understanding bacteriophage specificity in natural microbial communities. Viruses 5, 806–823 (2013).
Nguyen, S. et al. Bacteriophage transcytosis provides a mechanism to cross epithelial cell layers. mBio 8, e01874-17 (2017).
Górski, A. et al. Bacteriophage translocation. FEMS Immunol. Med. Microbiol. 46, 313–319 (2006).
Manrique, P. et al. Healthy human gut phageome. Proc. Natl Acad. Sci. USA 113, 10400–10405 (2016).
Zhang, T. et al. RNA viral community in human feces: prevalence of plant pathogenic viruses. PLoS Biol. 4, 0108–0118 (2006).
Huang, Y.-F. et al. Analysis of microbial sequences in plasma cell-free DNA for early-onset breast cancer patients and healthy females. BMC Med. Genomics 11, 16 (2018).
Nayfach, S. et al. Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome. Nat. Microbiol. 6, 960–970 (2021).
Camarillo-Guerrero, L. F., Almeida, A., Rangel-Pineros, G., Finn, R. D. & Lawley, T. D. Massive expansion of human gut bacteriophage diversity. Cell 184, 1098–1109.e9 (2021).
Tisza, M. J. & Buck, C. B. A catalog of tens of thousands of viruses from human metagenomes reveals hidden associations with chronic diseases. Proc. Natl Acad. Sci. USA 118, e2023202118 (2021).
Adriaenssens, E. M. Phage diversity in the human gut microbiome: a taxonomist’s perspective. mSystems 6, e0079921 (2021).
Blauwkamp, T. A. et al. Analytical and clinical validation of a microbial cell-free DNA sequencing test for infectious disease. Nat. Microbiol. 4, 663–674 (2019).
Andrew, S. FastQC a quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc (Babraham Institute, 2010).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Kirschbaum, J. O. & Kligman, A. M. The pathogenic role of Corynebacterium acnes in acne vulgaris. Arch. Dermatol. 88, 832–833 (1963).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Mihara, T. et al. Linking virus genomes with host taxonomy. Viruses 8, 66 (2016).
Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948).
Hotchkiss, R. S. et al. Sepsis and septic shock. Nat. Rev. Dis. Primers 2, 16045 (2016).
Cook, R. et al. INfrastructure for a PHAge REference Database: identification of large-scale biases in the current collection of cultured phage genomes. PHAGE 2, 214–223 (2021).
Suzuki, H., Lefébure, T., Bitar, P. P. & Stanhope, M. J. Comparative genomic analysis of the genus Staphylococcus including Staphylococcus aureus and its newly described sister species Staphylococcus simiae. BMC Genomics 13, 38 (2012).
Gu, W. et al. Rapid pathogen detection by metagenomic next-generation sequencing of infected body fluids. Nat. Med. 27, 115–124 (2020).
Long, Y. et al. Diagnosis of sepsis with cell-free DNA by next-generation sequencing technology in ICU patients. Arch. Med. Res. 47, 365–371 (2016).
Barrett, S. L. R. et al. Cell free DNA from respiratory pathogens is detectable in the blood plasma of cystic fibrosis patients. Sci. Rep. 10, 6903 (2020).
Ross, A., Ward, S. & Hyman, P. More is better: selecting for broad host range bacteriophages. Front. Microbiol. 7, 1352 (2016).
Wang, X. et al. Cryptic prophages help bacteria cope with adverse environments. Nat. Commun. 1, 147 (2010).
Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021).
Li, H. Seqtk: a fast and lightweight tool for processing FASTA or FASTQ sequences. Github https://github.com/lh3/seqtk (2013).
R Core Team R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2020).
Chamberlain, S. A. & Szöcs, E. Taxize: taxonomic search and retrieval in R. F1000Res 2, 191 (2013).
Oksanen, J. et al. vegan: community ecology package. R package version 2.5-2. https://cran.r-project.org/package=vegan (2018).
Kassambara, A. et al. factoextra: extract and visualize the results of multivariate data analyses. R package factoextra version 1.0.7. https://cran.r-project.org/package=factoextra (2020).
Acknowledgements
We thank T. Blauwkamp (Karius Inc.), S. Bercovici (Karius Inc.) and N. Noll (Karius Inc.) for their assistance providing additional metadata for the SepSeq dataset. We thank the funding sources supporting this work: NIH R01HL148184-01 (P.L.B.), NIH R01AI12492093 (P.L.B.), NIH R01DC019965 (P.L.B.), Cystic Fibrosis Foundation (P.L.B.), grant from the Emerson Collective (P.L.B.), NSF GRFP (N.L.H.), NIH T32HL129970-06 (L.J.B.), NIH R01AI148623 (A.S.B.), NIH R01AI143757 (A.S.B.), Stand Up 2 Cancer grant (A.S.B.), the Allen Distinguished Investigator Award (A.S.B.), NIH R21GM147838 (S.Y. and P.L.B.), NIH R01AI153133 (S.Y.), NIH R01AI137272 (S.Y.) and NIH R01AI138978 (S.Y.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
N.L.H., L.J.B., N.R.-M., G.K., S.Y. and P.L.B. designed the study. N.L.H., L.J.B. and N.R.-M. performed experiments. N.L.H., L.J.B., G.K., N.R.-M., S.Y., A.S.B., C.Y.C. and P.L.B. analysed data. N.L.H., L.J.B. and P.L.B. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
A.S.B. has consulted for biomX and is on the scientific advisory boards of ArcBio and Caribou Biosciences. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Microbiology thanks Bryan Kraft, Evelien Adriaenssens, Jeremy Barr, Paul Turner and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Non-Human reads in Asymptomatic and Septic individuals.
(A) Average proportion of bacterial hit genus in asymptomatic (N = 10) and septic (N = 61) nonhuman cfDNA as identified by BLAST search. (B) Violin plots of proportions of non-human read identities by BLAST search from both asymptomatic (N = 10) and septic (N = 61) individuals. Descriptive statistics are available in Extended Data Table 1. (C) Proportions of bacteriophage hits removed in secondary human sequence homolog removal step (mean 0.042 SD 0.038, N = 71). (D) Average distribution of unique phages by bacterial host genus with and without secondary human sequence homology removal (N = 71). Uncleaned Hits, mean proportions and SD (Pseudomonas: mean 0.029 SD 0.30, Enterobacter mean 0.021 SD 0.015, Escherichia mean 0.042 SD 0.038, Klebsiella mean 0.015 SD 0.020, Salmonella mean 0.013 SD 0.009, Not Annotated mean 0.603 SD 0.070, Staphylococcus mean 0.009 SD 0.011, Streptococcus mean 0.014 SD 0.036, Enterococcus mean 0.001 SD 0.002, Bacillus mean 0.006 SD 0.007, Other mean 0.245 SD 0.052), Cleaned Hits, mean proportions and SD (Pseudomonas: mean 0.047 SD 0.058, Enterobacter mean 0.050 SD 0.058, Escherichia mean 0.076 SD 0.065, Klebsiella mean 0.023 SD 0.036, Salmonella mean 0.014 SD 0.016, Not Annotated mean 0.35 SD 0.100, Staphylococcus mean 0.020 SD 0.040, Streptococcus mean 0.032 SD 0.076, Enterococcus mean 0.002 SD 0.004, Bacillus mean 0.012 SD 0.015, Other mean 0.373 SD 0.123). All violin plots are shown with individual data points with median and quartiles shown by dashed lines.
Extended Data Fig. 2 Bacterial host distribution does not change in sepsis, though individual variation remains.
(A) Violin plot of bacteriophage host genus proportions between Asymptomatic (N = 10) and Septic (N = 61) patient samples, associated statistics are in Extended Data Table 2. (B) Heatmap of Pearson dissimilarity matrix between patients with sepsis (N = 61). (C) Heatmap of Pearson dissimilarity matrix between asymptomatic controls (N = 10). (D) Histogram of prevalence across sequenced samples of each phage. (E) Phage bacterial host genus proportions per asymptomatic patient (N = 10). (F) Phage bacterial host genus proportions per septic patient (N = 61).
Extended Data Fig. 3 Proportion of ‘Not Annotated’ Phages from CHVD Gut Metagenome Phages.
(A) Proportion of ‘Not Annotated’ Phages from Gut Metagenome Phages in Stanford Sepsis Cohort (Asymptomatic mean: 0.413 SD: 0.060, N = 10. Sepsis mean: 0.300 SD:0.119, N = 61. Unpaired two-sided t test, P = 0.0046) (B) Proportion of ‘Not Annotated’ Phages from Gut Metagenome Phages in SepSeq cohort (Asymptomatic mean: 0.464 SD:0.271 N = 10. Sepsis mean: 0.418, SD:0.29, N = 61. Unpaired two sided t test, P = 0.101).
Extended Data Fig. 4 Number of E. coli phages by genetically classified taxonomic phage family.
(A) Sankey diagram of taxonomic family classifications from NCBI Taxonomy classification (Left) to genetically classified family (Right) (B) Number of E. coli phages by genetically classified taxonomic phage family tested by Brown-Forsythe and Welch Anova Test with two sided Dunnet’s T3 test multiple comparisons for Asymptomatic (N = 166), SIRS (N = 95), Other Sepsis (N = 55) and E. coli Sepsis (N = 36).
Extended Data Fig. 5 E. coli Phage Host Characteristic Proportions.
(A)Proportion of E. coli phage host characteristics in violin plots with individual data points with median and quartiles shown by dashed lines. Analyzed by two-way ANOVA with Sidak’s multiple comparisons only in samples with E. coli phage for Asymptomatic (N = 100), SIRS (N = 62), Other Sepsis (N = 36) and E. coli Sepsis (N = 36) patients. Phage characteristic source of variation P = 7.93E-292, Patient category source of variation P > 0.99, Interaction of Phage category and patient category P = 1.75E-50. Lab Strain Associated Phage (Mean: Asymptomatic 0.69, SIRS 0.58, Other Sepsis 0.45, E. coli Sepsis 0.36. Multiple comparisons: E. coli Sepsis vs: Asymptomatic P = 4.37E-33, SIRS P = 4.45E-13, Other Sepsis P = 0.03. Other Sepsis vs: SIRS P = 5.88E-5, Asymptomatic P = 2.19E-18. Asymptomatic vs SIRS P = 1.80E-6), Unspecified Host Associated Phage (Mean: Asymptomatic 0.18, SIRS 0.17, Other Sepsis 0.27, E. coli Sepsis 0.17. Multiple comparisons: E. coli Sepsis vs: Asymptomatic P > 0.99, SIRS P > 0.99, Other Sepsis P = 0.01. Other Sepsis vs: SIRS P = 2E-3, Asymptomatic P = 3E-3. Asymptomatic vs SIRS P = 0.994), STEC Associated Phage (Mean: Asymptomatic 0.05, SIRS 0.14, Other Sepsis 0.13, E. coli Sepsis 0.31. Multiple comparisons: E. coli Sepsis vs: Asymptomatic P = 4.76E-22, SIRS P = 1.19E-8, Other Sepsis P = 5.22E-8. Other Sepsis vs: SIRS P = 0.998, Asymptomatic P = 0.02. Asymptomatic vs SIRS P = 1.79E-4), ETEC Associated Phage (Mean: Asymptomatic 0.03, SIRS 0.04, Other Sepsis 0.03, E. coli Sepsis 0.05. Multiple comparisons: E. coli Sepsis vs: Asymptomatic P = 0.85, SIRS P = 0.99, Other Sepsis P = 0.98. Other Sepsis vs: SIRS P > 0.99, Asymptomatic P > 0.99. Asymptomatic vs SIRS P > 0.99), EPEC Associated Phage Associated Phage (Mean: Asymptomatic 3.67E-3, SIRS 0.01, Other Sepsis 6.78E-4, E. coli Sepsis 0.01. Multiple comparisons: E. coli Sepsis vs: Asymptomatic P > 0.99, SIRS P > 0.99, Other Sepsis P > 0.99. Other Sepsis vs: SIRS P > 0.99, Asymptomatic P > 0.99. Asymptomatic vs SIRS P > 0.99), ExPEC Associated Phage (Mean: Asymptomatic 0.02, SIRS 0.04, Other Sepsis 0.04, E. coli Sepsis 0.05. Multiple comparisons: E. coli Sepsis vs: Asymptomatic P = 0.89, SIRS P > 0.99, Other Sepsis P > 0.99. Other Sepsis vs: SIRS P > 0.99, Asymptomatic P = 0.99. Asymptomatic vs SIRS P > 0.99), Sewage/Manure/Water Associated Phage (Mean: Asymptomatic 0.02, SIRS 0.03, Other Sepsis 0.07, E. coli Sepsis 0.03. Multiple comparisons: E. coli Sepsis vs: Asymptomatic P > 0.99, SIRS P > 0.99, Other Sepsis P = 0.73. Other Sepsis vs: SIRS P = 0.66, Asymptomatic P = 0.33. Asymptomatic vs SIRS P > 0.99).
Supplementary information
Supplementary Data
This file contains the following sheets: Phage dictionary, Coliphage dictionary and Stanford sepsis cohort (infection metadata), negative control summary, PBS control BLAST hits and water control BLAST hits.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Haddock, N.L., Barkal, L.J., Ram-Mohan, N. et al. Phage diversity in cell-free DNA identifies bacterial pathogens in human sepsis cases. Nat Microbiol 8, 1495–1507 (2023). https://doi.org/10.1038/s41564-023-01406-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41564-023-01406-x