The gut of healthy human neonates is usually devoid of viruses at birth, but quickly becomes colonized, which—in some cases—leads to gastrointestinal disorders1,2,3,4. Here we show that the assembly of the viral community in neonates takes place in distinct steps. Fluorescent staining of virus-like particles purified from infant meconium or early stool samples shows few or no particles, but by one month of life particle numbers increase to 109 per gram, and these numbers seem to persist throughout life5,6,7. We investigated the origin of these viral populations using shotgun metagenomic sequencing of virus-enriched preparations and whole microbial communities, followed by targeted microbiological analyses. Results indicate that, early after birth, pioneer bacteria colonize the infant gut and by one month prophages induced from these bacteria provide the predominant population of virus-like particles. By four months of life, identifiable viruses that replicate in human cells become more prominent. Multiple human viruses were more abundant in stool samples from babies who were exclusively fed on formula milk compared with those fed partially or fully on breast milk, paralleling reports that breast milk can be protective against viral infections8,9,10. Bacteriophage populations also differed depending on whether or not the infant was breastfed. We show that the colonization of the infant gut is stepwise, first mainly by temperate bacteriophages induced from pioneer bacteria, and later by viruses that replicate in human cells; this second phase is modulated by breastfeeding.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Expanding known viral diversity in the healthy infant gut
Nature Microbiology Open Access 10 April 2023
Comparison of gut viral communities in children under 5 years old and newborns
Virology Journal Open Access 27 March 2023
Interpersonal variability of the human gut virome confounds disease signal detection in IBD
Communications Biology Open Access 25 February 2023
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Get just this article for as long as you need it
Prices may be subject to local taxes which are calculated during checkout
Sample information and raw sequences are available in the National Center for Biotechnology Information Sequence Read Archive under BioProject ID PRJNA524703 (Supplementary Table 8). The isolated bacterial genome sequences have been deposited at DDBJ/ENA/GenBank under the accession numbers WVTF00000000–WVUC00000000 (Supplementary Table 3).
All bioinformatic scripts are available on Github (https://github.com/guanxiangliang/liang2019).
Breitbart, M. et al. Viral diversity and dynamics in an infant gut. Res. Microbiol. 159, 367–373 (2008).
Lim, E. S. et al. Early life dynamics of the human gut virome and bacterial microbiome in infants. Nat. Med. 21, 1228–1234 (2015).
Liu, L. et al. Global, regional, and national causes of under-5 mortality in 2000–15: an updated systematic analysis with implications for the Sustainable Development Goals. Lancet 388, 3027–3035 (2016).
Oude Munnink, B. B. & van der Hoek, L. Viruses causing gastroenteritis: the known, the new and those beyond. Viruses 8, 42 (2016).
Kim, M. S., Park, E. J., Roh, S. W. & Bae, J. W. Diversity and abundance of single-stranded DNA viruses in human feces. Appl. Environ. Microbiol. 77, 8062–8070 (2011).
Lepage, P. et al. Dysbiosis in inflammatory bowel disease: a role for bacteriophages? Gut 57, 424–425 (2008).
Hoyles, L. et al. Characterization of virus-like particles associated with the human faecal and caecal microbiota. Res. Microbiol. 165, 803–812 (2014).
Bahl, R. et al. Infant feeding patterns and risks of death and hospitalization in the first half of infancy: multicentre cohort study. Bull. World Health Organ. 83, 418–426 (2005).
Arifeen, S. et al. Exclusive breastfeeding reduces acute respiratory infection and diarrhea deaths among infants in Dhaka slums. Pediatrics 108, e67 (2001).
Victora, C. G. et al. Infant feeding and deaths due to diarrhea. A case–control study. Am. J. Epidemiol. 129, 1032–1041 (1989).
Aagaard, K. et al. The placenta harbors a unique microbiome. Sci. Transl. Med. 6, 237ra65 (2014).
Lauder, A. P. et al. Comparison of placenta samples with contamination controls does not provide evidence for a distinct placenta microbiota. Microbiome 4, 29 (2016).
Theis, K. R. et al. Does the human placenta delivered at term have a microbiota? Results of cultivation, quantitative real-time PCR, 16S rRNA gene sequencing, and metagenomics. Am. J. Obstet. Gynecol. 220, 267.E1–267.E39 (2019).
de Goffau, M. C. et al. Human placenta has no microbiome but can contain potential pathogens. Nature 572, 329–334 (2019).
Baumann-Dudenhoeffer, A. M., D’Souza, A. W., Tarr, P. I., Warner, B. B. & Dantas, G. Infant diet and maternal gestational weight gain predict early metabolic maturation of gut microbiomes. Nat. Med. 24, 1822–1829 (2018).
Reyes, A. et al. Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature 466, 334–338 (2010).
Minot, S., Grunberg, S., Wu, G. D., Lewis, J. D. & Bushman, F. D. Hypervariable loci in the human gut virome. Proc. Natl Acad. Sci. USA 109, 3962–3966 (2012).
Reyes, A., Semenkovich, N. P., Whiteson, K., Rohwer, F. & Gordon, J. I. Going viral: next-generation sequencing applied to phage populations in the human gut. Nat. Rev. Microbiol. 10, 607–617 (2012).
Aggarwala, V., Liang, G. & Bushman, F. D. Viral communities of the human gut: metagenomic analysis of composition and dynamics. Mob. DNA 8, 12 (2017).
Wolf, Y. I. et al. Origins and evolution of the global RNA virome. mBio 9, e02329-18 (2018).
Jacob, F., Sussman, R. & Monod, J. On the nature of the repressor ensuring the immunity of lysogenic bacteria (in French). C. R. Acad. Sci. 254, 4214–4216 (1962).
Ptashne, M. A Genetic Switch (Blackwell, 1986).
Jacob, F. & Wollman, E. Spontaneous induction of the development of bacteriophage lambda during genetic recombination in Escherichia coli K12 (in French). C.R. Acad. Sci. 239, 317–319 (1954).
McNair, K., Bailey, B. A. & Edwards, R. A. PHACTS, a computational approach to classifying the lifestyle of phages. Bioinformatics 28, 614–618 (2012).
Dutilh, B. E. et al. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes. Nat. Commun. 5, 4498 (2014).
Shkoporov, A. N. et al. ΦCrAss001 represents the most abundant bacteriophage family in the human gut and infects Bacteroides intestinalis. Nat. Commun. 9, 4781 (2018).
Turin, C. G. & Ochoa, T. J. The role of maternal breast milk in preventing infantile diarrhea in the developing world. Curr. Trop. Med. Rep. 1, 97–105 (2014).
Newburg, D. S., Ruiz-Palacios, G. M. & Morrow, A. L. Human milk glycans protect infants against enteric pathogens. Annu. Rev. Nutr. 25, 37–58 (2005).
Lewis, E. D., Richard, C., Larsen, B. M. & Field, C. J. The importance of human milk for immunity in preterm infants. Clin. Perinatol. 44, 23–47 (2017).
Chehoud, C. et al. Transfer of viral communities between human individuals during fecal microbiota transplantation. mBio 7, e00322-16 (2016).
Wang, D. et al. Viral discovery and sequence recovery using DNA microarrays. PLoS Biol. 1, e2 (2003).
Hill, D. A. et al. Metagenomic analyses reveal antibiotic-induced temporal and spatial changes in intestinal microbiota with associated alterations in immune cell homeostasis. Mucosal Immunol. 3, 148–158 (2010).
Clarke, E. L. et al. Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments. Microbiome 7, 46 (2019).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Davis, N. M., Proctor, D. M., Holmes, S. P., Relman, D. A. & Callahan, B. J. Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data. Microbiome 6, 226 (2018).
Li, D., Liu, C. M., Luo, R., Sadakane, K. & Lam, T. W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).
Pundir, S., Magrane, M., Martin, M. J., O’Donovan, C. & The UniProt Consortium. Searching and navigating UniProt databases. Curr. Protoc. Bioinformatics 50, 1.27.1–1.27.10 (2015).
Minot, S. et al. Rapid evolution of the human gut virome. Proc. Natl Acad. Sci. USA 110, 12450–12455 (2013).
Yutin, N. et al. Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut. Nat. Microbiol. 3, 38–46 (2018).
Guerin, E. et al. Biology and taxonomy of crAss-like bacteriophages, the most abundant virus in the human gut. Cell Host Microbe 24, 653–664 (2018).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Abbas, A. A. et al. Bidirectional transfer of Anelloviridae lineages between graft and host during lung transplantation. Am. J. Transplant. 19, 1086–1097 (2019).
Jothikumar, N. et al. Quantitative real-time PCR assays for detection of human adenoviruses and identification of serotypes 40 and 41. Appl. Environ. Microbiol. 71, 3131–3136 (2005).
Abbas, A. A. et al. The perioperative lung transplant virome: torque teno viruses are elevated in donor lungs and show divergent dynamics in primary graft dysfunction. Am. J. Transplant. 17, 1313–1324 (2017).
Verstrepen, W. A., Kuhn, S., Kockx, M. M., Van De Vyvere, M. E. & Mertens, A. H. Rapid detection of enterovirus RNA in cerebrospinal fluid specimens with a novel single-tube real-time reverse transcription-PCR assay. J. Clin. Microbiol. 39, 4093–4096 (2001).
van Maarseveen, N. M., Wessels, E., de Brouwer, C. S., Vossen, A. C. & Claas, E. C. Diagnosis of viral gastroenteritis by simultaneous detection of adenovirus group F, astrovirus, rotavirus group A, norovirus genogroups I and II, and sapovirus in two internally controlled multiplex real-time PCR assays. J. Clin. Virol. 49, 205–210 (2010).
Oka, T. et al. Detection of human sapovirus by real-time reverse transcription-polymerase chain reaction. J. Med. Virol. 78, 1347–1353 (2006).
Rolfe, K. J. et al. An internally controlled, one-step, real-time RT-PCR assay for norovirus detection and genogrouping. J. Clin. Virol. 39, 318–321 (2007).
Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903 (2015).
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
Boetzer, M., Henkel, C. V., Jansen, H. J., Butler, D. & Pirovano, W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27, 578–579 (2011).
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Arndt, D. et al. PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res. 44, W16–W21 (2016).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Kao, D. et al. ERE database: a database of genomic maps and biological properties of endogenous retroviral elements in the C57BL/6J mouse genome. Genomics 100, 157–161 (2012).
Young, G. R., Kassiotis, G. & Stoye, J. P. Emv2, the only endogenous ecotropic murine leukemia virus of C57BL/6J mice. Retrovirology 9, 23 (2012).
We thank J. You and co-workers for help with imaging, M. Goulian and co-workers for bacterial strains, and F. Rohwer and associates for assistance with phage induction; members of the Bushman laboratory for help and suggestions; and L. Zimmerman for artwork and help with the manuscript. The Botswana Infant Microbiome Study team thank Copan Italia for their donation of the eNAT medium and flocked swabs used for the collection of rectal swab specimens. This work was supported by NIH grants R61-HL137063 (F.D.B.), R01-HL113252 (F.D.B.) and R01DK107565 (G.D.W. and E.F.). The project described was also supported by the Penn Center for AIDS Research (P30 AI 045008) (F.D.B.), the PennCHOP Microbiome Program (F.D.B., G.D.W. and R.N.B.) and a Tobacco Formula grant under the Commonwealth Universal Research Enhancement (CURE) program (grant number SAP 4100068710) (F.D.B. and R.N.B.). Funding was also provided by an unrestricted donation from the American Beverage Foundation for a Healthy America to the Children’s Hospital of Philadelphia to support the Healthy Weight Program (C.Z., K.B., E.F., J.S.G. and B.S.Z.). The project was also supported by the National Center for Advancing Translational Sciences, National Institutes of Health, through grants UL1TR000003 and UL1TR001878 (E.F.). Funding for the Botswana Infant Microbiome Study was provided by the Duke Center for AIDS Research (5P30 AI064518) and the NIH (K23 AI135090).
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Overview of total stool microbial shotgun metagenomic sequencing.
a, Percentage of reads mapped to human or microbial genomes or that were unassigned. The types of DNA detected are indicated on the right. b, Correlation between the percentage of human DNA and sampling time after delivery using month-0 samples (n = 20). The percentage of human DNA is shown on the y axis, and the sampling time after delivery is shown on the x axis. The black dashed line shows the linear regression line and the grey-shaded region shows the 95% confidence interval for the slope. Two-sided Spearman’s rank-order correlation method was used to test significance (R represents Spearman’s ρ). c, Taxonomic composition of bacteria at the phylum level. The total read number is shown on the y axis; the x axis shows different samples. d, Bacterial richness. The y axis shows the richness calculated as the number of observed species. e, Bacterial diversity. d, e, A two-sided Wilcoxon rank-sum test was used to test the difference between different age groups (n = 20 infants at three time points). The horizontal lines in the box plots represent the third quartile, median and first quartile; whiskers extend to ±1.5× the interquartile range. The dots represent the outliers.
Extended Data Fig. 2 Summary of virome sequencing of infant stool.
a, Heat map summarizing the representation of the top five most-abundant DNA viral contigs in each sample. Samples are grouped sequentially by infant on both the x axis and y axis. The last group of infants on the x axis are negative control samples. Circularity indicates whether a contig is circular (orange colour) or not (light-green colour). The heat map map colour represents the abundance (log-transformed reads per million total reads) of each contig in each sample. b, Contig read abundance compared between different infants and within the same individuals. Time points were pooled for each individual. c–e, Percentage of DNA virome reads assigned to viruses (c), unassigned (d) and contamination (e). f–h, Percentage of RNA virome reads assigned to viruses (f), unassigned (g) and contamination (h). b–h, n = 20 infants at three time points were tested. The horizontal lines in box plots represent the third quartile, median and first quartile; whiskers extend to ±1.5× the interquartile range. The dots represent the outliers.
Extended Data Fig. 3 Correlation between viral and bacterial communities.
a, Pairwise correlations among sample measures including: VLP count number, bacterial 16S qPCR copy number, viral richness, bacterial sequence read proportion, bacteria richness and diversity. The size of circles indicates the R value of the correlation. Blue colour indicates a positive correlation, and red colour indicates a negative correlation. Samples from different time points were pooled (n = 60). A two-sided Spearman’s rank-order correlation method was used in this analysis. b, As in a, but showing the raw data of the statistical analysis. P values, FDR-corrected P values and R (Spearman’s ρ) values are shown.
Extended Data Fig. 4 Life cycles of bacteriophages.
a, Diagram of lytic and lysogenic bacteriophage replication (based on a previous study22). Not shown are additional phage replication strategies, such as chronic infection and pseudolysogeny. b, Prediction of replication modes from contig sequences using PHACTS. The x axis shows the probability that a contig belongs to a lytic or temperate phage predicted by PHACTS. The y axis shows the viral contig number. In total, 1,029 phage contigs with at least 10 open-reading frames were used in this analysis. Of 1,029 contigs, 233 were predicted to be lytic and 794 were predicted to be temperate. Probability values obtained from PHACTS were standardized between −1 and 1, which was presented as a probability to be lytic or temperate.
Extended Data Fig. 5 Prophage induction in the early-life virome.
a, Comparison of the extent of sequence alignment of induced VLP sequences from bacterial strains compared with VLP sequences from stool samples. Contigs were generated from mitomycin-C-induced VLPs from purified bacterial strains from stool (n = 33 phage contigs from 16 bacterial isolates), then VLP reads from faeces were aligned to these contigs and quantified. ‘Within infants’ indicates matching stool VLPs to induced VLPs from purified bacteria for samples all from the same infant. ‘Between infants’ indicates alignment of stool VLPs versus induced VLPs from different infants. The horizontal lines in box plots represent the third quartile, median and first quartile. The dots represent the outliers. Samples were compared using a two-sided Wilcoxon rank-sum test. b, Correlation between the proportion of each bacterium in the infant gut community and the proportion of prophages from that bacterial species in the infant’s gut virome. This plot is based on VLP sequences of phages produced by spontaneous induction (n = 42 phage contigs from 20 bacterial isolates). This is different from Fig. 2d, which is based on VLP sequences of phages produced after induction with mitomycin C. The black dashed line shows the linear regression line and the grey-shaded region shows the 95% confidence interval for the slope. The correlation was tested using a two-sided Spearman’s rank-order correlation (R represents Spearman’s ρ).
Extended Data Fig. 6 Colonization by crAssphages in different age groups.
The percentage of crAssphage-positive infants (as scored by requiring that the crAssphage genome was more than 33% covered by sequence reads from stool VLPs).
Extended Data Fig. 7 Profiling of animal-cell viruses by virome sequencing.
a, c, f, h, Percentage of infants positive for animal cell-associated viruses using different viral genome coverage cut-offs in the discovery cohort (a, f) and validation cohort (c, h). The green line shows the data from infants who were formula fed (a, c) or born by caesarean (C)-section delivery (f, h), and the yellow line shows the data from infants fed with breast milk or who were mixed fed (a, c) or were born by spontaneous vaginal delivery (f, h). b, d, g, i, Two-sided Fisher’s exact test on infant feeding types (b, d) and delivery types (g, i) using different viral genome coverage cut-offs in the discovery cohort (b, g) and validation cohort (d, i). The horizontal red line indicates P = 0.05. e, j, Comparison of the relative abundance of animal-cell viruses between different feeding types (e) and delivery types (j). The abundance (reads per million total reads after log transformation) is shown on the y axis. A two-sided Wilcoxon rank-sum test was used to test the difference. The horizontal lines in box plots represent the third quartile, median and first quartile; whiskers extend to ±1.5× the interquartile range. The dots represent the outliers. k, Genome coverage fraction of negative control samples for animal-cell viruses. The maximal fraction of animal viral genome coverage for each negative control sample (n = 25) is shown on the y axis. Different negative control samples are shown on the x axis. Note that coverage never exceeds 10%. a, b, f, g, n = 20 samples from the discovery cohort were used; c–e, h–j, n = 125 samples from the validation cohort were used.
Extended Data Fig. 8 Phage population structure.
a, Statistical tests of the association of clinical variables with phage population structure. Variables are shown in the first column. P values and FDR-corrected P values are shown in the second and third columns. All categorized variables, such as infant age, infant feeding type, infant delivery type, infant gender, mother body type, formula type, mother pregnancy induced hypertension or diabetes and mother chorioamnionitis were tested by PERMANOVA. Continuous variables, including gestational age, infant birth weight, household underage number, household number and mother pregnancy weight gain were tested by Envfit. All samples from both discovery US and validation US cohorts (n = 185) were used to test infant age effects, and pooled samples at month 3 and month 4 from both discovery US and validation US cohorts (n = 145) were used to test other variables. b, PCoA plot based on phage Pfam counts per sample, coloured by infant ages. This analysis is based on the Bray–Curtis dissimilarity index for all stool samples from both discovery US and validation US cohorts (n = 185). Negative control samples were not included for Bray–Curtis dissimilarity assessment and statistical tests. c–e, PCoA plots of phage Pfam components, coloured by infant feeding types (c), delivery type (d) and infant gender (e). This analysis is based on pooled samples at month 3 and month 4 from both discovery US and validation US cohorts (n = 145), and as in a, PERMANOVA was used to test the differences. FDR-corrected P values are shown.
Extended Data Fig. 9 16S qPCR before and after VLP purification.
Red and light-blue dots show before and after separately, and the horizontal lines represent the means (n = 20 infants at three time points were tested). A two-sided Wilcoxon signed-rank test was used to test the difference.
Extended Data Fig. 10 Percentage of DNA aligning to sequences of HERVs in each sample.
The percentage of HERV sequences in stool VLPs is shown on the y axis. Sample type and time point is shown on the x axis. The proportion of HERV sequences paralleled those of long interspersed nuclear elements and short interspersed nuclear elements, indicating that they are derived from human DNA contamination. Data are mean ± s.e.m.; n = 20 infants at three time points were tested.
This file includes Supplementary Tables 1–8.
Rights and permissions
About this article
Cite this article
Liang, G., Zhao, C., Zhang, H. et al. The stepwise assembly of the neonatal virome is modulated by breastfeeding. Nature 581, 470–474 (2020). https://doi.org/10.1038/s41586-020-2192-1
This article is cited by
Comparison of gut viral communities in children under 5 years old and newborns
Virology Journal (2023)
The fecal and oropharyngeal eukaryotic viromes of healthy infants during the first year of life are personal
Scientific Reports (2023)
Centenarians have a diverse gut virome with the potential to modulate metabolism and promote healthy lifespan
Nature Microbiology (2023)
Interpersonal variability of the human gut virome confounds disease signal detection in IBD
Communications Biology (2023)
Phage-inclusive profiling of human gut microbiomes with Phanta
Nature Biotechnology (2023)
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.