Maturation of the infant microbiome community structure and function across multiple body sites and in relation to mode of delivery

Published online:


Human microbial communities are characterized by their taxonomic, metagenomic and metabolic diversity, which varies by distinct body sites and influences human physiology. However, when and how microbial communities within each body niche acquire unique taxonomical and functional signatures in early life remains underexplored. We thus sought to determine the taxonomic composition and potential metabolic function of the neonatal and early infant microbiota across multiple body sites and assess the effect of the mode of delivery and its potential confounders or modifiers. A cohort of pregnant women in their early third trimester (n = 81) were prospectively enrolled for longitudinal sampling through 6 weeks after delivery, and a second matched cross-sectional cohort (n = 81) was additionally recruited for sampling once at the time of delivery. Samples across multiple body sites, including stool, oral gingiva, nares, skin and vagina were collected for each maternal–infant dyad. Whole-genome shotgun sequencing and sequencing analysis of the gene encoding the 16S rRNA were performed to interrogate the composition and function of the neonatal and maternal microbiota. We found that the neonatal microbiota and its associated functional pathways were relatively homogeneous across all body sites at delivery, with the notable exception of the neonatal meconium. However, by 6 weeks after delivery, the infant microbiota structure and function had substantially expanded and diversified, with the body site serving as the primary determinant of the composition of the bacterial community and its functional capacity. Although minor variations in the neonatal (immediately at birth) microbiota community structure were associated with the cesarean mode of delivery in some body sites (oral gingiva, nares and skin; R2 = 0.038), this was not true for neonatal stool (meconium; Mann–Whitney P > 0.05), and there was no observable difference in community function regardless of delivery mode. For infants at 6 weeks of age, the microbiota structure and function had expanded and diversified with demonstrable body site specificity (P < 0.001, R2 = 0.189) but without discernable differences in community structure or function between infants delivered vaginally or by cesarean surgery (P = 0.057, R2 = 0.007). We conclude that within the first 6 weeks of life, the infant microbiota undergoes substantial reorganization, which is primarily driven by body site and not by mode of delivery.


The human microbiome comprises a rich ecosystem of microbes that are essential to human health and physiology. In the adult, the microbiota inhabiting each body site is characterized by a distinct microbial community structure and function1,2,3,4. For instance, in the gastrointestinal tract, enteric microbes produce and modify numerous biologically active compounds that support host metabolism, including bile acids, vitamins and other macromolecules, whereas Lactobacillus spp. in the vagina support a low vaginal pH5,6. Indeed, the importance of microbiota to human health is underscored by the observation that dysbiotic shifts in these microbial communities have been associated with a number of human diseases, including obesity, inflammatory bowel disorders, autoimmune disease and gastrointestinal cancer7,8,9,10. However, to understand how our microbiota may contribute to disease progression later in life, the mechanisms by which host–microbial symbiosis is established and maintained in early life requires further exploration.

Evidence from germ-free (GF) mouse models has demonstrated that normal development is dependent on the presence of commensal microbiota, particularly in the gastrointestinal tract11. For example, GF mice show an altered immune phenotype, with deficits in both innate and adaptive immune components of the gut mucosa12,13. Reintroducing microorganisms postnatally (after birth) partially corrects many of these defects, although even a brief GF neonatal period can induce immunological changes that persist into adulthood12,14. Notably, different bacterial species have been shown to distinctly modulate the host immune system, indicating that the presence of specific bacteria within a given developmental window is important for normal patterning of host immunity15,16,17.

Work by the Human Microbiome Project Consortium and others has demonstrated that in adults, distinct microbial communities uniquely inhabit each body site1,4. Even sites in close proximity (for example, supragingival plaque versus tongue) have discernable differences in microbial composition1 that may be due to differences in microenvironment conditions, such as oxygen exposure and nutrient availability. A previous study in a small cohort of primarily preterm, very-low-birth-weight infants demonstrated that the microbiota of skin, saliva and stool rapidly diverged within the first 3 weeks of life18. However, more precise approximations of when body site differentiation is achieved have not been explored in larger cohorts of healthy term infants. Furthermore, how this assembly process is altered by exogenous factors, such as mode of delivery and breastfeeding practices, is not fully understood.

In recent years, the effect of mode of delivery (i.e., vaginal versus cesarean) on the infant microbiota has been subjected to scrutiny owing to the increased rate of cesarean deliveries worldwide and their potential association with allergic and autoimmune disease19,20,21,22. However, the clinical decision to deliver via the cesarean mode is often indicated by the underlying maternal or fetal medical diagnoses or co-morbidities, and it is accompanied by the varying use of medications, including antibiotics and anti-inflammatory pain analgesics23. Several previous studies identified the distinct microbiota that initially colonized the neonatal microbiome and were differentially associated with either cesarean or vaginal deliveries, but many of these were limited by small cohort sizes, the potential for significant confounders or a lack of longitudinal infant sampling24,25,26. A subsequent study by Azad et al. reported differences in the infant gut microbiota by the type of cesarean delivery performed (emergent versus non-labored or elective), suggesting that the differences associated with cesarean births may be due to the underlying medical indication, rather than the surgical procedure per se27. Indeed, the overwhelming majority of cesarean births in the United States are not initiated by maternal request but are rather performed for a medical indication (for example, arrest of active labor, fetal malpresentation, severe preeclampsia and 'hemolysis, elevated liver enzymes and low platelets' (HELLP) syndrome, fetal macrosomia and/or prior cesarean delivery28), and yet the effect of such potential confounders on the offspring's microbiota have not been established and remains underexplored23. In addition, although cesarean deliveries can be classified on the basis of whether or not the mother was in active labor before the cesarean surgery (potentially indicating descent of the fetus into the vaginal canal or exposure to vaginal microbes), the effect of these courses of events during delivery on the offspring's microbiota remains underexplored and often overlooked by many studies27. Finally, because diet is known to be a potent and persistent modifier of the microbiota in both adults and children29,30 and because human milk modifies the infant microbiota31,32,33, infant-feeding practices in association with mode of delivery have been accounted for in some studies, but they need to be more thoroughly evaluated27,32.

Given the complexities of the clinical context surrounding the decision to deliver an infant via cesarean surgery, we conducted a large, population-based prospective cohort study of maternal–infant pairs to firstly assess the taxonomic composition and potential metabolic function of the early neonatal microbiota across multiple body sites and up to 6 weeks of age, and secondly determine the effect of a cesarean mode of delivery and its potential confounders or modifiers on the neonatal and infant microbiota structure and function. Mothers and their infants were sampled at two time points in early life across multiple body sites, and sequencing of the gene encoding the 16S rRNA and whole-genome shotgun (WGS) sequencing were performed on collected samples to interrogate its bacterial composition and function.


We enrolled a population-based cohort of mother–infant dyads for longitudinal sampling (at the time of delivery and 4 to 6 weeks after delivery) across multiple body sites (skin, oral cavity, nares, stool, posterior fornix and vaginal introitus; Supplementary Fig. 1). An end point of approximately 4–6 weeks after delivery (hereafter referred to as 6 weeks) was chosen because at this age infants still consume a relatively homogenous diet of human milk and/or formula, have limited person-to-person contact and are not yet exposed to a wide variety of environmental sources of microbes (for example, they are not yet attending daycare and are not yet crawling). In all, a cohort of mother–neonate dyads (n = 81) was prospectively enrolled from our county hospital (Ben Taub General Hospital, Harris Health System, Houston, Texas) and sampled starting in the mother's early third trimester. The Dirichlet-multinomial distribution was used to power our study to detect a difference in the infant microbiota by the mode of delivery at the 6-week time point. Preliminary sequencing efforts made at the time the subjects were enrolled demonstrated lower read counts in neonatal samples taken at the time of delivery as opposed to those taken at 6 weeks after delivery (Supplementary Table 1). Therefore, to increase our power to detect a difference based on mode of delivery at birth, we enrolled a matched cross-sectional cohort of gravidae (n = 82) that was sampled once at the time of delivery (Supplementary Fig. 1a). Of the longitudinally sampled cohort, two subjects withdrew, and four delivered at another hospital site and thus were excluded. An additional fifteen subjects were lost to follow up and did not present at the second (6-week) visit. No losses were incurred in the cross-sectional cohort. In all, 157 mother–neonate dyads were sampled at delivery, of which 60 dyads were longitudinally sampled (Supplementary Fig. 1a). As shown in Supplementary Table 2, and consistent with the typical population that delivers at our hospital, the enrolled cohort consisted of primarily Hispanic women (90.0%) who delivered singleton pregnancies (96.2%) at term (88.5%). The rate of cesarean delivery was similar to the US national incidence (33.1% versus 32.7%, respectively, P = 0.91)20. The baseline maternal characteristics of the two cohorts were similar, except for a greater proportion of mothers with gestational diabetes (GDM) who were enrolled in the longitudinal cohort (44.0% versus 17.1%; P < 0.001; Supplementary Table 2). Furthermore, the baseline maternal characteristics were similar between mothers who delivered vaginally (n = 105) and those who delivered by cesarean surgery (n = 52) (all P > 0.05), except for a greater proportion of twin pregnancies in subjects undergoing cesarean delivery (Student's t-test, P = 0.008; Supplementary Table 3). Finally, the rate of GDM, the occurrence of class I–class III obesity, the mean pre-pregnancy body mass index (BMI) and the incidence of fetal macrosomia did not vary among women who underwent cesarean surgery as compared to those among women who delivered their infants vaginally (Supplementary Tables 2 and 3). Overall, the characteristics of the subjects of the enrolled cohort reflect the typical demographic of patients seen in most maternal–fetal medicine clinics. However, as evidenced by the nationally comparable rate of cesarean delivery, intrapartum management of these subjects' pregnancies was probably in line with the standard of care provided to the general population.

Neonates and mothers were sampled across five different body habitats (antecubital fossa, retro-auricular crease, keratinized gingiva, anterior nares and stool) representing four major body sites (skin, oral cavity, nares and gut). Samples from the maternal vagina (introitus and posterior fornix) were additionally obtained (Supplementary Fig. 1b). A second set of samples was obtained for all subjects who were enrolled in the longitudinal cohort at a follow-up visit (mean 43.6 d; s.d. of 17.6 d after delivery). After quality control and sequencing analysis of the gene encoding the 16S rRNA by 454 pyrosequencing, 1,429 high-quality samples were available for downstream analysis (mean 8,579; s.d. 5,681 filtered sequences per sample; Supplementary Table 1). A summary of filtered sequences per sample by age, body habitat and time point is provided in Supplementary Table 1.

Early-life microbiota community structure

We first sought to examine how the neonatal microbiota varied across different body sites at the time of delivery and then at 6 weeks of age, using the maternal microbiota as a reference of the adult microbiota. Unlike the maternal microbiota, whose community structure was driven primarily by the major body site groupings (Adonis P < 0.001, R2 = 0.192; Supplementary Fig. 2), the neonatal microbial community structure at the time of delivery did not demonstrate strong body site differentiation (Fig. 1a). Only the microbiota in the neonatal meconium seemed to cluster separately from those of the skin, nares and oral cavity (Adonis P < 0.001), with a small R2 value (0.038) suggestive of minimal body site variation. Pairwise Bray–Curtis dissimilarity comparisons between samples among subjects (beta diversity) further corroborated these observations. In maternal subjects, the beta diversity between maternal samples obtained from the same body site was significantly lower than that between different body sites (Fig. 1b, gray solid line versus gray dashed line). Conversely, among neonatal subjects, Bray–Curtis distances were as dissimilar among samples from the same body site as they were between different body sites (Fig. 1b, red solid line versus red dashed line).

Figure 1: Neonatal (at birth) microbial community structure.
Figure 1

(a) Principal coordinate analysis (PCoA) on unweighted UniFrac distances between the neonatal microbiota is shown along the first two principal coordinate (PC) axes. Box-and-whisker plots shown along each PC axis represent the median and interquartile range with whiskers determined by Tukey's method, indicating the distribution of samples along the given axis. Each point represents a single sample and is colored by body site: meconium (Mec.), orange (n = 117); skin, blue (n = 104); oral cavity, red (n = 82); nares, green (n = 73). Ellipses represent a 95% confidence interval (CI) around the cluster centroid. Clustering significance by body site was determined by the Adonis statistical method; P < 0.001. (b) Cumulative distribution of Bray–Curtis dissimilarity distances calculated pairwise between samples of the same body site (solid lines) and between different body sites (dashed lines). Distance comparisons for neonates (within-site comparisons, n = 15,487; between-site comparisons, n = 55,013) and maternal samples (within-site comparisons, n = 20,473; between-site comparisons, n = 128,362) are shown in red and gray, respectively. Smaller values indicate a greater similarity between samples. (c) The average relative abundance (circle size) of the most prevalent genera (y axis) in each body site (x axis) is plotted for neonates (oral gingiva, n = 82; skin, n = 104; nares, n = 73; meconium, n = 117) and mothers (oral gingiva, n = 141; skin, n = 113; nares, n = 96; stool, n = 60; vaginal introitus, n = 147; posterior fornix, n = 144) at the time of delivery. The indicator value index (related to the shading of the circle color) represents the strength of association between a taxon and a given body site, with larger values indicating greater specificity.

Although microorganisms belonging to many different taxa were present in neonates at the time of delivery, few were characteristic of a given body site. To measure the specificity of a taxon to a given site, we determined its indicator value (IndVal) index, which considers the relative abundance of a taxon in a given site and its relative frequency of occurrence across all sites34. A maximum IndVal represents taxa that are found in only a single site and are found within all individuals. Thus, a taxon with a large IndVal can be thought of as a signature taxon for a given body site. Linear discriminate analysis effect size (LEfSe) analysis was additionally performed to further corroborate representative taxa (Supplementary Fig. 3). For example, in the maternal subjects Lactobacillus was both highly abundant and highly specific for the vagina (average abundance 64.7%; IndVal = 0.922; Fig. 1c and Supplementary Table 4), whereas Bacteroides was prevalent and highly specific for the maternal stool (average abundance 27.8%; IndVal = 0.943; Fig. 1c and Supplementary Table 4), which is consistent with previous observations by ourselves and others1,6,35. In contrast, few signature taxa were detected in the neonatal microbiota at the time of delivery. Consistent with previous studies6,24, predominant members of the vaginal and skin microbiota of the adult—namely Lactobacillus, Propionibacterium, Streptococcus and Staphylococcus—were also the most abundant genera in the neonates across all body sites (Fig. 1c and Supplementary Table 4). However, none of these taxa were specific to any of the body sites (all IndVal < 0.5). Of interest, many samples from the neonatal meconium harbored Escherichia and Klebsiella (average abundance 14.3% and 6.4%, respectively), which was not seen in any other body site (all < 0.03%). Notably, these taxa are known facultative anaerobes that are typical of the early gastrointestinal tract30,36 and have previously been detected in the placenta and amniotic fluid by both sequencing of the gene encoding the 16S rRNA and WGS sequencing37,38,39.

By 6 weeks of age, however, the community structure of the infant microbiota seemed to be primarily driven by body site differences, similarly to that of the maternal microbiota. Microbiota from the stool, oral gingiva and skin clustered distinctly with that of the nares, which bridges the oral gingiva and skin body sites (Adonis P < 0.001, R2 = 0.189; Fig. 2a). Similarly, patterns of beta diversity (Fig. 2b) and signature taxa (Fig. 2c) were similar to those of the maternal communities (Fig. 2b,c). For instance, similarly to the microbiota in the equivalent maternal body sites, the infants' oral gingiva was characterized by Streptococcus (60.7%, IndVal = 0.73), whereas the infant's skin and nares were characterized by Staphylococcus and Corynebacterium (Supplementary Table 5). More appreciable differences could be detected between the microbiota of the infant and maternal stool. As has been previously published, the maternal stool was dominated by either Bacteroides or Prevotella35, although only Bacteroides was considered a signature taxon for the maternal gut (Bacteroides: IndVal = 0.943; Supplementary Table 4). In contrast, Escherichia and Klebsiella were both prominently abundant (10% on average) and highly specific for the infant gut (IndVal of 0.95 and 0.75, respectively), which conforms to previously published observations of the typical microbial constituents of infant stool at this age30,40 (Fig. 2c and Supplementary Table 5). Although the infant and maternal microbiota, as a whole, shared a similar community structure and taxonomic membership (classified at the genus level) (Fig. 2), the microbial communities of the infant within most body habitats remained distinct at the operational taxonomic unit (OTU) level. Microbiota from the infant nares, oral cavity and gut clustered distinctly from their maternal counterparts, whereas no difference could be detected in the microbiota between the infant and maternal skin (Supplementary Fig. 4). Furthermore, measurements of taxonomic diversity within a sample revealed that at 6 weeks of age, with the exception of the skin microbiota, the infant tended to harbor simpler communities with fewer unique taxa than those in the mother (Supplementary Fig. 5). Thus, although body site differences seem to drive the reorganization of the infant microbiota across body habitats within the first 6 weeks of life, these site-specific communities are generally less ecologically rich and harbor unique communities than those from the mother.

Figure 2: The microbiota of infants at 6 weeks of age demonstrates body site specificity.
Figure 2

(a) PCoA on unweighted UniFrac distances between the infant microbiota is shown along the first two PC axes. Box-and-whisker plots shown along each PC axis represent the median and interquartile range with whiskers determined by Tukey's method, indicating the distribution of samples along the given axis. Each point represents a single sample and is colored according to body site: stool, orange (n = 53); skin, blue (n = 52); oral cavity, red (n = 58); nares, green (n = 51). Ellipses represent a 95% CI around the cluster centroid. Clustering significance by body site was determined by Adonis; P < 0.001. (b) Cumulative distribution of Bray–Curtis dissimilarity distances calculated pairwise between samples of the same body site (solid lines) and between different body sites (dashed lines). Distance comparisons for infant (within-site comparisons, n = 5,037; between-site comparisons, n = 17,159) and maternal samples (within-site comparisons, n = 3,521; between-site comparisons, n = 33,695) are shown in red and gray, respectively. Smaller values indicate a greater similarity between samples. (c) The average relative abundance (circle size) of the most prevalent genera (y axis) in each body site (x axis) plotted for samples from 6-week-old infants (oral gingiva, n = 58; skin, n = 52; nares, n = 51; meconium, n = 53) and from mothers at 6 weeks after delivery (oral gingiva, n = 59; skin, n = 51; nares, n = 21; stool, n = 43; vaginal introitus, n = 55; posterior fornix, n = 55). The indicator value index (related to the shading of circle color) represents the strength of association between a taxon and a given body site, with larger values indicating greater specificity.

Effect of the mode of delivery on microbiota community structure

Although patterns of microbial composition in neonates were influenced by mode of delivery at birth, these differences were absent in the infants at 6 weeks of age. We used principal component analysis (PCoA) and observed significant clustering in the neonatal microbiota by the mode of delivery in only the oral cavity, nares and skin (Adonis R2 = 0.038) but not the meconium (Mann–Whitney P > 0.05; Fig. 3a). Similarly, for neonates born by cesarean delivery, alpha diversity was reduced at the time of delivery within the oral cavity and skin microbiota (P < 0.001), but not within the nares or meconium (P > 0.05; Supplementary Fig. 6). These differences seemed to be driven by an increased association of Propionibacterium and Streptococcus with cesarean-born neonates, whereas Lactobacillus was associated with vaginally delivered neonates (Supplementary Fig. 7). Consistent with this observation, pairwise beta diversity comparisons between mother–neonate dyads revealed that the microbiota of neonates born vaginally tended to be more similar to that of the maternal vagina (Supplementary Fig. 8), which was predominantly comprised of Lactobacillus at the time of delivery (Fig. 1c). However, for infants at 6 weeks of age, the microbiota did not significantly cluster by mode of delivery at any body site (Fig. 3b), and the alpha diversity did not differ significantly (Supplementary Fig. 6). Hierarchical clustering of taxa at the genus level further demonstrated primary clustering by body site but without appreciable clustering by mode of delivery (Supplementary Fig. 9).

Figure 3: Failure to demonstrate a significant effect of mode of delivery on the infant microbiota across body sites and over time.
Figure 3

(a,b) PCoA on unweighted UniFrac distances between the neonatal microbiota at the time of delivery (oral gingiva, n = 53 for vaginal delivery, n = 28 for cesarean delivery; skin, n = 79 for vaginal delivery, n = 23 for cesarean delivery; nares, n = 47 for vaginal delivery, n = 25 for cesarean delivery; meconium, n = 82 for vaginal delivery, n = 33 for cesarean delivery) (a) and at 6 weeks after delivery (oral gingiva, n = 45 for vaginal delivery, n = 13 for cesarean delivery; skin, n = 40 for vaginal delivery, n = 12 for cesarean delivery; nares, n = 40 for vaginal delivery, n = 11 for cesarean delivery; stool, n = 40 for vaginal delivery, n = 12 for cesarean delivery) (b). Data are stratified by body site, with mode of delivery indicated by color (cesarean, red; vaginal, gray). Ellipses represent the 95% CI around the cluster centroid. Significance of clustering, which was seen among neonatal (at the time of delivery; a) nasal, oral and skin communities, but not meconium microbiota (Mann–Whitney test of PC1 values: P > 0.05), was determined by an Adonis test by mode of delivery, stratified by body site (R2 = 0.038). Conversely, among infants (6 weeks after delivery; b), significant clustering was not observed (Adonis P = 0.057; R2 = 0.007). (c,d) Three-axis ternary plots indicating the proportion of OTUs within a neonatal sample (each point) that is predicted to originate from a maternal body site (indicated by the triangle vertices). Each point represents a neonatal sample, and its position indicates the predicted relative contribution from the maternal vagina (posterior fornix or introitus), maternal skin (retroauricular crease or antecubital fossa) or another maternal site (supragingival plaque, anterior nares, stool or unknown source). Points closer to the vertices indicate that a greater proportion of the sample's OTUs are predicted to originate from the microbiota of the indicated maternal body site. Data for samples obtained at the time of delivery (oral gingiva, n = 53 for vaginal delivery, n = 28 for cesarean delivery; skin, n = 79 for vaginal delivery, n = 23 for cesarean delivery; nares, n = 47 for vaginal delivery, n = 25 for cesarean delivery; meconium, n = 82 for vaginal delivery, n = 33 for cesarean delivery) (c) and at 6 weeks after delivery (oral gingiva, n = 45 for vaginal delivery, n = 13 for cesarean delivery; skin, n = 40 for vaginal delivery, n = 12 for cesarean delivery; nares, n = 40 for vaginal delivery, n = 11 for cesarean delivery; stool, n = 40 for vaginal delivery, n = 12 for cesarean delivery) (d), are stratified by body site and by mode of delivery (vaginal, labored cesarean or unlabored cesarean). A two-dimensional point density topography map (blue shading) for each plot is given to indicate the point density.

The cesarean mode of delivery has been demonstrated to affect patterns of maternal transmission of microbiota at the time of delivery, although to what degree of significance and extent this is modified by other maternal or infant clinical co-variates or comorbities remains underexplored. Having extensively collected samples from multiple maternal body sites in parallel with the equivalent sites in neonates and infants, we thus sought to perform SourceTracker analysis41 to predict the most probable maternal source of origin for the neonatal microbiota. Samples from different maternal body sites were considered to be potential sources of OTUs for each neonatal sample. From this analysis, we obtained an estimated proportion of OTUs of a given neonatal sample that were predicted to originate from the maternal skin, vagina, stool, nares or oral cavity or from an unknown source (Supplementary Figs. 10 and 11). Because neonates who are delivered by cesarean surgery are thought to be populated by maternal skin microbiota rather than by vaginal microbiota24,40, data were plotted on a ternary plot to highlight these two maternal body sites (Fig. 3c,d). Moreover, because differences in the neonatal microbiota have previously been shown on the basis of cesarean indication (emergent versus non-emergent)27, comparisons were further stratified by whether or not the mother had undergone a period of labor before the cesarean procedure (hereafter referred to as a labored or unlabored cesarean delivery). For neonates who were delivered vaginally, most body sites demonstrated a bimodal pattern of maternal origin—although many sites were predominantly populated by microbiota originating from the mother's vagina, a substantial number were predominantly populated by microbiota from the maternal skin (Fig. 3c). Notably, meconium samples additionally harbored OTUs that were predicted to originate from the maternal stool (Supplementary Fig. 10). Thus, except for the infant gut, the maternal vagina and skin seem to contribute equally to a majority of the early taxa in the vaginally delivered neonate. This bimodal pattern among vaginally born neonates was similarly seen in neonates who were delivered by a labored cesarean procedure but not by an unlabored cesarean procedure (Fig. 3c). In the latter case, most of the samples from neonates who were delivered by an unlabored cesarean procedure were primarily populated by microbiota found in the maternal skin. Thus, whether or not labor had occurred before delivery, rather than the cesarean procedure itself, seemed to have the greatest effect on the maternal origin of the infant microbiota.

Samples from infants at 6 weeks of age, however, were predominantly populated by microbiota from the corresponding maternal body site, irrespective of the mode of delivery (Fig. 3d and Supplementary Fig. 11). Although microbiota from the maternal skin and vagina represented a substantial portion of taxa of the neonatal microbiota at the time of delivery, most seemed to be replaced in favor of taxa that were more typical of the given body niche. For example, although either Propionibacterium or Lactobacillus largely comprised the neonatal oral microbiota at the time of delivery (Supplementary Fig. 7), by 6 weeks of age, Streptococcus was the most prominent genus in nearly all of the infant oral cavity samples (Supplementary Fig. 9).

Previous studies24,40,42,43,44 have indicated that mode of delivery has the greatest influence on the presence and abundance of several notable taxa of the early infant gut microbiome, namely Bacteroides, Lactobacillus and Bifidobacterium. We next sought to examine how these three genera were affected by mode of delivery and other clinical variables in infants at 6 weeks of age. The abundance of these three taxa were classified as being undetectable (0%), sparsely present (>0.1%), appreciably present (>1%) or moderately abundant (>10%). Comparisons of the gut microbiota at the time of delivery revealed that in all cesarean-delivered infants and in most of the vaginally delivered infants, Lactobacillus, Bifidobacterium and Bacteroides were appreciably (i.e., not undetectable at greater than 0%) present at the time of delivery (Fig. 4a–d). By 6 weeks of age in the infants, Lactobacillus and Bifidobacterium continued to be appreciably present and similar by virtue of the mode of delivery in the neonatal stool (Fig. 4a). Bacteroides were undetectable in just three infant stool samples, as determined by shotgun sequencing, and all of these belonged to infants who were born not by vaginal delivery (Supplementary Fig. 12). To further interrogate whether mode of delivery had a significant effect on the relative abundance of these three genera while controlling for other factors, we used a generalized linear model. On the basis of previously published literature, breastfeeding practices26,32,42, gestational age36, maternal and infant obesity and macrosomia45,46, maternal diet (percentage fat intake)47 and maternal GDM48 have all been reported to affect the very-early infant microbiome; thus, these were used in model creation. Of the imputed factors, breastfeeding practice was the only substantial contributor to the abundance of Bacteroides in the infant stool at 4–6 weeks of age, with exposure to formula associated with increased levels of Bacteroides (Supplementary Fig. 13 and Supplementary Table 6). We have recently published47 that increased maternal fat intake during gestation is significantly associated with decreased abundance of Bacteroides, but only after a potential outlier was excluded from analysis. No factors were significantly associated with altered Bifidobacterium or Lactobacillus abundance (Supplementary Fig. 13 and Supplementary Table 6).

Figure 4: Taxonomic profiles of infant and maternal microbiomes from stool and oral gingiva, according to mode of delivery and time.
Figure 4

(ad) A phylogenetic representation of the taxonomic composition in samples from the infant stool (neonate: vaginal delivery, n = 82; cesarean delivery, n = 33; infant: vaginal delivery, n = 40; cesarean delivery, n = 12) (a), maternal stool (at the time of delivery: vaginal delivery, n = 42; cesarean delivery, n = 18; 6 weeks after delivery: vaginal delivery, n = 34; cesarean delivery, n = 9) (b), infant oral cavity (neonate: vaginal delivery, n = 53; cesarean delivery, n = 28; infant: vaginal delivery, n = 45; cesarean delivery, n = 13) (c) and maternal oral cavity (at the time of delivery: vaginal delivery, n = 95; cesarean delivery, n = 47; 6 weeks after delivery: vaginal delivery, n = 47; cesarean delivery, n = 12) (d) at the time of delivery (inner two rings) and at 6 weeks after delivery (outer two rings). Each time point is further divided by mode of delivery (indicated by the label at the bottom of the rings). The average relative abundance of each genus is plotted within the concentric rings and represented by the shaded cells, with higher relative abundance indicated by a darker shade. The phylum to which each taxon belongs to is indicated by the phylogenetic tree. Three notable taxa—namely Bacteroides, Bifidobacterium and Lactobacillus—are indicated by the red outline.

Expansion and diversification of microbial community function

We next sought to examine how the microbial metabolic and functional pathways of the early neonatal metagenome changed from the time of delivery to 6 weeks of age across different body sites. Overall, the metagenomic profiles clustered by age and by body site (Fig. 5a,b), indicating overall similarities in site-specific microbial activities. However, although the maternal gut metagenome seemed much more consistent and evenly diverse than the representation of relative abundance of all identified taxa (Fig. 5a and Supplementary Table 5), a significant amount of heterogeneity was seen in the metagenomes of the neonatal oral cavity and gut at the time of delivery (Supplementary Fig. 14). This was particularly apparent in the neonatal oral cavity. For example, genes related to alanine, aspartate and glutamate metabolism in neonatal oral cavity samples were undetectable in half of the sequenced samples, but were present at greater than 2.3% in the other half (median abundance across all samples = 2.4%; Supplementary Table 7). However, at 6 weeks, metagenomes from both the infant oral cavity and stool were significantly more similar between body sites of different individuals (Supplementary Fig. 14), indicating convergence toward a shared set of microbial metabolic pathways and activities unique to each body site. Comparisons of the infant stool and oral cavity metagenomes revealed that each was enriched for pathways that potentially reflected a selective advantage of specific microbiota to its local environment (Fig. 5d). For example, lipopolysaccharide biosynthesis is a key feature of Gram-negative enteric bacteria and was fittingly increased in the infant stool metagenome as compared to that of the oral cavity. Additionally, taurine and hypotaurine metabolism, which is critical for bile acid metabolism and conjugation49, was also increased in the infant stool metagenome as compared to that of the infant oral cavity. In contrast, amino acid biosynthesis and metabolism was seen to be a predominant feature of the infant oral cavity metagenome.

Figure 5: Expansion and diversification of microbial community structure and function in infants by 6 weeks of age.
Figure 5

(a) Heat map showing distinct microbial gene (according to KEGG pathway analysis) profiles of the infant stool and oral cavity metagenome at the time of delivery (meconium, n = 9; oral cavity, n = 11) and at 6 weeks of age (stool, n = 34; oral cavity, n = 11). The microbial gene profile for maternal stool (n = 24) is shown as a comparison. The relative abundance of a pathway in a given sample is colored by its row z-score ((value − row mean)/row s.d.). The vertical color bar represents the higher-order KEGG module to which each pathway belongs. (b) PCoA of Bray–Curtis distances based on the relative abundances of pathways demonstrates clustering by body site and time point. P < 0.001 by permutational analysis of variance (PERMANOVA). (c) Enumeration of the unique number of genes and species within the neonatal meconium (at the time of delivery, n = 9), infant stool (at 6 weeks of age, n = 34) and maternal stool (n = 24). ***P < 0.001 by ANOVA and post hoc Tukey's tests. (d) Results from LEfSe analysis, which was conducted to identify pathways that differentiated the infant stool metagenome at the time of delivery and that at 6 weeks of age (top left), the infant and maternal stool metagenomes at 6 weeks after delivery (top right), and the oral and stool metagenomes in infants at 6 weeks of age (bottom) (neonatal meconium, n = 9; infant stool, n = 34; maternal stool, n = 24; neonatal oral cavity, = 11, infant oral cavity, n = 11). Only significant pathways with a linear discriminate analysis (LDA) score >3.0 are shown.

From the time of delivery to 6 weeks of age, the number of unique species and microbial genes found within the infant gut metagenome was also significantly increased, indicating an overall expansion and diversification within this early time period (Fig. 5c). The neonatal meconium was enriched for several microbial pathways, including the pentose phosphate pathway and the phophotransferase system (PTS), which participate in sequestering and using glucose for anabolism of amino acids and cell wall components. In comparison, the infant gut metagenome at 6 weeks was significantly enriched for pathways related to co-factor metabolism and biosynthesis, including those for folate, biotin and vitamin B6, which are known to be important for growth, metabolism and neurodevelopment50. However, the maternal gut metagenome showed more gene counts and unique species than the infant gut metagenome (Fig. 5c). Furthermore, several pathways discriminated the infant and maternal gut metagenomes (Fig. 5d), including an enrichment of genes involved in tricarboxylic acid (TCA) cycle and amino acid metabolism in the maternal gut. Thus, although the infant gut metagenome was significantly expanded and diversified from birth, both its taxonomic composition and metabolic capability remained distinct from those in the maternal gut.

To assess how the metabolic and functional pathways of the infant stool metagenome varied with cesarean or vaginal mode of delivery, breastfeeding practices and other clinical metadata, we fitted a generalized linear mixed model to infant stool samples to identify significantly differing pathways. This approached allowed us to quantify the contribution of each variable to the pathway abundances, while controlling for possible confounding between clinical metadata. Using previously published observations, we considered mode of delivery, intrapartum antibiotics51, breastfeeding practices at 6 weeks of age32, gestational weight gain38, maternal GDM or pre-pregnancy BMI46, and gestational age36 at the time of delivery as possible fixed effects in our model. From all of the pathways considered, no single pathway correlated with any clinical variable considered in our model, indicating that the relative frequency of most functions throughout the infant stool microbiota was consistent across individuals and robust to exogenous perturbations, including cesarean delivery (Fig. 6a). Similarly, few species in the infant stool identified by WGS sequencing were significantly altered in our model (Supplementary Fig. 15). Intrapartum antibiotics seemed to have the greatest effect on pathway variation, as ten distinct pathways were either positively or negatively correlated with intrapartum antibiotic usage (Fig. 6b). In contrast, gestational age and pre-pregnancy BMI had little effect on pathway variation within the infant stool. Of particular interest, given co-linearity with antibiotic usage and breastfeeding practices, cesarean mode of delivery per se did not bear a differential effect on the infant metagenome nor its function when subjected to robust linear mixed modeling controlling for covariates as fixed effects.

Figure 6: Infant microbial community function with clinical metadata in a generalized linear model.
Figure 6

(a) Heat map of the relative abundances of KEGG pathways found in infant stool samples (n = 36) as determined by WGS sequencing. The vertical bar on the left indicates mode of delivery, breastfeeding practices and intrapartum antibiotic (Abx) usage. Dendrograms represent hierarchical clustering on Euclidean distances using average linkage. (b) A generalized linear-mixed model was fitted for each pathway to identify pathways whose abundances differed significantly between individuals by mode of delivery, antibiotic usage, breastfeeding, gestational weight gain, BMI and gestational age. The strength of the linear model predictions for each pathway is represented by bar height. Significant correlations (P < 0.05 by Student's t-test) are indicated by the darker color (dark red or dark gray). Labels correspond to the following comparisons: vaginal delivery—pathways enriched in infants born vaginally (n = 26) (red, >0) or by cesarean surgery (n = 8) (gray, <0); intrapartum antibiotics—pathways enriched in infants exposed to intrapartum antibiotic usage (n = 18) (red, >0) relative to those in infants not exposed to antibiotic treatment (n = 16) (gray, <0); mixed formula and human milk feeding—pathways increased in partially breast-fed (human milk and formula) infants (n = 26) (red, >0) relative to those in infants fed exclusively with human milk (n = 8) (gray, <0); exclusive formula feeding—pathways higher in infants exclusively fed formula (n=1) (red, >0) than in those fed human milk only (n = 8) (gray, <0); excess gestational weight gain—pathways higher in maternal cases of excess gestational weight gain (n = 15) (red, >0) as compared to those in maternal cases of normal weight gain (n = 19) (gray, <0); pre-pregnancy BMI and gestational age—pathways positively (>0, red) or negatively (<0, gray) correlated with pre-pregnancy BMI or gestational age (n = 36).


In this large cohort of term infants that is reflective of characteristics common to the contemporary US birth population, we found significant reorganization of the neonatal microbiota across multiple body sites within the first 6 weeks of life, which was primarily shaped by the major body site groupings as seen previously in the adult1. Although the neonatal microbiota at the time of delivery was sparsely populated and predominantly comprised of taxa of the maternal skin and vaginal microbiota, by 6 weeks of age the major patterns of community variation were most strongly associated with body site grouping, with concomitant coalescence of functional and metabolic microbial pathways that were reflective of probable selection within, or adaptations to, a given body niche. However, although the infant skin microbiota most resembled the maternal skin microbiota, the infant stool, nares and oral cavity harbored distinct microbial communities than those from their maternal counterparts. These differences probably reflect age-related physiological differences between the maternal and infant body habitats, including nutrient availability and oxygen exposure.

Overall, our data reach similar conclusions to those derived from observations of low-birth-weight infants18, which found that changes to microbiota composition within the first weeks of life are primarily shaped by body site. We similarly found that after several weeks, each body niche was enriched for taxa that were characteristic of their adult counterparts (such as Streptococcus in the oral cavity), likely indicating a common maturation process. However, our study further extended these observations through the generation of rich WGS data, which provided the benefit of interrogating both taxonomic abundances and community function within this early time period. Our results suggest that, similar to microbiota abundance, site-specific functional pathways emerge very early on in life, before extensive contact to environmental sources that may influence community membership and consequent function.

There are several distinctions between the characteristics of our study cohort and those of other studies that have reported a variation in the neonatal or infant microbiome by mode of delivery24,40,42,43,52. First, our cohort was ethnically homogenous (mostly Hispanic) and was comprised of primarily term infants; thus, specific observations of the predominant taxonomic and pathway abundances of the infant metagenome may not be generalizable to preterm neonates or other ethnic populations. Previous observations have indicated that in adult and infant populations43, there are large differences in the gut microbiota between individuals of different ethnic populations residing in different countries and societal structures. Thus, the specific taxa or functions that become enriched early on may vary depending on the environmental context. Nevertheless, considering that the adult microbiota is driven to a larger extent by body site habitat and not ethnicity, we anticipate that gross reorganization of the infant microbiota by body site within this early time period is generalizable across populations, although additional studies are needed to confirm this.

Second, and reflective of our clinical population from which these women were recruited for our longitudinal cohort, we observed a high prevalence of GDM. However, the rate of GDM did not vary among women who later delivered their infants via cesarean or vaginal birth (Supplementary Table 2), and GDM (as well as other potential or likely confounders) was included in our generalized linear model analysis. Because other studies have not reported maternal gestational or type II diabetes as a characteristic of their cohorts, we cannot directly compare or speculate as to whether their findings might be a result of confounding or co-linearity24,40,42,43,44,52. This would necessarily include those characteristics and comorbidities that render risk or indication for cesarean delivery, such as poorly controlled diabetes with resultant fetal macrosomia. For example, it is worthwhile noting that the landmark study of Dominquez-Bello et al.24 does not report at-risk maternal characteristics such as diabetes or BMI, but it does describe infant birth weight ranges of up to 5.2 kg. Thus, in their study, which comprised 11 neonates from Venezuela (six born via cesarean surgery), at least one neonate was macrosomic (“Babies weighed between 2 and 5.2 kg (the smallest baby was the twin in second order of birth, after his 3-kg brother)”)24. A 5.2-kg infant would be 1.0 kg heavier than more than 98% of the birth population by World Health Organization standards. Although there are multiple underlying causes of fetal macrosomia, common causes include poorly controlled maternal diabetes. Additional causes include genetic and epigenetic overgrowth disorders, chronic caloric excess and maternal obesity. These were excluded or controlled for in our study analysis, and our mean infant birth weight was 3.23 kg (±0.68 kg), with no significant difference in birth weight observed when comparing infants born vaginally to those delivered by cesarean (P = 0.85; Supplementary Tables 2 and 3). Notably, in our entire cohort of 162 infants, only one exceeded 5 kg (birth weight = 5.09 kg; Supplementary Table 8).

Our observations of the early neonatal microbiota are consistent with the growing body of evidence that is challenging the idea of a sterile in utero environment. Previous studies have detected bacteria associated with the placenta and amniotic fluid of preterm and healthy, term pregnancies37,39,53,54,55, and work in murine models has demonstrated maternal transmission of bacteria to the fetal gut during gestation, consistent with the idea that microbial colonization of the mouse fetus occurs before delivery56. More recently, the neonate's first intestinal discharge (meconium) has been shown to harbor a microbial community similar to that of the amniotic fluid and placenta39,57,58. Of note, the meconium microbiota was shown to vary with maternal glycemic control but not mode of delivery, indicating that establishment of the neonatal microbiota can potentially be altered by gestational exposures48. Consistent with these findings, we observed that the meconium microbiota was markedly different from that at other neonatal body sites at the time of delivery, indicating a potentially different maternal origin. Additionally, unlike the skin, oral cavity or nares microbiota, the neonatal gut microbiota at the time of delivery did not significantly vary by mode of delivery. The content of the first meconium is hypothesized to reflect the in utero environment (in which the infant is swallowing amniotic fluid continuously from mid to late gestation), and thus we speculate that these microbes were similarly transmitted from the mother to the fetus during gestation, suggesting that seeding of the early microbiota may occur earlier than was previously thought. However, additional studies are needed to evaluate potential mechanisms of transmission and its potential impact on fetal programming and the establishment of the infant gut microbiome long-term.

In agreement with previous observations24,25, mode of delivery was associated with differences in the neonatal microbiota immediately after delivery only within the nares, skin and oral cavity. However, this was not true of the infant gut microbiota, which seemed to have a distinct maternal origin than the microbiota from the rest of the body sites. Several studies have indicated that the early composition of the gut may be influenced by maternal exposures or health status47,48, which may account for the lack of separation by mode of delivery in our study. Our previously published work in a nonhuman primate model demonstrated that maternal diet during gestation and lactation has a persistent effect on the gut microbiota of the offspring at least up to 1 year of age. In conjunction with these findings, we have similarly seen in human cohorts that the composition of the early infant gut microbiome was associated with maternal diet in the last trimester of pregnancy, independently of mode of delivery and maternal obesity47. In addition to maternal gestational diet, maternal pre-pregnancy obesity and maternal glycemic control have also been attributed to differences in the early infant gut microbiome. Notably, these maternal health states also increase the risk that the pregnancy will be delivered by cesarean surgery28. However, after accounting for these potential confounders in our study, we did not find significant differences in infant gut microbiome that could be reliably or significantly attributed to a cesarean or vaginal mode of delivery.

We additionally sampled multiple body sites from both the neonate and mother, which allowed for an in-depth interrogation of the potential maternal origin of neonatal microbiota. The consideration of cesarean deliveries by indication (labored versus unlabored) revealed that the differences in the neonatal microbiota at the time of birth associated with mode of delivery was most pronounced when comparing unlabored cesarean-delivered neonates to vaginally delivered neonates. In our cohort, these unlabored cesarean deliveries were generally among women with a history of prior cesarean surgery who elected for a repeat cesarean procedure, rather than a trial of labor with attempted vaginal birth after a prior cesarean procedure. This is in agreement with the observations of Azad et al.27, which identified differences in the infant gut microbiome at 4 months of age when cesarean deliveries were classified as either being emergent (typically labored) or elective (typically unlabored). This is not unexpected, as cesarean delivery is not a randomly allocated procedure but rather is indicated for an underlying maternal or fetal medical condition23. Thus, prior studies with small cohorts that do not account for the underlying indication for the cesarean surgery may be prone to both type I and type II error, resulting in misclassification of the attribution of cesarean delivery per se as a driving force in any dysbiosis in the infant microbiota24,25.

A number of perinatal and postnatal factors have been associated with relatively low abundance of Bacteroides in the neonatal and infant stool. These include cesarean mode of delivery40,43,52, exclusive breastfeeding26,42,59, maternal high-fat diet47, maternal obesity46, transition to solid foods30 and other as-yet-unknown factors52. Bakhed et al.40 have shown that Bacteroides spp. can be acquired postnatally, with a notable increase at 4 months of age among both cesarean and vaginally delivered infants. Therefore, any supposition relating relative abundance and less abundance of important microbiota to cesarean delivery per se must be threaded with caution. This may include introduction of antibiotics postnatally51,52,60, feeding of formula42,59 and feeding of solid foods30. In our current study, we focused on early exposures because they are most proximal to the intrauterine and intrapartum period of exposure and are least likely to be subjected to exogenous postnatal influences. We have now demonstrated with WGS data that both vaginally delivered and cesarean-delivered infants can have low Bacteroides counts. In our cohort, and distinct from that of Bäcked40, we found no Bacteroides at the time of birth and at 6 weeks of age in vaginally delivered infants (n = 3), whereas we detected Bacteroides in every cesarean-delivered infant (Supplementary Fig. 12). These findings are consistent with a recent publication by Yassour et al.60 that similarly showed low Bacteroides counts in 20% of vaginally delivered infants. Even after we extended our definition of significantly variant taxa to include those identified by Dominquez-Bello as being perturbed in cesarean-delivered infants and only restored with vaginal wiping44, we failed to see significantly altered taxonomic profiles. Specifically, sequential definitions of low levels of Bifidobacterium or high Lactobacillus were equally probable in vaginally delivered and cesarean-delivered infants (Supplementary Fig. 12). Finally, when we examined 4- to 6-week-old infants with either absent or relatively low levels of Bacteroides and who were assigned relative risk that was estimated for a number of antenatal factors, low levels of Bacteroides were most likely associated with delivery by vaginal birth (Supplementary Fig. 12).

In summary, we undertook the largest study to date using WGS analysis to analyze both the composition and function of the neonatal and infant microbiota with paired maternal–infant subjects across multiple body sites. We observed that by 6 weeks of age, the microbial community structure and function had significantly expanded and diversified. We further demonstrated that there was no discernable effect of the cesarean mode of delivery on the early microbiota beyond the immediate neonatal period (and never inclusive of that in the meconium or stool) and that early taxonomic distinctions were not recapitulated by functional pathway analysis. These findings underscore the likely importance of interrogating not only the relative abundance of members of the microbiota but their potentially overlapping metabolic pathways when considering the impact of an intervention on diversity of the microbiota.


Online Methods

Experimental design.

The intent of this prospective cohort study was to characterize the neonatal microbiota in early life and subsequently compare the neonatal microbiota as a consequence of different modes of delivery. An overview of the study design is shown in Supplementary Figure 1a. An initial cohort of gravidae and their infants (n = 81) were prospectively enrolled in the mothers' third trimester from the Ben Taub County Hospital in Houston, Texas. By assuming a cesarean delivery rate equivalent to that of the US national average (32%)20, this cohort size achieved a power of at least 0.80 to detect a difference in the taxonomic composition by the mode of delivery at the 6-week time point, based on an estimated read count of at least 5,000 and a small effect size (Φ = 0.07 at an α = 0.05). Power calculations were based on the Dirichlet-multinomial distribution as described in La Rosa et al.61. A second cross-sectional cohort (n = 82) was additionally enrolled to increase the power to detect differences by the mode of delivery at the time of delivery. This second cohort increased enrollment of cesarean deliveries, thus capturing a larger number of types of cesarean indications (labored versus unlabored; n = 26 for each; Supplementary Table 9). All subjects were enrolled under Baylor College of Medicine Institutional Review Board (IRB) protocol H-27393. Subjects included in this study had a viable pregnancy >28 weeks gestation, were 18 years of age or older and were willing to consent to all aspects of the protocol. Subjects were excluded if there was known HIV or hepatitis C infection, known immunosuppressive disease, known use of cytokines or immunosuppressive agents within the last 6 months, a history of cancer (except for squamous or basal cell carcinoma of the skin that could be managed by local excision), treatment of suspicion of ever having had toxic shock syndrome or major surgery of the gastro-intestinal tract (except for cholecystectomy or appendectomy) in the past 5 years. During the consent process, subjects were informed of the potential risks of participation, including the minimal physical risks associated with specimen collection and the possibility that protected health information or de-identified project data could be accidentally released. Several precautions, described in the protocol and consent form, were taken to reduce these risks.

Clinical metadata.

Clinical metadata for all subjects are listed in Supplementary Tables 6 and 7. The cohort that each subject was recruited to is listed under “Study” as either longitudinal or cross-sectional. Gestational age at delivery is provided as weeks and days. Standard definitions of 'neonate' (28 d or less) and 'infant' (29 d or greater) were used, although neonates were sampled within 1 h of birth. Neonates were considered preterm if they were delivered at <37 weeks gestational age. Twin births (multiples) are indicated. The indication for cesarean delivery is provided as either being primary or repeat (first cesarean or repeat cesarean). Whether the mother was in labor before cesarean delivery was also recorded (labored versus unlabored). Antepartum or intrapartum (around the time of delivery) antibiotic usage was noted. Breastfeeding practices were assessed by careful chart review and classified as being fed exclusively by formula, exclusively fed by human milk, or fed by both formula and human milk (partial human milk) within the first 6 weeks of life. Birth weight was provided in grams and percentile, given as a function of gestational age and gender. Maternal diet during pregnancy was determined for the 1-month period before enrollment using the National Health and Examination Survey Dietary Screener Questionnaire, as described previously47.

Sample collection and processing.

Specimens were collected in a sterile and uniform manner by trained personnel according to a standardized protocol as previously described62. A summary of specimens collected at each time point from each maternal–infant pair is shown in Supplementary Figure 1b. All specimens were stored at 4 °C and processed in the lab within 24 h. In a decontaminated, sterile environment, genomic DNA was isolated from each specimen using the MOBIO PowerSoil DNA Isolation Kit (MOBIO), using the manufacturer's standard protocol. Extracted DNA was prepared for sequencing according to the Human Microbiome Project consortium–outlined protocol1 for sequencing the gene encoding the 16S rRNA.

Sequencing analysis for the gene encoding the 16S rRNA.

To characterize the microbial composition of each specimen, the hypervariable regions (V) of the gene encoding the 16S rRNA were sequenced from the extracted DNA as described in the Human Microbiome Project1,62. The V5–V3 region of the gene encoding the 16S rRNA was amplified by PCR using bar-coded universal primers 926R (R, reverse primer) and 357F (F, forward primer). Amplicons were quantified by PicoGreen (Thermofisher) and pooled in equimolar amounts. Multiplex sequencing was performed on a 454-FLX Titanium Sequencer (Roche) at the Human Genome Sequencing Center (HGSC) at the Baylor College of Medicine (BCM). In total, all samples were sequenced across 32 sequencing runs (Supplementary Table 10). To reduce bias related to sequencing in different runs, samples from both mother and infant for all body sites were sequenced as they were obtained, and researchers were blinded to subject clinical metadata. Given the nature of enrollment, samples from the cross-sectional and longitudinal cohorts tended to be sequenced in different pools; however, as shown in Supplementary Figure 16, the data did not appear to strongly vary by cohort enrollment.

Raw sequence data for the 16S-rRNA-encoding gene was processed using the QIIME platform63 (v. 1.9.0), using the default parameters on the suggested pipeline ( with notable changes as described below. Sequences were preprocessed to remove sequences less than 200 nt, greater than 700 nt or with a minimum average quality score less than 25. The remaining sequences were trimmed of reverse primer sequences and trimmed at the first ambiguous base call. Sequences below 200 nt (after trimming) were subsequently removed. Retained high-quality sequences were subsequently assigned to the sample of origin by using the uniquely identifying barcode sequence and thereafter trimmed of the barcode and 5′ primer. To remove human contamination, high-quality reads were mapped to the human genome sequence (GRCh38) using DeconSeq (v. 0.4.3), and highly similar sequences to the human genome reference were subsequently removed from the data set64. The remaining sequence data and associated flowgram data were aggregated by body site and denoised to reduce sequencing errors, using the QIIME denoising pipeline ( All denoised sequences were aggregated and de novo operational taxonomic units (OTUs) were identified. Sequences were clustered into distinct OTUs at 97% similarity using the UCLUST65 method. Chimeric sequences were identified and removed using ChimeraSlayer on the QIIME workflow script (v. 1.9.0, The RDP classifier67 (v. 2.2), retrained against the May 8, 2013 version of the GreenGenes68 taxonomic database, was used to assign taxonomy for each OTU at a confidence greater than 50% at the lowest level of assignment. Read counts for each OTU were tabulated for downstream analysis. In all, 2,581 samples were collected from the maternal–infant pairs and used for sequencing the 16S-rRNA-encoding gene. After the filtering of singletons and removal of samples with less than 100 sequences, 1,429 samples were retained for downstream analysis (mean 8,578.5; s.d. 5,680.9 filtered sequences per sample; Supplementary Table 1). Relative abundances of the identified taxa found within each sample are provided in Supplementary Table 11.

The specificity of certain taxa to a given body site was assessed by calculating the indicator value index as previously described34. Each body grouping (skin, nares, oral cavity, stool and vagina) within the infant or mother was considered a different site. For each genus in each site, we computed the mean abundance of each taxon relative to that in all of the sites, multiplied by the frequency of appearance of that taxon (abundance > 0.0%) across all samples within the site. Bubble plots of relative taxonomic abundances were generated using the R package ggplot. Linear discriminant analysis effect size (LEfSe v. 1.0) was performed with a LDA cut-off of 2.0, with a significance value (P) of 0.05 (ref. 69). Diversity within samples (alpha diversity) and between samples (beta diversity) were evaluated on the OTU table (16S) or Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway table (WGS) as indicated. For alpha diversity, samples were first rarefied at 2,000 sequences before diversity indexes were calculated. Significance was determined by Mann–Whitney U tests. To evaluate OTU discovery as a function of sequencing depth, reads per sample were plotted relative to the number of unique OTUs using PRISM (GraphPad Software, La Jolla, CA), and regression lines were fitted. Significance between linear regression slopes was determined by Student's t-test. For beta diversity measurements, both phylogenetic (UniFrac) and nonphylogenetic (Bray–Curtis, Jaccard) distance matrices were determined. Cumulative distribution plots of beta diversity distances were generated using PRISM. Principal component analysis (PCoA) was performed on distance matrices, as indicated with significance of clustering determined by PERMANOVA or Adonis with 999 permutations. PCoA plots and confidence ellipses were generated by the R package ggplot. Heat maps were generated using the R package pheatmap. Hierarchical clustering was performed using complete linkage on Jaccard similarity coefficients or Bray–Curtis dissimilarity distances using the R package vegan as indicated. Phylogenetic trees were created with PhyloPhlAn (v. 0.99). Association of the metadata categories with the hierarchical clustering in the data set was determined by a Chi-squared test. A generalized linear model with a normal distribution, according to a log link function, was fit by maximum likelihood using with JMP (v. 12.2.0). All statistical analysis was performed on the QIIME platform (v. 1.9) or R (v. 3.2.2) as indicated.

SourceTracker analysis to predict the maternal origin of OTUs in neonatal samples.

SourceTracker analysis was performed on the QIIME platform to predict the likely origin of microbiota in the neonatal microbiota using the maternal microbiota as potential sources41. OTUs present in less than 1% of samples were first filtered, and the resultant OTU table was imputed using default parameters, with the neonatal samples being identified as the 'sink' and the maternal samples being identified as the 'source'. Results were aggregated into three categories: skin (retroauricular crease and antecubital fossa), vagina (posterior fornix and vaginal introitus) and other (stool, nares and supragingival plaque). Data were visualized as ternary plots and were generated by the R package ggtern.

Whole-genome shotgun (WGS) sequencing and analysis.

Due to limiting sample volumes, WGS sequencing and analysis was constrained to a subset of all samples. We focused our sequencing efforts on the neonatal meconium (n = 9), infant stool (n = 34), maternal stool (n = 24), neonatal oral samples obtained at delivery (n = 11) and infant oral samples obtained at 6 weeks (n = 11). A list of samples is provided in Supplementary Table 10. Isolated genomic DNA was sheared, and Illumina adapters were ligated to the resultant fragments. The DNA was pooled and paired-end sequenced on the Illumina HiSeq 2500 platform at the HGSC at BCM. DNA was then processed and analyzed through in-house pipelines as described previously37. In brief, human contamination was identified and removed with Best Match Tagger according to the HMP human sequence removal standard operating procedure. The resultant filtered high-quality sequences were deposited in the NCBI Sequence Read Archive (SRA) database with BioProject ID SRP078001. Identification and quantification of taxa in each sample were determined using Metagenomic Phylogenic Analysis (MetaPhlAn) v. 1.7.7 (ref. 70), whereas the relative abundance of microbial pathways was determined by the HMP Unified Metabolic Analysis Network (HUMAnN) v. 0.99 (ref. 71). The total counts of mapped reads by each algorithm are provided in Supplementary Table 12. Gene counts and unique species counts were determined as described in the MetaHit project72. After quality-filtering and removal of human contamination, reads were aligned against the MetaHIT gene catalog by using SOAPalign version 2.21 with parameters '-m 200 -x 600'. As long as the read could be aligned to a gene, the aligned gene was taken as one gene count. To assess the number of unique species, the collection of genes for each sample were then checked against the origin of species using the MetaHIT gene catalog taxonomic reference. Species information w tabulated for any gene that was represented in the data set.

Statistical modeling of infant gut metagenome pathway and species relative abundance.

Heat maps for WGS-based KEGG pathways and species were generated using the Python seaborn package. Hierarchical clustering was performed using average linkage on Euclidean distance. To identify factors with a significant impact on either species relative abundance or gene pathway abundance while controlling for other factors, we fitted a generalized linear model for each pathway or taxon with a normal distribution and a linear link function by residual maximum likelihood using the lme4 package in R. The KEGG pathway or taxon abundance was treated as the dependent variable as a function of mode of delivery, feeding method (any formula versus exclusively breastfeeding), gestational weight gain category (excess versus normal), maternal age, pre-pregnancy BMI, GDM, antibiotic usage and gestational age as fixed effects. The categorical variables (mode of delivery, feeding and GWG) were represented using dummy variables. The base case for linear regression was set as vaginal delivery, exclusively breastfeeding, normal gestational weight gain and no antibiotic usage.

Data availability.

All specimens and associated sequence data were assigned a de-identified code and stored in controlled-access repositories. Sequence data for the16S-rRNA-encoding gene was deposited to the NCBI Sequence Read Archive (SRA) database with BioProject ID PRJNA322188. WGS sequence data were deposited in the NCBI Sequence Read Archive (SRA) database with BioProject ID SRP078001.


Primary accessions


Sequence Read Archive


  1. 1.

    Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).

  2. 2.

    et al. Enterotypes of the human gut microbiome. Nature 473, 174–180 (2011).

  3. 3.

    et al. A metagenomic approach to characterization of the vaginal microbiome signature in pregnancy. PLoS One 7, e36466 (2012).

  4. 4.

    et al. Bacterial community variation in human body habitats across space and time. Science 326, 1694–1697 (2009).

  5. 5.

    et al. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444, 1027–1031 (2006).

  6. 6.

    et al. Vaginal microbiome of reproductive-age women. Proc. Natl. Acad. Sci. USA 108(Suppl. 1), 4680–4687 (2011).

  7. 7.

    et al. Gut microbiota from twins discordant for obesity modulate metabolism in mice. Science 341, 1241214 (2013).

  8. 8.

    et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).

  9. 9.

    et al. Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment. Genome Biol. 13, R79 (2012).

  10. 10.

    et al. High-fat-diet-mediated dysbiosis promotes intestinal carcinogenesis independently of obesity. Nature 514, 508–512 (2014).

  11. 11.

    & The gnotobiotic animal as a tool in the study of host microbial relationships. Bacteriol. Rev. 35, 390–429 (1971).

  12. 12.

    et al. Microbial exposure during early life has persistent effects on natural killer T cell function. Science 336, 489–493 (2012).

  13. 13.

    et al. Microbial colonization influences early B lineage development in the gut lamina propria. Nature 501, 112–115 (2013).

  14. 14.

    et al. The maternal microbiota drives early postnatal innate immune development. Science 351, 1296–1302 (2016).

  15. 15.

    et al. Induction of intestinal TH17 cells by segmented filamentous bacteria. Cell 139, 485–498 (2009).

  16. 16.

    et al. The microbial metabolites, short-chain fatty acids, regulate colonic Treg cell homeostasis. Science 341, 569–573 (2013).

  17. 17.

    et al. Induction of colonic regulatory T cells by indigenous Clostridium species. Science 331, 337–341 (2011).

  18. 18.

    , , , & Microbiome assembly across multiple body sites in low-birthweight infants. mBio 4, e00782–13 (2013).

  19. 19.

    , , & Factors influencing rising cesarean section rates in China between 1988 and 2008. Bull. World Health Organ. 90, 30–39 (2012).

  20. 20.

    & Trends in low-risk cesarean delivery in the United States, 1990–2013. Natl. Vital Stat. Rep. 63, 1–16 (2014).

  21. 21.

    , , & The impact of birth mode of delivery on childhood asthma and allergic diseases—a sibling study. Clin. Exp. Allergy 42, 1369–1376 (2012).

  22. 22.

    , , , & Planned repeat cesarean section at term and adverse childhood health outcomes: a record-linkage study. PLoS Med. 13, e1001973 (2016).

  23. 23.

    American College of Obstetricians and Gynecologists (College) & Society for Maternal–Fetal Medicine. Safe prevention of the primary cesarean delivery. Am. J. Obstet. Gynecol. 210, 179–193 (2014).

  24. 24.

    et al. Delivery mode shapes the acquisition and structure of the initial microbiota across multiple body habitats in newborns. Proc. Natl. Acad. Sci. USA 107, 11971–11975 (2010).

  25. 25.

    et al. Mode of delivery affects the bacterial community in the newborn gut. Early Hum. Dev. 86(Suppl. 1), 13–15 (2010).

  26. 26.

    et al. Intestinal microbiota of 6-week-old infants across Europe: geographic influence beyond delivery mode, breast-feeding and antibiotics. J. Pediatr. Gastroenterol. Nutr. 51, 77–84 (2010).

  27. 27.

    et al. Gut microbiota of healthy Canadian infants: profiles by mode of delivery and infant diet at 4 months. CMAJ 185, 385–394 (2013).

  28. 28.

    et al. Indications contributing to the increasing cesarean delivery rate. Obstet. Gynecol. 118, 29–38 (2011).

  29. 29.

    et al. Diet rapidly and reproducibly alters the human gut microbiome. Nature 505, 559–563 (2014).

  30. 30.

    et al. Succession of microbial consortia in the developing infant gut microbiome. Proc. Natl. Acad. Sci. USA 108(Suppl. 1), 4578–4585 (2011).

  31. 31.

    , , , & Vertical mother–neonate transfer of maternal gut bacteria via breastfeeding. Environ. Microbiol. 16, 2891–2904 (2014).

  32. 32.

    et al. Impact of maternal intrapartum antibiotics, method of birth and breastfeeding on gut microbiota during the first year of life: a prospective cohort study. BJOG Int. J. Obstet. Gynaecol. 123, 983–993 (2015).

  33. 33.

    et al. Breast-fed and bottle-fed infant rhesus macaques develop distinct gut microbiotas and immune systems. Sci. Transl. Med. 6, 252ra120 (2014).

  34. 34.

    & Species assemblages and indicator species: the need for a flexible asymmetrical approach. Ecol. Monogr. 67, 345–366 (1997).

  35. 35.

    et al. Host remodeling of the gut microbiome and metabolic changes during pregnancy. Cell 150, 470–480 (2012).

  36. 36.

    et al. Patterned progression of bacterial populations in the premature infant gut. Proc. Natl. Acad. Sci. USA 111, 12522–12527 (2014).

  37. 37.

    et al. The placenta harbors a unique microbiome. Sci. Transl. Med. 6, 237ra65 (2014).

  38. 38.

    et al. The pre-term placental microbiome varies in association with excess maternal gestational weight gain. Am. J. Obstet. Gynecol. 212, 653.e1–653.e16 (2015).

  39. 39.

    , , , & Human gut colonization may be initiated in utero by distinct microbial communities in the placenta and amniotic fluid. Sci. Rep. 6, 23129 (2016).

  40. 40.

    et al. Dynamics and stabilization of the human gut microbiome during the first year of life. Cell Host Microbe 17, 852 (2015).

  41. 41.

    et al. Bayesian community-wide culture-independent microbial source tracking. Nat. Methods 8, 761–763 (2011).

  42. 42.

    et al. Factors influencing the composition of the intestinal microbiota in early infancy. Pediatrics 118, 511–521 (2006).

  43. 43.

    et al. Variation in microbiome LPS immunogenicity contributes to autoimmunity in humans. Cell 165, 842–853 (2016).

  44. 44.

    et al. Partial restoration of the microbiota of cesarean-born infants via vaginal microbial transfer. Nat. Med. 22, 250–253 (2016).

  45. 45.

    , , , & Maternal obesity is associated with alterations in the gut microbiome in toddlers. PLoS One 9, e113026 (2014).

  46. 46.

    et al. Birth-mode-dependent association between pre-pregnancy maternal weight status and the neonatal intestinal microbiome. Sci. Rep. 6, 23133 (2016).

  47. 47.

    et al. The early infant gut microbiome varies in association with a maternal high-fat diet. Genome Med. 8, 77 (2016).

  48. 48.

    et al. Diversified microbiota of meconium is affected by maternal diabetes status. PLoS One 8, e78257 (2013).

  49. 49.

    , , & Bile acids and the gut microbiome. Curr. Opin. Gastroenterol. 30, 332–338 (2014).

  50. 50.

    B vitamins and the brain: mechanisms, dose and efficacy—a review. Nutrients 8, 68 (2016).

  51. 51.

    et al. Developmental dynamics of the preterm infant gut microbiota and antibiotic resistome. Nat. Microbiol. 1, 16024 (2016).

  52. 52.

    et al. Antibiotics, birth mode and diet shape microbiome maturation during early life. Sci. Transl. Med. 8, 343ra82 (2016).

  53. 53.

    et al. Microbial prevalence, diversity and abundance in amniotic fluid during preterm labor: a molecular- and culture-based investigation. PLoS One 3, e3056 (2008).

  54. 54.

    et al. Bacteria and inflammatory cells in fetal membranes do not always cause preterm labor. Pediatr. Res. 57, 404–411 (2005).

  55. 55.

    et al. Isolation of commensal bacteria from umbilical cord blood of healthy neonates born by cesarean section. Curr. Microbiol. 51, 270–274 (2005).

  56. 56.

    et al. Is meconium from healthy newborns actually sterile? Res. Microbiol. 159, 187–193 (2008).

  57. 57.

    et al. Meconium microbiome analysis identifies bacteria correlated with premature birth. PLoS One 9, e90784 (2014).

  58. 58.

    et al. First-pass meconium samples from healthy-term vaginally delivered neonates: an analysis of the microbiota. PLoS One 10, e0133320 (2015).

  59. 59.

    et al. Determinants of the human infant intestinal microbiota after the introduction of first complementary foods in infant samples from five European centers. Microbiology 157, 1385–1392 (2011).

  60. 60.

    et al. Natural history of the infant gut microbiome and impact of antibiotic treatment on bacterial strain diversity and stability. Sci. Transl. Med. 8, 343ra81 (2016).

  61. 61.

    et al. Hypothesis testing and power calculations for taxonomic-based human microbiome data. PLoS One 7, e52078 (2012).

  62. 62.

    et al. The Human Microbiome Project strategy for comprehensive sampling of the human microbiome and why it matters. FASEB J. 27, 1012–1022 (2013).

  63. 63.

    et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335–336 (2010).

  64. 64.

    & Fast identification and removal of sequence contamination from genomic and metagenomic data sets. PLoS One 6, e17288 (2011).

  65. 65.

    Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).

  66. 66.

    et al. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res. 21, 494–504 (2011).

  67. 67.

    et al. The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 37, D141–D145 (2009).

  68. 68.

    et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072 (2006).

  69. 69.

    et al. Metagenomic biomarker discovery and explanation. Genome Biol. 12, R60 (2011).

  70. 70.

    et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).

  71. 71.

    et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput. Biol. 8, e1002358 (2012).

  72. 72.

    et al. Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546 (2013).

Download references


The authors gratefully acknowledge the support of the NIH Director's New Innovator Award (DP2 DP21DP2OD001500; K.M. Aagaard), the NIH–NINR (NR014792-01; K.M. Aagaard), the NIH National Children's Study Formative Research (N01-HD-80020; K.M. Aagaard), the Burroughs Welcome Fund Preterm Birth Initiative (K.M. Aagaard), the March of Dimes Preterm Birth Research Initiative (K.M. Aagaard), the Baylor College of Medicine Medical Scientist Training Program (NIH NIGMS T32 GM007330; D.C. and K.M. Aagaard), the National Institute of General Medical Sciences (T32GM088129; D.M.C.), Baylor Research Advocates for Student Scientists (D.M.C.) and the Human Microbiome Project funded through the NIH Director's Common Fund at the National Institutes of Health (as part of NIH RoadMap 1.5; K.M. Aagaard). All sequencing and adaptation of protocols for WGS sequencing were performed by the Baylor College of Medicine Human Genome Sequencing Center (BCM–HGSC), which is funded by direct support from the National Human Genome Research Institute (NHGRI) at NIH (U54HG004973 (BCM); R. Gibbs, Principal Investigator). The authors also thank the staff members who were directly involved in clinical recruitment and specimen processing (M. Moller, B. Boggan, R. Benjamin, J. Chen, C. Cook and D. Racusin). The authors are grateful to M. Belfort, J. Versalovic, T. Savidge, R.A. Luna, D. Racusin, M. Suter and K. Meyer for critical review of the manuscript.

Author information


  1. Department of Obstetrics and Gynecology, Division of Maternal–Fetal Medicine, Baylor College of Medicine, Houston, Texas, USA.

    • Derrick M Chu
    • , Jun Ma
    • , Amanda L Prince
    • , Kathleen M Antony
    • , Maxim D Seferovic
    •  & Kjersti M Aagaard
  2. Interdepartmental Program in Translational Biology and Molecular Medicine, Baylor College of Medicine, Houston, Texas, USA.

    • Derrick M Chu
    •  & Kjersti M Aagaard
  3. Medical Scientist Training Program, Baylor College of Medicine, Houston, Texas, USA.

    • Derrick M Chu
    •  & Kjersti M Aagaard
  4. Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.

    • Kjersti M Aagaard
  5. Department of Molecular and Cell Biology, Baylor College of Medicine, Houston, Texas, USA.

    • Kjersti M Aagaard


  1. Search for Derrick M Chu in:

  2. Search for Jun Ma in:

  3. Search for Amanda L Prince in:

  4. Search for Kathleen M Antony in:

  5. Search for Maxim D Seferovic in:

  6. Search for Kjersti M Aagaard in:


D.M.C. and K.M. Aagaard designed and conceived the study; K.M. Aagaard and K.M. Antony assembled the cohort and developed the infrastructure to obtain swabs, samples and clinical metadata from all samples; K.M. Aagaard and K.M. Antony recruited and sampled all subjects; A.L.P., D.M.C. and M.D.S. prepared samples for sequencing of the gene encoding16S rRNA and for WGS sequencing; D.M.C., J.M. and K.M. Aagaard performed and supervised all analysis and statistical modeling; and D.M.C. and K.M. Aagaard wrote the manuscript, with contributions from J.M., A.L.P., K.M. Antony and M.D.S.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Kjersti M Aagaard.

Supplementary information

PDF files

  1. 1.

    Supplementary Figures

    Supplementary Figures 1–16

Excel files

  1. 1.

    Supplementary Tables

    Supplementary Tables 1–12