HIV-exposure, early life feeding practices and delivery mode impacts on faecal bacterial profiles in a South African birth cohort

There are limited data on meconium and faecal bacterial profiles from African infants and their mothers. We characterized faecal bacterial communities of infants and mothers participating in a South African birth cohort. Stool and meconium specimens were collected from 90 mothers and 107 infants at birth, and from a subset of 72 and 36 infants at 4–12 and 20–28 weeks of age, respectively. HIV-unexposed infants were primarily exclusively breastfed at 4–12 (49%, 26/53) and 20–28 weeks (62%, 16/26). In contrast, HIV-exposed infants were primarily exclusively formula fed at 4–12 (53%; 10/19) and 20–28 weeks (70%, 7/10). Analysis (of the bacterial 16S rRNA gene sequences of the V4 hypervariable region) of the 90 mother-infant pairs showed that meconium bacterial profiles [dominated by Proteobacteria (89%)] were distinct from those of maternal faeces [dominated by Firmicutes (66%) and Actinobacteria (15%)]. Actinobacteria predominated at 4–12 (65%) and 20–28 (50%) weeks. HIV-exposed infants had significantly higher faecal bacterial diversities at both 4–12 (p = 0.026) and 20–28 weeks (p = 0.002). HIV-exposed infants had lower proportions of Bifidobacterium (p = 0.010) at 4–12 weeks. Maternal faecal bacterial profiles were influenced by HIV status, feeding practices and mode of delivery. Further longitudinal studies are required to better understand how these variables influence infant and maternal faecal bacterial composition.

16S ribosomal ribonucleic acid (rRNA) amplicon library preparation. We performed two polymerase chain reactions (PCRs) using primers targeting the V4 hypervariable region of the 16S rRNA gene, with minor modifications 39 . In the first PCR reaction, we used modified-F515 (5′ -GTGCCAGCHGCYGCGGT -3′) and R806 (5′ -GGACTACNNGGGTWTCTAAT -3′ to amplify the V4 region of the bacterial 16S rRNA gene. The reaction consisted of 12.5 μl of 2X MyTaq TM HS Mix (Bioline, MA, USA), 2 μl forward and 2 μl reverse primer (each at a10 μM initial concentration), 0.75 μl of dimethyl sulfoxide (catalogue no D2650, Sigma-Aldrich ® , MO, USA) and 4 μl template, made up to a final volume of 25.25 μl using PCR-grade water (Thermo Fisher Scientific Inc., MA, USA). Cycling conditions included a denaturation step at 95 °C for 3 min, an amplification step at 95 °C for 30 sec, 50 °C for 30 sec and 72 °C for 1 sec (proceeding for 10 cycles); and a final extension step at 72 °C for 5 min. In the second PCR, we used 4 μl of the amplicon from the first PCR to add adapters, barcodes, 12-15 staggered nucleotides (NNNNNNNNNNNN) and priming regions. During this second PCR, we used composite primers F515-composite (5′ -AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGC TCTTCCGATCTNNNNNNNNNNGTGCCAGCHGCYGCGGT -3′) and R806-composite (5′ -CAAGCAGA AGACGGCATACGAGATACGAGACTGATTGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNN NNNNNNNNGGACTACNNGGGTWTCTAAT -3′) . The two-step amplification approach was performed to reduce the risk of non-specific binding when using adapters/sequencing primers of more than 100bp 40 . Staggered random nucleotides were incorporated to improve cluster identity and imaging in more diverse sample types 40 . Golay barcodes (12 bases underlined in the reverse primer R806-composite) 39 served to multiplex samples. The only change made to the cycling conditions in the second PCR was the addition of 20 cycles during the amplification step. Amplicons were cleaned using the Agencourt ® AMPure ® XP PCR Purification kit (Beckman Coulter, CA, USA). Slight modifications to the manufacturer's protocol included the use of a 0.65:1 ratio of Agencourt AMPure XP solution to PCR products in step 2. PCR products were verified and annotated by agarose gel electrophoresis. We quantified amplicons using the Quanti-iT TM PicoGreen ® dsDNA Reagent (Life Technologies, CA, USA) on the Infinite M1000 Pro ® microplate reader (Tecan Group Ltd., Grödig, Austria) equipped with Tecan i-Control TM 1.7 software. Following pooling of amplicons at 100 ng, we quantified and purified the library using the Nanodrop ND 1000 (Thermo Fisher Scientific Inc., MA, USA) equipped with ND-1000 3.7.1 software and Agencourt AMPure XP solution (at a 1:1 ratio), respectively. The pooled 16S library was excised following agarose gel electrophoresis, and purified using the QIAquick Gel Extraction kit (QIAgen, MA, USA). Minor modifications to the manufacturer's protocol included incubation of the sample at 37 °C for 5 min at step 10 and heating of the elution buffer, Tris-EDTA buffer (pH 8.0), between 60 and 70 °C at step 13.
SCIEnTIFIC REPORtS | (2018) 8:5078 | DOI: 10.1038/s41598-018-22244-6 16S ribosomal RNA gene sequencing. The 16S library from faecal specimens was sequenced using the MiSeq Reagent Kit v3, 600 cycles (Illumina, CA, USA). Sequencing controls included two Human Microbiome Project mock community controls (HM-782D and HM-783D) (BEI Resources, ATCC, VA, USA) and two no-template water controls. In addition, to serve as inter-run reproducibility measures, we randomly selected nucleic acid extracts from 17 faecal specimens within our cohort to be amplified and sequenced in duplicate. The KAPA qPCR quantification kit (KAPA Biosystems, MA, USA), as well as the Agilent DNA 1000 kit (Agilent Technologies, CA, USA) were used to quantify and size the library. Using these metrics, we diluted the library to 4 nM using Buffer EB (Qiagen, Hilden, Germany), and denaturatured and neutralized the library using 0.2 N NaOH and hybridization buffer (HT1). We prepared a final library dilution at 4 pM to which the internal control (15% PhiX) was added at 4 pM. The denatured library was loaded to the Illumina ® MiSeq TM platform as per manufacturer's instructions 41 . Bioinformatics pipeline. We assessed sequencing quality of FASTQ files using Fastqc 42 and SolexaQA 43 .
We then merged forward and reverse sequences using USEARCH7 fastq_mergepairs (fastq_maxdiffs set to 3) 44 , followed by quality filtering using USEARCH7 fastq_filter (sequences truncated to 250 and fastq_maxee set to 0.1). Sequences from no-template water controls were aligned to biological sample sequences using USEARCH7 usearch-global in order to remove potential contaminants. We matched each unique sequence from the two no-template controls to reads present in our biological samples. The average number of reads calculated from the two no-template controls were removed from biological samples in the event that reads matched at 100% similarity. USEARCH sortbysize allowed for dereplication and selection of sequences occurring more than twice. We clustered sequences into operational taxonomic units (OTUs) using USEARCH7 cluster_otus (with a clustering radius of 3). The ChimeraSlayer reference database 45 and the USEARCH7 uchime_ref tool were used to remove chimeras. OTU counts were obtained using USEARCH7 usearch-global. Further processing of data was performed using the Quantitative Insights Into Microbial Ecology (QIIME 1.7.0) suite of software tools 46 . We assigned taxonomy to representative reads, selecting SILVA 47 as the reference database and a 97% sequence similarity, using the RDP classifier method implemented through the assign_taxonomy.py script in QIIME 48 . Sequences were aligned (align_seqs.py) at 97% similarity using the PYNAST algorithm and filtered using the filter_alignment.py script 48  for all statistical analyses as well as graphical representations of the data. We determined the reproducibility of our experiment using nucleic acid extracts from 17 faecal specimens randomly selected for processing in duplicate. Proportions of each OTU from each of the 17 faecal specimens were compared to the proportions from their technical repeats, using simple linear regression analysis to calculate the coefficient of determination (R 2 ) 50 . We colour coded the plotted proportion of variations based on template nucleic acid concentrations, as well as sequencing depth.
We determined alpha and beta diversity using the Shannon diversity (H′) 51,52 and the Bray Curtis dissimilarity index (calculated using the [vegdist] function from the R package vegan 53 ) 52,54-56 , respectively. We performed agglomerative clustering by applying Complete Linkage (furthest neighbour) clustering using the [hclust] function in the R package stats 49 together with a matrix based on the Bray-Curtis dissimilarity index. Clustering was performed on all OTUs with a relative abundance >0.5%. Due to the variation in the total number of reads sequenced across different specimens within a single run, we transformed count data 57 to compositional data by calculating the relative abundance of each OTU per specimen 58,59 . As we were dealing with compositional data, we constructed log-ratio biplots 60 using only OTUs where proportions differed significantly between participant groups (mothers, infants at birth, infants at 4-12 weeks and infants at 20-28 weeks) (at the 5% significance level). The data was adjusted in a Bayesian context to remove zeros [61][62][63] . Lambda scaling 64 was employed to construct log-ratio biplots ensuring evenness in the spread of the modes (OTUs and specimens).
We used Fisher's exact test for two-way tables or Pearson's chi-square test to determine whether significant associations were present between covariates. We used generalized linear models (GLMs) to test the effect of covariates on the composition and diversity of maternal and infant faecal microbiota profiles, respectively. GLMs were used to test the effect of covariates at each of the time-points under study. We performed hypothesis testing at a 5% significance level, controlling for the false discovery rate as described by Benjamini & Hochberg 65 . We implemented the negative binomial model in RStudio 66 through the family function quasipoisson; specified the offset as equal to "root OTU counts" 67 ; and used paramaters estimated by the iterative weighted least squares method in the function [glm] in the package stats 49 . Final models were based on OTUs with a relative abundance >0.5%. These models were designed to test covariates with missing variables separately. Covariates included mode of delivery (vaginal delivery versus Caesarean-section delivery); gestational age; birth weight (low birth weight: <33 weeks gestation), birth length; gender; mode of feeding (exclusive breastfeeding, exclusive formula feeding or mixed feeding -a combination of breastfeeding and/or formula feeding and/or solid food); residential area [TC Newman (primarily a mixed ancestry population) vs. Mbekweni (primarily a black African population)]; maternal education; maternal HIV status; maternal smoking status; maternal cotinine levels; maternal body mass index (BMI) at 6-10 weeks postpartum; and the number of household members. Each statistically significant result obtained from our GLMs was graphically investigated to eliminate spurious results. In the event that significant variables showed uneven data distributions, we used "refitted GLMs" to validate our findings. Results from GLMs on uneven data could only be reported once confirmed by refitted GLMs. Refitted GLMs were based on subsets of comparable data points. To confirm the effect of maternal HIV status on faecal bacterial profiles, we used refitted GLMs to eliminate possible confounding effects of residential area and feeding practices. Therefore, refitted GLMs were based on data from participants residing in Mbekweni who did not practice exclusive formula feeding. The effect of mode of feeding on faecal bacterial profiles was confirmed by eliminating potential confounding effects of HIV status and residential area in the refitted GLMs. Here we only included data from HIV-infected mothers or HIV-exposed infants from Mbekweni. Refitted GLMs confirming the effect of residential area on faecal bacterial profiles included data from only HIV-uninfected mothers or HIV-unexposed infants which exclusively breastfed. Violin plots were used to summarise significant findings from our GLMs on all variables under study, including variables validated using refitted GLMs.
Differences in faecal bacterial profiles between mother-infant pairs, and amongst infants over time (using a subset of 36 infants with complete longitudinal data sets), were determined by generalized linear mixed models (GLMMs) 68 using the glmmPQL function in the R package MASS 69 . In these models the same covariates (as above) were tested using GLMMs. Furthermore, these models also investigated possible interaction effects with each of the covariates and time (interaction-with-time models). A significant interaction effect would mean that the effect of the covariate differed across time-points. Significant findings from GLMs and GLMMs were also represented graphically; results were only deemed significant if the p-value and visual inspection of the related plots showed noticeable effects. Table 1 describes the characteristics of our study participants. We investigated the meconium bacterial profiles from 107 infants, of which the mothers of 90 infants provided a maternal faecal specimen at the time of delivery. Of the 107 infants investigated in this study, a subset of 72 and 36 infants had faecal specimens collected at 4-12 weeks and 20-28 weeks, respectively ( Supplementary Fig. S1). The median age at which meconium samples were collected was 12 hours (Interquartile range (IQR): 4.2-24.6) postnatally, while infant faecal samples were collected at median ages of seven and 24 weeks during the two longitudinal collection periods ( Table 1). The median age of maternal participants was 25 years (Table 1). Maternal faecal specimens collected from mothers undergoing vaginal delivery was 0 days (IQR: 0-1). Collections from mothers undergoing Caesareansection deliveries took place at a median of two days after delivery (IQR: 1.5-2.5). Infants were primarily delivered vaginally (82%); had a median gestational age of 39 weeks; and a median birth weight of 3 kg (Table 1). We observed that some of our data were unevenly distributed across the groups under study (Supplementary Tables S4). Twenty percent of mothers were HIV-infected (18/90) (Table 1), of which the majority resided in Mbekweni (94%, 17/18) (Supplementary Table S1). Twenty four percent of the infants studied at birth were HIV-exposed (Table 1), but all were uninfected. Sixty nine percent of the HIV-exposed uninfected infants had a paired maternal sample. Most (85%) infants were exclusively breastfed prior to hospital discharge, while 47% and 17% were exclusively breastfed until 4-12 and 20-28 weeks, respectively (Table 1). All HIV-uninfected mothers exclusively breastfed prior to discharge, compared to 50% (9/18) of HIV-infected mothers (Supplementary Table S1). More HIV-unexposed infants were exclusively breastfed compared to HIV-exposed infants at birth (99%, 76/77 vs. 42%, 11/26), at 4-12 weeks (49%, 26/53 vs. 37%, 7/19), and at 20-28 weeks (17%, 6/36 vs. 0%, 0/10) (Supplementary Table S1). In addition, fewer HIV-unexposed infants were exclusively formula fed compared to those who were HIV-exposed at birth (1%, 1/77 vs. 58%, 15/26), at 4-12 weeks (2%, 1/53 vs. 53%, 10/19), and at 20-28 weeks (0%, 0/26 vs 70%, 7/10) (Supplementary Table S1). A subset of infants also received mixed feeding (a combination of breastfeeding and/ or formula feeding and/or solid food introduction) at 4-12 weeks of age and 20-28 weeks (Table 1). Most mothers achieved a secondary-level of education and many were exposed to cigarette smoke (Table 1).

Meconium bacterial profiles are distinct from maternal faecal bacterial profiles at the time of delivery.
Clustering patterns of meconium specimens did not appear to be associated with timing of specimen collection at birth (Fig. 1). When compared to maternal faecal specimens, infant meconium had significantly lower alpha diversity indices [Shannon-index (H′ = 2.6 (IQR = 1.9-3.1) vs. (H′ = 2.9 (IQR = 2.7-3.1); (p < 0.001)]. Infant meconium and maternal faecal specimens also had distinct bacterial compositions at the time of delivery (Supplementary Table S2). Table 2 summarises significant differences in bacterial proportions at phylum-and class-level observed for 90 mothers and their infants studied at the time of delivery.
Unsupervised clustering patterns at genus-level showed that maternal faecal specimens predominantly contained Firmicutes and clustered together (clusters 7 and 8) (Fig. 2). Meconium specimens had high abundances of Proteobacteria and formed clusters 1, 2, 3, 4 and 6 ( Fig. 2). Clusters no. 5 and 9 had a mix of meconium and maternal faecal specimens, however, no mother-infant pairs grouped within these clusters (Fig. 2). Overall, only two mother-infant pairs grouped closely together in clusters no. 8 (mother-infant pair no. 44) and no. 1 (mother-infant pair no. 45) ( Fig. 2A). Apart from the distinction between maternal and infant faecal bacterial profiles at the time of delivery, no other characteristics were associated with these profiles using unsupervised clustering (Fig. 2B). Similar clustering profiles were observed when each genus, irrespective of its relative abundance, was treated as a single unit during unsupervised clustering (data not shown). Log ratio biplot analysis (at genus-level) on the 36 infants with complete longitudinal sets (Fig. 4A), revealed similar clustering profiles as seen for the complete data set (Fig. 4B). Of note, we observed that infant faecal specimens sampled at 4-12 and 20-28 weeks were dominated by Lactobacillus, Streptococcus and Bifidobacterium and overlapped ( Fig. 4A and B). Maternal faecal and infant meconium specimens collected at the time of delivery were distinct from each other and from infant faecal profiles at 4-12 and 20-28 weeks ( Fig. 4A and B). Maternal faecal specimens primarily clustered around genera representing the class Clostridia. Infant meconium specimens clustered around Alpha-, Beta-and Gammaproteobacteria. We also investigated whether faecal bacterial profiles from the three infant age groups (birth, 4-12 weeks and 20-28 weeks of life), where complete longitudinal sample sets were available (n = 36), were significantly different (Supplementary Table S4; Supplementary Fig. S2).

The effect of maternal HIV status, mode of feeding and residential area on faecal bacterial profiles.
GLMs incorporating all covariates and full datasets from the groups under study showed that infant meconium bacterial profiles were influenced by mode of feeding ( Supplementary Fig. S3). Meconium from infants

Median (IQR) or N (%) Median (IQR) or N (%)
Age at which specimens were collected 25 Fig. S3; Fig. 5). Infants exclusively formula fed prior to discharge had significantly higher proportions of the family Enterobacteriaceae (p = 0.039), including the genera Enterococcus (p = 0.002) and Streptococcus (p = 0.045) (Fig. 5). GLMs showed that maternal faecal bacterial profiles were significantly associated with maternal HIV status, mode of feeding as well as residential area (Table 3). Faecal bacterial profiles of infants at 4-12 weeks were also significantly associated with maternal HIV status, feeding practices and residential area (Table 3). At 20-28 weeks, infant faecal bacterial profiles were associated with maternal HIV status and mode of feeding (Table 3). A detailed exploratory investigation of the variables maternal HIV status, mode of feeding and residential area showed an uneven distribution of data across the participants under study (Supplementary Table S1). Based on our refitted models, used to validate the effect of these three variables, we found that only maternal HIV status and mode of feeding had true significant effects on maternal faecal bacterial profiles (Table 3). Infant faecal bacterial profiles at 4-12 and 20-28 weeks were only influenced by maternal HIV status (Table 3).    Maternal faecal bacterial profiles were influenced by both their HIV status and feeding practices (Table 3; Supplementary Figs S4 and S5). Mothers who exclusively breastfed prior to discharge (77/86) also had higher faecal diversity compared to mothers who exclusively formula fed [H′ = 2.9 (IQR: 2.7-3.1) vs. H′ = 2.8 (IQR: 2.6-2.8); p = 0.034]. Infant faecal bacterial profiles at 4-12 weeks were significantly associated with maternal HIV status (Table 3; Supplementary Fig. S6). We found that HIV-exposed infants (19/72) had higher faecal bacterial diversity compared to HIV-unexposed infants ( The effect of HIV-exposure on infant faecal bacterial profiles over time was assessed using an interaction-with-time model. This model included complete datasets from the 36 infants with samples collected from birth until 20-28 weeks. We found that Proteobacteria proportions decreased with an increase in age among HIV-unexposed infants (Supplementary Fig. S8A). In contrast, Clostridia proportions increased with an increase in age among HIV-unexposed infants (Supplementary Fig. S8A). A less prominent effect of infant age on faecal bacterial profiles was observed among HIV-exposed infants (Supplementary Fig. S8B). Our interaction-with-time model also showed that infants exclusively formula fed had the highest faecal bacterial diversity measured during the first 28     Acinetobacter (p < 0.001) and Enterobacter (p = 0.03) when compared to mothers undergoing vaginal delivery ( Supplementary Fig. S9). Higher proportions of Catenibacterium (p = 0.04), Coprococcus (p = 0.047), and Incertae Sedis (p = 0.01) were observed from mothers delivering via vaginal delivery. Fischer exact tests for two way tables showed that infant birth weight (p = 0.018), birth length (p = 0.008), as well as the period between delivery and sample collection (p < 0.001) were significantly associated with mode of delivery; however, none showed any direct relationship with maternal faecal bacterial profiles. We further tested whether other potential confounders, such as anaemia, hypertension and urinary tract infections recorded during pregnancy; anaesthetic use during labour; analgesic use during labour; oxytocin-like substance use during labour; the period between membrane rupture and delivery; as well as any medication supplementation during hospitalisation, were significantly associated with delivery mode. Of these, only anaesthetic use during labour was significantly associated with mode of delivery (p < 0.001).  Bacilli (RR = 0.70; p = 0.030) from meconium, whilst Gammaproteobacteria were positively correlated with higher maternal BMI (RR = 1.09; P = 0.024). High maternal BMI was also associated with increased proportions of Veillonellaceae (RR = 1.46; p = 0.025) at 20-28 weeks of age. No correlations were observed when plotting maternal BMI and Firmicutes/Bacteroidetes proportions against maternal BMI (Supplementary Fig. S10). Maternal education was associated with significant differences in the proportions of Gammaproteobacteria, Clostridia and Actinobacteria observed from maternal faecal specimens ( Supplementary Fig. S11A). Maternal education also had a significant association with infant faecal bacterial profiles measured at 20-28 weeks of life ( Supplementary Fig. S11B). Infant gender and birth weight was associated with faecal bacterial composition only at 20-28 weeks of life. Female infants had significantly higher proportions of the family Leuconostocaceae (p = 0.001) and the genus Weissella (p = 0.002). Infant birth weight was inversely associated with proportions of the families Leuconostocaceae (RR = 0.03; p < 0.001) and Ruminococcaceae (RR = 0.003; p < 0.001) as well as the genus Weissella (RR = 0.03; p = 0.012).

Discussion
This study is the first to investigate infant meconium bacterial profiles in an African setting. Infant meconium specimens contain high abundances of the phylum Proteobacteria 70,71 . High proportions of the phylum Proteobacteria from meconium specimens in our study is in agreement with the finding by Ardissone and colleagues 2 reporting high abundances of Proteobacteria from meconium of infants born at a gestational age of more than 33 weeks. Another interesting finding was the positive correlation between maternal BMI measures and the class Gammaproteobacteria from meconium. No differences have previously been shown between the meconium bacterial composition of infants born to overweight or obese (OWOB) mothers and those born to mothers with normal BMIs 72 . In contrast, differences have been reported when assessing faecal bacterial profiles collected from older infants 72 .
Infant faecal bacterial profiles have also been shown to be influenced by early life feeding practices [17][18][19] . In our study, the effect of mode of feeding was already evident from meconium specimens. Meconium from infants exclusively formula fed had significantly higher proportions of the family Enterobacteriaceae compared to those exclusively breastfed. Feeding has also been reported to promote shifts from the highly variable infant-like GIT bacterial profile towards a more stable adult-like profile when solid foods are introduced 73   anaerobes; and an increase in the overall microbial diversity 20 . Although we did not test the effect of solid food on infant faecal bacterial profiles, we did observe a slight increase in Bacteroidia and Clostridia amongst infants at 20-28 weeks of age.
In addition, we showed the importance of exploratory analysis in observational studies such as ours. Observational studies, as opposed to designed experiments, potentially allows for the investigation of uneven datasets and reports of spurious p-values as a result thereof. In this study, each statistically significant p-value was carefully investigated graphically in order to assess whether further validation of the results were needed. For example, one of our observations was that HIV-infected mothers primarily opted for exclusive breast-or mixed feeding practices whilst HIV-infected mothers tended to practice formula feeding. In our initial analysis, we found that both HIV-unexposed as well as breastfed infants had higher proportions of beneficial Bifidobacterium [76][77][78] . Upon validation of our results, we concluded that maternal HIV status was the primary driver of this observation among infants in our cohort. Of note, in South Africa, HIV-infected mothers tend to formula feed 31 despite recommendations from the World Health Organization 32 . This is relevant, given the important health benefits that breast milk provide, including for HIV-exposed children 29,30 .
In this study, we also found low proportions of Bacteroidetes among our adult population contrary to previous reports 79 . Reduced proportions of Bacteroidetes in our cohort compared to other adult participants could be ascribed to the fact that we only studied faecal bacterial profiles from females. Two previous studies, one from the United States of America and one from four European locations (France, Germany, Italy, and Sweden), reported lower abundances of Bacteroidetes among adult females compared to males 80,81 . An alternative potential explanation for the low Bacteroidetes proportions observed from our mothers may be their BMI, since the majority of moms participating in our study was overweight or obese. Overweight/obesity has been previously associated with reduced levels of Bacteroidetes and increased levels of Firmicutes [82][83][84] . The latter, however, was not observed in our study ( Supplementary Fig. S10). In addition to the low proportions of Bacteroidetes observed from mothers under study, we noted that mode of delivery influenced maternal faecal bacterial profiles. The mechanism by which this occurs is unknown. We hypothesize that stress may be a potential contributor to changes in maternal faecal bacterial profiles during delivery. Studies have shown that stress may alter the maternal GIT 85 and vaginal microbiota 85,86 . In addition, it has been reported that maternal prenatal stress may contribute to shifts in infant faecal bacterial profiles 87 . We also thought of investigating the effect of the process of delivery, in particular anaesthetic use during delivery, on maternal faecal bacterial profiles. Although we found a significant association between mode of delivery and anaesthetic use, we could not establish whether the use of anaesthetics or some other aspect of Caesarean-section delivery caused these changes. This was primarily due to the fact that anaesthetics were given to all mothers undergoing Caesarean-section delivery. In support of the hypothesis that mode of delivery may modulate maternal faecal microbiota profiles, previous reports have shown some influence on maternal colostrum and breastmilk as well as vaginal bacterial profiles [88][89][90][91] . We further found that feeding practices had a significant effect on maternal faecal bacterial profiles. This is surprising given the short time-frame between the birth process and specimen collection from mothers. The bacterial entero-mammary pathway 92 could potentially support this finding due to the potential crosstalk between the maternal GIT and mammary glands. Nevertheless, further investigation is needed to support our finding.

Limitations
In this study, one of the main limitations was the drop in the number of infants assessed at follow-up time-points. Among the 107 infants studied at birth, only 36 were followed longitudinally up until 20-28 weeks. Missed samples were primarily due to the inability of infants to pass stool at the scheduled study visits. In addition, some infants could not attend the scheduled stool sample collection visit. Hence, significant findings from the reduced sample size of 36 assessed at 20-28 weeks of life may need to be interpreted with caution due to the variable nature of the faecal microbiota composition. However consistencies of results across age groups in our analysis provide confidence that significant effects displayed by our data are indeed true effects. Another potential limitation of this study might be the sequencing depths obtained for faecal specimens under investigation as previously highlighted by Ni and colleagues 93 . However, we are confident that the sequencing depth obtained in our study was sufficient to answer our research questions ( Supplementary Fig. S12) 93,94 . We did not observe any effect of sequencing depth on the clustering profiles observed for the different groups under study ( Supplementary  Fig. S13). In contrast, we noted that template concentration had a bigger effect on reproducibility compared to sequencing depth (Supplementary Fig. S14).

Conclusion
The meconium from infants investigated in our study contained high proportions of the phylum Proteobacteria, in particular bacteria within the Enterobacteriaceae family. Infant meconium was distinct from maternal faecal bacterial profiles sampled at birth. In addition, infant faecal bacterial profiles changed during the first 28 weeks of life. Major determinants of infant meconium bacterial profiles were mode of feeding and maternal BMI. HIV-exposure, on the other hand, was an important contributor to the composition of infant faecal bacterial profiles at 4-12 weeks of life, with HIV-exposed infants having higher bacterial diversity and reduced proportions of Bifidobacteria. Maternal faecal bacterial profiles following delivery were also influenced by their HIV status, feeding practices, as well as mode of delivery. Further large longitudinal studies are needed to improve our understanding of the contribution of maternal HIV status and mode of delivery to infant and maternal faecal microbial profiles.
According to Ni and colleagues, a decrease in sequencing depth results in an increase in dissimilarities (beta diversities) between microbial community samples 93 . Multidimensional scaling 95 showed an effect of sequencing depth on dissimilarity indices (beta diversities) 54,55 between samples in our study. Samples with low sequencing depths (in red) had higher beta diversities compared to samples with higher sequencing depths (in green and blue) (Supplementary Fig. S13A). Although we observed an effect of sequencing depth on beta diversities, this did not have a large enough influence to impact on beta diversities calculated between different sampling groups ( Supplementary Fig. S13B). We observed distinct clusters between maternal faecal specimens; infant meconium specimens; and infant faecal specimens collected at 4-12 and 20-28 weeks of age, which corresponds to our log ratio biplot analysis (Fig. 4).
The subset of faecal specimens (n = 17) that were processed in duplicate confirms that our 16S rRNA sequencing technique was reproducible (adjusted R 2 = 0.98). However, we determined that sequencing data was less reproducible where lower template concentrations used during library preparation ( Supplementary Fig. S14A), as previously reported 96 . Of note, template concentration seemed to have a more pronounced effect on reproducibility compared to sequencing depth ( Supplementary Fig. S14B). Mock communities (HM782-D and HM783-D) (BEI Resources, ATCC, VA, USA) were sequenced successfully with similar profiles generated as previously reported when targeting the V4 hypervariable region of the 16S rRNA gene 97 .