Preterm birth is one of the major determinants of neonatal mortality and morbidity. Prematurity is also well-recognised to carry long term consequences on health well into the infants’ adult life. Children born prematurely have higher rates of developing diabetes1,2, hypertensions3,4 and obesity5 as compared to those born at term. Large cohort study on the longitudinal maturation of full term infant gut microbiome showed three distinct phases of microbiome progression at first year, between second to third year and beyond three years of life6. Further, the rate of gut microbiome maturation accelerated after the cessation of breast-feeding6,7. Additionally, there have been various studies of patterns in gut microbiome of preterm infants in the last two decades, most reporting correlations between the gut microbiota and various factors during early life, such as gestational age8,9, breastmilk10,11,12 and antibiotic exposures11,13. Also, a number of studies have characterised the gut microbiome composition in infants subjected to different modes of delivery14, modes of feeding9, and geographical locations15,16. However, limited studies have advanced interpretations on the impact of degree of prematurity at birth and postnatal environment (for e.g. treatments received in NICU) in terms of longitudinal associated patterns, especially gut microbiota maturation after weaning, despite increasing evidence of the population dynamics of the human gut microbiome17. Tauchi et al.18 studied the longitudinal stool samples of preterm neonates from NICU and observed an age-dependent transition from Gram-positive cocci-dominated microbiota to Enterobacteriaceae and/or Bifidobacteriaceae-dominated microbiota. Understanding how host factors influence diversity is an important step towards unravelling the relationship between gut microbial composition and the host health conditions. The integration of metagenomes and metabolomes provided a systemic view on how gut microbial ecosystems operate. Given that host metabolism is strongly influenced by gut microbes through their collective activities and cooperative responses, the association of health status and gut microbial dynamics should therefore be viewed holistically.

Based on the National Obstetrics Registry report in 2015, preterm birth rate in Malaysia was recorded at 12.4%19. Specifically, preterm birth rates were highest among the indigenous population in Peninsular Malaysia, mothers at extreme ages, parity > 6, and mothers with preeclampsia, eclampsia and renal disease19. The complications of preterm birth and morbidity arise from immature organ systems that are incapable of supporting life in the extra-uterine environment. Despite the significant health burden of preterm birth, the consequences of impaired maturation of gut microbiota to the development of preterm infants to adult life remained poorly studied, especially in South East Asia. Further, there is a lack of information regarding the contribution of gut microbial composition to host metabolism, and the interplay between both in the growth of preterm infants. There is therefore a need to characterise the temporal dynamics of the human gut microbiome and metabolome to understand their roles in the preterm infants’ health and development. In this study, we compared the variations in the faecal microbial composition and metabolome between preterm and term infants at varying time points from birth to month 12. The pattern of gut bacterial assemblages and metabolic profiles were modelled using selected clinical characteristics while the relationship of microbial taxa and metabolites that differed significantly across different time points was assessed through sparse partial least squares correlation analysis.


Characteristics of the preterm and term infants

Of the 19 preterm infants studied, two were lost-to-follow-up due to early death. For comparison, a total of 20 healthy full-term infants born at a gestational age of more than 37 weeks were included in this study. Characteristics of the preterm and term infants are reported in Table 1. Term infants were mainly born by spontaneous vaginal delivery and no exposure to antibiotics while preterm infants were mainly born by caesarean delivery (94.7%, P < 0.0001) and exposed to antibiotics (57.9%, P < 0.0001). Clinical characteristics of the preterm infants during admission are reported in Table 2. Preterm infants’ stool samples collected during NICU admission were also subjected for bacterial isolation for multidrug resistant Enterobacteriaceae, as published in Yap et al.20. Antibiotic susceptibility profile of the isolates was correlated with patient clinical data for further statistical analyses.

Table 1 Demographics and characteristics of all infants studied.
Table 2 Clinical characteristics of the preterm infants studied.

Eleven preterm infants had respiratory distress syndrome (RDS) and five patients received invasive mechanical ventilation. Eleven patients received parental nutrition (PN) and three of them had peripherally inserted central catheter (PICC). Eleven patients were treated with antibiotics for three days to two week with various combinations of penicillin (8 patients), gentamicin (10 patients), vancomycin (4 patients) and meropenem (4 patients).

Stool samples were collected at: day 1 (meconium), week 1, week 2, month 6 and month 12 of life from all infants. However, some time points were missed for selected infants due to the lack of adherence to the sampling schedule. In total, we collected 141 samples, for an average of 3.05 samples per preterm infants and 4.15 samples per term infants.

Microbial composition comparison between term and preterm infants

The alpha diversities (Shannon’s, Inverse Simpson’s, Pielou’s evenness, Chao1 and Faith’s phylogenetic diversity indices) of the faecal microbial community between groups (term and preterm) and between term and preterm after birth, week 1, week 2, month6 and month 12 were calculated. No significant difference in bacterial community richness, evenness and phylogenetic diversity were observed between term and preterm samples (Supplementary Fig. 1). Out of the five indices, only Faith's phylogenetic diversity showed significant difference (P < 0.01) between term and preterm at meconium and month 6 (Supplementary Fig. 2). Separately, beta diversity was assessed using PERMANOVA and PLS-DA. Overall, significant differences in faecal bacterial composition between the term and preterm infants were detected in PERMANOVA (pseudo-F = 2.4834, P(perm) = 0.001). The first PLS-DA plot (Fig. 1) showed clear separation in faecal bacterial compositions between term and preterm infants. When time points were added as factor, it was observed that samples from week 2 were overlapped with meconium and week 1 (Fig. 2). For month 6 and month 12, however, both groups formed two tight clusters with little overlapped. Consistent observation was also obtained using pairwise PERMANOVA analysis (Supplementary Table 2).

Figure 1
figure 1

Faecal bacterial and metabolite composition of preterm and term control. PLS-DA models depicting inter-subject distances based on the (a) 16S bacterial composition (fivefold cross-validation, 2 components, overall error rate = 0.30) and (b) NMR metabolomics (fivefold cross-validation, 2 components, overall error rate = 0.29). Prevalence uncovers distinct clusters among the subject group. Eclipses for each group are calculated based on a 95% confidence level. PLS-DA plots were generated using R statistical software (version 3.5.0,

Figure 2
figure 2

PLS-DA models of (a) 16S bacterial composition and (b) NMR metabolomics. The models were built on inter-subject vs the five collection time points and colour coded by group with 95% confidence eclipses. Both validated with fivefold cross-validation. PLS-DA plots were generated using R statistical software (version 3.5.0,

The overall distributions of the phyla and genera were provided in Supplementary Fig. 3. Differentially abundance OTUs were identified based on negative binomial model. A total of 55 OTUs from 3 phyla (Proteobacteria, Firmicutes and Bacteroidetes) whose abundance differed between two groups across the five time points were identified. The species which matched highest sequence homology with the input were included in the Supplementary Table 2. When the comparison was made at phylum level, the dominant phyla for term infants were Firmicutes and Bacteroidetes; while enriched levels of Proteobacteria were observed in preterm infants’ stool at the first two weeks of life. At 12 months of life, significant abundance of Bacteroidetes (OTU0012, OTU0029, OTU0051 and OTU0110) were observed among the preterm group. Conversely, term infants were observed with abundant species of Proteobacteria at month 12. The preterm infants’ stool samples had significantly enriched Klebsiella OTUs (mainly K. pneumoniae) during the first two weeks of life while the level of Klebsiella OTUs started to elevate in term infant stools only after 6 months of life. It was observed that OTU0029 (Bacteroides fragilis) was elevated in week 1 and week 2 stools of term infants while the significant elevation only observed at month 12 for preterm.

Temporal differences in metabolomics profiles between term and preterm infants

Significant metabolites partitioning between both groups were established based on PLS-DA and PERMANOVA (P(perm) = 0.001, P(MC) = 0.003). PLS-DA plot inferred using metabolomics profiles showed less apparent separation in comparison to the microbial composition, although distinct clusters specific to term and preterm are still discernible (Fig. 1). When time points were added as factor, samples from meconium, week 1 and week 2 were loosely clustered especially among the preterm infants (Fig. 2). Consistent with bacterial composition, contrasting temporal response pertinent term vs preterm was also detected at month 6 (P(perm) = 0.005, P(MC) = 0.005). Nonetheless, both sample groups (i.e. term and preterm) from month 12 did not show significant difference with most of the data points overlapped with month 6. This observation was also reflected in the significant metabolites detected consistently in month 6 and month 12 in the term infants.

Statistical significance of the faecal metabolites was calculated using permutation test (number of permutations = 1000). Only metabolites with a P value of 0.01 and below detected in both groups were reported and summarised in Table 3. The corresponding covariance plots of preterm vs term derived from stools samples obtained from meconium, week 1, week 2, month 6 and month 12 of life were included in Fig. 3. Meconium collected from preterm infants showed elevated glycerol. At week 1, preterm group showed elevated faecal valine, leucine, isoleucine, tyrosine and phenylalanine, whereas α-glucose and methylmalonic acid (MMA) were elevated in the term group. However, no significant metabolic changes in preterm and term infants were observed at week 2. Changes in stool metabolites were more pronounced at month 6 with the preterm group showing higher levels of faecal succinate, citrate and trimethylamine-N-oxide (TMAO), whereas the term infants showed higher levels of faecal β-hydroxybutyric acid (BHBA), fucose and pyruvatoxime. Additionally, the latter consistently showed elevated levels of BHBA and fucose up to month 12, whereas the preterm infants exhibited higher levels of faecal tyrosine and phenylalanine at month 12 as well as week 1.

Table 3 List of 1H NMR-derived metabolites that differ significantly between preterm and term infants at different time-points.
Figure 3
figure 3

1H NMR spectroscopy showing the metabolite profiles of preterm and term controls. Covariance plots showing the colour-coded significance of stool metabolite profiles (red, P < 0.01) calculated using permutation test from (a) meconium, (b) week 1, (c) week 2, (d) month 6 and (e) month 12, were generated using MATLAB (version R2014a, Key: BHBA, β-hydroxybutyrate; Citr, citrate; Fuc, fucose; Glyc, glycine; Ile, isoleucine; Leu, leucine; MMA, methylmalonic acid; Phe, phenylalanine; Pyr, pyruvatoxime; Suc, succinate; Tau, taurine; TMAO, trimethylamine-N-oxide; Tyr, tyrosine; Val, valine.

Association between differentially expressed OTUs and significant metabolites

The relationship between differentially expressed OTUs and faecal metabolites were assessed using sparse partial least square model (sPLS-DA) (Fig. 4). In preterm, three Clostridium-related OTUs (OTU0035, OTU0090 and OTU0112) were positively correlated to the branched-chain amino acids (BCAAs): valine, leucine and isoleucine. In addition, OTU0035 (Erysipelatoclostridium ramosum) was negatively correlated to BHBA, fucose and pyruvatoxime while OTU0090 (Clostridium cocleatum) and OTU0112 (Clostridium spiroforme) were negatively correlated to pyruvatoxime and α-glucose. Veillonella seminalis (OTU0017) also showed correlations with the three BCAAs. Bacteroides fragilis (OTU0029) which was differentially expressed in term group at week 1 and week 2 and subsequently in preterm group at month 12, showed moderate to strong negative correlations to BHBA, fucose, MMA and pyruvatoxime. In term group, Lactobacillus mucosae (OTU0048) showed moderate to strong positive correlations with the three BCAAs and strong negative correlations with pyruvatoxime. OTU0050 which belongs to the genus Delftia, showed distinct clustering with negative correlations with the BCAAs and positive correlation with pyruvatoxime. OTU0050 was consistently significantly expressed in preterm group at week 1, month 6 and month 12, while term group, conversely, was enriched with the same OTU at week 2. Among the 10 OTUs with significant correlations, 50% of the taxa belonged to phylum Firmicutes while 20% of the taxa belonged to Proteobacteria. Importantly, the low abundant bacterial taxa present at relative abundance of less than 1% accounted for 65 total affecter correlations.

Figure 4
figure 4

Spares partial least squared correlations (sPLS) between differentially expressed operational taxonomic units (OTUs) and identified significant metabolites in (a) preterm and (b) term groups. sPLS in regression mode (predict Y from X) to model a causal relationship between the OTUs and metabolites. Columns correspond to faecal metabolites; rows correspond to bacterial taxa. Red and blue denote positive and negative association, respectively. Percentage of mean proportion and bacterial phyla are summarised by the colour code on the right. Only the strongest pairwise associations were displayed, with a cut-off threshold of 0.4 (positive and negative). Heatmap plot was generated using R mixOmics package (version 3.5.0, Key: a.Glu, α-glucose; BHBA, β-hydroxybutyric acid; Fuc, fucose; Ile, isoleucine; Leu, leucine; MMA, methylmalonic acid; Pyr, pyruvatoxime; Val, valine.

Correlations between omics data with demographic and clinical parameters

Distance based linear modelling was carried out to identify the demographic and clinical predictors (Supplementary Data 1) for the differentially expressed 16S rRNA metagenomic and NMR metabolomic profiles. We modelled the faecal microbial and metabolic composition by splitting the data into each respective time point. Birth weight was consistently selected as the best explanatory variable for the total variation in the metabolic profiles all infants at birth and at month 12 (Table 4). On the other hand, gestational age was selected to explain the elevated 16S rRNA metagenomics profile for all infants at month 12. When the data was analysed by considering clinical parameters, the step-wise selection algorithm selected “PICC line insertion” and “isolation of bacteria resistant to third generation cephalosporins” as the best explanatory parameters for the faecal metabolic composition of the preterm group at month 6 and month 12 respectively.

Table 4 Cross-modelling of the stool microbial and metabolites profiles using a stepwise selection procedure.


In comparison to preterm infants, full term infants have completed in utero microbial and organ development. The relatively more matured organs provided distinct ecological landscape for the colonisation and establishment of the gut microbiota21. However, we showed that the difference in alpha diversities between term and preterm infants is generally unremarkable. This lack of difference in bacterial alpha diversities was also reported previously between moderate-late premature infants to term infants9. Nonetheless, significant differences in phylogenetic diversity between groups were observed at meconium and month 6, but not month 12, suggesting that the microbiome diversity of preterm may move towards similar level as term over time.

High Firmicutes:Bacteriodetes ratio of the gut microbes has been previously associated with a “healthier” gut. However, recent publications suggested that such link is over-simplified17. Owing to the diverse functional capability of each phylum, variation in “beneficial gut microbes” might not induce uniform phylum-wide metabolic responses. A change of associated external environment such as diet, infancy infections, living conditions are known to impact the composition of infant gut microbiome22. Although similar developmental patterns with respect to age and gut maturation were observed in various studies, the gut microbiota of healthy infants in a global comparison study exhibited different phylotypes. As reported by Kuang et al., majority of the gut microbiota of infants from western countries belonged to “A-type” (abundant in Actinobacteria), while most infants from Bangladesh were “F-type” (abundant in Firmicutes). In the same study, all healthy Chinese infants (age < 3 months) from China were “P-type” (abundant in Proteobacteria), with Escherichia/ Shigella and Klebsiella being the dominant genera15. Consistently, our data demonstrates that gut microbiota of all infants were dominated by Proteobacteria which gradually decreased thereafter, in line with the previous findings in which the abundance of Proteobacteria (mainly Enterobacteriaceae) gradually decreased from birth up to 24 months of life23. It was reported that geographical location exerted a stronger influence to the compositional differences in gut microbiome than other factors24. However, most major studies of early-life gut microbiome focused more on western cohorts6,7 and may not adequately represent the gut microbiota of the South East Asians which are ethnically diversified. Hence regional variations should be considered during data interpretation for the association of gut microbiota and infant health.

Throughout the years, human gut microbiome has been explored to provide better understanding of health and diseases. However, the accuracy of data reporting the abundance of bacterial gut microbiota is largely dependent on the primers and selected hypervariable regions25. Among the primers used in neonatal faecal 16S rRNA studies, the primers developed by Klindworth et al.26 was found to be able to amplify Bifidobacterium spp. compared to V4–V5 primers when the species is present in low abundance (< 0.05%)27. In a meta-analysis of full-term intestinal microbiota, 1035 publicly available datasets from 15 studies involving primer regions of V1–V2, V3, V4, V3–V4, and V4–V5 were assessed by using the web-tool TestPrime 1.028. In general, all the primers studied produced comparable results. More specifically, the primers developed by Klindworth et al.26 showed an in silico efficiency of > 95% in amplifying the targeted Bifidobacterium, the important commensals of the infant gut microbiome28. In addition, Mazzola et al.29 have also reported a comparable performance of V3–V4 sequencing and qPCR in determining the abundance of Bifidobacterium. The bacterial membership in the stool collected in week 1 and week 2 displayed relatively higher consistency across individuals as compared to the meconium. The homogenisation effect observed might be a natural response of the gut microbiome towards the changes from womb to birth in which a smaller subset of available microbial populations was preferentially selected. It is notable that the divergence in preterm and term microbiomes was strongest at month 12. Previous study observed that the difference in abundance of Klebsiella spp. could last for more than 4 months, long after the initial neonatal period and it was correlated with later development of paediatric allergy30. Differentially abundance of potentially pathogenic Klebsiella OTUs observed among the term infants and their roles in the host immunological response and metabolism remained to be elucidated. The dichotomous development of the microbiome may be due to the available of the different species pools during birth (for e.g. natural birth vs. caesarean). The variation in physical development further resulted in the shift in gut microbial composition. These observations were consistent with the ecological processes such as “historical contingency” and “niche specialisation” 31,32.

In comparison to gut microbial composition, the metabolic profiles were relatively less sensitive to the infant development in terms of sampling time points. Temporal stability of the gut metabolome can be related to the concept of high redundancy of metabolic pathways across different microbial species33,34. Our finding is also consistent with previous literature suggesting that human metabolic profiles are more conserved than gut microbiome34. Elevated serum MMA and stool glucose have been associated with vitamin B1235 and carbohydrate malabsorption36 in children respectively. In this study, MMA and α-glucose were found to be significantly increased in week 1 stools of the term group. The sPLS-DA analysis showed that the elevation of stool MMA is negatively correlated with a Parabacteroides OTU (OTU0051) and two Bacteroides OTUs (OTU0012 and OTU0029). These four OTUs were differentially abundant in the stool samples of term group at week 1 and week 2, and only detected in preterm group at month 12. On the other hand, Prevotella OTUs were exclusively differentially expressed in preterm group at week 1, week 2 and month 6. The abundances of Ruminococcaceae, Bacteroides, and Prevotella had been proposed as enterotype identifiers which explained the most taxonomic variation across individuals37. Shift in Prevotella-Bacteroides microbial taxa has not been previously reported in infants.

The functional output of the gut microbiota, including amino acids and short-chain fatty acids (SCFAs) are thought to be important modulators underlying the pathogenesis of metabolic diseases such as obesity38, insulin resistance and type 2 diabetes mellitus39. Metabolites identified in the preterm group at week 1 include BCAAs such as valine, leucine and isoleucine, and aromatic amino acids including tyrosine and phenylalanine. High concentrations of neonatal faecal excretion of BCAAs and other amino acids metabolism derived metabolites indicate reduced protein absorption or excess protein intake40. Protein catabolism and absorption generally take place in the small intestine. During this process, amino acids that are not re-absorbed will be transferred to colon. We detected a positive correlation between BCAAs with three OTUs of Erysipelatoclostridium (former Clostridium spp.) and Lactobacillus (OTU0048). The former has been reported to be prevalent among the type 2 diabetes mellitus patients41 while Clostridium ramosum was detected in high-fat fed mice42. It was postulated that amino acids can serve as precursors for the synthesis of SCFAs by intestinal microbes to deliver additional energy to the host38,43. Thus, the elevated Erysipelatoclostridium (Clostridium) OTUs in preterm infants may play an important role in the energy metabolism that fuel rapid physical development. Succinate is a key intermediate in the tricarboxylic acid (TCA) cycle in host metabolism. The major producers of succinate in the mammalian gut are Prevotella spp.44. Although significantly higher abundance of Prevotella spp. in preterm as compared to term infants was detected, high succinate concentration was only observed in the stool samples collected at month 6. The lack of correlation might be due to the fact that succinate is only transiently available as it is rapidly converted as an intermediate in the production of propionate by the succinate-utilising bacteria45. Further, the high expression of succinate may also due to the extended administration of antibiotic among the preterm infants46,47. Another TCA cycle intermediate identified in preterm at month 6 is citrate, possibly indicating incomplete fermentation of complex polysaccharides in the gut of preterm. Additionally, the elevated TMAO identified in the preterm group at month 6 might be a response of weaning as the metabolite has been associated with animal proteins such as eggs, red meat and dairy48. TMAO is commonly identified in urine, but it could also be eliminated through faeces, although the levels of TMAO in faeces have not been carefully quantified48. In comparison with adults, children, especially infants have lower expression of TMAO precursors, thus it is less metabolised but the gut microbes49.

At 6 months of life, increased milk metabolites such as fucose and BHBA, as well as energy–related intermediate, pyruvatoxime, were observed in the term infants. Emerging evidence suggests the presence of fucose in mammalian gut could improve host health by supporting the competitive growth of beneficial members of the gut community50 and interact directly with host immune cells51. Studies have demonstrated the ability of Bacteroides fragilis to incorporate fucose into their own glycans for the production of fucosylated glycoproteins, which is necessary for its fitness to colonise the mammalian intestine52,53,54. Our finding supports this hypothesis in which B. fragilis was strongly and negatively associated with fucose. Fletcher et al.52 suggested that this general O-glycosylation system is conserved among the intestinal Bacteroides species. However, contrasting observations were noted in another study, in which Bacteroides thetaiotaomicron produces multiple fucosidases that cleave fucose from host glycans and contributes to the enterohaemorrhagic E. coli (EHEC) virulence55. This suggests that the beneficial or detrimental role of fucose in human gut is determined by the microbes which are responsible for its uptake. Besides being identified abundantly in milk, BHBA is long viewed as a ketone body that serves as a circulating energy source by the brain when blood glucose is low in times of fasting or prolonged exercise56. In this study, majority of the taxa with significant correlations in preterm group belonged to phylum Firmicutes and Bacteroidetes. Proteobacteria, despite being the most dominant phyla, might be transiently available in the neonatal gut23. Thus, when comparison was made, taxa that distinguish between term and preterm group were Firmicutes and Bacteroidetes.

One of the caveats of this study is the lack of detailed dietary data as the duration of breastfeeding, introduction to solid foods and cessation of milk feeding may influence the gut microbiota and subsequently the metabolites levels in the stool57. Nevertheless, feeding regimen of the preterm neonates during NICU admission was taken into account but none explained their stool microbial and metabolites patterns. Based on the DISTLM analysis of the overall elevated metabolites profile, birth weight was found to be the best explanatory variable at birth and at month 12 of life. In comparison, when the model was constructed by selecting the bacterial taxa that differed significantly in all infants, gestational age was associated with the gut microbial profiles at month 12. This is congruent with findings from one of the longest longitudinal preterm neonatal gut microbiota study in which gestational age continues to impact the microbial composition up to four years of age58. Similarly, La Rosa et al.11 also reported that the patterned progression of the gut bacterial community of preterm infants is most strongly influenced by gestational age at birth, than identifiable exogenous factors such as mode of delivery, antibiotic administration, or feeds11. When the DISTLM analysis was repeated on the differentially expressed metabolites of preterm infants with clinical variables during NICU admission, “PICC line insertion” and “isolation of bacteria resistant to third generation cephalosporins” were selected as the best explanatory variable at month 6 and month 12 respectively. Total parenteral nutrition (TPN) can be administered through umbilical catheters or PICCs, the latter should be used preferentially when prolonged PN is anticipated to prevent fewer insertion attempts and extravasation injuries to the neonates59. Data regarding changes in the microbiome and metabolome of humans receiving PICC insertion have been limited, but longitudinal alterations in the gut microbial colonisation patterns of infants who received TPN has been reported60. Nonetheless, blood microbiome signatures in neonatal PICCs associated with bloodstream infections were detected61. Although the aetiology for these observations on the neonatal microbiome is not clearly understood, we ascertain that PICC line insertion may exert a prolonged effect on the neonatal gut metabolism.

The metabolomic profiles showed that preterm infants harboured significantly more carbohydrate metabolism intermediates (citrate, succinate), BCAAs (leucine, isoleucine, valine) and aromatic amino acids (tyrosine, phenylalanine) in the stools through the first year of life as opposed to the term infants. Such differences might imply impaired sugar metabolism among the preterm infants. The metabolism of sugar is an important pathway for energy harvesting and essential of physical and mental growth in children62. In order to generate greater understanding in disease pathogenesis, there is a pressing need to extend the temporal coverage of infant gut microbial research. Despite of our modest sample size, this is the first targeted report on Malaysian preterm infant gut microbiome. Our study contributed to the understanding of the ethnically diverse South East Asian gut microbiota. In conclusion, we suggest that the early-life characteristics and environmental perturbations may exert a sustained effect of at least one year of life in the gut microbial composition and metabolism of the preterm neonates who were admitted to the NICU. Thus, our study highlighted the opportunities for large-scale studies to further investigate the association between early-life factors and a sustained microbiota imprint leading to establishment on methods of therapeutic intervention which could potentially lower the risk of developing metabolic dysfunction in later life.


Sample collection

Written informed consent was obtained from the infants’ parents and investigations were conducted according to the principles approved by the University of Malaya Research Ethics Committee (UMREC) with ethical approval number 201310–0267. The study was performed in accordance with the UMREC guidelines. Preterm infants were recruited among patients admitted to the neonatal intensive care unit (NICU) at University Malaya Medical Centre (Kuala Lumpur, Malaysia) from June 2014 to December 2014. Enrolled preterm infants were born at a gestational age (GA) of < 37 weeks. Clinical data and stool samples from diapers of preterm neonates were collected. Term neonates at a GA > 37 weeks and with no congenital malformations and metabolic diseases were recruited in this study as a comparative group. Stool samples were collected across five time points: at birth, week 1, week 2, month 6, and month 12, and same samples were subjected to both 16S rRNA sequencing and metabolomic profiling. Samples were immediately stored at − 20 °C and then transferred to − 80 °C no more than 3 days post-collection.

DNA extraction, 16S rRNA sequencing and data processing

Nucleic acid extraction of stool was carried out on 220 mg of faecal sample using the QIAamp DNA Stool Mini Kit (Qiagen, Hilden, Germany) in accordance with the manufacturer’s instructions. PCR amplification of the 16S rRNA V3 and V4 regions was performed using primers V3-V4F (5′- TCG TCG GCA GCG TCA GAT GTG TAT AAG AGA CAG CCT ACG GGN GGC WGC AG -3′) and V3-V4R (5′- GTC TCG TGG GCT CGG AGA TGT GTA TAA GAG ACA GGA CTA CHV GGG TAT CTA ATC C -3′)26. The set of primers were carefully selected to minimise bias due to the mismatch of primers to amplify early-life microbial 16S rRNA genes (Supplementary Data 2). The reaction was performed with 1 µM each primer, 2×KAPA HiFi HotStart Ready Mix, 5 ng/µL of gDNA in 10 mM Tris pH 8.5 made up to 25 µL of reaction mixture. The thermoprofile used was as followed: initial denaturation for 3 min at 95 °C followed by 25 cycles of 95 °C for 30 s, 55 °C for 30 s and 72 °C for 30 s with a final extension at 72 °C for 5 min and hold at 4 °C. The PCR amplicons were purified by using Agencourt AMPure XP beads (Beckman Coulter) and the resultant DNA was quantified using a Qubit 3.0 Fluorometer (Life Technologies, Carlsbad, CA, United States) and Illumina sequencing adapters were attached to each sample using Nextera XT Index Kit. Amplicon fragment size was checked with a Bioanalyzer DNA 1000 chip and the DNA library qualities were evaluated with using the library quantification kit (KapaBiosystems) on an M×3000P qPCR system (Agilent Technologies). Following an initial library quantification using the MiSeq V2 reagent kit (Illumina) on the MiSeq system, the libraries were normalised and pooled at equimolar concentration with 15% PhiX spike-in. The final library was then subjected to paired-end 2 × 250 bp sequencing on a HiSeq 2500 platform using HiSeq rapid SBS kit v2 (Illumina).

Raw reads generated from Illumina paired-end sequencing were imported into Mothur software (version 1.39)63 for quality filtering and processing, according to the MiSeq SOP ( Briefly, 99,122,189 paired-end sequences were joined into contigs. Contigs with ambiguous bases, homopolymer > 6 bp and sequence length below/ exceed the cut-off (< 460 bp and > 466 bp) were trimmed. Approximately 40% of the sequences were removed based on the filter and a total of 59,308,069 sequences were used for subsequent procedures. Chimeric sequences detected using Uchime command were removed before clustering the sequences into operational taxonomic units (OTUs). The taxonomic assignment was carried out in reference to SILVA database. Sequences affiliated with chloroplast, mitochondria, unknown, archaea, and eukaryote linages were further removed. Sequence for each OTU was blasted using NCBI database to identify the microorganisms at species level. The data was rarefied to equal depth for ease of comparison. Alpha diversity metrics including Shannon’s, Inverse Simpson’s, Pielou’s evenness, Chao1 and Faith’s phylogenetic diversity indices were calculated and projected in box plots using R microbiomeSeq package ( In additional, bar charts were constructed using phyloseq package in R64 to display the proportional differences in genus and phylum across groups. The beta-diversity was evaluated using statistical ordinations: Partial least squares discriminant analysis (PLS-DA) and Permutational Multivariate Analysis of Variance (PERMANOVA) from a Bray–Curtis distance matrix of normalised OTU counts. PLS-DA models were built by using the R mixOmics package65. PERMANOVA testing the inter-subject significance with time points was performed by using PRIMER + PERMANOVA software (version 7, PRIMER-E Ltd., Ivybridge, UK). Further, taxa showing significant differences in abundance between term and preterm infants at the respective time points were identified using negative log binomial model implemented in DESeq2 R package66. Correction of multiple corrections was conducted using Benjamini-Hochberg (BH) procedure implemented in DESeq2. Differentially expressed phylum genus and OTUs were selected by using BH adjusted P-value cut off of 0.01.

Metabolomic analysis through nuclear magnetic resonance (NMR)

The faecal metabolite extraction was adapted from faecal metabolite extraction protocol as previously described by Yap et al.67. For each sample, a total of 0.05 g of faecal matter was homogenised with 1 mL of phosphate buffer (90% D2O,1 mM 3-trimethylsilyl-1-[2,2,3,3-2H4] propionate (TSP) and 3 mM sodium azide; pH 7.4). The homogenates were sonicated for 30 min in room temperature using a water bath sonicator followed by centrifugation at 13,000 rpm for 10 min. A total of 600 µL of supernatant was transferred to a 5 mm (outer diameter) NMR tube (Norell, USA) for NMR analysis.

A standard 1-dimensional (1D) 1H-NMR spectrum was acquired for each sample with a pulse [recycle delay (RD)–90°–t1–90°–tm–90°–acquire free induction decay (FID)] on a Bruker 600 MHz spectrometer (Bruker Biospin, Fallenden, Switzerland) with a 5 mm BB(F)O broadband probe operating at 600.13 MHz (ambient probe temperature 27 °C). The spectra were acquired according to parameters used previously68. The field frequency was locked on D2O solvent and water peak suppression was performed by gradient water pre-saturation during RD of 4 s and a mixing time (tm) of 0.01 s. The 90° pulse length was adjusted to ~ 10 μs and an acquisition time of 2.65 s was used.

Phasing and baseline correction of the 1H-NMR spectra were performed manually using Bruker TopSpin (version 4.0.5, Bruker Biospin, Fallenden, Switzerland). TSP was added in the samples as an internal reference peak (0.0 ppm) and the phosphate buffer (pH 7.4) was added to the samples to minimise peak shift due to residual pH changes. All spectra were referenced to the TSP resonance at δ 0.00. The spectra were digitised into 10 k data points using an in-house developed MATLAB (version R2014a, Natwick, USA) script67. The regions containing water resonances (δ 4.5–6.5) were removed from each spectrum to eliminate distortion effects of water peak on the baseline. Additionally, the regions, δ 0.0–0.5 and δ 9.2–10.0 in the stool water spectra which contain only noise were removed. Normalisation to the total sum of residual spectrum and data scaling to unit variance were also carried out prior to pattern recognition analysis.

The covariance plot was used to aid interpretation of the significance of each metabolite from the permutation tests. The significance and validity of statistical differences were calculated using permutation test (number of permutations = 1000). The colours projected onto the spectrum indicate significance of the metabolites with the blue indicating no significance difference at P ≥ 0.05 confidence levels. To minimise false positive results, a conservative cut-off of P < 0.01 (red) were used to select differentially abundant metabolites. All NMR metabolites were validated by 2D-NMR spectroscopy in accordance with in-house standards and database (O. Cloarec, Imperial College London).

Integrated analysis of microbiome and metabolomics datasets

The relative abundance of the dominant bacterial taxa from 16S rRNA gene sequencing and the concentration of selected NMR metabolites were integrated using sparse partial least squares regression (sPLS) implemented under R mixOmics package65. Successive samples collected from the same individual were treated as independent sample considering that the biological variations between subjects are independent where different conditions are applied on the same subject during repeated measurements69. Heat map chart was constructed to observe the association of metabolomics and metagenomics features.

Distance-based linear modelling (DISTLM)

DISTLM was performed to identify the association between the clinical and demographic parameters with microbial assemblage pattern and changes in metabolic profiles68. In brief, the clinical and demographic parameters were selected and fitted to the overall changes in gut microbial composition and metabolic profiles using stepwise regression under the second-order bias-corrected Akaike Information Criterion (AIC).