Bifidobacterium species associated with breastfeeding produce aromatic lactic acids in the infant gut

Breastfeeding profoundly shapes the infant gut microbiota, which is critical for early life immune development, and the gut microbiota can impact host physiology in various ways, such as through the production of metabolites. However, few breastmilk-dependent microbial metabolites mediating host–microbiota interactions are currently known. Here, we demonstrate that breastmilk-promoted Bifidobacterium species convert aromatic amino acids (tryptophan, phenylalanine and tyrosine) into their respective aromatic lactic acids (indolelactic acid, phenyllactic acid and 4-hydroxyphenyllactic acid) via a previously unrecognized aromatic lactate dehydrogenase (ALDH). The ability of Bifidobacterium species to convert aromatic amino acids to their lactic acid derivatives was confirmed using monocolonized mice. Longitudinal profiling of the faecal microbiota composition and metabolome of Danish infants (n = 25), from birth until 6 months of age, showed that faecal concentrations of aromatic lactic acids are correlated positively with the abundance of human milk oligosaccharide-degrading Bifidobacterium species containing the ALDH, including Bifidobacterium longum, B. breve and B. bifidum. We further demonstrate that faecal concentrations of Bifidobacterium-derived indolelactic acid are associated with the capacity of these samples to activate in vitro the aryl hydrocarbon receptor (AhR), a receptor important for controlling intestinal homoeostasis and immune responses. Finally, we show that indolelactic acid modulates ex vivo immune responses of human CD4+ T cells and monocytes in a dose-dependent manner by acting as an agonist of both the AhR and hydroxycarboxylic acid receptor 3 (HCA3). Our findings reveal that breastmilk-promoted Bifidobacterium species produce aromatic lactic acids in the gut of infants and suggest that these microbial metabolites may impact immune function in early life.

H uman breastmilk is a well-adapted nutritional supply for the infant 1 . Breastfeeding provides the infant with important short-term protection against infections and may also provide long-term metabolic and immunological benefits [1][2][3] . These benefits may partly be mediated through the gut microbiota, since breastfeeding is the strongest determinant of gut microbiota composition and function during infancy [4][5][6] . Human breastmilk contains human milk oligosaccharides (HMOs), which are complex, highly abundant sugars serving as substrates for specific microbes including certain species of Bifidobacterium 7 . This co-evolution between bifidobacteria and the host, mediated by HMOs, to a large extent directs the colonization of the gut in early life, which has critical impact on the immune system 8 . Depletion of specific microbes, including Bifidobacterium, in early life has been associated with increased risk of allergy and asthma development in childhood 9,10 and has been suggested to compromise immune function and lead to increased susceptibility to infectious disease 11,12 . Despite Bifidobacterium dominating the gut of breastfed infants and being widely acknowledged as beneficial, mechanistic insights into the contribution of these bacteria and their metabolites to immune function and development are limited and have mainly focused on short-chain fatty acids 12,13 . Recent studies show that microbial aromatic amino acid metabolites including tryptophan-derived indoles 14 via activation of the aryl hydrocarbon receptor (AhR) 15,16 can fortify the intestinal barrier 17,18 , protect against pathogenic infections 15,19 and influence host metabolism 17,20,21 , making this group of microbial metabolites of particular interest in the context of early life.
Here, we show that breastmilk-promoted Bifidobacterium species, via a previously unrecognized aromatic lactate dehydrogenase (ALDH), produce aromatic lactic acids including indolelactic acid (ILA) in substantial amounts in the infant gut and that faecal concentrations of this metabolite are correlated with the capacity of infant faeces to activate AhR. We furthermore demonstrate that ILA via AhR and hydroxycarboxylic acid receptor 3 (HCA 3 )-dependent pathways impact immune functions ex vivo, suggesting that Bifidobacterium species associated with breastfeeding produce aromatic lactic acids in the infant gut breastmilk-promoted Bifidobacterium via production of aromatic lactic acids impact the immune system in early life.

Results
Bifidobacterium associate with aromatic lactic acids in the infant gut. To explore interactions between breastfeeding status, gut microbial composition and metabolism of aromatic amino acids in early life, we used 16S ribosomal RNA amplicon sequencing to infer gut microbiota composition and a targeted ultraperformance liquid chromatography-mass spectrometry (UPLC-MS) metabolomics approach to quantify 19 aromatic amino acids and derivatives thereof (Supplementary Tables 1 and 2) in faecal samples from 59 healthy Danish infants from the SKOT I cohort 22 . The SKOT I infants included were born full term, 9.1 ± 0.3 (mean ± s.d.) months of age at sampling and 40.7% were still partially breastfed (Supplementary Data 1a,b). After stratification of the 9-month-old infants on the basis of breastfeeding status (partially breastfed versus weaned), principal coordinates analysis (PCoA) of weighted UniFrac distances showed a significant separation across the first PC-axis (r 2 = 0.093, P < 0.001, Adonis test; Fig. 1a), which mirrored an increasing gradient in relative abundance of Bifidobacterium (r 2 = 0.397, P < 0.001, Adonis test; Fig. 1b). Other metadata (age, gender, mode of delivery, current formula intake and age of introduction to solid foods) did not explain gut microbiota variation to the same degree as breastfeeding status (r 2 < 0.05, P > 0.03, Adonis tests; Supplementary Data 1c) and no bacterial genera differed significantly according to these parameters (FDR-corrected P > 0.1, Mann-Whitney U-tests; Supplementary Data 1d). Principal component analysis (PCA) of faecal aromatic amino acid metabolite concentrations (Supplementary Data 1e) also suggested a minor separation by breastfeeding status, which was largely driven by three aromatic lactic acids, 4-hydroxyphenyllactic acid (4-OH-PLA), phenyllactic acid (PLA) and indolelactic acid (ILA) (Fig. 1c). Correlation analysis revealed that Bifidobacterium, but no other bacterial genera, were significantly associated with faecal concentrations of all three aromatic lactic acids (4-OH-PLA, PLA and ILA), in addition to indolealdehyde (IAld) (Fig. 1d and Supplementary Data 1f). The Bifidobacterium species (Extended Data Fig. 1a and Supplementary Data 1g) enriched in the breastfed infants, B. longum, B. bifidum and B. breve, were positively associated with the faecal concentrations of aromatic lactic acids (4-OH-PLA, PLA and ILA) and IAld (cluster 1 in Fig. 1e) but negatively associated with the faecal concentrations of aromatic propionic acids, aromatic amino acids and, to a lesser degree, with aromatic acetic acids (cluster 2 in Fig. 1e). In contrast, postweaning type Bifidobacterium species, including B. adolescentis, B. animalis/pseudolongum and B. catenulatum group 23,24 , were not significantly associated with aromatic lactic acids nor breastfeeding status (Fig. 1e). These associations were in agreement with the observation that the concentrations of the three aromatic lactic acids were higher in the faeces of breastfed than weaned infants (Extended Data Fig. 1b). Furthermore, the abundances of the three aromatic lactic acids in infant urine  showed similar positive associations with relative abundances of breastmilk-promoted Bifidobacterium species (Extended Data Fig.  1c). In addition, faecal and urinary levels of ILA were positively correlated (ρ = 0.68, P < 0.0001), showing that faecal levels of this metabolite are reflected systemically. Consistently, urine abundance of ILA, but not of PLA and 4-OH-PLA, were significantly higher in breastfed compared to weaned infants (Extended Data Fig. 1b). Together, this suggests that specific Bifidobacterium species produce aromatic lactic acids in the infant gut (Fig. 1f).

Bifidobacterium species produce aromatic lactic acids in vitro.
To confirm the ability of Bifidobacterium species detected in infants to produce aromatic lactic acids, Bifidobacterium type strains were grown anaerobically in a medium containing all three aromatic amino acids with either glucose or HMOs as sole carbohydrate sources. Analyses of culture supernatants revealed that ILA, PLA and 4-OH-PLA were produced mainly by B. bifidum, B. breve, B. longum ssp. longum, B. longum ssp. infantis and B. scardovii (Fig. 2a), in accordance with the associations observed in the 9-month-old infants (Fig. 1e). Other Bifidobacterium species, namely B. adolescentis, B. animalis ssp. lactis, B. animalis ssp. animalis, B. dentium, B. catenulatum, B. pseudocatenulatum and B. pseudolongum ssp. pseudolongum produced only low amounts of these metabolites (Fig. 2a). The ability of Bifidobacterium species to produce high levels of the aromatic lactic acids was generally convergent with the ability to use HMOs as a carbohydrate source (Fig. 2a), suggesting a link between breastmilk-promoted bifidobacteria and production of aromatic lactic acids. None of the downstream products of the aromatic lactic acids (Fig. 1f) was detected in any of the culture supernatants.
Identification of a responsible ALDH. Since it has been reported that a lactate dehydrogenase (LDH) in Lactobacillus species can convert phenylpyruvic acid to PLA 25 , we hypothesized that a corresponding enzyme would be present in Bifidobacterium species. Alignment and phylogenetic analysis of all genes annotated as ldh in the Bifidobacterium type strains included in this study, revealed four clusters (Fig. 2b). Whereas all Bifidobacterium genomes contain the ldh (here designated as type 1 ldh) responsible for conversion of pyruvic acid to lactic acid in the bifidobacterial fructose 6-phosphate shunt 26,27 , some species have an extra ldh, here designated as type 2, type 3 and type 4, respectively. In agreement with the in vitro fermentations (Fig. 2a), all prominent aromatic lactic acid-producing Bifidobacterium species contain the type 4 ldh, suggesting that this could encode a previously unrecognized ALDH. A further analysis of all available whole-genome sequenced Bifidobacterium strains showed that the type 4 ldh is universally present in B. longum, B. bifidum, B. breve and B. scardovii strains (Supplementary Table 3). Interestingly, genomic analysis of the Bifidobacterium type strains revealed that the type 4 ldh gene is part of a genetic element containing an amino acid transaminase gene (suspected to be responsible for converting the aromatic amino acids into aromatic pyruvic acids) and a haloacid dehalogenase gene (of unknown importance) ( Supplementary Fig. 4), which has been indicated to constitute an operon in B. breve 28 . Cloning of the type 4 ldh gene from the type strain of B. longum ssp. infantis (DSM20088) into a vector transformed into Escherichia coli revealed that the expression of the type 4 ldh gene indeed resulted in the appearance of PLA, 4-OH-PLA and ILA in the culture supernatant (Fig. 2c). To verify the type 4 ldh-dependent production of aromatic lactic acids in Bifidobacterium species, we generated a type 4 ldh insertional mutant strain by homologous recombination in B. longum ssp. longum 105-A ( Supplementary Fig. 5), a genetically tractable strain 29,30 containing the type 4 ldh (Supplementary Fig. 6a). The type 4 LDH amino acid sequence of the 105-A strain had >98% identity to the homologues in type strains of B. longum ssp. longum and B. longum ssp. infantis and >91% identity to B. bifidum, B. breve and B. scardovii ( Supplementary Fig. 6b) but no non-bifidobacterial homologues were found by BLAST analysis (amino acid sequence identity cutoff 60%). Cultivation of the wild type (WT), the type 4 ldh mutant strain and a complemented type 4 ldh mutant strain in a medium containing the three aromatic amino acids confirmed that type 4 ldh disruption did not impair growth in a rich medium (Fig. 2d). ILA, PLA and 4-OH-PLA accumulated in the supernatant of the WT and of the complemented type 4 ldh mutant strains but not in the type 4 ldh mutant (Fig. 2e). Importantly, the type 4 ldh mutant was not significantly compromised in its ability to convert pyruvic acid to lactic acid (Fig. 2e), supporting the distinct role of type 4 ldh in converting aromatic pyruvic acids. Further, to demonstrate in vivo production of the indicated aromatic lactic acids, we monocolonized germ-free mice with either the WT or the type 4 ldh mutant strain and found a 20-60-fold increase in their concentrations in WT versus type 4 ldh mutant monocolonized mice (Extended Data Fig. 2). Purification and characterization of the recombinant type 4 LDH enzyme revealed that it had a mass of 33.9 kDa ( Supplementary Fig. 7a), while the native molecular mass was estimated to be 71.9 kDa by size exclusion chromatography, indicating dimer formation in solution ( Supplementary Fig. 7b). Lack of added metal ions or addition of ethylenediaminetetraacetic acid (EDTA) did not reduce enzymatic activity, the optimal pH was 8.0-8.5 and the enzyme was most stable at 37 °C ( Supplementary  Fig. 7c-e). Heterotrophic effects were neither observed for fructose 1,6-bisphosphate (an allosteric effector for type 1 LDH) nor for several intermediates for aromatic amino acid synthesis 26,27 ( Supplementary Fig. 8). However, we found that phosphate served as a positive effector, suggesting that type 4 LDH is an intracellular enzyme ( Supplementary Fig. 9a,b) 15 ). For all panels, asterisks indicate statistical significance: *P < 0.05, **P < 0.01, ***P < 0.001 and ****P < 0.0001.
phosphate concentrations revealed the type 4 LDH is a K-type allosteric enzyme ( Supplementary Fig. 9b). The catalytic rate (k cat ) was moderate to high for the aromatic pyruvic acid substrates but very low for pyruvic acid (Fig. 3), in accordance with the non-impaired lactic acid production observed for the type 4 ldh mutant (Fig. 2e). Production of ILA, PLA and 4-OH-PLA from the respective aromatic pyruvic acid substrates was verified by high-performance liquid chromatography (HPLC) (Supplementary Fig. 9c). The enzyme showed highest affinity (lowest K 0.5 ) for indolepyruvic acid but highest catalytic rate for 4-hydroxyphenylpyruvic acid in the presence of 100 mM phosphate (Fig. 3). However, the catalytic efficiency (k cat /K 0.5 ) was highest for indolepyruvic acid (194 s −1 mM −1 ), followed by 4-hydroxyphenylpyruvic acid (16 s −1 mM −1 ) and phenylpyruvic acid ( Table 4 and Supplementary Data 2d). The subject-specific gut microbiota profiles revealed a highly individualized species composition (Extended Data Fig. 3c and Extended Data Fig. 4a). As expected from a cohort mainly containing vaginally born, breastfed infants, the gut microbiota was highly dominated by Bifidobacterium (average of 64.2%) and among the top ten dominating taxa, B. longum (38.5%), B. breve (9.1%), B. bifidum (7.9%), B. catenulatum group (6.4%) and B. dentium (1.7%) were found (Extended Data Fig. 3a Fig. 4a and Extended Data Fig. 4b-f). Community abundance of B. longum, B. bifidum and B. breve but not B. catenulatum group and B. dentium (Fig. 4a) matched the measured faecal concentrations of the aromatic lactic acids (Fig. 4b). On the basis of quantitative PCR (qPCR) estimated total bacterial load of all samples, we calculated absolute abundances of each bacterial taxon in the 16S rRNA amplicon dataset and defined infant-type Bifidobacterium species as the summarized abundance of B. longum, B. bifidum, B. breve and B. scardovii. We observed a significant increase in the absolute abundance of infant-type Bifidobacterium species from birth to around 6 months of age and this occurred concurrently with a progressive increase in the faecal concentrations of ILA, PLA and 4-OH-PLA and a progressive decrease in faecal abundances of HMO residuals (Extended Data Fig. 5). We confirmed by linear mixed models 31 adjusting for subject and age that the absolute abundances of the infant-type Bifidobacterium species were positively associated with faecal levels of ILA, PLA and 4-OH-PLA and additionally negatively associated with abundances of HMOs residuals in faeces (Fig. 4c). Among all bacterial taxa detected, B. longum, B. bifidum and B. breve were most strongly associated with faecal levels of ILA, PLA and 4-OH-PLA (Supplementary Data 2g). These associations were also evident within individuals when using repeated measures correlations 32 (Extended Data Fig. 6) and across individuals at each sampling point using Spearman's rank correlations (Extended Data Fig. 7). Furthermore, re-analysing the microbiome data at the amplicon sequence variant (ASV) level showed very similar results (Extended Data Fig. 8a). Finally, qPCR targeting B. longum ssp. longum, B. longum ssp. infantis, B. breve and B. bifidum confirmed the associations to aromatic lactic acids and HMOs (Extended Data Fig. 8b). Notably, we found that both subspecies of B. longum were associated with the aromatic lactic acids but mainly B. longum ssp. infantis and B. bifidum were associated with the HMO residuals in faeces (Extended Data Fig. 8b). To further corroborate our findings regarding the relevance in early life and impact of breastfeeding, we mined a published metagenomic dataset from faecal samples from a cohort of 98 Swedish mother-infants pairs 5 for bifidobacterial metagenome-assembled genomes (MAGs) containing the aldh gene. This analysis revealed a significantly higher abundance of aldh-containing MAGs in exclusively breastfed (compared to mixed-or formula-fed) infants at 4 months and in partially breastfed (compared to weaned) infants at 12 months of age (Extended Data Fig. 9). In addition, we found very low abundance of aldh-containing MAGs in the mothers and a significant decline of these MAGs in infants after introduction to solid foods (4 versus 12 months of age). We have thus established a link between breastfeeding, degradation of HMOs, abundance We noticed among the latter, infants CIG08 and CIG09 were twins, born late preterm and dominated by an operational taxonomic unit (OTU) assigned to Clostridium neonatale (Extended Data Fig. 3c and Supplementary Data 2c) in accordance with previous reports on C. neonatale overgrowth 33 and delayed Bifidobacterium colonization [34][35][36][37] in preterm infants. Infant CIG07 who also showed delayed colonization with infant-type Bifidobacterium, was mixed-fed throughout the whole period and predominantly colonized with E. coli and Clostridium species (Extended Data Fig. 3c). Infant CIG18 had relatively low faecal concentrations of aromatic lactic acids until age 172 days, when B. breve replaced B. dentium (Fig. 4e), consistent with the fact that B. dentium lacks the aldh gene while B. breve contains it ( Fig. 2b and Supplementary Table 3). Finally, in the three infants treated with antibiotics during our study, Bifidobacterium species abundances were temporarily decreased simultaneously with reduced concentrations of the aromatic lactic acids (Fig. 4f). Together, these results demonstrate that HMO-using infant-type Bifidobacterium species determine the abundance of aromatic lactic acids in the infant gut. Yet, the impact of early/late Bifidobacterium colonization, preterm delivery, exposure to antibiotics and formula supplementation with respect to bifidobacterial aromatic lactic acid production warrants further investigation.

Indolelactate modulates immune responses via AhR and HCA 3 .
The tryptophan-derived metabolite ILA was consistently measured in the faeces of breastfed infants at 0-6 months (Supplementary Table  4) and 9 months of age (Supplementary Table 2). Microbial tryptophan catabolites have been found to contribute to intestinal and systemic homoeostasis, in particular by their ability to bind the AhR 14 . Furthermore, aromatic lactic acids have been found to activate HCA 3 (ref. 38 ), which is involved in the regulation of immune function and energy homoeostasis 39,40 . In accordance with previous reports 15, 16 , we observed modest but significant dose-dependent increases in agonistic activity of ILA in both rat and human AhR reporter gene cell lines ( Supplementary Fig. 11). Furthermore, all three aromatic amino lactic acids, and especially ILA, showed very potent and dose-dependent agonistic activity towards the HCA 3 in a reporter cell line assay ( Supplementary Fig. 12), in agreement with previous reports 38,40 . To investigate the relationship between gut microbiota, aromatic amino acid metabolites and AhR signalling, the AhR activity induced by sterile-filtered faecal water from selected CIG infants ( Fig. 4d-f) was associated with the most abundant bacterial taxa (Fig. 5a) and all quantified aromatic amino acid metabolites (n = 19) in the same samples (Fig. 5b). This revealed that, among other taxa, in particular the infant-type Bifidobacterium species were positively associated with AhR activity across individuals using linear mixed models adjusted for subject and age (Fig. 5a) as well as within individuals using repeated measures correlations ( Supplementary Fig.  13a). Of all the aromatic amino acid metabolites measured, only faecal concentrations of ILA were significantly positively associated with AhR activity (Fig. 5b and Supplementary Fig. 13b).
Since ILA on absorption in the gut is circulated in the body 18 , we next asked whether ILA affects immune function via AhR and HCA 3 . Since the human AhR has adapted to sense microbial tryptophan catabolites 41 and only humans and other hominids contain HCA 3 (ref. 38 ), we isolated immune cells from human blood and assessed the impact of ILA on their function. Specifically, we cultured isolated human CD4 + T cells under T H 17-polarizing conditions and assessed IL-22 production on exposure to ILA. Interestingly, ILA induced the production of IL-22, an effector cytokine produced by T H 17 cells after AhR stimulation [42][43][44] , in a dose-dependent manner (Fig. 6a). Conversely, the addition of AhR antagonist CH-223191 inhibited IL-22 production, further corroborating that ILA acts through AhR to induce IL-22 production (Fig. 6b). We also isolated monocytes from human blood, where both AhR (ref. 45 ) and HCA 3 (ref. 46 ) are expressed, stimulated the cells with E. coli lipopolysaccharide (LPS) and interferon-gamma (IFN-γ) to induce pro-inflammatory conditions, and assessed IL-12p70 production on ILA exposure. ILA reduced pro-inflammatory IL-12p70 production in a dose-dependent manner (Fig. 6c). Addition of CH-223191 blocked the ILA-induced inhibition of IL-12p70 production, confirming that ILA also acts through AhR in human monocytes (Fig.  6d). Furthermore, ILA-induced inhibition of IL-12p70 was prevented, when using knockdown of HCA 3 by small interfering RNA (siRNA), supporting that ILA also acts as an anti-inflammatory agent via HCA 3 in human monocytes (Fig. 6e). Thus, ILA affects human immune responses via AhR and HCA 3 -dependent pathways, suggesting that Bifidobacterium-derived ILA is a highly relevant AhR and HCA 3 agonist that may impact immune responses in early life.

Discussion
The importance of intestinal commensal bacteria in regulation of the intestinal barrier function and immune development during infancy is well established 47,48 . Yet, specifically the symbiotic role of the breastmilk-promoted Bifidobacterium species, which are highly abundant in many breastfed infants, remains largely unknown.
Here, we identified an ALDH, which catalyses the last step of the conversion of aromatic amino acids into their respective aromatic lactic acids in the infant gut. This ALDH was different in terms of amino acid sequence compared to the ALDH previously identified in Clostridiales species (C. sporogenes, C. cadaveris and P. anaerobius) 18 . Furthermore, these species are not prevalent nor abundant in the infant gut and have previously been found to convert the aromatic lactic acids into aromatic propionic acids as end products 18 . We show that only the infant-type Bifidobacterium species, adapted to breastfeeding by their HMO-transport and degradation genes providing them with a colonization advantage in infant gut 13,49-52 , contain the ALDH gene. This fits the observation that Bifidobacterium species commonly isolated from the infant gut in vitro produce relative higher levels of ILA compared with adultor animal-associated Bifidobacterium species 53 . Our enzymatic assays showed strong adaptation of ALDH towards indolepyruvic acid, resulting preferentially in the formation of ILA. Importantly, our data suggest that the production of the AhR agonist ILA by breastmilk-promoted Bifidobacterium is a key determinant of AhR-dependent signalling in the gut during infancy. Previous studies have found that ILA decreases inflammation in intestinal cells through activation of AhR 54,55 . Here, we show ex vivo that ILA impacts human primary immune cells via AhR-and HCA 3 -dependent pathways. The observed dose-and AhR-dependent stimulation of IL-22 secretion by ILA may reflect a mechanism by which infant-type Bifidobacterium species impact intestinal homoeostasis in early life, as IL-22 for example provides protection against gastrointestinal pathogens [56][57][58] and promotes mucus production 57,59 and epithelial function 60 . Further, the AhR-and HCA 3 -dependent inhibitory effect of ILA on IL-12p70 secretion by monocytes may constitute a means by which infant-type Bifidobacterium species contribute to the regulation of the pro-inflammatory responses to LPS derived from Enterobacteriaceae species that also often colonize the neonatal/infant gut. While the functional implications of aromatic lactic acids remain to be fully characterized, the phenomenon observed here is probably of fundamental importance, since HCA 3 , which is only expressed in humans and other hominids 38 , is involved in the regulation of immune functions and energy homoeostasis 39,40 . Furthermore, AhR signalling is involved in protection against gastrointestinal pathogens 15 and plays a key role in enhancement of intestinal barrier function 61,62 , immune development 19,63-65 , attenuation of induced colitis 66 , autoimmunity 67-69 and metabolic syndrome 61 . In addition, ILA and PLA have been shown in vitro to have direct antibacterial 70,71 and antifungal properties 72,73 . Therefore, our findings provide a rationale for further investigation of the implications of aromatic lactic acids in infant health and immune development.

Methods
Human study populations and metadata. SKOT cohort. The discovery cohort consisted of a random subset of 59 healthy infants (30 male, 29 female) of the observational SKOT I cohort 22 . No statistical method was used to predetermine sample size. The infants were originally recruited from Copenhagen and Frederiksberg regions by random selection from the National Danish Civil Registry 74 . Inclusion criteria were single birth and full-term delivery, absence of chronic illness and age of 9 months ± 2 weeks at inclusion. Mode of delivery, gender, age at sampling, use of medication, breast-and formula-feeding prevalence, as well as exclusive and total breastfeeding duration and age of introduction to solid foods were recorded by parental questionnaires (Supplementary Data 1a,b). Anthropometrics, full dietary assessment and other relevant metadata have been published previously 4,75 . Faecal samples were obtained at 9 months ± 2 weeks of age and were stored at -80 °C until DNA extraction, as described previously 4 . Urine samples were collected by the use of cotton balls placed in the infants' disposable nappies from which the urine was squeezed into a sterile tube and stored at -80 °C. In cases of faeces in the nappy, the urine sample was discarded. The study protocol was approved by the Gender, preterm versus full-term birth, mode of delivery, infant/maternal antibiotics, feeding patterns (breastmilk versus formula) and introduction to solid foods were recorded (Supplementary Data 2a,b). The Data Protection Agency (18/02459) approved the study. The Committees on Biomedical Research Ethics for the Capital Region of Denmark confirmed that the CIG study was not notifiable according to the Act on Research Ethics Review of Health Research Projects (paragraph 1, subsection 4), as the study only concerned the faecal microbial composition and activity and not the health of the children. Informed consent was obtained from all parents of infants participating in the CIG study. In addition, parents of twins gave informed consent to publish data from the twins although the parents themselves would be able to identify their children using indirect identifiers. The parents did not receive any compensation.
Gut microbiota analysis. 16S rRNA gene amplicon sequencing. Sample preparation and sequencing were performed as previously described 4 using a subset of 59 faecal samples originating from infants participating in the SKOT I cohort and 241 faecal samples from 25 infants participating in the CIG cohort (data from a total of 28 samples were missing due to insufficient sample material (n = 1), insufficient DNA extraction/lack of PCR product (n = 20), very low number of sequencing reads (n = 6) or resemblance of community to sequenced blank buffer DNA extraction negative controls (n = 1)). Briefly, DNA was extracted from 250 mg of faeces or blank buffer negative controls (PowerLyzer PowerSoil DNA isolation kit, MoBio 12855-100) and the V3 region of the 16S rRNA gene was amplified (30 s at 98 °C, 24-30 cycles of 15 s at 98 °C and 30 s at 72 °C, followed by 5 min at 72 °C) using non-degenerate universal barcoded primers including sequencing adaptors (Supplementary Table 5) 76 and then sequenced with the Ion OneTouch and Ion PGM platform with a 318-Chip v.2. Sequences from SKOT and CIG were analysed separately. Briefly, they were demultiplexed according to barcode and trimmed as previously described 76,77 in CLC Genomic Workbench (v.8.5 CLCbio, Qiagen). Quality filtering (-fastq_filter, MAX_EE (SKOT) =2.0, MAX_EE (CIG) =1.0), dereplication, OTU clustering (-cluster_otus, minsize 4), chimaera filtering (-uchime_ref, RDP v.9 database), mapping of reads to OTUs (-usearch_global, id 97%) and generation of OTU tables (python, uc2otutab.py) were done according to the UPARSE pipeline 78 . In QIIME (ref. 79 ), OTU tables (n OTUs(SKOT) = 545, n OTUs(CIG) = 478) were filtered to include only OTUs with abundance across all samples above 0.005% of the total OTU counts (n OTUs(SKOT) = 258, n OTUs(CIG) = 145), ending up with 21,781 ± 13,110 (mean ± s.d.) reads for SKOT I and 40,156 ± 17,614 (mean ± s.d.) reads for CIG. OTU relative abundances within samples were then estimated by total sum scaling. Taxonomy was assigned to the OTUs using the rdp classifier with confidence threshold 0.5 (ref. 80 ) and the GreenGenes database v.13.8 (ref. 81 ). Estimating species composition in the CIG cohort, the OTUs detected with identical taxonomy were collapsed and using a cutoff of average relative abundance of 0.1%, only 39 bacterial species/taxa remained, representing 97.5% of total community (Supplementary Data 2c and Extended Data Fig. 3c). On the basis of PyNAST alignment of representative OTU sequences from each cohort separately, a phylogenetic tree was created with FastTree, as described previously 77 . Alpha diversity (Shannon index) and beta diversity (weighted and unweighted UniFrac distances, abundance weighted and binary Bray-Curtis were calculated in QIIME, with the sequencing depth rarefied to 2,000 (SKOT)/8,000 (CIG) sequences per sample. To investigate Bifidobacterium species composition, OTU sequences classified as Bifidobacterium according to the GreenGenes database v.13.8 were filtered to remove low-abundant OTUs (cutoff 0.1% of total Bifidobacterium) and the taxonomy of these resulting OTUs (n OTUs(SKOT) = 23, n OTUs(CIG) = 8) was confirmed by BLAST (ref. 82 ) search against the 16S rRNA gene sequence database at the National Center for Biotechnology Information (NCBI). The top BLAST hit indicated species annotation ( Supplementary Data 1g and 2e). OTUs were collapsed into Bifidobacterium species (B. longum, B. bifidum, B. breve,  B. catenulatum group, B. adolescentis, B. scardovii, B. dentium and B. animalis/ pseudolongum) on the basis of the top BLAST hit (Supplementary Data 1g and  2e). In addition, to validate the findings from the OTU analysis, we performed ASV analysis on the CIG cohort samples using the DADA2 pipeline v.1.14 (ref. 83 ) with the demultiplexed and trimmed reads and the same cutoffs as for the OTU analysis (MAX_EE = 1, ASVs filtered to include only those with abundance across all samples above 0.005% of the total ASV counts), resulting in a total of 211 ASVs and 13 ASVs assigned to Bifidobacterium (using the RDP database v.18) above the abundance cutoff of 0.1% of the total Bifidobacterium population (Supplementary Data 2f). As for OTUs, the taxonomy for individual Bifidobacterium ASVs was confirmed by BLAST search against the 16S rRNA gene sequence database at NCBI. Infant-type Bifidobacterium species were defined as the summarized abundance of B. longum, B. bifidum, B. breve and B. scardovii. CIG individuals were grouped on the basis of colonization with infant-type Bifidobacterium species, into those with early colonization (colonized within first month reaching average relative abundance >40% during first 6 months, n = 17) and late colonization (not detectable or on average <0.5% of total community within the first 3 months of life, n = 5), as well as those associated with antibiotics (at least one episode of recorded oral antibiotics during the first 6 months of life, n = 3). Bacterial load data (using the universal primers) were used to estimate absolute abundances of each microbial taxa by multiplying with relative abundances derived from 16S rRNA gene amplicon sequencing.
Identification of aldh gene/operon in Bifidobacterium strains, metagenomic data and homology searches. From the full genome sequences (available at NCBI Genome, https://www.ncbi.nlm.nih.gov/genome/) of Bifidobacterium type strains included in this study (Supplementary Table 6) all genes annotated as LDHs were aligned (gap cost 10, gap extension cost 1) and subsequently a phylogenetic tree (Algorithm = Neighbor-Joining, Distance measure = Jukes-Cantor, 100 bootstrap replications) was constructed in CLC Main Workbench (v.7.6.3, CLCbio, Qiagen). The tree was visualized by use of the FigTree software v.1.4.3 (http://tree.bio.ed.ac. uk/software/figtree/). For identification of aldh (type 4 ldh) in Bifidobacterium strains, all complete human gut-associated Bifidobacterium genomes (n = 127) including plasmids were retrieved from NCBI Genome and aldh genes were identified using NCBI tBLASTn with default settings and a cutoff of 70% identity and 70% query coverage. Aligned genomic nucleotide sequences were translated and verified to match LDHs using reciprocal BLASTx against NCBI's non-redundant database. In addition, the ALDH amino acid sequence (translated from the aldh nucleotide sequence) of B. longum ssp. longum 105-A was aligned (gap cost 10, gap extension cost 1) with the ALDH amino acid sequences of the B. longum ssp. longum, B. longum ssp. infantis, B. bifidum, B. breve and B. scardovii type strains and pairwise amino acid identity percentages were calculated in CLC Main Workbench. Potential non-bifidobacterial ALDH homologues were searched for by BLASTp analysis of the 105-A amino acid sequence against the non-redundant protein sequence database and the Swiss-Prot database using a cutoff of 60% amino acid sequence identity. Further, comparison of aldh gene cluster/operon in 12 Bifidobacterium type strains ( Supplementary Fig. 4) was conducted by pairwise alignments in MBGD (Microbial Genome Database for Comparative Analysis; http://mbgd.genome.ad.jp/). The amino acid sequences of the gene cluster from B. pseudolongum ssp. pseudolongum type strain was collected from NCBI database (https://www.ncbi.nlm.nih.gov/genome/) and was used for comparison with that from B. animalis ssp. animalis type strain. Using 193 infant samples collected at 4 and 12 months of age with data on feeding practice available and data from 98 mothers 5 , we used IGGsearch and IGGdb v.1.0.0 (ref. 88 ) to identify Bifidobacterium MAGs. MAGs were included in the analysis if they passed the following criteria: --min-reads-gene=2 --min-perc-genes=40 --min-sp-quality=75. For each Bifidobacterium MAG identified, we used the representative genome to search for aldh genes. The aldh genes were identified using NCBI tBLASTn with default settings and a cutoff of 70% identity and 70% query coverage. A total of 2 μl of the cloning mixture was transformed into 50 μl of One Shot TOP10 Competent Cells (Invitrogen, K4300-01) by gentle mix, incubation 15 min on ice and heat-shock for 30 s at 42 °C. A total of 250 μl of S.O.C medium (Invitrogen, K4300-01) was added and incubated at 37 °C for 1 h at 200 r.p.m. and subsequently spread on LB-AMP (LB supplemented with 20 μg ml -1 Ampicillin (Sigma, A9518)) agar plates and incubated at 37 °C ON. Transformants were picked and clean streaked on LB-AMP agar plates, incubated at 37 °C ON and afterwards single colonies of each transformant was inoculated into 5 ml of LB-AMP broth and incubated at 37 °C for 15 h at 250 r.p.m. Plasmid DNA was isolated (QIAprep Spin Miniprep Kit, Qiagen, 27104) from each transformant and subsequently 5 μl of plasmid DNA (80-100 ng μl -1 ) was mixed with 5 μl (5 pmol μl -1 ) pBAD forward (5′-ATGCCATAGCATTTTTATCC-3′) or reverse (5′-GATTTAATCTGTATCAGG-3′) sequencing primers (5 pmol μl -1 ) and shipped for sequencing at GATC (GATC-biotech). To remove the leader peptide in pBAD-TOPO, 10 μl of plasmid (0.1 μg) with correct insert was cut with FastDigest NcoI (Thermo Scientific, FD0563) for 10 min at 37 °C and the enzyme inactivated 15 min at 65 °C. Plasmid was ligated using 1 μl (1 U μl -1 ) T4 DNA Ligase (Invitrogen, 15224-017) for 5 min at room temperature and subsequently 2 μl of plasmid was transformed into 100 μl of chemically competent E. coli LMG194 cells by incubation on ice for 30 min, followed by heat-shock at 43 °C for 3 min and incubation on ice for 2 min. A total of 900 μl of LB medium was added and cells were incubated at 37 °C for 1 h at 250 r.p.m., before plating on LB-AMP agar plates and incubation at 37 °C ON. Transformants were picked, clean streaked and plasmid DNA isolated and sequenced as described above. A transformant with correct insert was selected for recombinant expression of the type 4 ldh gene; 2 ml of LB-AMP broth was inoculated with a single recombinant colony or the non-transformed E. coli LMG194 (negative control) and grown at 37 °C ON at 250 r.p.m. In 3× triplicates, 100 μl of the ON cultures (2×3× 100 μl of transformant culture + 1×3× 100 μl of non-transformed E. coli LMG194 culture) were diluted 100-fold into 9.9 ml of prewarmed LB-AMP/LB broth and grown at 37 °C, 250 r.p.m. until OD 600nm ≈ 0.5, at which 9 ml of culture was added to 1 ml of mix of indolepyruvic acid, phenylpyruvic acid and 4-hydroxyphenylpyruvic acid (1 mg ml -1 each). The cultures were sampled (time zero) and subsequently 100 μl of 20% l-arabinose (or 100 μl of sterile water; control for induction) was added to induce gene expression and the cultures were re-incubated at 37 °C, 250 r.p.m., before sampling at 1 and 5 h post-induction for OD 600nm measurements and assessment of production of aromatic lactic acids. For the latter, samples were centrifuged at 16,000g for 5 min at 4 °C and supernatants were stored at -20 °C for UPLC-MS analyses. Insertional mutant construction and plasmid complementation. The type 4 ldh gene (BL105A_0985) of B. longum ssp. longum 105-A was disrupted by a plasmid-mediated single crossover event as described previously 89 . The plasmid used for disruption was constructed using the In-Fusion cloning kit (Clontech Laboratories, catalogue no. 639649). E. coli DH5α was used as a host for genetic manipulation. In brief, the internal region of the ldh gene (position 142-638 of the nucleotide sequence of BL105A_0985 (ref. 90 ; Supplementary Fig. 5) was amplified by PCR using a primer pair Pr-580/581 (Supplementary Table 7) and ligated with the BamHI-digested pBS423 fragment carrying pUC ori and a spectinomycin-resistance gene 29 . The resulting plasmid pMSK127 was introduced into B. longum ssp. longum 105-A by electroporation to be integrated into type 4 ldh locus by single crossover recombination (type 4 ldh::pMSK127). Type 4 ldh disruption was confirmed by genomic PCR with a primer pair (Pr-543/546) designed to anneal outside of the gene ( Supplementary Fig. 5 and Supplementary Table 7). The amplified fragment was also sequenced to ensure the correct recombination event. Complementation plasmid pMSK128 was constructed by ligating PCR-amplified xfp (xylulose 5-phosphate/fructose 6-phosphate phosphoketolase) promoter region (Pxfp) and the type 4 ldh coding region with PstI-and SalI-digested pBFS38 (ref. 91 ) using the In-Fusion cloning kit, by which type 4 ldh was placed under the control of Pxfp. Primer pairs of Pr-598/Pr-599 and Pr-600/Pr-601 were used for amplifying Pxfp from pBFS48 (ref. 91 ) and the type 4 ldh gene from the B. longum ssp. longum 105-A genome, respectively (Supplementary Table 7). The resulting plasmid was electroporated into type 4 ldh::pMSK127 to give type 4 ldh::pMSK127/ pMSK128 (Pxfp-type4_ldh) ( Supplementary Fig. 5).

Biochemical characterization of ALDH (type 4 LDH).
Recombinant expression and purification. Type 4 LDH (BL105A_0985) was recombinantly expressed as a non-tagged form. The gene was amplified by PCR using the genomic DNA of B. longum ssp. longum 105-A as a template and a primer pair of Pr-617 (5′-GGTGGTGGTGCTCGAGTCACAGCAGCCCCTCGCAG-3′) and Pr-635 (5′-AAGGAGATATACATATGGTCACTATGAACCGC-3′). Underlined bases indicate 15 bp for In-Fusion cloning (Clontech). The amplified DNA fragment was inserted into the NdeI and XhoI site of pET23b(+) (Novagen) using an In-Fusion HD cloning kit (Clontech). The resulting plasmid was introduced into E. coli BL21 (DE3) ΔlacZ carrying pRARE2 (ref. 89 ) and the transformant was cultured in LB medium supplemented with ampicillin (100 μg ml −1 ) and chloramphenicol (7.5 μg ml −1 ). When OD 600nm reached 0.5, isopropyl β-d-thiogalactopyranoside was added at a final concentration of 0.02 mM to induce the protein expression. The culture was incubated for 4 d at 18 °C, harvested by centrifugation and resuspended in 50 mM potassium phosphate buffer (KPB; pH 7.0) supplemented with 1 mM 2-mercaptoethanol (2-ME) and 200 μM phenylmethane sulfonyl fluoride. Following cell disruption by sonication, the cleared lysate was saturated with ammonium sulfate (40-60%). The resulting precipitate was dissolved, dialysed against 20 mM KPB (pH 7.0) containing 1 mM 2-ME and concentrated by Amicon Ultra 10 K centrifugal device (Merck Millipore). The sample was then loaded onto an Affigel blue column (Bio-Rad) preequilibrated with 20 mM KPB (pH 7.0) containing 1 mM 2-ME and eluted by the same buffer containing 1 M NaCl. The The reducing reactions of phenylpyruvic acid and pyruvic acid were continuously monitored by measuring the decrease of the absorbance at 340 nm (NADH consumption). When 4-hydroxyphenylpyruvic acid and 4-indolepyruvic acid were used as the substrates, the reaction products 4-OH-PLA and ILA were quantified by HPLC after the termination of the reactions by adding 5% (w/v) trichloroacetic acid. HPLC analysis was performed using a Waters e2695 separation module (Waters) equipped with a LiChrospher 100 RP-18 column (250 × 4 mm, φ = 5 μm; Merck Millipore) at 50 °C. Following equilibration with a mixture of 10% solvent A (50% methanol, 0.05% trifluoroacetic acid) and 90% solvent B (0.05% trifluoroacetic acid) at a flow rate of 1 ml min -1 , the concentration of solvent A was linearly increased to 100% for 25 min and maintained at 100% for additional 15 min. The 4-OH-PLA and ILA were detected by a Waters 2475 Fluorescence Detector with λ ex 277 nm and λ em 301 nm and λ ex 282 nm and λ em 349 nm, respectively. The standard curves were created using the known concentrations of both compounds. Experiments were performed at least in duplicate. Physicochemical property of the enzyme was examined by using 1 mM phenylpyruvic acid as a substrate. The effects of metal ions (0.1 mM each) on the enzyme activity was examined using 50 mM MES (2-(N-morpholino) ethanesulfonic acid) buffer (pH 7.0). EDTA was added at the final concentration of 0.1, 0.5 or 1 mM. The optimal pH was determined using 50 mM KPB (pH 6.0-8.5) and TAPS (N-Tris(hydroxymethyl)methyl-3-aminopropanesulfonic acid) buffer (pH 8.0-9.0). The thermostability was evaluated by the residual activities after incubating the enzyme (1.0 mg ml -1 in 10 mM KPB (pH 7.0) containing 50 mM NaCl and 1 mM 2-ME) at the indicated temperatures for 30 min before the assay. Fructose 1,6-bisphosphate, shikimate-3-phosphate, d-erythrose-4-phosphate and phosphoenolpyruvic acid were added to the reaction mixtures at the concentrations of 0.1 and 1 mM to examine their heterotropic effects. KPB, TAPS buffer or HEPES (4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid) buffer (pH 8.0 each) containing 1 and 4 mM phenylpyruvic acid as a substrate were used. The effect of phosphate ion was analysed by adding various concentration of KPB (pH 8.0) into 10 mM HEPES buffer (pH 8.0). All experiments were conducted at least in duplicate. In the subsequent kinetic analysis, we used phosphate ion at the concentration of 100 mM because (1) no saturation was obtained for phosphate under the tested conditions ( Supplementary Fig. 9a), (2) the intracellular phosphate concentration in Gram-positive bacteria is known to be 130 mM at maximum 92 and (3) the strong homotrophic effect of the substrate phenylpyruvic acid was observed only in the presence of 10 mM phosphate ion.
In vivo monocolonization experiments. Germ-free (GF) Swiss Webster mice (Tac:SW, originally obtained from Taconic Biosciences) were bred and housed within GF isolators (Scanbur) in type II Makrolon cages (Techniplast) with bedding, nesting material, hiding place and a wooden block at the National Food Institute, Technical University of Denmark. The mice were fed an irradiated standard Altromin 1314 chow (Brogaarden) and the environment was maintained on a 12 h light/12 h dark cycle at a constant temperature of 22 ± 1 °C, with air humidity of 55 ± 5% relative humidity and change of air 50 times per hour. The GF condition of the mice before inoculation of bacteria was confirmed by plating of faecal sample suspensions on blood agar plates (Statens Serum Institut) incubated both aerobically and anaerobically. In two separate experiments, pregnant GF mice were randomized to be colonized with either B. longum 105-A WT (n = 4) or aldh (type 4 ldh) mutant (n = 5) by a single oral gavage (200 μl, ≈5 × 10 7 c.f.u. per dose) 1 week before giving birth. The monocolonized offspring (n wildtype = 21, 12 males and 9 females; n aldhmutant = 29, 18 males and 11 females) were euthanized at 4 weeks of age by cervical dislocation and dissected to collect caecal contents. Successful colonization with B. longum and absence of contamination in monocolonized offspring was confirmed by cultivation of caecal content on MRSc and blood agar plates incubated both aerobically and anaerobically. Aromatic lactic acids were quantified from caecal content. All mouse experiments were approved by the Danish Animal Experiments Inspectorate (license no. 2015-15-0201-00553) and carried out in accordance with existing Danish guidelines for experimental animal welfare.
Extraction of metabolites from faecal samples. Faecal samples (100-500 mg) from the SKOT (n = 59) and the CIG cohort (n = 267, data from two samples missing due to insufficient sample material (n = 1) and problems detecting the internal standards (n = 1)) were diluted 1:2 with sterile MQ water, vortexed for 10 s and centrifuged at 16,000g, 4 °C for 5 min. Subsequently, the supernatant liquor was transferred to a new tube and centrifuged again at 16,000g, 4 °C for 10 min. Finally, an aliquot of 150-300 µl was stored at -20 °C. All samples were later thawed at 4 °C, centrifuged at 16,000g, 4 °C for 5 min, and diluted in a total volume of 80 µl of water corresponding to a 1:5 dilution of the faecal sample. To each sample, 20 µl of internal standard mix (4 µg ml -1 ) and 240 µl of acetonitrile were added. The tubes were vortexed for 10 s and left at -20 °C for 10 min to precipitate the proteins. The tubes were then centrifuged at 16,000g, 4 °C for 10 min, and each supernatant (320 µl) was transferred to a new tube, which was dried with nitrogen gas. Subsequently, the residues were reconstituted in 80 µl of water (equalling a 1:5 dilution of the faecal sample with internal standards having a concentration of 1 µg ml -1 ), vortexed for 10 s, centrifuged at 16,000g, 4 °C for 5 min and transferred to an LC vial, which was stored at −20 °C until analysis.
Extraction of metabolites from urine samples. Urine samples (n = 49) from the SKOT cohort were thawed in a refrigerator and all procedures during the sample preparation were carried out at 0-4 °C using an ice bath. The subjects were randomized between analytical batches by placing all the samples from the each subject in the same 96-well plate. The run order of the samples was randomized within the analytical batch. Urine samples were centrifuged at 3,000g for 2 min at 4 °C. A total of 150 µl of each urine sample were added to separate wells and diluted with 150 µl of diluent (MQ water: formic acid (99.9:0.1, v/v) / internal standard mixture (100 µg ml -1 ) (90:10, v/v). A blank sample (diluent), standard mixture of external standard containing 44 biologically relevant metabolites (metabolomics standard) 93 and pooled sample containing equal amounts of each sample (20 µl) were added to spare wells as quality control samples. The plates were stored at −80 °C until the analysis. Immediately before analysis, the plates were thawed and mixed by vortex stirring for 10 min.
Extraction of metabolites from in vitro fermentation samples. Supernatants from in vitro fermentations were thawed at 4 °C, centrifuged at 16,000g, 4 °C for 10 min, before 80 µl was transferred to a new tube. To each sample, 20 µl of internal standard (40 µg ml -1 ) and 300 µl of acetonitrile were added. The tubes were vortexed for 10 s and left at -20 °C for 10 min to precipitate the proteins. Following, the tubes were centrifuged at 16,000g, 4 °C for 10 min before 50 µl of each sample was diluted with 50 µl of sterile water and transferred to an LC vial (equalling a 1:10 dilution of the sample with internal standards having a concentration of 1 µg ml -1 ). Table 1) of faecal and in vitro samples were quantified by a semiquantitative UPLC-MS method 94 . In brief, samples were analysed in random order. For the analysis of the CIG faecal samples, a pooled quality control (QC) sample was injected for every ten samples. In all cases, five standard mix solutions (0.1, 0.5, 1, 2 and 4 μg ml -1 ) were analysed once for every ten samples to obtain a standard curve for every ten samples. For each sample, a volume of 2 µl was injected into a ultraperformance liquid chromatography quadrupole time-of-flight mass spectrometry (UPLC-QTOF-MS) system consisting of Dionex Ultimate 3000 RS liquid chromatograph (Thermo Scientific) coupled to a Bruker maXis time-of-flight mass spectrometer equipped with an electrospray interphase (Bruker Daltonics) operating in positive mode.

Metabolic profiling of faecal, caecal and in vitro samples using UPLC-MS. Aromatic amino acids and derivatives (Supplementary
The analytes were separated on a Poroshell 120 SB-C18 column with a dimension of 2.1 × 100 mm and 2.7 μm particle size (Agilent Technologies) as previously published 94 . Aromatic amino acids and derivatives were detected by selected ions and semiquantified by isotopic internal standards with similar molecular structures as listed in Supplementary Table 1. The recoveries of the internal standards varied but were, relative to each other, in general rather consistent ( Supplementary Fig.  14) emphasizing that while the absolute concentrations may not be accurate due to lack of isotope-labelled internal standards for each single analyte, the relative metabolite concentrations across samples were robust with the applied LC-MS method. Data were processed using QuantAnalysis v.2.2 (Bruker Daltonics) and bracket calibration curves for every ten lumen samples were obtained for each metabolite. The calibration curves were established by plotting the peak area ratios of all of the analytes with respect to the internal standard against the concentrations of the calibration standards. The calibration curves were fitted to a quadratic regression.
For untargeted metabolomics, the raw UPLC-MS data, obtained by analysis of the CIG faecal samples in positive ionization mode, were converted to mzXML files using Bruker Compass DataAnalysis 4.2 software (Bruker Daltonics) and preprocessed as previously reported 95 using the R packpage XCMS (v.1.38.0; ref. 96 ). Noise filtering settings included that features should be detected in minimum 50% of the samples. A data table was generated comprising mass-to-charge (m/z), retention time and intensity (peak area) for each feature in the every sample. The data were normalized to the total intensity and log-transformed. Subsequently, features with a coefficient of variation >0.3 in the QC samples and features with a retention time <0.5 min were excluded from the data. Parent ion masses of HMO compounds of interest (2′FL/3FL, LNT/LNnT, 3′SL/6′SL) were searched in the cleaned dataset with 0.02 Da m/z and 0.02 min retention time tolerance. Subsequently, the identities of the features of interest were confirmed at level 1 (ref. 97 ) by tandem mass spectrometry and comparison to authentic standards (Supplementary Table 8). Of notice, HMO isomers could not be distinguished with the method applied due to identical retention times.
The raw data were converted to netCDF format using DataBridge Software v.3.5 (Waters) and imported into MZmine v.2.28 (ref. 99 ). A subset of samples was used to optimize the preprocessing parameters for the positive and negative mode data separately. Optimized preprocessing parameters are listed in Supplementary  Table 9. Data preprocessing was used with the following steps: mass detection, chromatogram builder, chromatogram deconvolution, deisotoping, peak alignment and gap filling. After the preprocessing, each detected peak was represented by a feature defined with a retention time, m/z and peak area.
The data matrix was imported into MATLAB R2015b (MathWorks). Features that were present in the blanks, were very early and late eluting (retention time < 0.30 and retention time > 9.46 min), potential isotopes, duplicates as well as features with masses indicating multiple charges were removed from the dataset using an in-house algorithm. The data were normalized using unit length normalization to correct the variation in urine concentration. Parent ion masses of the aromatic lactic acids (ILA, PLA and 4-OH-PLA) were searched in the cleaned dataset with 0.02 Da m/z and 0.02 s retention time tolerance. A linear regression model was used feature-wise to correct for batch differences and instrumental sensitivity drifts 100 . The aromatic lactic acids were confirmed at level 1 (ref. 97 ) by comparison to authentic standards and by tandem mass spectrometry using the same experimental conditions (Supplementary Figs. 1-3).
Lactic acid production by B. longum ssp. longum 105-A strains using GC-MS. The lactic acid production of the B. longum ssp. longum 105-A WT, type 4 ldh mutant and type 4 ldh complemented strains were assessed in supernatants obtained after 13 h of growth (early stationary phase) by gas chromatography-mass spectrometry (GC-MS) on methyl chloroformate derivatization using a slightly modified version of the protocol previously described 101 . All samples were analysed in a randomized order. Analysis was performed using GC (7890B, Agilent Technologies) coupled with a quadrupole detector (59977B, Agilent Technologies). The system was controlled by ChemStation (Agilent Technologies). Raw data was converted to netCDF format using Chemstation, before the data was imported and processed in Matlab R2014b (Mathworks) using the PARADISe software 102 .
Rat AhR reporter gene assay. Rat hepatoma cells (H4IIE) stably transfected with a luciferase reporter gene under the control of AhR (pGudLuc1.1) were used. The cells were kindly provided by M. S. Denison (University of California, Davis). The assay was conducted as previously described 103 , where cells were incubated for ∼22 h in Minimum Essential Medium (MEM) α with 1% fetal bovine serum (FBS) and 1% penicillin/streptomycin/fungizone. Chemical exposure was performed for 24 h and successively luminescence was measured. Cell viability was analysed by measuring ATP levels with the CellTiter-Glo Luminescent Assay according to the manufacturer's instruction (Promega). As a positive control, 2,3,7,8-tetra chlorodibenzo-p-dioxin was used. Three experiments in triplicates were conducted with five twofold dilutions of ILA and IAld ranging from 12.5 to 200 µM with a constant vehicle concentration in all wells. Further, sterile-filtered faecal water (10 mg faeces per ml MQ water) obtained from all samples (n = 119) of 11 selected CIG infants (Fig. 4d-f) were run in technical triplicates in the assay. Only mild toxicity that did not correlate with AhR-induced luminescence signal was observed for some faecal water samples.
Human AhR reporter gene assay. ILA and IAld (positive control) 15 were tested for activation of the human AhR. AhR Reporter Cells from Indigo Biosciences (catalogue no. IB06001) that include a luciferase reporter gene functionally linked to an AhR-responsive promoter were used. The assay was run according to the instructions of the manufacturer (technical manual v.6.0) with the reference agonist MeBIO as the positive control. Three experiments in triplicates were conducted with five twofold dilutions of ILA and IAld ranging from 12.5 to 200 µM with a constant vehicle concentration in all wells. No cytotoxicity was observed for any of the tests as determined by a resazurin toxicity assay.
Human HCA 3 receptor assay. The aromatic lactic acids (ILA, PLA and 4-OH-PLA) were tested for activation of the HCA 3 receptor, which is a Gα i -coupled receptor (GPCR). The cAMP Hunter eXpress GPR109B CHO-K1 GPCR Assay for chemiluminescence detection of cAMP was used (DiscoveRx Corporation, catalogue no. 95-0141E2CP2M). Following ligand stimulation of cells overexpressing the HCA 3 receptor, the functional status of the receptor was monitored by measuring cellular cAMP levels using a homogeneous, competitive immunoassay based on Enzyme Fragment Complementation technology. The assay was run in agonist mode in a 96-well plate format according to the instructions of the manufacturer (DiscoveRx Corporation) in the presence of 15 µM forskolin. Eleven threefold dilutions of ILA ranging from 0.03 to 1,574 µM and of PLA and OH-PLA ranging from 0.02 to 1,000 µM were tested twice in duplicates.

Ex vivo stimulation of human immune cells.
Human buffy coats were acquired from the Copenhagen University Hospital (Rigshospitalet) from healthy anonymous donors. Use of the buffy coat material from healthy anonymous donors was approved by the Blood bank at Rigshospitalet, Copenhagen, under the jurisdiction of Region H. Prior written informed consent was obtained according to the Declaration of Helsinki. Blood samples were handled in accordance with guidelines put forward in the 'Transfusion Medicine Standards' by the Danish Society for Clinical Immunology (www.dski.dk).

Isolation, cell culture and stimulation of T cells. Peripheral blood mononuclear cells (PBMCs) were isolated from whole blood by density centrifugation on
Lymphoprep and cryopreserved at −150 °C in FBS with 10% DMSO until the day of cell culture. For cultivation, PBMCs were thawed and CD4 + T cells isolated using EasySep Human CD4 + T Cell Isolation Kit (Stemcell, 17952) following the manufacturers protocol. In short, ~2.5 × 10 7 PBMCs were incubated for 5 min at room temperature in 500 µl of IMDM-medium containing 50 µl of CD4 + T cell isolation cocktail, followed by the addition of 50 µl of RapidSpheres. Subsequently, the volume was topped up to 2.5 ml with IMDM-medium, the cells placed in an EasySep magnet (Stemcell) and incubated at room temperature for 3 min. The pure CD4 + T cell fraction was obtained by pouring the enriched non-bound cell fraction into a new tube. Enriched CD4 + T cells were cultured in T H 17-polarizing culture medium (IMDM supplemented with 10% FCS, 20 mM HEPES (pH 7.4), 50 µM 2-mercaptoethanol, 2 mM l-glutamine and penicillin-streptomycin (10,000 U ml -1 ), 30 ng ml -1 IL-6, 10 ng ml -1 IL-1β, 0.5 ng ml -1 TGFβ-1, 10 ng ml -1 IL-23, 25 µl ml -1 ImmunoCult Human CD3/CD28 T cell activator for 3 d at 37 °C and 5% CO 2 in Falcon polystyrene 48-well plates (Thermo Fisher, 10059110). Each culture condition contained 0.2% DMSO with or without the indicated amounts of ILA and/or the AhR-inhibitor CH-223191. After 3 d of culture, supernatants were collected for ELISA and frozen down until further use. The ELISA to detect IL-22 was performed in technical duplicates using the ELISA MAX Deluxe Set Human IL-22 kit (Biolegend, 434504) following the supplied manufacturer's protocol. In short, a Nunc MaxiSorb flat-bottom 96-well plate (Thermo Fischer, 44-2404-21) was coated for 12 h at 4 °C with IL-22 coating antibody followed by four rounds of washing with PBS + 0.05% Tween-20. The washed plate was blocked with supplied assay diluent A buffer for 1 h at room temperature and 400 r.p.m., washed four more times with PBS + 0.05% Tween-20 and incubated with cell culture supernatants for 2 h at room temperature and 400 r.p.m. Serially diluted standard controls and a blank control were included as a reference. To detect bound IL-22, the plate was washed four times with PBS + 0.05% Tween-20 and incubated with IL-22 detection antibody for 1 h at room temperature and 400 r.p.m. After four further washing steps with PBS + 0.05% Tween-20, Avidin-HRP was added for 30 min at room temperature and 400 r.p.m. To detect HRP activity, the plate was washed five times with PBS + 0.05% Tween-20 followed by an incubation with Solution F substrate solution in the dark at room temperature. HRP activity was stopped after 20 min using 1 M H 2 SO 4 and the optical density recorded (absorption at 450 nm) using a PowerWave HT Microplate Spectrophotometer (BioTek Instruments). Values below limit of detection (16 pg ml -1 ) of the kit were set to LOD/2. Sources and identifiers of all reagents used are given in Supplementary  Table 10.

Statistics and reproducibility.
All experiments were performed with full factorial (biological and technical) replication. Data collection and analysis were not performed blind to the conditions of the experiments. No data were excluded from the analyses, except in the CIG cohort, six samples were omitted from the PCoA due to low read counts (<8,000), for correlation analyses between Bifidobacterium species and HMO residuals in faeces, 12 samples with no reported breastfeeding were excluded. One donor was excluded from the 200 µM indolelactate (ILA) stimulation of monocytes, since something went wrong during the stimulation. Statistical analyses were performed using QIIME v.1.9 (ref. 79 ), R v.3.1 (ref. 105 ) and GraphPad Prism v.8.1 (GraphPad Software). If data were normally distributed (evaluated by visual inspection and D' Agostino-Pearson test), parametric statistical tests were used, whereas non-parametric tests were used with non-normally distributed data. OTU distance/dissimilarity matrices were generated from OTU tables with rarefied read counts (2,000 sequences per sample for SKOT and 8,000 sequences per sample for CIG) and ordination (PCoA, beta_diversity. py script, default settings) and statistical tests (ADONIS and PERMDISP tests, compare_categories.py script, permutations = 999, default settings) of OTU distance/dissimilarity matrices were performed in QIIME. PCoA plots were illustrated (ggplot function) in R using the ggplot2 (v.3.3.3) package 106 . PCA of metabolite concentrations was performed in R using the prcomp function with zero-centring and autoscaling and illustrated using the ggbiplot function 107 within ggplot2. Spearman's rank correlations were performed in GraphPad Prism, whereas repeated measures correlation analyses and linear mixed models were performed in R using the packages rmcorr (v.0.4.3) 32 (rmcorr function with default settings) and maaslin2 (v.1.0.0) 31 , respectively. In MaAsLin2 linear mixed models (Maaslin2 function) subject and age were included as random effects and the individual faecal aromatic lactic acids or HMOs as fixed effects, with default settings, except that total sum scaling normalization was not performed when including absolute abundances of bacterial taxa. The faecal HMO data were log 10 -transformed. Heatmaps and hierarchical clustering of correlation coefficient were generated in R using the heatmap.2 function (default settings) within the gplots (v.3.1.1) package 108 and visualized in GraphPad Prism. Longitudinal metabolite and taxonomic abundance were modelled using LOESS regression and associations between taxa, metabolites and in vitro AhR activity (log 10 -transformed) were modelled using linear regression in R using the ggplot function (method = 'loess' or 'lm') within the ggplot2 package 106 . For enzyme kinetics, the parameters (k cat , K 0.5 and Hill coefficient n H ) were calculated by curve-fitting the experimental data to the Hill equation, using GraphPad Prism. Two-tailed paired or unpaired Student's t-test or two-tailed non-parametric Mann-Whitney U-test were performed when comparing two groups. For comparison of more than two groups, statistical significance was evaluated by one-way analysis of variance (ANOVA) or the non-parametric Kruskal-Wallis test. P <0.05 were considered statistically significant. When applicable, P values were corrected for multiple testing by the Benjamini-Hochberg false discovery rate (FDR) 109 using a cutoff of 0.1.
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Code availability
No custom code was used in the analyses. R scripts are available on request.