Describing the fecal metabolome in cryogenically collected samples from healthy participants

The chemical composition of feces plays an important role in human metabolism. Metabolomics and lipidomics are valuable tools for screening the metabolite composition in feces. Here we set out to describe fecal metabolite composition in healthy participants in frozen stools. Frozen stool samples were collected from 10 healthy volunteers and cryogenically drilled in four areas along the specimen. Polar metabolites were analyzed using derivatization followed by two-dimensional gas chromatography and time of flight mass spectrometry. Lipids were detected using ultra high-performance liquid chromatography coupled with quadruple time-of-flight mass spectrometry. 2326 metabolic features were detected. Out of a total of 298 metabolites that were annotated we report here 185 that showed a technical variation of x < 30%. These metabolites included amino acids, fatty acid derivatives, carboxylic acids and phenolic compounds. Lipids predominantly belonged to the groups of diacylglycerols, triacylglycerols and ceramides. Metabolites varied between sampling areas, some were broadly homogeneous, others varied 80%. A LASSO-computed network using metabolites present in all areas showed two main clusters describing the system, DAG lipids and phenyllactic acid. In feces from healthy participants, the main groups detected were phenolic compounds, ceramides, diacylglycerols and triacylglycerols.


Results and Discussion
The metabolome is composed of molecules belonging to a myriad of chemical groups. Untargeted analytical methods are usually tailored to detect a broad chemical group, like lipids or polar metabolites, based on abundance, purity, polarity, volatility and on the availability of bioactive groups for derivatization 31 . In this study, the preprocessing of the data resulted in the detection of 2326 features in the polar metabolome and lipidome. Among these features, 182 polar and 116 lipid species were putatively identified. Metabolites with relative standard deviations (%RSDs) higher than 30% in the pooled samples were excluded from further statistical evaluation, thus resulting in 185 metabolites fulfilling this requirement (Figs. 1 and 2). Annotated features with pool RSD < 30% are presented in Supplementary Tables S1 and S2 for polar metabolites and lipids respectively.
To study the feasibility of cryogenically collected samples, we applied additional parameters and calculated metabolite variation in four sampling areas per specimen. As an example of the data collected in this study, the first metabolite in the Supplementary Tables S1 is 3-Hydroxyphenylacetic acid. The variation between participants for this metabolite exceeded 107% and the variation found in the four sampling area was also high at 64% meaning it is not found homogenously in the stool specimens. Since the technical variation of the measurement Figure 1. Variation between heathy individuals and sampling area in feces. Each dot represents a metabolite colored by their class. Most metabolites vary more than 30% between collection area and between volunteers making the fecal metabolome highly heterogenous. is negligible at 4% we can be confident of the technical results. We then propose that this benzene derivative is found in a wide range of concentrations in the feces of the healthy population. The other benzene derivatives that were detected followed the same trend, apart from 3-Phenyllactic acid, 3-Phenylpropionic acid and 4-Hydroxyphenyllactic acid which were in similar concentrations in the four areas. "The most homogenous molecule between participants was uracil (11%) and throughout technical (7%) and between drills (18%). However, the interindividual metabolite variation is affected by water content differences between participants and it is difficult to normalize this with a limited number of samples. " the fecal metabolome and lipidome. This study allowed for the reporting of 185 molecules, adding to previous research 32 . These metabolites included amino acids, fatty acid derivates, carboxylic acids, benzene compounds and indole compounds (Fig. 1). Fatty acids were mainly present as medium chain fatty acids (e.g. hydroxylated and dicarboxylic species). The group of small carboxylic acids included metabolites from the citric acid cycle as well as hydroxy butyric and propionic acids. Among the amino acids, both regular amino acids as well as branched chain amino acids were present in the feces. In addition to these, some amines, purine derivates and pyrimidine derivates were also annotated. Additionally, the results show that the compounds with the highest variation between participants were benzene derivates (e.g. hydroxyphenyllactic acid, 3-phenyllactic acid and 3-phenylpropionic acid), polyamines and fatty acid derivates (e.g. methylsuccinic acid and 2-hydroxyisocaproic acid), and some of the small carboxylic acids (e.g. tricarballylic acid).
Many of the polar metabolites found in healthy individuals can be associated with bacterial metabolism. A detailed review on microbiota metabolism is out of the scope of this report, but some examples are hydroxyphenyllactic acid which has been found in most anaerobic bacteria and is associated with tyrosine metabolism in humans 33 . In another study, 3-phenyllactic acid is thought to display antimicrobial properties and to be associated with phenylalanine metabolism 34 . In addition, the occurrence of 3-phenylpropionic acid is associated with microbially transformed plant polyphenols such as flavanols, flavanones and tannins 35 . Putrescin is a polyamine produced by the breakdown of amino acids and mainly produced by the microbiota 36 . Other metabolites such as the product of cellulose, cellobiose, has not been reported in feces in metabolomics analyses and is a microbe-produced metabolite.
Feces are also rich in BCAAs and these can be precursors of hydroxyisocaproic acid, a bacterial end product of leucine degradation 37 . BCAAs were in the low physiological variation group in this cohort, ie. isoleucine (technical variation 6%) varied less than 30% with area of collection and between participants. In comparison to other molecules such as fumaric acid (technical variation 8%) varied biologically over the 70% bracket. With this information in 185 molecules it is possible to ascertain if a molecule that might be relevant physiologically in the healthy was excreted homogenously.
Clinical studies using the lipidome in fecal samples are sparse in the literature. A study showed its clinical potential in five prematurely born infants 16 . Van Meulebroek et al. also reported the analysis of the fecal lipidomic profile in a small cohort of healthy volunteers and type 2 diabetic patients 15 . Using adult lyophilized feces, the team identified 127 lipids, out of which 54 were frequently present in feces.
In this study we observed that the fecal samples mainly consisted of ceramides, diglycerides and triglycerides (Fig. 2). Lipids, which are commonly detected in plasma (e.g. lysophosphocholines, glycerophosphocholines and sphingomyelins), were also detected in the fecal samples but with remarkably lower coverage 38 . In fact, only seven phosphatidylcholines could be annotated in the fecal samples while as many as 78 different phosphatidylcholines were identified 38 . Similarly, we detected two sphingomyelins while 30 different sphingomyelins were detected in plasma 38 .
Lipids perform many functions, from storing energy to forming cell and organelle walls or as inflammation markers. Lipids are also a source of nutrition for the bacteria living in the gut, and a fatty diet can be a modifying factor of intestinal microbiota 39 . Interestingly, the lipid families that have been linked to metabolic syndrome were abundant in the feces of healthy volunteers. Ceramides which are metabolized from sphingomyelins in eukaryotic organisms were abundant in these participants' feces 40 . Ceramides are found in large quantities in cell membranes and contribute to cellular signaling mechanisms, particularly cell death, partially reproducing the effect of palmitic acid on insulin signaling 41 . Triacylglycerols and diacylglycerols are usually thought of as energy metabolites, although they can also be signaling molecules and have long been implicated in the occurrence of metabolic syndrome 42 . correlation between fecal lipids and polar metabolites. Two main clusters were assigned using the partial correlation network inferred with the graphical LASSO algorithm (Fig. 3). This machine learning algorithm will detect the metabolites that best describe the system out of the 185 metabolites and their relationships or correlations. One cluster was composed mostly of diacylglycerides, the other is composed mainly of amino acids and phenyl lactic acid. 3-phenyl lactic acid was the metabolite with most connections to others. Both clusters showed correlations to metabolites from different platforms. Phenyl lactic acid cluster corelated with certain ceramide species and diacylglyceride cluster correlates with a 3-hydroxy butyric acid and 3-phenyl propionic acid.
In the current screening it is not possible to estimate the source of the metabolites. They can derive from food, the microbiota converting food constituents, or they can be primary bacterial or host metabolites. According to the literature, Phenyl lactic acids are mostly associated with lactic acid bacteria and they have antifungal and antimicrobial properties 34 . Additionally, with a focus on integration with metagenomic analysis, the high variability seen in these ten healthy subjects is probably due to variations in both food intake and in microbiota composition 43 . On the other hand, ceramides are known to play a role in metabolic dysfunction 44 . The same is true for diacylglycerides, although their accumulation is thought to be less detrimental than that of triglycerides 45 .

Scientific RepoRtS |
(2020) 10:885 | https://doi.org/10.1038/s41598-020-57888-w www.nature.com/scientificreports www.nature.com/scientificreports/ There are several limitations in this study. The metabolites with good technical reproducibility were reported and analyzed further for correlations, but those classified as analytically inconsistent were not reported. Then, the differences in the metabolites by area of collection were reported, but the participants were not controlled for diet or time of food ingestion with traceable metabolites. The number of healthy participants was not representative of a healthy population, so the origin of the wide-range metabolite levels is unknown at this stage and this work will need replication in a larger cohort.
To conclude, the fecal metabolome composition was studied by applying two analytical platforms. The variation of metabolites was assessed for quality control, per area of collection and between healthy participants. Annotated metabolites included a myriad of metabolic pathways, molecules synthesized by the microbiome (phenol derivatives) and three main classes of lipids: ceramides, triacylglycerides and diacylglycerides in the feces of healthy individuals.

Materials and Methods
Samples. The samples analyzed were from healthy participants, recruited from the general population after a community call for volunteers from Odense University Hospital, under ethical approval from the regional ethics committee of Region of Southern Denmark (S-20160006G). All experiments were performed in accordance with relevant named guidelines regulations. We confirm that informed consent was obtained from all participants and that all methods involving human participants were carried out in accordance with the ethical principles of the Declaration of Helsinki.
Our inclusion criterium was age 40-75 years. We excluded volunteers in case of: (1) being on any medication, prescription or otherwise, at the time of inclusion, (2) having a chronic disease, whether medicated or not, (3) any use of antibiotics within the six months leading up to inclusion, (4) reported alcohol intake above the low-risk limit of 7 units of alcohol/week for females and 14 units/week for males, or binge drinking (≥5 units at one event). Additionally, volunteers went through a sigmoidoscopy, abdominal ultrasonography and liver elastography to screen for existing gastrointestinal disease. We also performed routine blood tests to rule out diabetes, thyroid disease, dyslipidemia and liver disease. Finally, we checked whether any medical events had occurred during the six months following inclusion. In case of a positive finding, we excluded participants.
From February 2016 to December 2016 we included nine men and one female with a mean age of 51 years (range 42-72 years) and a mean BMI of 28 kg/m 2 (range [25][26][27][28][29][30][31][32][33]. None of the volunteers smoked. One participant had a pacemaker due to arrhythmia, one participant consumed antihistamine for seasonal rhinitis, two consumed a daily vitamin supplement and two ingested fish oil. Volunteers sampled stool in their own home, within 24 hours of a scheduled visit to the research clinic. The volunteer stored the sample in his/her own freezer at −20 °C immediately after sampling. We instructed them to keep the sample on ice, in a cooling bag, during transportation. Upon arrival at the research clinic, we then transferred the stool to a −80 °C storage facility. For the sampling analyses, each original fecal sample was cryogenically drilled four times at different positions along the specimen. Once the cryogenic drilling was finalized, each drill, 200 mg sample, was mixed with an equal amount of water (50:50, w/w), homogenized and distributed into 20 mg aliquots for further extraction procedures.
The chemicals, sample preparation and instrumental analyses for GC-GC-MS and Lipidomics used in this study are available as Supplementary information.
Data pre-processing GC × GC-MS. The detected and potentially identified peaks were aligned using the Gineu software 46 . The peak count filter was set to 5 in order to filter out features, which did not occur in all samples. Additionally, retention indexes were assigned using the NIST14 and GMD libraries 47 . The Golm grouping tool was also used for assessment of groups based on characteristic ions. All features that scored less than 850 of similarity score or had more than 35 units of retention index difference were annotated as unknowns. The resulting peak table was exported. Thereafter the data were post-processed and analyzed in R 55 as described later.
Lipidomics data pre-processing. Data processing was performed using MZmine 2.28 48  Data post-processing. A schematic figure of the post-processing steps can be seen in Fig. S2. The data from each of the platforms were post-processed in the same way in R 55 : Exported peak lists were imported to R 55 and each feature was normalized against the most-correlated internal standard. Annotated metabolites were assigned with a level 3 49 . The annotation included features which had equivalent standards injected during the sequence (Level 1) and structure information acquired previously with MS2 fragmentation (Level 2). Annotations were made using our in-house database, the human metabolome database 50 and Lipid Maps 51 .
Features at level 4 (unknowns) and with over 30% coefficient of variation (CV; or, relative standard deviation) in pooled samples were discarded. Further, features with a missing value in more than 20% of samples were discarded, and remaining missing values were imputed with the k-nearest neighbor algorithm using the impute package 52 .
Statistical analysis. Statistical analysis was done in R 55 (version 3.4.2, https://www.r-project.org/) with the package limma 53 . Each feature's variation was compared with a feature-wise F-test, where the feature is the dependent variable and the categorical variable representing the individual is the independent variable. The F-statistic, the associated p-value, and the multiple-testing corrected p-values (Benjamini-Hochberg) were reported. (2020) 10:885 | https://doi.org/10.1038/s41598-020-57888-w www.nature.com/scientificreports www.nature.com/scientificreports/ Coefficients of variation (CV) were computed for each feature as follows, between pooled samples for technical variation, between sampling area for specimen heterogeneity and between participants. These feature-wise value pairs were visualized in two bubble plots for the lipidomics and metabolomics data using the ggplot2 package 54 . The metabolite/lipid category of each feature was shown by color of the data point to give an overview of the overall variation in the categories. In the integration step, the data were auto-scaled and averaged over replicates of the same individual as well as over features in each compound category. The two resulting individuals-by-compound categories data sets were then combined into a single data matrix, which was visualized as a rows-scaled heatmap using the ggplot2 54 package to show the relative abundances of each compound category between the individuals 55 .
Data driven partial correlation network of lipids and polar metabolites was computed and visualized with R-package qgraph 56 using the graphical LASSO algorithm and RIC (Rotation Information Criterion). Data were imputed and auto-scaled prior to model-fitting. In the visualization, lipids were shown as circular and polar metabolites as rectangular nodes. Positive and inverse associations between nodes were shown as brown and blue lines, respectively. Strength of the association was shown as width of the line. 3-Phenyl lactic acid was selected as metabolite with most connections and highlighted with purple color. Other nodes were colored by the respective Spearman correlation to 3-Phenyl lactic acid.

Data availability
The datasets generated during and analyzed during the current study are available from the corresponding authors on reasonable request and according to the Danish legislature for clinical trials.