Toward an improved definition of a healthy microbiome for healthy aging

The gut microbiome is a modifier of disease risk because it interacts with nutrition, metabolism, immunity and infection. Aging-related health loss has been correlated with transition to different microbiome states. Microbiome summary indices including alpha diversity are apparently useful to describe these states but belie taxonomic differences that determine biological importance. We analyzed 21,000 fecal microbiomes from seven data repositories, across five continents spanning participant ages 18–107 years, revealing that microbiome diversity and uniqueness correlate with aging, but not healthy aging. Among summary statistics tested, only Kendall uniqueness accurately reflects loss of the core microbiome and the abundance and ranking of disease-associated and health-associated taxa. Increased abundance of these disease-associated taxa and depletion of a coabundant subset of health-associated taxa are a generic feature of aging. These alterations are stronger correlates of unhealthy aging than most microbiome summary statistics and thus help identify better targets for therapeutic modulation of the microbiome.

originating from US and UK based individuals. These included 1184 samples from older individuals (age greater than 60 years), corresponding to individuals with self-reported information corresponding to 13 different diseases, namely, autism spectrum disorder (ASD), Alzheimer's disease, cardiovascular disease (CVD), cancer, diabetes, CDI, IBD, irritable bowel syndrome (IBS), kidney disease, liver disease, lung disease, migraine and small intestinal bacterial overgrowth (SIBO).  This cohort did not have any clinical measure pertaining to the health status.
f/ "He et al" Cohort (henceforth referred to as 'He') (42): This is a Chinese population cohort previously analyzed by He et al. The cohort contained 7,009 16S rRNA gene amplicon based gut microbiomes. The age range of the participants ranged from 18 to 97 years and a total of 2,434 gut microbiomes were from individuals aged >= 60 years. A total of 3,559 gut microbiome profiles originated from apparently non-diseased individuals (encompassing 1024 gut microbiomes from subjects >= 60 years). Of the remaining 3,450 gut microbiomes, the following 12 clinical complications/diseases contained at least 10 patient-derived profiles: atherosclerosis, cholecystitis, colitis, constipations, diarrhoea, fatty liver, gastritis, IBS, kidney stones, rheumatoid arthritis, metabolic syndrome and T2D. g/ LogMPie Cohort (referred to as 'LogMPie') (43): This is a pan-Indian population cohort consisting of 1,004 16S rRNA gene amplicon based gut microbiome profiles, previously investigated by Dubey et al (43). The age-range of the subjects was from 18 to 65 years.

Description of the approach for computing CLR transformations of microbiome data
The clr-transformations were performed using the clr function of the compositions package

Computation of diversity and uniqueness measures
Shannon diversity: Shannon diversities at both the taxonomic and pathway levels were computed using the 'diversity' function of the vegan package in R [47].

Bray-Curtis and Jaccard Uniqueness:
Bray-Curtis uniqueness was computed using the same strategy as adopted in the original study by Wilmanski et al [6]. For this purpose, for any of the three kinds of profile (genus abundance or species abundance or pathway abundance), within the samples belonging to a given study, we first computed all-versus-all the Bray-Curtis distance matrix using the vegdist function of the vegan R package with 'method="bray"' as the parameter. The vegan package of version 2.5.7 was utilized for all analyses in the current study.
The sample in the given study cohort with the minimum genus-level Bray-Curtis distance to the given sample was identified and the corresponding distance was assigned as the Bray-Curtis uniqueness corresponding to that sample (for that particular profile, namely genus or species or pathway). The same procedure was utilized to compute the Jaccard uniqueness values (for the three kinds of profiles), except by setting the method parameter in the vegdist function to "jaccard". We utilized the raw counts and the total sum scaled abundances for computing the Bray-Curtis and the Jaccard uniqueness measures.
Aitchison Uniqueness: Given any two samples (or microbiomes), the Aitchison distance is computed as the feature-to-feature Euclidean distances between clr-transformed abundances of each feature constituting the two microbiomes. Thus, for this purpose, we first computed an all-versus-all Aitchison distance matrix by providing the clr-transformed abundance matrix of all the samples to the vegdist function of the vegan R package and computing the distances using 'method=euclidean' as the parameter. As described previously, the Aitchison uniqueness for the microbiome was computed as the minimum distance for that sample from all samples from the same study cohort.
Kendall Uniqueness: The Kendall Uniqueness measure (for the three kinds of features) was computed as follows. As for the other uniqueness measures, given the clr-transformed microbiome profile (species/genus/pathway) we first computed all-versus-all sample-tosample Kendall distance matrices for all samples belonging to each study. For this purpose, the profile (features in rows and samples in columns) was provided as input to the cor.fk function of the pcaPP package (version 1.9.74) of R as: cor.fk(species or genus or pathway profile for all samples belonging to a given study) The resultant was a Kendall's-Tau matrix. Each cell of this matrix contained the Kendall's-Tau correlation between two samples based on the similarities in the abundance patterns of the different features. We specifically utilized the Kendall's-Tau for this purpose rather than using Spearman or Pearson correlation, because of the relatively higher robustness of this measure against sparse datasets as typically observed for microbiome profiles.

Different Distance Measures convey distinct aspects of gut microbiome variation
Bray-Curtis distance measures the variation in the counts (or relative abundances) of the different microbial features (species/genus/pathway) between two microbiomes. Jaccard distance, on the other hand, primarily measures the variation in the detection (rather than the abundance) of various features. The Aitchison distance is similar to Bray-Curtis distances but is especially tuned for investigating compositional data (the data type under which a majority of microbiome data can be classified. It is defined as the Euclidean distances (sum of the squared differences) between the centered-log-ratio (clr) transformed abundances of the different microbiome features (a transformation suited for investigating compositional datasets). Each of the above three uniqueness reflect specific aspects of variations in the abundance and detection of individual species or genera or pathways. Thus, higher values of any of the three uniqueness measures indicates higher variation in terms of the presence or abundance of taxa (or pathways) across the samples (Extended Data Figure 1). In contrast, the Kendall Uniqueness measures variations in the relative ranks of different taxa within an individual sample as compared to others from the same cohort. Low Kendall dissimilarity between two microbiomes A and B indicates that features highly abundant in sample A are also highly abundant in Sample B and vice-versa, thereby retaining the relative hierarchy in the composition of microbiome structure. Similarly, high Kendall distance between two microbiomes A and C indicates that features highly abundant in microbiome A may not be highly abundant in C thereby there is a change in the relative hierarchy in the composition of microbiome structure.

SUPPLEMENTARY NOTE S6
Details of the two step procedure adopted for computing associations between different properties In the first step, given any pair of properties (including the age), we first computed the extent of association between the properties individually within each study using Robust Linear Regression models. The significance of the associations for each model were computed using two-sided Robust F-tests (or Wald-Tests for multiple coefficients as described previously) (50).
In the second step, RLM estimates obtained individually within each individual study (now considered as study-specific effect sizes) were then investigated for consistency and significance across studies using the meta-analytic Random Effect Models across the studies as a whole or within groups of studies from the European/North American, East Asian and Other (South Asian, South American/Pacific Islands and African) geographies. The random effect models were computed using the rma function of the metafor package version 3.0.2 in R. However, for certain investigations that required a combined analysis by merging studies, the study name was encoded as 'dummy variable' and associated computed using a simple linear regression computed (lm function of the base R package 4.1.0).

Details of the gut microbiome co-occurrence network computation
Within each of the 12 study cohorts, separate microbial co-abundance networks were computed for the selected taxa for the gut microbiome profiles from the older individuals (age >= 60 years) and those from younger individuals (age < 60 years). For these two types of coabundance networks, we adopted a two-step compositionality-addressing strategy as described below. First, we computed the Kendall tau values between the clr-transformed abundances of each pair of the 107 species-level taxa. This was obtained using the cor.fk function of the pcaPP package. The p-values and the FDR corrected p-values (or the Q-values) of the associations were computed using corr.p function of the psych package (where in the p-values were corrected on a per-taxa basis). The subsequent species-to-species taxa-level co-abundance network was created by adding an edge between species-level taxa-pairs with Kendall's Tau of greater than 0 and FDR <= 0.1. This was performed using the igraph package version 1.2.8 of R (using the function of graph_from_adjacency_matrix function). The three different centrality measures, namely Betweenness, Degree and Hub-Score were then computed for the older-specific networks and the young-specific networks obtained for each of the 12 individual studies using the different functions within these packages (namely, betweenness, degree and hub_score). Each centrality measure captures a different aspect of the centrality pertaining to a given node (or taxa) of a network. For example, while degree indicates the number of nodes that a given node is connected to (or the total number of taxa a given taxa has co-abundance relationships with), betweenness refers to number of shortest paths (as compared to all possible shortest paths between vertex or node pair) that pass through the given node (in other words, the number of taxa pairs whose co-abundance relationships may be mediated by the given taxa).
Hub-score on the other hand is equivalent to the authority score of the taxa (and a generalization of the eigen-vector centrality) that primarily measures the connectedness of the nodes or taxa connected to the given taxa (where in taxa that are connected to other highly connected taxa are given higher scores). The individual centrality measures for each taxa (or node) obtained were then ranked across all taxa constituting the nodes of a given network.
The two combined networks of edges merging the patterns corresponding to the oldersubject-specific and young-specific microbiome sub-types across the 12 different study cohorts were obtained as follows. For each pair of species, we computed the overall association (model estimates and P-value of association) by investigating the individual Kendall Taus obtained for each of the study cohorts using Random Effect Model. These included either the older-subjectspecific or young-specific gut microbiome sub-types belonging to the 12 study cohorts. For each species-level taxa, the P-values obtained using Random Effect Models corresponding to each of the other 107 species-level taxa were corrected using Benjamini-Hochberg approach to obtain the FDR (or Q-Value). Species-level-pairs having an overall Random Effects Model estimate of greater than 0 and FDR <= 0.1 (and a positive Kendall Tau across at least 70% of the study cohorts) were identified as having a significant co-abundance relationship and thus a co-abundant edge amongst them. The same strategy was utilized for generating the oldersubject-specific and the young-specific consensus co-abundance networks.
For the specific co-abundant hub of putatively beneficial species-level taxa, we also checked if the same edge was reproduced in at least 50% of the other cohorts. Prevalence values of the taxa in this combined network were computed as the percentage of all the older-subjectspecific gut microbiomes across all the five data repositories in which they were detected.

Supplementary Figures
Supplementary Figure S1. Pictorial representation of the two-step Meta-analytic framework utilized for investigating different microbiome properties and age. Figure S2. Network showing the relationships between different microbiome summary indices at the level of A. Taxonomy and B. Pathways, obtained using the Random Effect Models. An edge between any two properties indicates a significant summarized association between the properties with Q <= 0.05. Positive associations are indicated in orange and negative associations are indicated in dark-blue. Figure S3. Boxplots comparing the total number of samples, total number of samples from older subjects with age >= 60 years, the minimum subject age and the maximum subject age in cohorts from Europe/North America and those from other geographical regions. The pvalues of the comparison in the distributions of these values obtained for the two major cohort groups using two-sided Mann-Whitney tests is also computed. Boxes corresponding to the boxplots indicate the inter-quartile range (with the median indicated in bold) of the values and the upper and lower whiskers extend to +1.5 X interquartile range from the third quartile (upper whisker) or to -1.5 X interquartile range from the first quartile (lower whisker). Figure S4. Boxplots showing the distribution of the correlation values between the clr-transformed abundances and the (total sum-scaled) relative abundances (for the same taxa) obtained within each individual microbiome belonging to the 28 individual studies. A. denotes the correlations between the sample-specific taxa abundances at the genus level. B. denotes the correlations between the sample-specific taxa abundances at the species level. The clr-transformed abundances of the different taxa had a Spearman correlation of greater than 0.50 based upon taxa with relative abundances in more than 75% of the samples across all 28 studies at the genus level and in 27 out of the 28 studies when the taxonomic abundances were profiled at species level. Boxes corresponding to the boxplots indicate the inter-quartile range (with the median indicated in bold) of the values and the upper and lower whiskers extend to +1.5 X interquartile range from the third quartile (upper whisker) or to -1.5 X interquartile range from the first quartile (lower whisker).  Figure S5. Boxplots showing the distribution of the correlation values between the clr-transformed abundances and the (total sum-scaled) relative abundances of the different taxa across the gut microbiomes belonging to each of the 28 individual studies. A. denotes the correlations between the sample-specific taxa abundances at the genus level. B. denotes the correlations between the sample-specific taxa abundances at the species level. The clr-tranformed abundances of the different taxa had a spearman correlation of greater than 0.50 with the relative abundances in more than 75% of the taxa across all 28 studies at the genus level and in 27 out of the 28 studies (for all taxa) when the taxonomic abundances were profiled at species level. Boxes corresponding to the boxplots indicate the inter-quartile range (with the median indicated in bold) of the values and the upper and lower whiskers extend to +1.5 X interquartile range from the third quartile (upper whisker) or to -1.5 X interquartile range from the first quartile (lower whisker). The number of samples (or gut microbiome profiles) investigated as part of each study cohort are: HMP_2019_ibdmdb:846, SankaranarayananK_2015:37, CosteaPI_2017:201, AsnicarF_2021:1098 Figure S6. Heatmap showing the individual associations of each of the 100 species-level taxa with each of the five microbiome summary statistics in the Random Effect Models based meta-analysis across all the 28 studies. This list includes a subset of taxa the 107 taxa that were identified as in Supplementary Figure S9 that had significant associations with at least one of the five microbiome summary statistics.