Multigroup analysis of compositions of microbiomes with covariate adjustments and repeated measures

Microbiome differential abundance analysis methods for two groups are well-established in the literature. However, many microbiome studies involve more than two groups, sometimes even ordered groups such as stages of a disease, and require different types of comparison. Standard pairwise comparisons are inefficient in terms of power and false discovery rates. In this Article, we propose a general framework, ANCOM-BC2, for performing a wide range of multigroup analyses with covariate adjustments and repeated measures. We illustrate our methodology through two real datasets. The first example explores the effects of aridity on the soil microbiome, and the second example investigates the effects of surgical interventions on the microbiome of patients with inflammatory bowel disease.

Conversely, the ANCOM-BC2 (SS Filter) variant also utilizes the ANCOM-BC2 methodology for the identification of significant taxa, but it distinctly omits those that failed to meet the sensitivity score filter criteria.Here, a taxon would only be declared significant if it was significant without the addition of a pseudo-count (in line with the ANCOM-BC2 default setting) and if it remained significant across a range of pseudo-count additions.For the control of false discovery rates due to multiple testing, we favored the Holm-Bonferroni method [3] over the Benjamini-Hochberg (BH) procedure [4] for all DA methods.The Holm-Bonferroni method, independent of assumptions regarding the dependence structure among the underlying p-values, is recognized as more robust when dealing with inaccurate p-values [5].In regard to data preprocessing, we have utilized a tailored protocol based on the DA methods employed.Specifically, for ANCOM-BC2, ANCOM-BC, LinDA, and LOCOM, a taxon filter was applied to exclude taxa displaying less than 10% prevalence across samples.Concurrently, samples with a library size (total counts across taxa) of less than or equal to 1000 counts were discarded from the study.In relation to the use of CORNCOB, a method specializing in differential relative abundance analysis, we also implemented an exclusionary criterion based on library size, specifically discarding any samples with counts of 1000 or less.It is important to note that the application of a taxon filter was not feasible within this method due to limitations inherent in its current implementation, which does not provide for this feature.In consideration of the computational efficiency of CORNCOB, the bootstrap function was disabled, and the Wald test statistic was employed.Beyond these specific modifications, the default parameters were retained in all DA methods unless otherwise stated, ensuring consistency in the analyses.
Continuous exposure.We generated a continuous exposure variable (cont) from a standard normal distribution (cont ∼ N (0, 1)) and introduced an adjusting binary covariate that equally divided the samples into two categories.The log fold-changes for the adjusting covariate, drawn from [0, 1], were applied to a randomly chosen subset of taxa.Sample-specific bias (S), reflective of sampling fraction differences, was set to strongly correlate with the continuous exposure (S = softmax(cont)/10) to simulate batch effects.The log-transformed sample-specific bias (s = log S) and feature-specific bias (c = log C) were applied to the log absolute abundances of samples and taxa, respectively.Each (n, p) combination was iterated 100 times, with a significance level of 0.05.Binary exposure.In our simulation study concerning a binary exposure with a continuous adjusting covariate, we modified the settings used in the continuous exposure scenario.We evaluated a sequence of per-group sample sizes: 10, 20, 30, 50, and 100, resulting in total sample sizes for the binary exposure of 20, 40, 60, 100, and 200.To simulate batch effects, samplespecific bias (S) was strongly correlated with the binary exposure, set between 1e ) for the second half.An adjusting continuous covariate was incorporated, with its log fold-changes drawn from [0, 1], and applied to a randomly selected taxa subset with non-zero log fold-changes.
Multiple pairwise comparisons (against a reference group).In our simulation study involving multiple pairwise comparisons against the reference group, we used the sample settings from the binary exposure scenario but with a three-level categorical variable as the exposure of interest and a continuous variable as the adjusting covariate.The sample sizes per group ranged from 10 to 100, resulting in total sample sizes for the categorical exposure between 30 and 300.To simulate batch effects, the sample-specific bias (S) was strongly correlated with the categorical exposure, with S values ranging from 1e − 4 to 1e − 3 (S ∼ U [1e − 4, 1e − 3]) for the first third of samples, 1e − 3 to 1e − 2 (S ∼ U [1e − 3, 1e − 2]) for the middle third, and 1e − 2 to 1e − 1 (S ∼ U [1e − 2, 1e − 1]) for the last third.The log fold-changes for DA (non-null) taxa in the second group compared to the reference group and in the third group compared to the reference group were randomly drawn from the range of [-2, -1, 1, 2].The selection of DA taxa for the two comparisons was performed randomly and could differ between the comparisons.
Multiple pairwise comparisons.In our simulation of multiple pairwise comparisons, we utilized the sample settings from the multiple pairwise comparisons against the reference group scenario.However, we expanded the analysis to include comparisons between the second group and the third group.Specifically, the log fold-changes between the second group and the third group were calculated as the difference between the log fold-changes of "the second group compared to the reference group" and "the third group compared to the reference group".
Pattern analysis.In our simulation study for pattern analysis, we largely replicated the settings from the multiple pairwise comparisons study, with an exception made for the log foldchanges for the categorical exposure, tailored to benchmark a scenario featuring a monotonically increasing pattern.Here, the log fold-change between the second group and the reference group, denoted as δ, was randomly drawn from the set [0.5, 1.0, 1.5, 2.0], and the log fold-change for the third group was fixed at δ + 1 in relation to the reference group.
Correlated samples.In our simulation study, we contemplated two scenarios: one harboring solely a random intercept effect and another hosting both random intercept and random slope effects.Both random effects were characterized by a mean of zero, with standard deviations of 1 and 1.5 for the random intercept and random slope, respectively.If both random effects were present, they were associated with a correlation coefficient of 0.5.In these scenarios, the exposure variable contained three levels, thus defining three experimental groups.A continuous adjusting covariate was also incorporated.The remaining simulation settings were maintained as previously described in the preceding sections.
Normalization and transformation used for different DA methods We provide additional details regarding the preprocessing steps for each DA method discussed in our simulation studies.For ANCOM-BC2 and ANCOM-BC, no external normalization or transformation was applied to the input data.These methodologies internally estimate and correct biases, such as sampleand taxon-specific biases in ANCOM-BC2, and sample-specific bias in ANCOM-BC, prior to conducting statistical inferences.Therefore, these methods operate on internally "normalized" or "bias-corrected" absolute abundances.In the case of LinDA, it applies the centered Log-Ratio (CLR) transformation to the input data and incorporates an internal normalization procedure as part of its bias-correction process, akin to ANCOM-BC2 and ANCOM-BC.No further external transformation or normalization steps were performed for LinDA.LOCOM, on the other hand, accepts relative abundances (proportions) as input, which can be viewed as data already subjected to a "total-sum scaling" normalization.LOCOM infers changes in absolute abundance through a transformation similar to the Additive Log-Ratio (ALR) transformation.As such, LOCOM does not require any additional external transformation or normalization.CORN-COB, designed specifically for analyzing relative abundances, includes an internal "total-sum scaling" normalization procedure.No transformation is performed on the data for CORNCOB.

FDR Adjusted Power (FAP)
We introduced a novel concept, the "False Discovery Rate Adjusted Power (FAP)", defined as ln power FDR .Consequently, a method with a lower FDR and higher power would achieve a higher FAP value, indicative of a good power/FDR trade-off.The FAP was computed using power and FDR values derived from the simulation studies conducted for both continuous and binary exposure scenarios discussed in Fig. 1 of the main paper.It should be noted that the FAP is an average of over 100 simulation runs at each setting.Note that methods with very low power but equally low FDR may also yield a high FAP.Hence, the selection of a method should not rely solely on a high FAP value, but the user should make FAP comparisons with a minimum power requirement, such as 0.6 or 0.8, etc.

Computational Efficiency and Performance Benchmarking of Various DA Methods
We evaluated the computational efficiency of the methods described in this paper using the "atlas1006" data [6].Our focus was limited to comparing the "lean" and "obese" subjects since not all methods are tailored for multi-group hypothesis testing.Also, since not all methods considered in this paper are suitable for repeated measurements, we confined our comparisons to only baseline data.This resulted in 130 genera across 630 samples.The CPU time for implementing each method is summarized as follows: LOCOM and CORNCOB were computationally most intensive, requiring close to a minute each (62.3 seconds and 59.34 seconds, respectively), whereas ANCOM-BC2 (SS Filter) was faster than these two methods clocking in at 29.47 seconds.ANCOM-BC2 (No Filter) required only 8.58 seconds and ANCOM-BC took 6.02 seconds.Although LinDA was the fastest algorithm, taking only 0.05 seconds, it may not be preferable due to potentially high FDR as reported in simulation studies.Relative to the amount of time taken to conduct microbiome studies and generate data, the CPU time taken by the above DA methods is trivial.Hence when choosing a DA analysis method, characteristics such as FDR control and power should outweigh the CPU time taken by a method.
2 Supplementary Tables Supplementary

Table 1 :
Presence/Absence Test on the Soil Microbiome Data