Main

The differential abundance (DA) analysis of microbial taxa between two study groups is well-studied in the literature. Often two types of parameter are considered, namely the relative abundance and the absolute abundance of a taxon in a unit volume of an ecosystem. There exist several methods in the literature that can be used for performing differential relative abundance analysis between two groups such as count regression for correlated observations with the beta-binomial (CORNCOB)1. While relative abundance (same as relative proportions) is a natural measure to consider, the DA of relative abundances has an important limitation. Specifically, differences in the absolute abundance of a single taxon between two groups may result in differential relative abundances of all taxa between the two groups2,3. While this is mathematically correct, it does not help the researcher to discover the specific taxon that was DA between the two groups.

As an alternative to differential relative abundance analysis, several methods proposed in the literature can be used for differential absolute abundance analysis (hereafter referred to as DA analysis), which is the focus of this Article. Some examples include analysis of composition of microbiomes (ANCOM)2, analysis of compositions of microbiomes with bias correction (ANCOM-BC)3, linear models for DA analysis (LinDA)4 and logistic compositional analysis (LOCOM)5. However, the methodology for multigroup DA analysis is not well-developed in the literature. Some researchers perform a series of pairwise tests with a false discovery rate (FDR) control within each pairwise comparison and pool the results from all such pairwise comparisons to interpret the data. Such a strategy does not account for the fact that multiple tests and multiple pairwise comparisons are being performed and hence the overall FDR is not controlled.

Standard procedures, such as the Benjamini–Hochberg procedure6, are designed for testing multiple hypotheses between two groups. When there are more than two groups, the standard concept of FDR, and methods controlling the corresponding error rates, need to be modified according to the study design and type of analyses to be performed7,8,9. Some examples of interest include the following. (1) Multiple pairwise comparisons, in which a dietitian may be interested in making all pairwise comparisons of the gut microbial compositions among participants receiving diets D1, D2 or D3. Furthermore, for each pairwise comparison, the goal is often to identify taxa whose abundance increased (or decreased). (2) Multiple pairwise comparisons against a specific reference group, the same as in scenario (1), but the investigator is only interested in comparing groups D2 and D3 against D1, the reference group. (3) Pattern analysis over ordered study groups, where, in some instances, an investigator may be interested in discovering trends or patterns in abundances of taxa over ordered groups, such as the health of participants, changes in climate, doses of a drug and so on. For instance, during normal pregnancy, women experience major changes in their gut and vaginal microbiome10. These changes are necessary for maternal metabolism, immune response and hormonal changes to support pregnancy and to provide healthy flora for babies at birth11,12. Thus, as the pregnancy progresses from the first to the third trimester, a researcher may be interested in discovering temporal changes in microbiota. Thus, in many scientific investigations, researchers are interested in studying changes in the microbiome over ordered conditions. The patterns of microbial abundance may not always be monotonic. They may display other shapes, such as an umbrella or an inverted umbrella with the location of the peak or trough unknown a priori. Additionally, depending on the scientific question of interest, repeated measures are taken on the same participant. Although the pattern analyses mentioned here could be accomplished by conducting a sequence of pairwise tests over adjacent ordered groups, such a strategy may have lower power than a test designed for pattern analysis, as will be demonstrated in the analysis of soil aridity data described later in this Article.

The objective of this Article is to develop methodologies for performing multigroup DA analyses. A formal methodology for performing such analyses does not appear to be available in the literature, with a few exceptions, such as ANCOM-II (ref. 13). While ANCOM-II considered the above testing problems, it does not develop a formal framework for bias correction. The more recent methodology LinDA4, which uses a model similar to the one developed in ANCOM-II, does not address the above multigroup testing problems. Thus, there is a major gap in the literature for analyzing multigroup microbiome studies, which will be filled by the methodology developed in this Article called analysis of compositions of microbiomes with bias correction 2 (ANCOM-BC2).

Although the ANCOM-BC methodology accounted for sample-specific bias, for better control of FDR, ANCOM-BC2 also accounts for taxon-specific bias. This is important because sequencing efficiencies can vary across taxa, leading to a taxon-specific bias when some taxa are preferentially measured over others during sequencing. For example, gram-positive bacteria have stronger cell walls than gram-negative bacteria, making them harder to extract during the data preparation step. Consequently, gram-positive bacteria may be underrepresented in the observed counts, leading to biased results if taxon-specific biases are not properly accounted for in the analysis14. Also, it is well-known that small effect sizes are associated with small variances in high throughput data15. Consequently, in such cases, the value of the test statistics is inflated, resulting in a highly significant P value. Inspired by the significance analysis of microarrays (SAM)15 methodology, we regularize the variance to avoid inflated values for the test statistics and hence moderate the P values for a better control of FDR. Lastly, zeros are a common problem for log-abundance based DA methods, including ANCOM-BC. Often such methods use pseudo-counts to deal with zero before taking logarithms. However, the choice of pseudo-count can affect the results for rare taxa containing excess zeros, which potentially leads to an inflated FDR13,16,17. To mitigate this issue, we conduct a sensitivity analysis to filter a DA taxon that potentially is a false positive. Details of the procedure are provided in the Methods section.

Using constrained statistical inference-based methods7 and mixed directional FDR (mdFDR) methods for multiple pairwise comparisons8,9, along with the above-noted modifications to ANCOM-BC, in this Article we develop ANCOM-BC2 for multigroup microbiome studies. ANCOM-BC2 allows modeling covariates as well as repeated measures. The performance of ANCOM-BC2 is evaluated using extensive simulation studies under a variety of settings. ANCOM-BC2 is also illustrated using two publicly available data, namely soil microbiome data and irritable bowel disease data.

Results

Simulations: settings

Inspired by applications, we conducted simulation studies under various scenarios incorporating different exposure types and covariate adjustments. We compared the performance of ANCOM-BC2, with ANCOM-BC (ref. 3), as well as state-of-the-art DA methods for absolute abundances: (1) LinDA4 and (2) LOCOM5. Although designed for relative abundances, CORNCOB, a DA method based on beta-binomial regression model, was also included in the simulation studies.

The absolute abundances were simulated using the Poisson log-normal (PLN) model as done in linear decomposition model framework18. The PLN model postulates that absolute abundance follows a Poisson distribution with a multivariate log-normal distribution for the mean. The population mean and covariance matrix for absolute abundance in the PLN model were derived from the upper respiratory tract (URT) microbiome data, featuring 60 samples and 382 operational taxonomic units (OTUs), extracted from the original 856-OTU dataset19. OTUs present in less than 5% of samples were omitted. It is important to note that ANCOM-BC2 is not based on PLN model and thus, this simulation set-up does not inherently favor ANCOM-BC2 over the competing methods described in this Article.

Motivated by the limitations of ANCOM-BC identified through our experience and in the literature, we conducted an exhaustive simulation study that includes edge cases where ANCOM-BC performs poorly. Additional details regarding the simulation design are provided in Extended Data Fig. 1. Many DA methods implicitly assume that many taxa (for example, more than 50%) are not DA. To understand the breakdown point of various methods, we varied the proportion of DA taxa from 5 to 90%. Our evaluation of pseudo-count effects on zeros led to two ANCOM-BC2 versions: ANCOM-BC2 (no filter) and ANCOM-BC2 (SS filter, where SS denotes sensitivity score), detailed in the Methods section. Notably, ANCOM-BC2 (SS filter) is intrinsically more conservative. For the control of FDR due to multiple testing, we favored the Holm–Bonferroni method20 over the Benjamini–Hochberg procedure6 for all DA methods. The Holm–Bonferroni method, which allows arbitrary dependence structure among the underlying P values, is recognized to be robust to some extent for inaccurate P values21, a common problem with all DA methods. Further information regarding the simulation study set-up is provided in the Supplementary Methods.

Simulations: continuous and binary exposures

Figure 1a presents the simulation results when the exposure variable is continuous. Both versions of ANCOM-BC2 had smaller FDR compared to other methods. ANCOM-BC2 (SS filter) consistently controlled FDR below the nominal level of 0.05. By contrast, the FDR of ANCOM-BC2 (no filter) increased with sample size, a consequence of excess zeros across the distribution of the exposure variable, which is more likely to generate false positives with a larger sample size. Both versions of ANCOM-BC2 generally outperformed all other methods, with ANCOM-BC2 (no filter) achieving the highest power. Conversely, all competing methods had considerably higher FDR than both versions of ANCOM-BC2. For instance, the FDR of LOCOM ranged from 5 to 40%. Similarly, LinDA and ANCOM-BC had FDRs ranging from 5 to 70%. LOCOM experienced a substantial decrease in power for small sample sizes. For example, the power was as low as 20% for n = 10. Although ANCOM-BC and LinDA had larger powers, they suffered from high FDR, exceeding the nominal level in most scenarios. We further note that as the sample size increased, the FDR of ANCOM-BC, LinDA and LOCOM increased. This suggests a systematic bias within these test statistics. The FDR of CORNCOB, a method designed for DA of relative abundances, consistently exceeded the nominal level and reached its maximum when a large number of taxa were differentially abundant (between 20 and 50%). This is attributed to the fact that differential absolute abundance in a single taxon could induce differential relative abundance of many null taxa2,22.

Fig. 1: FDR and power comparisons for continuous and binary exposures.
figure 1

a,b, The FDR and power of various DA methods for continuous (a) and binary exposures (b) are summarized. Synthetic datasets were generated using the PLN model18 based on the mean vector and covariance matrix estimated from the URT dataset19. The x axis represents the sample size (or sample size per group for the binary exposure), and the y axis shows the FDR or power. The dashed lines denote the nominal level of FDR (FDR = 0.05). The proportion of true DA taxa are provided in the top of each panel. The mean estimated FDR (or power) ± standard errors (indicated by error bars) derived from 100 simulation runs are provided in each panel.

Figure 1b presents the simulation results for DA analysis for a binary exposure. These results are generally consistent with those presented in Fig. 1a. The FDRs of competing methods were substantially inflated compared to the two versions of ANCOM-BC2, and those FDRs monotonically increased with sample size. The two versions of ANCOM-BC2 consistently maintained lower FDR than all competing methods. Similar to the continuous exposure variable case, ANCOM-BC2 (SS filter) always controlled the FDR at the nominal level, whereas ANCOM-BC2 (no filter) controlled FDR at the nominal for small to moderate sample sizes. For large sample sizes (for example, more than 50), it failed to control FDR within the nominal level but still had substantially lower FDR than LOCOM, LinDA, ANCOM-BC and CORNCOB. However, ANCOM-BC2 (no filter) had the highest power among all the methods. On the other hand, ANCOM-BC2 (SS filter) sacrificed about 10% of power, a concession that enables the control of FDR across all simulation settings.

To evaluate the power and FDR trade-off across the diverse DA methods, we computed the FDR adjusted power (FAP), as detailed in the Supplementary Methods. This measure (not a probability) is represented in relation to power in Extended Data Fig. 2. An elevated FAP indicates a superior power and FDR trade-off for a given power. Extended Data Fig. 2a corresponds to the continuous exposure case and Extended Data Fig. 2b pertains to the binary exposure case. From the cumulative distribution plots, we see that for any given power, both versions of ANCOM-BC2 have stochastically larger FAP values than all other methods (that is, their cumulative distribution functions are more to the right), with ANCOM-BC2 (SS filter) being stochastically the largest. Since, in practice not all methods have the same FDR, hence to account for the power and FDR trade-off, we advocate the use of FAP as a measure for comparing DA methods.

Simulations: multiple groups

The simulation settings for multigroup comparisons mimic those outlined in the previous section.

Multiple pairwise comparisons against a reference group

We assessed the performance of ANCOM-BC2 (SS filter) and ANCOM-BC2 (no filter), ANCOM-BC and LinDA across three experimental groups with covariate adjustments. LOCOM and CORNCOB were not included because they are not designed for multiple groups. As illustrated in Fig. 2a, both versions of ANCOM-BC2 yielded smaller mixed directional FDR (mdFDR)8,9, compared to other methods. Note that mdFDR accounts for errors due to multiple testing, multiple comparisons and directional errors. Specifically, ANCOM-BC2 (SS filter) effectively controlled mdFDR below the nominal level of 0.05. Although in some cases it results in a loss of about 10–20% power, it ensures more stringent mdFDR control. Even with this power reduction, ANCOM-BC2 (SS filter) maintains a robust power (more than 0.8) in most scenarios. Without the filter, ANCOM-BC2 (no filter) remains to be the most powerful DA method of all. Despite its mdFDR occasionally surpassing 0.05 for larger sample sizes (more than 50), it was still markedly better than both LinDA and ANCOM-BC, which struggled to control mdFDR efficiently.

Fig. 2: FDR (mdFDR) and power comparisons for multiple exposure groups.
figure 2

ac, The FDR (mdFDR) and power of various DA methods for multiple pairwise comparisons against a reference group (a), multiple pairwise comparisons (b) and pattern analysis (c) are summarized. Synthetic datasets were generated using the PLN model18 based on the mean vector and covariance matrix estimated from the URT dataset19. The x axis represents the sample size per group, and the y axis shows the FDR (mdFDR) or power. The dashed lines denote the nominal level of FDR (FDR = 0.05) or mdFDR (mdFDR = 0.05). The proportion of true DA taxa are provided in the top of each panel. The mean estimated FDR (or power) ± standard errors (indicated by error bars) derived from 100 simulation runs are provided in each panel. Within the context of multiple pairwise comparisons, ANCOM-BC2 (SS filter) effectively controlled FDR (mdFDR) while maintaining power similar to ANCOM-BC2 (no filter).

Multiple pairwise comparisons

We assessed ANCOM-BC2’s performance when making all possible pairwise comparisons instead of comparing against a specific reference group as done above. Since the competing methods considered in this Article are not currently designed for multiple pairwise comparisons, they are excluded. As depicted in Fig. 2b, ANCOM-BC2 (SS filter) effectively controlled the mdFDR below the nominal level of 0.05 while maintaining substantial power (more than 0.8) in most scenarios. However, as seen above, ANCOM-BC2 (no filter) controlled mdFDR within the nominal level for small sample sizes or when a large proportion of taxa are differentially abundant. However, when the sample sizes are large (for example, more than 50), it had an inflated mdFDR exceeding the nominal level.

Pattern analysis

Pattern analysis is another unique feature of ANCOM-BC2. In this simulation study, we modeled a scenario demonstrating a monotonically increasing pattern. Here, the log fold-change (denoted by δ) among the DA (or nonnull) taxa between the second group and the reference group ranged from 0.5 to 2.0, and the log fold-change of the third group relative to the first group was taken to be δ + 1. In this setting, a ‘discovery’ in pattern analysis refers to the identification of a taxon that displays a monotonically increasing pattern across all three groups. As described in Fig. 2c, both versions of ANCOM-BC2 controlled the FDR while maintaining high power exceeding 0.8 in most scenarios. Nonetheless, under the most extreme scenario where 90% of taxa were truly differentially abundant, ANCOM-BC2 encountered a power loss. The observed power loss is largely due to ANCOM-BC2’s built-in bias correction, which assumes that there is a sufficient number of null taxa.

Simulations: correlated samples

In this section, we evaluated the performance of ANCOM-BC2 in comparison to LinDA when the samples across experimental groups were correlated, such as in a repeated measurement design. We also considered linear mixed model (LMM) on CLR-transformed data (LMM-CLR), a method commonly used for repeated measurements. The interpretation of LMM-CLR results differs from the previously mentioned DA methods. According to LMM-CLR, a taxon is nonnull if it is differentially abundant relative to the geometric mean of all taxa, not its absolute. We included this method in our simulation study due to its frequent application in repeated measures analyses of microbiome data. ANCOM-BC, LOCOM and CORNCOB were excluded in this simulation as none of them are equipped to handle correlated experimental groups. We considered mixed-effects models with: (1) a random intercept and (2) a random intercept and a random slope. The random intercept had a standard deviation of 1 and the random slope had a standard deviation of 1.5, and both had mean zero. If both random effects were present, the correlation coefficient between them was set to 0.5. In each of these scenarios, the exposure variable consisted of three levels (that is, three experimental groups). The simulation study also included a continuous covariate. The remaining simulation settings adhered to those described in the previous sections (details in Supplementary Methods section). The simulation results for both scenarios are provided in Fig. 3. In each case, as in all previous settings, ANCOM-BC2 (SS filter) effectively controlled the mdFDR at or below the nominal level of 0.05, while maintaining substantial power (more than 0.8) in most of the simulation settings. On the other hand, ANCOM-BC2 (no filter) consistently exceeded the nominal mdFDR level of 0.05. Despite this, it had a larger power and smaller mdFDR than LinDA and LMM-CLR across all settings. LMM-CLR, generally exhibited the lowest power among all methods while having inflated mdFDR across all simulation scenarios. Notably, LMM-CLR’s rate of mdFDR rise was the most rapid with increasing sample size relative to the other methodologies.

Fig. 3: mdFDR and power comparisons for correlated samples.
figure 3

a,b, The mdFDR and power of various DA methods in a random intercept model (a) and a random coefficients model (b) are summarized. Synthetic datasets were generated using the PLN model18 based on the mean and covariance estimated from the URT dataset19. The x axis represents the sample size per group, and the y axis shows the mdFDR or power. The dashed lines denote the nominal level of FDR (FDR = 0.05) or mdFDR (mdFDR = 0.05). The proportion of true DA taxa are provided in the top of each panel. The mean estimated mdFDR (or power) ± standard errors (indicated by error bars) derived from 100 simulation runs are provided in each panel.

Additional simulation studies

In addition to the URT data, we also analyzed a subset from the Quantitative Microbiome Project23, comprising 106 samples and 91 OTUs. The findings paralleled those from the URT dataset (Extended Data Figs. 35).

Soil microbiome and aridity

Recently, Neilson et al.24 investigated the differences in soil microbiomes according to soil aridity in the Atacama Desert in Chile. They classified soil samples into three ordered categories based on aridity, namely, arid, margin and hyper-arid, and sequenced data from 63 sample pits from 18 sites in the desert. Since they did not perform DA analyses of those data, we reanalyzed those data using the ANCOM-BC2 methodology. To begin with, we conducted a pattern analysis of richness with respect to the ordered aridity categories (arid to hyper-arid) (Fig. 4a). Using a constrained inference-based trend test7, executed using ORIOGEN25 with 10,000 bootstraps, we discovered a significant loss of richness with the increase in aridity (P = 0.0001). This finding is consistent with Neilson et al.24.

Fig. 4: DA analysis of desert soil microbial genera with increasing aridity.
figure 4

a, Violin plot illustrating the relationship between aridity and microbial richness. Samples encompass 63 biologically independent pits obtained from 18 distinct Atacama Desert sites in Chile24. Each violin’s median value is signified by a central black dot, while the interquartile range is represented by a black bar. The violin’s width mirrors the density of data points at each richness value. Individual data points are also displayed as jittered dots. A trend test using the constrained inference-based approach7 suggests a significant decline in richness with increase in soil aridity (P = 0.0001). b, ANCOM-BC2 (no filter) pattern analysis heatmap in relation to aridity. Monotonically increasing and decreasing trends were evaluated across ordered soil categories, with arid soil as the reference. The columns denote soil categories and the significant genera identified by ANCOM-BC2 pattern analysis are provided in the rows. Each cell color represents abundance change: blue indicates reduction and red signifies increase. The log fold-changes relative to the reference group (arid group) are noted in each cell. The Holm–Bonferroni method was used for multiple testing correction. Genera represented in black are significant without a multiple testing correction, whereas those highlighted in green are significant after multiple testing correction. Additionally, genera marked with an asterisk are also significant after applying the ANCOM-BC2 (SS filter).

Next, we conducted a pattern analysis using ANCOM-BC2 (no filter) to identify trends in microbial abundance across the ordered soil categories, with arid soil serving as the reference group. Significant genera are presented in Fig. 4b. Genera in green were determined to be significant after adjusting for multiple testing. Additionally, genera denoted by an asterisk were also identified as significant when the conservative ANCOM-BC2 (SS filter) was applied. Blastococcus, Rubrobacter and Thermobaculum increased in mean absolute abundance with soil aridity (P < 0.05). The trend in Blastococcus was significant even after adjusting for multiple testing (adjusted P < 0.05) (Fig. 4b). Thermobaculum is known for its thermophilic properties, with some species thriving in temperatures up to 90 °C (ref. 26). It has also been documented to possess antimicrobial-resistant genes27,28. Similarly, the two Actinobacteria genera, Blastococcus and Rubrobacter, are also known for their antibacterial resistance29,30. Thus, using ANCOM-BC2, we discovered genera that increased in abundance with aridity and may be antibacterial-resistant.

Elevated aridity in desert ecosystems has profound implications on soil health. For instance, increasing aridity in desert soils has been found to significantly diminish nitrogen-cycling microbes. Notable among the affected microbial taxa are Nitrobacter, a common contributor to nitrification, and potential widespread nitrogen fixers such as Sinorhizobium, Rhizobium and Azospirillum. These taxa were not detected in samples obtained from hyper-arid environments based on the results of the presence and absence test (Supplementary Table 1). In agreement with these findings, the ANCOM-BC2 (no filter) pattern analysis also revealed that increasing aridity correlates with significant reductions in beneficiary genera (Fig. 4b). The ANCOM-BC2 trend analysis revealed a significant decrease in the mean absolute abundance of Jiangella, Kaistobacter, Planctomyces and Pseudonocardia in relation to soil aridity (P < 0.05). Among them, Kaistobacter and Pseudonocardia remained significant after adjusting for multiple testing, and the result for Pseudonocardia did not change when the conservative ANCOM-BC2 (SS filter) was used. Pseudonocardia has been recognized for its nitrogen-fixing properties31 and its significance to biotechnology stems from its ability to synthesize secondary metabolites with antibacterial, antifungal and antitumor properties32. Likewise, Kaistobacter is known to foster homeostasis within soil microbial communities and acts as a suppressor of soil-borne pathogens33. Moreover, Jiangella, a halotolerant actinobacterium, is distinguished by its association with nitrate solution, sulfonate transport systems, nitrite reductase and nitrogen fixation34.

Gut microbial composition of patients with IBD

We illustrate ANCOM-BC2 using a longitudinal inflammatory bowel disease (IBD) dataset obtained from Fang et al.35 to investigate the changes in the gut microbiome following gastrointestinal surgery in patients with IBD. The data in this study are based on 322 stool samples collected from 125 patients. Of these, 46 patients were diagnosed with ulcerative colitis and 79 with Crohn’s disease. Stool samples were obtained from each participant at approximately 6-month intervals, beginning at the baseline time point. Specifically, 21 patients provided one sample, 38 patients provided two samples, 41 patients provided three samples, 23 patients provided four samples and two patients provided five samples. Of the total patient population, 87 (70.0%) had no history of intestinal surgery, while 22 patients with Crohn’s disease had undergone ileocolonic resection and 13 patients with Crohn’s disease and three patients with ulcerative colitis had undergone different types of colectomy. These surgeries occurred before the collection of the baseline stool sample. For the purposes of this study, we focused on comparing the microbial compositions between patients who had not undergone gastrointestinal surgery, those who had undergone ileocolonic resection and those who had undergone colectomies. We adjusted the ANCOM-BC2 model for IBD disease type (ulcerative colitis versus Crohn’s disease) and two potential confounders, namely disease state (inactive versus active) and antibiotic use (absent versus present).

We performed multiple pairwise comparisons among the three groups controlling the overall mdFDR at 0.05 using ANCOM-BC2 (no filter). The results are depicted in Fig. 5. The log fold-changes emphasized in green are significant after adjusting for mdFDR. Further, changes marked with an asterisk were also significant by ANCOM-BC2 (SS filter) method. Ileocolonic section is the surgical removal of the diseased section of the ileum, which is the junction area between the small and last intestines. By contrast, colectomy is the surgical removal of most or all of the large intestine. Our analysis revealed that almost no microbial species were differentially abundant between the two surgical groups of patients, except for F. prausnitzii, which is more abundant in the colectomy group.

Fig. 5: Heatmap of ANCOM-BC2 (no filter) pairwise analysis evaluating the impact of surgical resection on microbial species.
figure 5

In a cohort of patients with IBD35, the analysis entailed multiple pairwise comparisons among three distinct groups: ileocolonic resection, colectomy and no intestinal surgery, while maintaining an overall mdFDR at 0.05. The columns denote the specific comparisons: ileocolonic resection versus no intestinal surgery, colectomy versus no intestinal surgery and ileocolonic resection versus colectomy. The rows list significant species as identified by ANCOM-BC2. Each cell is color-coded to represent significant changes in absolute abundance: blue represents reduced abundance and red indicates increased abundance. Multiple testing corrections were performed using the Holm–Bonferroni method. The text within each cell represents the log fold-change value. The log fold-change values displayed in black represent significant changes without adjustment for mdFDR, whereas those in green are significant after applying mdFDR control. Furthermore, values with an asterisk are significant following the application of the ANCOM-BC2 (SS filter).

We observed marked reductions in the absolute abundance of several commensal gut bacterial species in patients who had undergone either ileocolonic resection or colectomy, in comparison to patients without any history of intestinal surgery. The affected species included Bacteroides spp. (ovatus and uniformis), Faecalibacterium prausnitzii and Roseburia faecis. Of particular note is the significant decrease in Faecalibacterium prausnitzii in patients subjected to ileocolonic resection. This reduction remained noteworthy even after using the conservative ANCOM-BC2 (SS filter) together with multiple testing corrections. A crucial aspect to consider is that most of these bacterial species are intrinsically involved in the production of short-chain fatty acids such as acetate, propionate and butyrate36,37,38,39,40,41,42. These short-chain fatty acids are essential for maintaining gut health, bolstering gut barrier function, exhibiting anti-inflammatory properties and serving as energy sources for colonocytes. Thus, the surgical intervention on these patients, which was necessary, may have unintended effects on the host’s immune response and overall health due to the reduction of some important gut microbiota.

Discussion

In this article, we introduced a general framework called ANCOM-BC2 for performing DA analysis when the exposure variable is continuous, binary or (ordered) categorical. The proposed methodology allows for adjusting for covariates and repeated measures (longitudinal measures) while controlling for FDR, or mdFDR when the exposure variable has more than two groups and the researcher is interested in inferring whether the absolute abundance of a taxon increased or decreased within each pairwise comparison. Furthermore, using the theory of constrained statistical inference, ANCOM-BC2 allows researchers to infer patterns in microbial absolute abundance over ordered categories of exposure variables. For example, it allows a researcher to test whether a particular microbe increased (or decreased) in absolute abundance over ordered disease categories (very healthy to least healthy). This is a unique feature of ANCOM-BC2.

Driven by observed shortcomings of ANCOM-BC in specific edge cases, highlighted in our work and recent literature, we tailored our simulation study to evaluate ANCOM-BC2’s performance in these scenarios as well. The results of our simulation study demonstrate that ANCOM-BC2 provides a better FDR control over competing methods tested here while maintaining high power. In particular, ANCOM-BC2 (SS filter) consistently controlled the FDR or mdFDR below the nominal level in all simulation settings considered in this Article while maintaining high power. By contrast, ANCOM-BC2 (no filter) emerged as the DA method with the highest power, displaying a smaller FDR or mdFDR when compared with competing methods other than ANCOM-BC2 (SS filter). According to the FAP score introduced in this Article, ANCOM-BC2 (SS filter) and ANCOM-BC2 (no filter) had stochastically larger FAP scores than competitors with ANCOM-BC2 (SS filter) having the highest score. In terms of practical application, we endorse the use of ANCOM-BC2 (no filter) for small to moderate sample sizes (for example, n ≤ 50) when repeated measurements are absent. For larger sample sizes (for example, n > 50) or in cases of repeated measures, ANCOM-BC2 (SS filter) is recommended due to its superior FDR control. In pattern analyses, both ANCOM-BC2 (no filter) and ANCOM-BC2 (SS filter) perform equally well in terms of FDR control within the nominal level, although ANCOM-BC2 (no filter) demonstrates a marginally superior power.

The power of ANCOM-BC2’s pattern analysis was demonstrated in the soil microbiome data analyzed in this Article. When standard pairwise analyses were performed, only Pseudonocardia was differentially abundant across different groups (data not shown). However, using the pattern analysis, we discovered several taxa display increasing or decreasing trends over the ordered soil aridity groups. This is because, unlike pairwise comparisons, pattern analysis uses constrained inference methods, which ‘borrow’ information from ordered groups, thus increasing the effective sample size and the power7,43,44.

The ileocolonic section and colectomy are procedures that surgically remove different regions of the intestines, and yet based on our analysis of the IBD data, there were no significant differences in the absolute abundance of most of the gut bacteria in these two groups. Furthermore, the two groups of patients have similarly reduced absolute abundances of certain bacteria relative to those who did not undergo either of the two surgeries. Based on these findings, it may be reasonable to hypothesize that most species of gut microbiota are spatially uniformly distributed in the ileum and large intestines.

Methods

Notation

Notations used in the ANCOM-BC2 methodology are summarized in Table 1. The overall procedure of the ANCOM-BC2 methodology is summarized in Extended Data Fig. 6.

Table 1 Summary of notation

ANCOM-BC2 for fixed-effects models

Model assumptions

Assumption 1

Multiplicative model for observed counts:

$${O}_{ij}={S}_{i}{C}_{j}{A}_{ij}{E}_{ij}.$$

Assumption 1 indicates that, in expectation, the observed counts of a taxon in a random sample is in constant proportion to the true absolute abundance in a unit volume of the ecosystem of the sample. This proportion can be decomposed into two parts: (1) sample-specific sampling fraction and (2) taxon-specific sequencing efficiency.

According to Assumption 1, for nonzero observed count, the above multiplicative model can be transformed into an additive model by log transformation

$${o}_{ij}={s}_{i}+{c}_{j}+{a}_{ij}+{e}_{ij}^{(o)}.$$

Assumption 2

Linear model for log true absolute abundances: for each taxon j, aij, i = 1, …, n are independently distributed, and

$${a}_{ij}={{{{{\bf{b}}}}}_{{{{{j}}}}}}^{T}{{{{\bf{x}}}}}_{{{{{i}}}}}+{{e}}_{ij}^{(a)},$$

where

  1. (1)

    \({{{{\bf{x}}}}}_{{{i}}}={(1,{x}_{i1},{x}_{i2},\ldots ,{x}_{ip})}^{T}\) are the covariates of interest (including the intercept) for the ith sample,

  2. (2)

    \({{{{\bf{b}}}}}_{{{{{j}}}}}={({b}_{j0},{b}_{j1},{b}_{j2},\ldots ,{b}_{jp})}^{T}\) are the corresponding coefficients for xi.

  3. (3)

    \({e}_{ij}^{(a)},i=1,\ldots ,n\) are independently distributed random errors for log true absolute abundances with \(E({e}_{ij}^{(a)})=0,{\mathrm{Var}}({e}_{ij}^{(a)})={\sigma }_{jj}^{(a)}\).

Assumption 3

(Independent random error for log observed counts): assume there are random errors, \({e}_{ij}^{(o)},i=1,\ldots ,n,j=1,\ldots ,d\), for log observed counts oij, which are independently distributed with heteroskedasticity:

$$E(e_{ij}^{(o)}) = 0, \, {\mathrm{Var}}(e_{ij}^{(o)}) = \sigma_{ij}^{(o)}, \, e_{ij}^{(o)} {\perp\!\!\!\perp} e_{ij}^{(a)}.$$

Regression framework

Based on the Assumptions 2 and 3, oij can be modeled as:

$${o}_{ij}={s}_{i}+{c}_{j}+{{{{{\bf{b}}}}}_{{{{{j}}}}}}^{T}{{{{\bf{x}}}}}_{{{{{i}}}}}+{e}_{ij}^{(a)}+{e}_{ij}^{(o)}:= {s}_{i}+{c}_{j}+{{{{{\bf{b}}}}}_{{{{{j}}}}}}^{T}{{{{\bf{x}}}}}_{{{{{i}}}}}+{e}_{ij},$$
(1)

with

$$E({o}_{ij})={s}_{i}+{c}_{j}+{{{{{\bf{b}}}}}_{{{{{j}}}}}}^{T}{{{{\bf{x}}}}}_{{{{{i}}}}},\,{\mathrm{Var}}({o}_{ij})={\mathrm{Var}}({e}_{ij})={\sigma }_{jj}^{(a)}+{\sigma }_{ij}^{(o)}:= {\sigma }_{ij}^{(t)}.$$

where \({\sigma }_{ij}^{(t)}\) denotes the total variance.

Equation (1) can also be written in a vector notation as follows:

$${{{{\bf{o}}}}}_{{{{{j}}}}}={{{\bf{s}}}}+{\bf{c}}_{j}{{{\it{1}}}}+{{{{X}}}}{{{\,{\bf{b}}}}}_{{{{{j}}}}}+{{{{\bf{e}}}}}_{{{{{j}}}}},$$
(2)

with

$$\begin{array}{ll}&E({{{{\bf{e}}}}}_{{{{{j}}}}})={(0,\ldots ,0)}^{T},\\ &E({{{{\bf{o}}}}}_{{{{{j}}}}})={{{\bf{s}}}}+{\bf{c}}_{j}{{{\it{1}}}}+{{{{X}}}}{{{\,{\bf{b}}}}}_{{{{{j}}}}},\\ &{\mathrm{Cov}}({{{{\bf{o}}}}}_{{{{{j}}}}})=\left[\begin{array}{llll}{\sigma }_{1j}^{(t)}&0&\ldots &0\\ 0&{\sigma }_{2j}^{(t)}&\ldots &0\\ \vdots &\vdots &\ddots &\vdots \\ 0&0&\ldots &{\sigma }_{nj}^{(t)}\end{array}\right].\end{array}$$

where

  1. (1)

    1 = (1, 1, …, 1)T,

  2. (2)

    \({{{{\bf{o}}}}}_{{{{{j}}}}}={({o}_{1j},{o}_{2j},\ldots ,{o}_{nj})}^{T}\),

  3. (3)

    \({{{\bf{s}}}}={({s}_{1},{s}_{2},\ldots ,{s}_{n})}^{T}\),

  4. (4)

    \({{{{\bf{b}}}}}_{{{{{j}}}}}={({b}_{j0},{b}_{j1},{b}_{j2},\ldots ,{b}_{jp})}^{T}\),

  5. (5)

    \({{{{\bf{e}}}}}_{{{{{j}}}}}={({e}_{1j},{e}_{2j},\ldots ,{e}_{nj})}^{T}\),

  6. (6)

    \({{{{X}}}}=\left[\begin{array}{ccccc}1&{x}_{11}&{x}_{12}&\ldots &{x}_{1p}\\ 1&{x}_{21}&{x}_{22}&\ldots &{x}_{2p}\\ \vdots &\vdots &\vdots &\ddots &\vdots \\ 1&{x}_{n1}&{x}_{n2}&\ldots &{x}_{np}\end{array}\right]\).

It is important to note that within each sample i, for taxa l ≠ m, oil and oim are not necessarily independent due to correlations between ail and aim. Thus vectors ol and om are not independent random vectors.

Remove the effect of taxon-specific sequencing efficiency

To eliminate the effect of cj, we first center the log observed counts across samples, that is

$$\begin{array}{rc}{y}_{ij}:= {o}_{ij}-{\bar{o}}_{\cdot j}&=({s}_{i}-\bar{s})+{{{{{\bf{b}}}}}_{{{{{j}}}}}}^{T}({{{{\bf{x}}}}}_{{{{{i}}}}}-\bar{{{{\bf{x}}}}})+({e}_{ij}-{\bar{e}}_{\cdot j}),\\ &:= {\theta}_{i}+{{{{\bf{\upbeta }}}}}_{j}^{T}{{{{\bf{x}}}}}_{{{{{i}}}}}+{\epsilon }_{ij},\end{array}$$
(3)

where

  1. (1)

    βjk = bjk for k = 1, …, p, and \({\beta }_{j0}={{{{{\bf{b}}}}}_{{{{{j}}}}}}^{T}\bar{{{{\bf{x}}}}}\),

  2. (2)

    \({\mathrm{Var}}({\epsilon }_{ij})=\frac{{(n-1)}^{2}}{{n}^{2}}{\sigma }_{ij}^{(t)}+\frac{1}{{n}^{2}}{\sum }_{{i}^{{\prime} }\ne i}{\sigma }_{{i}^{{\prime} }j}^{(t)}:= {\sigma }_{ij}\).

Estimation of sample-specific bias

As can be seen from equation (3), βj are not identifiable without determining the nuisance parameter θi. We define bias-corrected log absolute abundance \({y}_{ij}^{({\mathrm{crt}})}={y}_{ij}-{\theta }_{i}\), then the ordinary least squares estimators of θi and βj can be obtained by iteratively solving the following equations. For ease of exposition, the algorithm is described in the vector form, that is \({{{{\bf{y}}}}}_{{{{{j}}}}}={({y}_{1j},{y}_{2j},\ldots ,{y}_{nj})}^{T},{{{\bf{\uptheta }}}}={({{{\theta }}}_{1},{\theta }_{2},\ldots ,{\theta }_{n})}^{T}\) and so on.

Algorithm 1. Iterative maximum likelihood estimation

Initialize:

  For j = 1, …, d

  θ ← 0

  \({{{{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})}\leftarrow {{{{\bf{y}}}}}_{{{{{j}}}}}-{{{\bf{\uptheta }}}}={{{{\bf{y}}}}}_{{{{{j}}}}}\)

  \({{{{\mathbf{\upbeta }}}}}_{j}\leftarrow {({{{{{X}}}}}^{T}{{{{X}}}})}^{-1}{{{{{X}}}}}^{T}{{{{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})}={({{{{{X}}}}}^{T}{{{{X}}}}\;)}^{-1}{{{{{X}}}}}^{T}{{{{\bf{y}}}}}_{{{{{j}}}}}\)

While not converge do

  \({\mathbf {{{\uptheta }}}}\leftarrow \frac{1}{d}\mathop{\sum }\nolimits_{j = 1}^{d}({{{{\bf{y}}}}}_{{{{{j}}}}}-{{{{X}}}}{{{{\mathbf{\upbeta }}}}}_{j})\)

  \({{{{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})}\leftarrow {{{{\bf{y}}}}}_{{{{{j}}}}}-{\bf {{{\uptheta }}}}\)

  \({\bf {{{{\upbeta }}}}}_{j}\leftarrow {({{{{{X}}}}}^{T}{{{{X}}}})}^{-1}{{{{{X}}}}}^{T}{{{{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})}\)

end while

On convergence,

$${{{{\bf {\uptheta }}}}}^{* }=\frac{1}{d}\mathop{\sum }\limits_{j=1}^{d}({{{{\bf{y}}}}}_{{{{{j}}}}}-{{{{X}}}}{{{{\mathbf{\upbeta }}}}}_{j}^{* }),\,{{{{{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})}}^{* }={{{{\bf{y}}}}}_{{{{{j}}}}}-{{{{\bf {\uptheta }}}}}^{* },\,{{{{\mathbf{\upbeta }}}}}_{j}^{* }={({{{{{X}}}}}^{T}{{{{X}}}})}^{-1}{{{{{X}}}}}^{T}{{{{{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})}}^{* }.$$
(4)

Therefore

$$\begin{array}{ll}{{{{\bf {\uptheta }}}}}^{* }&=\frac{1}{d}\mathop{\sum }\limits_{j=1}^{d}({{{{\bf{y}}}}}_{{{{{j}}}}}-{{{{X}}}}{{{{\mathbf{\upbeta }}}}}_{j}^{* })=\frac{1}{d}\mathop{\sum }\limits_{j=1}^{d}({{{{\bf{y}}}}}_{{{{{j}}}}}-{{{{P}}}}{{{{\,{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})}}^{* })\\ &=\frac{1}{d}\mathop{\sum }\limits_{j=1}^{d}({{{{\bf{y}}}}}_{{{{{j}}}}}-{{{{P}}}}{{{\,{\bf{y}}}}}_{{{{{j}}}}}+{{{{P}}}}{{{{\bf {\uptheta }}}}}^{* })=\frac{1}{d}\mathop{\sum }\limits_{j=1}^{d}\left[{{{{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})}+{{{\bf {\uptheta }}}}-{{{{P}}}}({{{{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})}+{{{\bf {\uptheta }}}})+{{{{P}}}}{{{{\bf {\uptheta }}}}}^{* }\right]\\ &=({{{{I}}}}-{{{{P}}}}){{{\bf {\uptheta }}}}+{{{{P}}}}{{{{\bf{\uptheta }}}}}^{* }+\frac{1}{d}\mathop{\sum }\limits_{j=1}^{d}({{{{I}}}}-{{{{P}}}}){{{{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})}\\ &=({{{{I}}}}-{{{{P}}}}){{{\bf {\uptheta }}}}+{{{{P}}}}{{{{\bf {\uptheta }}}}}^{* }+\frac{1}{d}\mathop{\sum }\limits_{j=1}^{d}{{{{\mathbf{\upepsilon }}}}}_{j},\end{array}$$
(5)

where

  1. (1)

    \({{{{P}}}}={{{{X}}}}{({{{{{X}}}}}^{T}{{{{X}}}})}^{-1}{{{{{X}}}}}^{T}\) is the projection matrix onto \({{{\mathscr{C}}}}({{{{X}}}})\), the column space of X,

  2. (2)

    \({{{{\bf {\upepsilon }}}}}_{j}=({{{{I}}}}-{{{{P}}}}){{{{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})}\) with E(εj) = 0.

Rearranging equation (5), we see that

$$({{{{I}}}}-{{{{P}}}}){{{{\bf {\uptheta }}}}}^{* }=({{{{I}}}}-{{{{P}}}}){{{\bf {\uptheta }}}}+\frac{1}{d}\mathop{\sum }\limits_{j=1}^{d}{{{{\bf {\upepsilon }}}}}_{j}.$$

Taking expectations on both sides leads to

$$({{{{I}}}}-{{{{P}}}})[E({{{{\bf {\uptheta }}}}}^{* })-{{{\bf {\uptheta }}}}]={{{\bf {0}}}}.$$

As I − P is an orthogonal projector onto \({{{\mathscr{C}}}}({{{{X}}}})\), the above equation holds as long as either of the following is valid:

  1. (1)

    E(θ*) − θ = 0,

  2. (2)

    \(E({{{{\bf {\uptheta }}}}}^{* })-{{{\bf {\uptheta }}}}\in {{{\mathscr{C}}}}({{{{X}}}})\).

It is sufficient to consider (2) because (1) is the trivial case. If (1) were true then from (4) we deduce that there is no sample-specific effect and that \(E(\;{{{{\bf {\upbeta }}}}}_{j}^{* })={{{{\bf {\upbeta }}}}}_{j}\). Suppose (2) is true, then there exists a vector \({{{\mathbf{\updelta }}}}\ne {{{{0}}}}\in {{\mathbb{R}}}^{p}\), such that

$$E({{{{\bf {\uptheta }}}}}^{* })={{{\bf {\uptheta }}}}-{{{{X}}}}{{{\bf {\updelta }}}}.$$
(6)

Then by combining with equation (4), we have

$$E(\;{{{{\bf {\upbeta }}}}}_{j}^{* })={{{\bf {\updelta }}}}+{{{{\bf {\upbeta }}}}}_{j}.$$
(7)

We shall denote θ* and \({{{{\bf {\upbeta }}}}}_{j}^{* }\) obtained from the above iterative algorithm as preliminary estimators of θ and βj, respectively. Without loss of generality, throughout this Article we assume XTX is a full rank matrix. If it is not a full rank matrix, then we may use any generalized inverse of XTX because \({{{{X}}}}{{{{{{\bf {\upbeta }}}}}}}_{j}^{* }\) in equation (5) is invariant of the choice of generalized inverse \({({{{{{X}}}}}^{T}{{{{X}}}}\;)}^{g}\) used in \({{{{\bf{\upbeta }}}}}_{j}^{* }={({{{{{X}}}}}^{T}{{{{X}}}}\;)}^{g}{{{{{X}}}}}^{T}{{{{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})}\). Thus the preliminary estimator θ* provided above is invariant of the choice of generalized inverse used in deriving \({{{{\bf{\upbeta }}}}}_{j}^{* }\). Furthermore, throughout this Article, we are interested in testing a hypothesis regarding linearly estimable parameters Aβj, that is \({{{\mathscr{C}}}}({{{{{A}}}}}^{T})\subset {{{\mathscr{C}}}}({{{{{X}}}}}^{T})\) (ref. 45). Consequently, the estimator \({{{{A}}}}{{{{\bf{\upbeta }}}}}_{j}^{* }\) is invariant of the generalized inverse used in the estimation of \({{{{\bf{\upbeta }}}}}_{j}^{* }\). Hence, throughout this text, for simplicity of exposition, we shall assume XTX is of full rank.

For each taxon j = 1, …, d, by equation (7), \({{{{\bf{\upbeta }}}}}_{j}^{* }\) is a biased estimator if δ ≠ 0. Suppose we wish to test the following hypothesis

$$\begin{array}{ll}&{H}_{0}:{{{{A}}}}{{{{\bf {\upbeta }}}}}_{j}={{{{A}}}}{{{{\bf{\upbeta }}}}}_{j}^{0},\\ &{H}_{1}:{{{{A}}}}{{{{\bf{\upbeta }}}}}_{j}\ne {{{{A}}}}{{{{\bf{\upbeta }}}}}_{j}^{0}.\end{array}$$

Under the null hypothesis, \(E({{{{A}}}}{{{{\bf{\upbeta }}}}}_{j}^{* })-{{{{A}}}}{{{{\bf{\upbeta }}}}}_{j}^{0}={{{{A}}}}{{{\bf{\updelta }}}}\ne {{{\bf{0}}}}\) and hence biased. The next step is to estimate this bias δ and accordingly modify the estimator \({{{{A}}}}{{{{\bf{\upbeta }}}}}_{j}^{* }\) so that the resulting estimator is asymptotically centered at \({{{{A}}}}{{{{\bf{\upbeta }}}}}_{j}^{0}\) under the null hypothesis and hence the test statistic is asymptotically centered at zero.

First we make the following observations. As \(E(\;{{{{\bf{\upbeta }}}}}_{j}^{* })={{{\bf{\updelta }}}}+{{{{\bf{\upbeta }}}}}_{j}\), we note that as n → , for finite dimension d,

$${{{{{{\varSigma}}}}}_{j}}^{-\frac{1}{2}}(\;{{{{\bf{\upbeta }}}}}_{j}^{* }-({{{\bf{\updelta }}}}+{{{{\bf{\upbeta }}}}}_{j})){\to }_{d}{N}_{P}({{{\bf{0}}}},{{{{I}}}}\;),$$
(8)

where

$${{{{{\Sigma }}}}}_{j}=\mathop{\lim }\limits_{n\to \infty }{({{{{{X}}}}}^{T}{{{{X}}}})}^{-1}(\mathop{\sum }\limits_{i=1}^{n}{\sigma }_{ij}^{2}{{{{{x}}}}}_{{{{{i}}}}}{{{{{{x}}}}}_{{{{{i}}}}}}^{T}){({{{{{X}}}}}^{T}{{{{X}}}}\;)}^{-1}.$$
(9)

As

$$E({{{{\bf{\uptheta }}}}}^{* }+{{{{X}}}}{{{{\bf{\upbeta }}}}}_{j}^{* })={{{\bf{\uptheta }}}}-{{{{X}}}}{{{\bf{\updelta }}}}+{{{{X}}}}({{{\bf{\updelta }}}}+{{{{\bf{\upbeta }}}}}_{j})={{{\bf{\uptheta }}}}+{{{{X}}}}{{{{\bf{\upbeta }}}}}_{j},$$

that is \({{{{\bf{\uptheta }}}}}^{* }+{{{{X}}}}{{{{\bf{\upbeta }}}}}_{j}^{* }\) is an unbiased estimator of θ + Xβj, hence a possible estimator of Σj is given by

$${\hat{\Sigma}}_{j}={({X}^{T}{X}\,)}^{-1}\left(\mathop{\sum}\limits_{i=1}^{n}{(\;{y}_{ij}-{\theta}_{i}^{*}-{{\bf{\upbeta}}_{j}^{*}}^{T}{x}_{i})}^{2}{x}_{i}{{x}_{i}}^{T}\right){({X}^{T}{X})}^{-1}.$$
(10)

Under some mild regularity conditions46, with finite d, we have the following consistency result

$$n({\hat{\varSigma}}_{j}-{{\varSigma}}_{j}){\to}_{P}{0},\,{\rm{as}}\,n\to \infty.$$
(11)

Therefore, replacing Σj with \({\hat{\varSigma}}_{j}\) in equation (8) and appealing to Slutsky’s theorem, we have

$${{\hat{\varSigma}}_{j}}^{-\frac{1}{2}}(\;{{{{\bf{\upbeta }}}}}_{j}^{* }-({{{\bf{\updelta }}}}+{{{{\bf{\upbeta }}}}}_{j})){\to }_{d}{N}_{P}({{{\bf{0}}}},{{{{I}}}}\;),\,{{{\rm{as}}}}\,n\to \infty.$$

By equations (9) and (11), under some mild regularity conditions, for finite d, we obtain

$${\hat{\varSigma}}_{j}{\to }_{p}{{{{0}}}},\,{{{\rm{as}}}}\,n\to \infty.$$

Consequently,

$${{{{\bf{\upbeta }}}}}_{j}^{* }{\to }_{P}{{{\bf{\updelta }}}}+{{{{\bf{\upbeta }}}}}_{j},{{{\rm{as}}}}\,n\,\to \infty.$$
(12)

The above observation regarding the convergence of \({{{{\bf{\upbeta }}}}}_{j}^{* }\) plays a critical role in the following. Since the sampling fraction is constant for all taxa within a sample, we pool information across taxa within each sample when estimating δ. We model each taxon abundance using the following Gaussian mixture model. For the jth taxon and the kth covariate, let C0 denote the set of taxa that are not differentially abundant with respect to xik, that is, C0 = {j (1, 2, …, d): βjk = 0}; let C1 denote the set of taxa whose abundance decreases with xik, that is, C1 = {j (1, 2, …, d): βjk < 0}, and let C2 denote the set of taxa whose abundance increases with xik, that is, C2 = {j (1, 2, …, d): βjk > 0}. Let πr denote the probability that a taxon belongs to set Cr, r = 0, 1, 2. For simplicity of estimation of parameters, similar to generalized estimating equations, we shall assume that \({\beta }_{jk}^{* },j=1,2,\ldots ,d\), are independently distributed. As commonly done in the analyses of various omics data, we ignore the underlying correlation structure when estimating δ. Thus, we model the distribution of \({\beta }_{jk}^{* }\) by Gaussian mixture model as follows:

$$f({\beta }_{jk}^{* })={\pi }_{0}\phi \left(\frac{{\beta }_{jk}^{* }-{\delta }_{k}}{{\nu }_{j0}}\right)+{\pi }_{1}\phi \left(\frac{{\beta }_{jk}^{* }-({\delta }_{k}+{l}_{1})}{{\nu }_{j1}}\right)+{\pi }_{2}\phi \left(\frac{{\beta }_{jk}^{* }-({\delta }_{k}+{l}_{2})}{{\nu }_{j2}}\right),$$
(13)

where

  1. (1)

    ϕ is the standard normal density function,

  2. (2)

    δk, δk + l1 and δk + l2 are means for \({\beta }_{jk}^{* }| {C}_{0},{\beta }_{jk}^{* }| {C}_{1}\) and \({\beta }_{jk}^{* }| {C}_{2}\), respectively. l1 < 0, l2 > 0,

  3. (3)

    νj0, νj1 and νj2 are variances of \({\beta }_{jk}^{* }| {C}_{0},{\beta }_{jk}^{* }| {C}_{1}\) and \({\beta }_{jk}^{* }| {C}_{2}\), respectively.

Note that instead of fitting a multivariate Gaussian mixture model for all covariates together, we choose to fit a univariate Gaussian mixture model repeatedly for every single covariate. This repetition is simply because the sets of taxa {C0, C1, C2} are not necessarily the same for different covariates. Also, note that for a categorical covariate of s + 1 levels, this contains s coefficients, for example βj1, …, βjs, and we shall fit the Gaussian mixture model for these s coefficients separately.

For computational simplicity, we assume that νj1 > νj0, νj2 > νj0. Thus, without loss of generality for κ1, κ2 > 0, let νj1 = νj0 + κ1 and νj2 = νj0 + κ2. While this assumption is not a requirement for our method, it is reasonable to assume that variability among differentially abundant taxa is larger than that among the null taxa. By making this assumption, we simplify the computation.

Assuming samples are independent, we begin by first estimating \({\nu }_{j0}^{2}={\mathrm{Var}}(\;{\beta }_{jk}^{* })\). Note that \({\nu }_{j0}^{2}\) is the function of heteroscedastic variances, a consistent estimator of \({\nu }_{j0}^{2}\), which we refer to as \({\hat{\nu }}_{j0}^{2}\), is the kth diagonal element of \({\hat{\varSigma}}_{j}\) stated in equation (10). In all future calculations, we plug in \({\hat{\nu }}_{j0}^{2}\) for \({\nu }_{j0}^{2}\). This is similar in spirit to many statistical procedures involving nuisance parameters. The following lemma47 is useful in the sequel.

Lemma 1

Introducing the latent variable in calculating log-likelihood:

$$\log f(x| \theta )={E}_{f(z| x,\theta )}[\log f(z| \theta )+\log f(x| z,\theta )].$$

Let \({{{\it{\Theta }}}}={({\delta }_{k},{\pi }_{1},{\pi }_{2},{\pi }_{3},{l}_{1},{l}_{2},{\kappa }_{1},{\kappa }_{2})}^{T}\) denote the set of unknown parameters, then for each taxon the log-likelihood can be reformulated using Lemma 1, as follows:

$${{{{\Theta }}}}\leftarrow \arg \mathop{\max }\limits_{{{{{\Theta }}}}}\mathop{\sum }\limits_{j=1}^{d}\mathop{\sum }\limits_{r=0}^{2}{P}_{r,\,j}[\log {\mathrm{Pr}}(\;j\in {C}_{r})+\log f(\;{\beta }_{jk}|\; j\in {C}_{r})].$$
(14)

Then the EM algorithm is described as follows:

  • E step: compute conditional probabilities of latent variables. Define \({P}_{r,\,j}={\mathrm{Pr}}(\;j\in {C}_{r}| {\beta }_{jk},{{{\it{\Theta }}}})=\frac{{\pi }_{r}\phi (\frac{{\beta }_{jk}-({\delta }_{k}+{l}_{r})}{{\nu }_{jr}})}{{\sum }_{r}{\pi }_{r}\phi (\frac{{\beta }_{jk}-({\delta }_{k}+{l}_{r})}{{\nu }_{jr}})},r=0,1,2;j=1,\ldots ,d\), which are conditional probabilities representing the probability that an observed value follows each distribution. Note that l0 = 0.

  • M step: maximize the likelihood function with respect to the parameters, given the conditional probabilities.

We shall denote the resulting estimator of δk on convergence of the algorithm by \({\hat{\delta }}_{k}^{\mathrm{{EM}}}\).

As stated in Lin and Peddada3, compared to \({\hat{\nu }}_{j0}^{2}\), the variance and covariance contributed by \({\hat{\delta }}_{k}^{{\mathrm{EM}}}\) is negligible when the number of nondifferentially abundant taxa is large, such as when analyzing the microbiome data at the OTU, amplicon sequence variant (ASV) or species level of the phylogenetic tree.

The above procedure is applied to every βjk, k = 1, …, p, eventually, we obtain the estimator of δ as

$${\hat{\bf{\updelta }}}^{{\mathrm{EM}}}={({\hat{\delta }}_{1}^{{\mathrm{EM}}},{\hat{\delta }}_{2}^{{\mathrm{EM}}},\ldots ,{\hat{\delta }}_{P}^{{\mathrm{EM}}})}^{T}.$$
(15)

Therefore, the final estimator of βj is defined as

$${\hat{\bf {\upbeta }}}_{j}={{{{{{\bf{\upbeta }}}}}}}_{j}^{* }-{\hat{\bf{\updelta }}}^{{\mathrm{EM}}},$$
(16)

with

$${\hat{\bf{\upbeta }}}_{j}{\to }_{P}{{{{\bf{\upbeta }}}}}_{j},\,{{{\rm{as}}}}\,n\to \infty ,$$
(17)

given that \({\hat{\bf{\updelta} }}^{{\mathrm{EM}}}\) is a good approximation of δ.

The estimation procedure is summarized in Algorithm 2.

Algorithm 2. EM algorithm

(1) input:

  \({{{{\bf{\upbeta }}}}}_{j}^{* },{{{{{\varSigma}}}}}_{j},j=1,\ldots ,d\)

(2) procedure EM \(({{{{\bf{\upbeta }}}}}_{j}^{* },{{{{{\varSigma}}}}}_{j})\)

(3)  return \({\hat{\bf{\updelta} }}_{k}^{{\mathrm{EM}}},k=1,\ldots ,P\)

(4) end procedure

(5) for k = 1, …, p do

(6)  \({\hat{\beta }}_{jk}\leftarrow {\beta }_{jk}^{* }-{\hat{\delta }}_{k}^{{\mathrm{EM}}}\)

(7) end for

For taxon j, we now describe our methodology for testing the following hypotheses

$$\begin{array}{rc}&{H}_{0}:{{{{A}}}}{{{{\bf{\upbeta }}}}}_{j}={{{{A}}}}{{{{\bf{\upbeta }}}}}_{j}^{0},\\ &{H}_{1}:{{{{A}}}}{{{{\bf{\upbeta }}}}}_{j}\ne {{{{A}}}}{{{{\bf{\upbeta }}}}}_{j}^{0}.\end{array}$$

From Slutsky’s theorem, as n → , the following test statistic is approximately central chi-square distributed under the null hypothesis

$$\begin{array}{l}{W}_{j}={({{{{A}}}}{\hat{\bf{\upbeta} }}_{j}-{{{{A}}}}{{{{\bf{\upbeta }}}}}_{j}^{0})}^{T}{({{{{A}}}}{\hat{\varSigma}}_{j}{{{{{A}}}}}^{T})}^{-1}({{{{A}}}}{\hat{\bf{\upbeta} }}_{j}-{{{{A}}}}{{{{\bf{\upbeta }}}}}_{j}^{0})\\\quad={({{{{A}}}}{{{{\bf{\upbeta }}}}}_{j}^{* }-{{{{A}}}}{\hat{\bf{\updelta} }}^{\mathrm{EM}}-{{{{A}}}}{{{{\bf{\upbeta }}}}}_{j}^{0})}^{T}{({{{{A}}}}{\hat{\varSigma}}_{j}{{{{{A}}}}}^{T})}^{-1}({{{{A}}}}{{{{\bf{\upbeta }}}}}_{j}^{* }-{{{{A}}}}{\hat{\bf{\updelta} }}^{\mathrm{EM}}-{{{{A}}}}{{{{\bf{\upbeta }}}}}_{j}^{0})\\\quad{\to }_{d}{\chi }_{q}^{2},\end{array}$$

where q = rank(A).

To control the FDR due to multiple testing, we recommend applying Holm–Bonferroni method20 instead of Benjamini–Hochberg procedure6 because the Holm–Bonferroni method does not require any assumptions regarding the dependence structure in the underlying P values, and is also known to be a better method to control FDR when P values are not accurate21.

Sample-specific biases estimation

After obtaining \({\hat{\bf{\updelta} }}^{{\mathrm{EM}}}\), the estimator of sample-specific biases θ is defined as follows:

$${\hat{\bf{\uptheta} }}=\frac{1}{d}\mathop{\sum }\limits_{j=1}^{d}({{{{\bf{y}}}}}_{{{{{j}}}}}-{{{{X}}}}{\hat{\bf{\upbeta} }}_{j}).$$
(18)

Let \({{{{{\varSigma}}}}}^{(i)}={[{\sigma }_{lm}^{(i)}]}_{l,m = 1,\ldots ,d}\) denote the d × d covariance matrix of \({{{{\bf{\upepsilon }}}^{(i)}}}={({\epsilon }_{1}^{(i)},{\epsilon }_{2}^{(i)},\ldots ,{\epsilon }_{d}^{(i)})}{\,}^{T}\), where \({\sigma }_{lm}^{(i)}\) is the (l, m)th element of Σ(i) and \({\sigma }_{jj}^{(i)}\) is the jth diagonal element of Σ(i). Furthermore, suppose

From Assumption 4, we have

$$0\le {{{{\bf{1}}}}}^{T}{{{{{\varSigma}}}}}^{(i)}{{{\bf{1}}}}=\mathop{\sum }\limits_{l=1}^{d}\mathop{\sum }\limits_{m=1}^{d}{\sigma }_{lm}^{(i)}=\mathop{\sum }\limits_{j=1}^{d}{\sigma }_{jj}^{(i)}+\mathop{\sum }\limits_{l\ne m}^{d}{\sigma }_{lm}^{(i)}\le dK+\mathop{\sum }\limits_{l\ne m}^{d}{\sigma }_{lm}^{(i)}.$$

Hence

$$0\le \frac{{{{{\bf{1}}}}}^{T}{{{{{\varSigma}}}}}^{(i)}{{{\bf{1}}}}}{{d}^{2}}\le \frac{K}{d}+\frac{\mathop{\sum }\nolimits_{l\ne m}^{d}{\sigma }_{lm}^{(i)}}{{d}^{2}}=o(1).$$

Thus, for each taxon j = 1, 2, …, d, we have

$$\frac{1}{d}\mathop{\sum }\limits_{j=1}^{d}({{{{\bf{y}}}}}_{{{{{j}}}}}-({{{\bf{\uptheta }}}}+{{{{X}}}}{{{{\bf{\upbeta }}}}}_{j})){\to }_{P}\bf{0},\,{{{\rm{as}}}}\,d\to \infty.$$
(19)

Therefore, according to equations (17) and (19), as both n, d → ,

$${\hat{\bf{\uptheta} }}\to \bf{\uptheta}.$$
(20)

Assumption 4

Sparse correlations among taxa:

$$\begin{array}{ll}&{\sigma }_{jj}^{(i)} < K < \infty ,\\ &\frac{\mathop{\sum }\nolimits_{l\ne m}^{d}{\sigma }_{lm}^{(i)}}{{d}^{2}}=o(1).\end{array}$$

Remark 1

Regularization of variance: to avoid the spurious detection of significance due to extremely small standard errors, particularly for rare taxa, we incorporated a small positive constant in the denominator of the ANCOM-BC2 test statistic for each taxon. This approach was inspired by the significance analysis of microarray methodology15. Specifically, the regularization factor was set as the fifth percentile of the distribution of standard errors for each fixed effect, unless otherwise specified.

Remark 2

Sensitivity analysis for the pseudo-count addition: to mitigate the risk of inflated false-positive rates resulting from the choice of pseudo-count in ANCOM-BC2, we conducted a sensitivity analysis to assess the impact of varying pseudo-count values on DA results. This is particularly important, as several studies have shown that the choice of pseudo-count can significantly influence the results of DA analysis methods16,17. For details regarding the sensitivity analysis and the definitions of the two version of ANCOM-BC2, refer to the section ‘Strategies implemented in ANCOM-BC2 to handle zeros’ below.

Multigroup comparison

In some applications, for a given taxon, researchers are interested in drawing inferences regarding DA among different pairs of experimental groups. We refer to this kind of problem as a multigroup comparison problem, and extra caution needs to be exercised to correct P values due to multiple comparisons. For simplicity, we drop the subscript j (taxon index) in the following discussions.

Global test

For a given taxon and a total of g + 1 experimental groups (including the reference group), researchers may want to test whether there exists at least one group that is significantly different from others. For ease of exposition, we split the covariates X into two parts, where X1 stands for the group assignment and X2 denotes the remaining covariates. Note that the difference of group effects against the reference group is estimable, while the individual group effect is not. For simplicity, in the discussions of multigroup comparisons among group 0 to group g, we assume group 0 is the reference group. We use βk, k = 1, …, g to denote the group effect, but notice that it actually estimates βk − β0. We rewrite the model stated in equation (3) as

$${{{\bf{y}}}}={{{\bf{\uptheta }}}}+{{{{{X}}}}}_{{{{{1}}}}}{{{\mathbf{\upbeta }}}}+{{{{{X}}}}}_{{{{{2}}}}}{{{\mathbf{\upgamma }}}}+{{{\bf{\upepsilon }}}},$$
(21)

where

  1. (1)

    θ is the sample-specific bias,

  2. (2)

    β is the vector of group effects (as compared to group 0) of the order g × 1,

  3. (3)

    X1 is the design matrix of the order n × g consisting of 0s and 1s,

  4. (4)

    X2 is the known matrix of other covariates (including the intercept) of the order n × (p − g + 1) with the corresponding regression parameter vector γ of the order (p − g + 1) × 1.

The global test intends to test

$$\begin{array}{ll}&{H}_{0}:{\cap }_{k\in \{1,\ldots ,g\}}{\beta }_{k}=0,\\ &{H}_{1}:{\cup }_{k\in \{1,\ldots ,g\}}{\beta }_{k}\ne 0,\end{array}$$

which can be reformulated as

$$\begin{array}{ll}&{H}_{0}:{{{{A}}}}{{{\bf{\upbeta }}}}={{{\bf{0}}}},\\ &{H}_{1}:{{{{A}}}}{{{\bf{\upbeta }}}}\ne {{{\bf{0}}}},\end{array}$$

where

$${{{{A}}}}={{{{{I}}}}}_{g}=\left[\begin{array}{ccccc}1&0&0&\ldots &0\\ 0&1&0&\ldots &0\\ \vdots &\vdots &\ddots &\vdots &\vdots \\ 0&0&\ldots &0&1\end{array}\right]$$

with the test statistic

$$W={({{{{A}}}}{\hat{\bf{\upbeta} }})}{\,}^{T}{({{{{A}}}}{\hat{\varSigma}}^{(g)}{{{{{A}}}}}^{T})}{\,}^{-1}({{{{A}}}}{\hat{\bf{\upbeta} }}){\to }_{d}{\chi }_{g}^{2},\,{{{\rm{as}}}}\,n\to \,\infty ,$$

where \({\hat{\varSigma}}^{(g)}\) is the corresponding submatrix of \({\hat{\varSigma}}\) defined in equation (10).

Similarly, to control the FDR due to multiple testing, we recommend applying Holm–Bonferroni method20 instead of the Benjamini–Hochberg procedure6 due to the underlying complex dependence structure between taxa.

Example 1

Suppose there are three groups, namely, groups 0 (reference), 1 and 2, and no other covariates. For each sample i, i = 1, …, n, we have:

$${y}_{i}={\theta }_{i}+\mu +{\beta }_{1}I\{{\mathrm{group}}=1\}+{\beta }_{2}I\{{\mathrm{group}}=2\}+{\epsilon }_{i}.$$

To test whether there is at least one group among 0, 1 and 2, that is significantly different from others, we test:

$$\begin{array}{rc}&{H}_{0}:{\beta }_{1}={\beta }_{2}=0,\\ &{H}_{1}:{\beta }_{1}\ne 0\cup {\beta }_{2}\ne 0,\end{array}$$

which is the same as testing:

$$\begin{array}{rc}&{H}_{0}:{{{{A}}}}{{{\bf{\upbeta }}}}=0,\\ &{H}_{1}:{{{{A}}}}{{{\bf{\upbeta }}}}\ne {{{{0}}}},\end{array}$$

where \({{{{A}}}}=\left[\begin{array}{cc}1&0\\ 0&1\end{array}\right]\), and \({{{\bf{\upbeta }}}}={({\beta }_{1},{\beta }_{2})}^{T}\).

Multiple pairwise comparisons

If we are interested in knowing whether the abundance increased or decreased between various pairs of groups, then it amounts to testing the following hypotheses:

$$\begin{array}{rc}&{H}_{0,k,{k}^{{\prime} }}:{\beta }_{k}={\beta }_{{k}^{{\prime} }}\\ &{H}_{1,k,{k}^{{\prime} }}:\{{\beta }_{k} < {\beta }_{{k}^{{\prime} }}\}\cup \{{\beta }_{k} > {\beta }_{{k}^{{\prime} }}\},\end{array}$$

where \(k\ne {k}^{{\prime} }\in \{1,\,\ldots ,\,g\}\). Denote the test statistic for a given pairwise comparison as

$${W}_{k{k}^{{\prime} }}=\frac{{\hat{\beta }}_{k}-{\hat{\beta }}_{{k}^{{\prime} }}}{\sqrt{\widehat{{\mathrm{Var}}}(\;{\hat{\beta }}_{k})+\widehat{{\mathrm{Var}}}(\;{\hat{\beta }}_{{k}^{{\prime} }})}}{\to }_{d}N(0,1),\,{{{\rm{as}}}}\,n\to \infty ,$$

where \(\widehat{{\mathrm{Var}}}(\;{\hat{\beta }}_{k})\), \(\widehat{{\mathrm{Var}}}(\;{\hat{\beta }}_{{k}^{{\prime} }})\) are the kth and \({{k}^{{\prime} }}{{\mathrm{th}}}\) diagonal elements of \({\hat{\varSigma}}^{(g)}\), respectively. Thus, the raw P value for comparing group k and group \({k}^{{\prime} }\) is defined as:

$${P}_{k{k}^{{\prime} }}=2[1-\phi (| {W}_{k{k}^{{\prime} }}| )].$$

For comparing with the reference group (group 0), the hypotheses become:

$$\begin{array}{rc}&{H}_{0,k}:{\beta }_{k}=0\\ &{H}_{1,k}:\{{\beta }_{k} < 0\}\cup \{{\beta }_{k} > 0\}.\end{array}$$

We also replace \({\hat{\beta }}_{{k}^{{\prime} }}\) and \(\widehat{{\mathrm{Var}}}(\;{\hat{\beta }}_{{k}^{{\prime} }})\) with 0s in the test statistic.

Note that the null and alternative hypotheses for the global test are denoted as H0 and H1, a Type I error might occur due to wrongly rejecting H0 or correctly rejecting H0 but wrongly rejecting \({H}_{0,k,{k}^{{\prime} }}\). A directional error might occur due to correctly rejecting H0 but wrong assignment of the direction between βk and \({\beta }_{{k}^{{\prime} }}\) while correctly rejecting \({H}_{0,k,{k}^{{\prime} }}\). In this case, we need to control the error rate combining both type I and the directional errors in the FDR framework, which is referred to as mixed directional FDR (mdFDR)8,9.

Definition 1

mdFDR: let V(j) denote the indicator function of at least one type I error or directional error committed, that is

$$V(\;j)=\left\{\begin{array}{ll}1\quad &{{{\rm{if}}}}\,{{{\rm{Type}}}}\,{{{\rm{I}}}}\,{{{\rm{or}}}}\,{{{\rm{directional}}}}\,{{{\rm{error}}}}\,{{{\rm{occurs}}}},\\ 0\quad &{{{\rm{otherwise.}}}}\end{array}\right.$$

Then, mdFDR is defined as the expected proportion of Type I and directional errors among all discovered taxa.

$${\mathrm{mdFDR}}=E\left(\frac{\mathop{\sum }\nolimits_{j = 1}^{d}V(\;j)}{\max (R,1)}\right),$$

where R denotes the number of taxa discovered.

To control the mdFDR for all pairwise tests, we adopt the general mdFDR controlling procedure9, and do the following:

  1. (1)

    Apply the global test method stated above to obtain the P value for each taxon. We denote these P values as screening P values. Apply the Benjamini–Hochberg procedure to identify taxa that are differentially abundant in at least one pairwise comparison. Let R denote the number of taxa discovered.

  2. (2)

    For each taxon discovered in step (1), apply any mixed directional family wise error controlling procedure, such as Holm–Bonferroni (default), Hochberg and so on, to the pairwise P values (\({P}_{k{k}^{{\prime} }}\)) at level Rα/d.

  3. (3)

    For a given taxon discovered in step 1, if a pairwise hypothesis is rejected in step (2), then we declare \({\beta }_{k} < {\beta }_{{k}^{{\prime} }}\) or \({\beta }_{k} > {\beta }_{{k}^{{\prime} }}\) according to \({W}_{k{k}^{{\prime} }} < 0\) or more than 0.

It has been proved that under the assumption of independence of P values obtained from the global test, the mdFDR of the above procedure is strongly controlled at level α (ref. 9).

Example 2

Suppose there are three groups, namely, groups 0 (reference), 1 and 2, and no other covariates. For each sample i, i = 1, …, n, we have:

$${y}_{i}={\theta }_{i}+\mu +{\beta }_{1}I\{{\mathrm{group}}=1\}+{\beta }_{2}I\{{\mathrm{group}}=2\}+{\epsilon }_{i}.$$

To test whether the taxon is differentially abundant between group 1 and 0 (reference), we test:

$$\begin{array}{ll}&{H}_{0}:{\beta }_{1}=0,\\ &{H}_{1}:\{{\beta }_{1} < 0\}\cup \{{\beta }_{1} > 0\},\end{array}$$

with the test statistic:

$${W}_{10}=\frac{{\hat{\beta }}_{1}}{\sqrt{\widehat{{\mathrm{Var}}}(\;{\hat{\beta }}_{1})}}.$$

Additionally, if we want to test whether the taxon is differentially abundant between group 1 and 2:

$$\begin{array}{rc}&{H}_{0}:{\beta }_{1}={\beta }_{2},\\ &{H}_{1}:\{{\beta }_{1} < {\beta }_{2}\}\cup \{{\beta }_{1} > {\beta }_{2}\}.\end{array}$$

The test statistic is:

$${W}_{12}=\frac{{\hat{\beta }}_{1}-{\hat{\beta }}_{2}}{\sqrt{\widehat{{\mathrm{Var}}}(\;{\hat{\beta }}_{1})+\widehat{{\mathrm{Var}}}(\;{\hat{\beta }}_{2})}}.$$

Test against a specific group

Often, researchers are interested in knowing whether the abundance increased or decreased in an ecosystem relative a prespecified group, say the control group. Again, assume group 0 is the reference group and β0 = 0, then one may be interested in testing the following hypotheses:

$$\begin{array}{ll}&{H}_{0,k}:{\beta }_{k}=0,\\ &{H}_{1,k}:\{{\beta }_{k} < 0\}\cup \{{\beta }_{k} > 0\},\end{array}$$

where k {1, …, g}.

As before, the pairwise test statistic is defined as follows:

$${W}_{k}=\frac{{\hat{\beta }}_{k}}{\sqrt{\widehat{{\mathrm{Var}}}(\;{\hat{\beta }}_{k})}}{\to }_{d}N(0,1),\,{{{\rm{as}}}}\,n\to \infty ,$$

where \(\widehat{{\mathrm{Var}}}(\;{\hat{\beta }}_{k})\) is the kth diagonal elements of \({\hat{\varSigma}}^{(g)}\). Thus, the raw P value for comparing group k and group 1 is defined as

$${P}_{k}=2[1-\phi (| {W}_{k}| )].$$

Likewise, we apply the mdFDR controlling procedure for all pairwise tests. To improve power, we modify the global test mentioned earlier to a Dunnet-based test48,49,50 as described below:

  1. (1)

    The test statistic \(W=\mathop{\max }\limits_{k\in \{1,\,\ldots ,\,g\}}| {W}_{k}|\),

  2. (2)

    Generate \({W}_{k}^{\;(b)} \approx N(0,1),k=1,\ldots ,g\).

  3. (3)

    Compute \({W}^{\;(b)}=\mathop{\max }\limits_{k\in \{1,\,\ldots ,\,g\}}| {W}_{k}^{\;(b)}|\).

  4. (4)

    Repeat the above steps B times, we get the null distribution of W.

The screening P value is calculated as:

$$P=\frac{1}{B}\mathop{\sum }\limits_{b=1}^{B}I({W}^{\;(b)} > W\;).$$

Pattern analysis

When the experimental groups are ordered naturally, such as doses of exposure or duration of exposure or stages of a disease and so on, for a given taxon, researchers may be interested in testing whether the abundance of the taxon is changing with the ordered experimental groups according to some specific pattern. Thus, the null and alternative hypotheses one wants to test become (assume group 0 is the reference group):

$$\begin{array}{rc}&{H}_{0}:{\beta }_{1}={\beta }_{2}=\ldots ={\beta }_{g}=0,\\ &{H}_{1}:{{{\mathbf{\upbeta }}}}={(\;{\beta }_{1},\ldots ,{\beta }_{g})}^{T}\in {\mathbb{C}},\end{array}$$

where \({\mathbb{C}}\) is one or a collection of patterns. Examples of patterns are given below.

Example 3

Simple order

$${{\mathbb{C}}}_{1}=\{0\le {\beta }_{1}\le {\beta }_{2}\le \ldots \le {\beta }_{g}\}\,{{{\rm{with}}}}\,{{{\rm{at}}}}\,{{{\rm{least}}}}\,{{{\rm{one}}}}\,{{{\rm{strict}}}}\,{{{\rm{inequality.}}}}$$
(22)

Example 4

Tree order

$${{\mathbb{C}}}_{2}=\{{\beta }_{k}\ge 0,k=1,\ldots ,g\}\,{{{\rm{with}}}}\,{{{\rm{at}}}}\,{{{\rm{least}}}}\,{{{\rm{one}}}}\,{{{\rm{strict}}}}\,{{{\rm{inequality.}}}}$$
(23)

Example 5

Umbrella order

$$\begin{array}{l}{{\mathbb{C}}}_{4}=\{0\le {\beta }_{1}\le \ldots \le {\beta }_{k-1}\le {\beta }_{k}\ge {\beta }_{k+1}\ldots \ge {\beta }_{g}\}\,\\{{{\rm{with}}}}\,{{{\rm{at}}}}\,{{{\rm{least}}}}\,{{{\rm{one}}}}\,{{{\rm{strict}}}}\,{{{\rm{inequality.}}}}\end{array}$$
(24)

Estimation of β under a certain pattern (constraint) can be obtained by solving the following convex optimization (opt) problem51:

$${\hat{\bf{\upbeta}}}^{\mathrm{opt}}={\mathrm{arg}}\, {\mathop{\mathrm{min}}\limits_{\bf{\upbeta}\in {\mathbb{C}}}}{\left({\hat{\bf{\upbeta}}}-{\bf{\upbeta}}\right)}^{T}{\hat{\varSigma}}^{{(g)}^{-1}}(\;{\hat{\bf{\upbeta}}}-{\bf{\upbeta}}),$$
(25)

where \({\hat{\varSigma}}^{(g)}\) is the corresponding submatrix of \({\hat{\varSigma}}\) defined in equation (10). The solution to equation (25) can be numerically obtained by using a suitable convex optimization algorithm, such as CVXR (ref. 52).

Example 6

Suppose there are three groups, namely, groups 0 (reference), 1 and 2, and no other covariates. For each sample i, i = 1, …, n, we have:

$${y}_{i}={\theta }_{i}+\mu +{\beta }_{1}I\{{\mathrm{group}}=1\}+{\beta }_{2}I\{{\mathrm{group}}=2\}+{\epsilon }_{i}.$$

To test whether the group effect is monotonically increasing, we test:

$$\begin{array}{ll}&{H}_{0}:{\beta }_{1}={\beta }_{2}=0,\\ &{H}_{1}:{{{\mathbf{\beta }}}}\in {\mathbb{C}}=\{0\le {\beta }_{1}\le {\beta }_{2}\},\,{{{\rm{with}}}}\,{{{\rm{at}}}}\,{{{\rm{least}}}}\,{{{\rm{one}}}}\,{{{\rm{strict}}}}\,{{{\rm{inequality.}}}}\end{array}$$

The estimation of β under \({\mathbb{C}}\) can be obtained by solving:

$$\begin{array}{ll}&{\hat{\bf{\upbeta} }}^{{\mathrm{opt}}}=\arg \mathop{\min }\limits_{{{{{\bf{\upbeta} }}}}\in {{\mathbb{R}}}^{2}}{\left({\hat{\bf{\upbeta} }}-{{{{\bf{\upbeta} }}}}\right)}^{T}{\hat{\varSigma}}^{{(g)}^{-1}}(\;{\hat{\bf{\upbeta} }}-{{{{\bf{\upbeta} }}}}),\\ &{{{\rm{s.t.}}}}{{{{A}}}}{{{{\bf{\upbeta} }}}}\ge \bf{0},\end{array}$$

where \({{{{A}}}}=\left[\begin{array}{ll}1&0\\ -1&1\end{array}\right]\), and \({{{{\bf{\upbeta} }}}}={(\;{\beta }_{1},{\beta }_{2})}^{T}\).

Once the constrained estimator is obtained, there exist a variety of options to test the above hypotheses. For example, one may consider William’s type of statistic53. We adopt the following definitions from Peddada et al.7 to facilitate the construction of the test statistic.

Definition 2

Linked parameters: two parameters in a given pattern are said to be linked if the inequality between them is specified a priori.

Definition 3

Nodal parameter: for a given pattern, a parameter is said to be nodal if it is linked with every other parameter in the profile.

For example, every parameter is a nodal parameter in \({{\mathbb{C}}}_{1}\); no nodal parameter in \({{\mathbb{C}}}_{2}\) and βk is the only nodal parameter in \({{\mathbb{C}}}_{3}\).

Definition 4

Norm of maximum difference: define the norm \({l}_{\infty }({\mathbb{C}})\) of pattern \({\mathbb{C}}\) as the maximum difference between the estimates of two linked parameters.

For example, \({l}_{\infty}({\mathbb{C}}_{3})={\mathrm{max}} \{ {\hat{\beta}}_{k},{\hat{\beta}}_{k}-{\hat{\beta}}_{g}\}\).

Given a collection of potential patterns, \({{\mathbb{C}}}_{1},{{\mathbb{C}}}_{2},\ldots ,{{\mathbb{C}}}_{T}\), the William’s type of test statistic is defined as:

$$\begin{array}{ll}&W=\max \{{l}_{\infty }({{\mathbb{C}}}_{t}),t=1,\ldots ,T\;\},\\ &{{{\rm{with}}}}\,{t}^{\;{\mathrm{opt}}}=\arg \max \{{l}_{\infty }({{\mathbb{C}}}_{t}),t=1,\ldots ,T\;\},\end{array}$$

where topt is regarded as the optimal pattern for the microbial abundance of a specific taxon.

Under null hypothesis, the expectations for \({\hat{\beta }}_{k},k=1,\ldots ,g\) are 0s; thus, we can construct the null distribution of W as follows:

  1. (1)

    Generate \({\hat{\beta }}_{k}^{(b)} \approx \sqrt{\widehat{{\mathrm{Var}}}(\;{\hat{\beta }}_{k})}N(0,1),k=1,\ldots ,g\).

  2. (2)

    Obtain constrained regression estimators for \({\hat{\beta }}_{k}^{{\mathrm{opt}},(b)}\) using the convex optimization problem described above.

  3. (3)

    Compute \({W}^{\;(b)}=\max \{{l}_{\infty }({{\mathbb{C}}}_{t}),t=1,\ldots ,T\;\}\) using the simulated data under prespecified patterns.

  4. (4)

    Repeat the above steps B times, and we get the null distribution of W.

The raw P value is calculated as

$$P=\frac{1}{B}\mathop{\sum }\limits_{b=1}^{B}I({W}^{\;(b)} > W\;).$$

We then apply the Holm–Bonferroni correction or Benjamini–Hochberg procedure on raw P values to control the FDR.

ANCOM-BC2 for mixed-effects models

Similar to the fixed-effects model stated in equation (3), for each taxon j, j = 1, …, d, and each sample i, i = 1, …, n, suppose each sample has ni observations and ∑ini = n. The offset-based mixed-effects log-linear model is set up as

$${{{{\bf{y}}}}}_{{{{{ij}}}}}={\theta }_{i}{{{{\bf{1}}}}}_{{{{{{n}}}}}_{{{{{i}}}}}}+{{{{{X}}}}}_{{{{{i}}}}}{{{{\mathbf{\upbeta }}}}}_{j}+{{{{{Z}}}}}_{{{{{i}}}}}{{{{\mathbf{\upalpha }}}}}_{i}+{{{{\mathbf{\upepsilon }}}}}_{ij},$$
(26)

where

  1. (1)

    yij is the ni vector-centered observed counts,

  2. (2)

    \({{{{\bf{1}}}}}_{{{{{{n}}}}}_{{{{{i}}}}}}={(1,\ldots ,1)}^{T}\in {{\mathbb{R}}}^{{n}_{i}}\) is a vector of 1s,

  3. (3)

    Xi is the ni × p design matrix for fixed effects,

  4. (4)

    βj is the p vector of fixed-effects regression coefficients to be estimated,

  5. (5)

    Zi is the ni × q design matrix for the random effects,

  6. (6)

    αi is the q vector random effects,

  7. (7)

    ϵij is the ni vector residuals.

The following distributional assumptions are made

$$\begin{array}{rcl} &{\mathbf{\upalpha}}_{i} \sim N({\mathbf{0}}, {{D}}_{{q \times q}}), \\ &{\mathbf{\upepsilon}}_{ij} \sim N({{0}}, \sigma_j^2 {\mathbf{1}}_{{{n}}_{{i}}}), \\ &{\mathbf{\upalpha}}_{i} {\perp\!\!\!\perp} {\mathbf{\upepsilon}}_{ij} \, {\rm{for }}\, i = 1, \ldots, n. \end{array}$$

Thus, for each taxon j, j = 1, …, d, and each sample i, i = 1, …, n, we have

$${{{{\bf{y}}}}}_{{{{{ij}}}}} \sim N({\theta }_{i}{{{{\bf{1}}}}}_{{{{{{n}}}}}_{{{{{i}}}}}}+{{{{{X}}}}}_{{{{{i}}}}}{{{{\bf{\upbeta }}}}}_{j},{{{{{H}}}}}_{{{{{ij}}}}}({{{{\tau }}}})),$$

where \({{{{{H}}}}}_{{{{{ij}}}}}({{{\boldsymbol{\tau }}}})={{{{{Z}}}}}_{{{{{i}}}}}{{{{D}}}}{{{{{{Z}}}}}_{{{{{i}}}}}}^{T}+{\sigma }_{j}^{2}{{{{{I}}}}}_{{{{{{n}}}}}_{{{{{i}}}}}}\) (or Hij for short) denotes a general covariance matrix parametrized by τ.

Stack up observations across samples, we have:

$${{{{\bf{y}}}}}_{{{{{j}}}}}={{{\bf{\uptheta }}}}+{{{{X}}}}{{{{\bf{\upbeta }}}}}_{j}+{{{{Z}}}}{{{\bf{\upalpha }}}}+{{{{\bf{\upepsilon }}}}}_{j},$$
(27)

where

$$\begin{array}{ll}&{{{{\bf{y}}}}}_{{{{{j}}}}}=\left[\begin{array}{l}{{{{{y}}}}}_{{{{{1}}}}{{{{j}}}}}\\ {{{{{y}}}}}_{{{{{2}}}}{{{{j}}}}}\\ \vdots \\ {{{{{y}}}}}_{{{{{nj}}}}}\end{array}\right],\,{{{{\bf{\uptheta} }}}}=\left[\begin{array}{c}{\theta }_{1}{{{{{\bf{1}}}}}}_{{{{{{n}}}}}_{{{{{1}}}}}}\\ {\theta }_{2}{{{{{\bf{1}}}}}}_{{{{{{n}}}}}_{{{{{2}}}}}}\\ \vdots \\ {\theta }_{n}{{{{{\bf{1}}}}}}_{{{{{{n}}}}}_{{{{{n}}}}}}\end{array}\right],\,{{{{X}}}}=\left[\begin{array}{c}{{{{{X}}}}}_{{{{{1}}}}}\\ {{{{{X}}}}}_{{{{{2}}}}}\\ \vdots \\ {{{{{X}}}}}_{{{{{n}}}}}\end{array}\right],\,{{{{{\bf{\upbeta} }}}}}_{j}=\left[\begin{array}{c}{\beta }_{j1}\\ {\beta }_{j2}\\ \vdots \\ {\beta }_{jp}\end{array}\right],\\ &{{{{Z}}}}=\left[\begin{array}{cccc}{{{{{Z}}}}}_{{{{{1}}}}}&{{{{0}}}}&\ldots &{{{{0}}}}\\ {{{{0}}}}&{{{{{Z}}}}}_{{{{{1}}}}}&{{{{0}}}}&{{{{0}}}}\\ \vdots &\vdots &\ddots &\vdots \\ {{{{0}}}}&{{{{0}}}}&\ldots &{{{{{Z}}}}}_{{{{{1}}}}}\end{array}\right],\,{{{{\bf{\upalpha} }}}}=\left[\begin{array}{c}{{{{{\alpha }}}_{1}}}\\ {{{{{\alpha }}}_{2}}}\\ \vdots \\ {{{{{\alpha }}}_{n}}}\end{array}\right],\,{{{{{\bf{\upepsilon} }}}}}_{j}=\left[\begin{array}{c}{{{{{\epsilon }}}}}_{1j}\\ {{{{{\epsilon }}}}}_{2j}\\ \vdots \\ {{{{{\epsilon }}}}}_{nj}\end{array}\right].\end{array}$$

That is,

$${{{{\bf{y}}}}}_{{{{{j}}}}} \sim N\left({{{{\bf{\uptheta} }}}}+{{{{X}}}}{{{{{\bf{\upbeta} }}}}}_{j},{{{{{H}}}}}_{{{{{j}}}}}({{{{\tau }}}})=\left[\begin{array}{cccc}{{{{{H}}}}}_{{{{{1}}}}{{{{j}}}}}({{{{\tau }}}})&{{{{0}}}}&\ldots &{{{{0}}}}\\ {{{{0}}}}&{{{{{H}}}}}_{{{{{2}}}}{{{{j}}}}}({{{{\tau }}}})&{{{{0}}}}&{{{{0}}}}\\ \vdots &\vdots &\ddots &\vdots \\ {{{{0}}}}&{{{{0}}}}&\ldots &{{{{{H}}}}}_{{{{{nj}}}}}({{{{\tau }}}})\end{array}\right]\right),$$

where Hj(τ) (or Hj for short) is a block diagonal matrix.

Similarly, we estimate θ and βj iteratively to obtain the corresponding preliminary estimators. Compared to Algorithm 1, the maximum likelihood is replaced with restricted maximum likelihood (ReML)54,55.

Algorithm 3. Iterative ReML estimation

1: Initialize:

  For j = 1, …, d

  θ ← 0

  \({{{{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})}\leftarrow {{{{\bf{y}}}}}_{{{{{j}}}}}-{{{\bf{\uptheta }}}}={{{{\bf{y}}}}}_{{{{{j}}}}}\)

  \({{{{\bf{\upbeta }}}}}_{j}\leftarrow {{{\rm{ReML}}}}({{{{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})})={{{\rm{ReML}}}}({{{{\bf{y}}}}}_{{{{{j}}}}})\)

(2) While not converge do

(3)  \({{{\bf{\uptheta }}}}\leftarrow \frac{1}{d}\mathop{\sum }\nolimits_{j = 1}^{d}({{{{\bf{y}}}}}_{{{{{j}}}}}-{{{{X}}}}{{{{\bf{\upbeta }}}}}_{j})\)

(4)  \({{{{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})}\leftarrow{{{{\mathbf{y}}}}}_{{{{{j}}}}}-{{{\bf{\uptheta }}}}\)

(5)  \({{{{\bf{\upbeta }}}}}_{j}\leftarrow {{{\rm{ReML}}}}({{{{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})})\)

(6) end while

Note that the estimators for regression coefficients βj and variance components τ are obtained iteratively by maximizing the following log-likelihood function:

$${{{\mathrm{L}}}}({{{{\tau }}}}| {{{{\bf{y}}}}}_{{{{{j}}}}})=-\mathop{\sum }\limits_{i=1}^{n}\log | {{{{{H}}}}}_{{{{{ij}}}}}| -\mathop{\sum }\limits_{i=1}^{n}\log | {{{{{{X}}}}}_{{{{{i}}}}}}^{T}{{{{{H}}}}}_{{{{{ij}}}}}^{-1}{{{{{X}}}}}_{{{{{i}}}}}| -\mathop{\sum }\limits_{i=1}^{n}{({{{{\bf{y}}}}}_{{{{{ij}}}}}-{{{{{X}}}}}_{{{{{i}}}}}{{{{\bf{\upbeta }}}}}_{j})}^{T}{{{{{H}}}}}_{{{{{ij}}}}}^{-1}({{{{\bf{y}}}}}_{{{{{ij}}}}}-{{{{{X}}}}}_{{{{{i}}}}}{{{{\bf{\upbeta }}}}}_{j}),$$
(28)

where \({{{{\bf{\upbeta }}}}}_{j}\leftarrow {({{{{{X}}}}}^{T}{{{{{{H}}}}}_{{{{{j}}}}}}^{-1}{{{{X}}}}\;)}^{-1}{{{{{X}}}}}^{T}{{{{{{H}}}}}_{{{{{j}}}}}}^{-1}{{{{\bf{y}}}}}_{{{{{j}}}}}\). As close-form solutions of equation (28) do not exist, the Newton–Raphson method56 is usually used.

Suppose on convergence, \({{{\bf{\uptheta }}}}\leftarrow {{{{\bf{\uptheta }}}}}^{* },{{{{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})}\leftarrow {{{{{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})}}^{* },{{{{H}}}}\leftarrow {{{{{H}}}}}^{* },{{{{\mathbf{\upbeta }}}}}_{j}\leftarrow {{{{\bf{\upbeta }}}}}_{j}^{* }\), we have

$$\begin{array}{rc}&{{{{\bf{\uptheta }}}}}^{* }=\frac{1}{d}\mathop{\sum }\limits_{j=1}^{d}({{{{\bf{y}}}}}_{{{{{j}}}}}-{{{{X}}}}{{{{\bf{\upbeta }}}}}_{j}^{* }),\\ &{{{{{\bf{y}}}}}_{{{{{j}}}}}^{({{{\mathrm{crt}}}})}}^{* }={{{{\bf{y}}}}}_{{{{{j}}}}}-{{{{\bf{\uptheta }}}}}^{* },\\ &{{{{\bf{\upbeta }}}}}_{j}^{* }={\left({{{{{X}}}}}^{T}{{{{{{H}}}}}_{{{{{j}}}}}^{* }}^{-1}{{{{X}}}}\right)}^{-1}{{{{{X}}}}}^{T}{{{{{{H}}}}}_{{{{{j}}}}}^{* }}^{-1}{{{{\bf{y}}}}}_{{{{{j}}}}}^{{\left({{{\mathrm{crt}}}}\right)}^{* }}.\end{array}$$

It is easy to show that there exists a vector \({{{\mathbf{\updelta }}}}\in {{\mathbb{R}}}^{P}\), such that

$$\begin{array}{rc}E({{{{\bf{\uptheta }}}}}^{* })&={{{\bf{\uptheta }}}}-{{{{X}}}}{{{\bf{\updelta }}}},\\ E(\;{{{{\bf{\upbeta }}}}}_{j}^{* })&={{{\bf{\updelta }}}}+{{{{\bf{\upbeta }}}}}_{j}.\end{array}$$

that is, \({{{{\bf{\upbeta }}}}}_{j}^{* }\) is a biased estimator for βj.

Similar to the case of fixed-effects model, we fit the Gaussian mixture model to each βjk, k = 1, …, p separately, to correct the bias δ, and final estimators for βj and θ are given by

$$\begin{array}{ll}&{\hat{\bf{\upbeta}}}_{j}={\bf{\upbeta}}_{j}^{*}-{\hat{\bf{\updelta}}}^{\mathrm{EM}},\\ &{\hat{\bf{\uptheta}}}=\frac{1}{d}\mathop{\sum}\limits_{j=1}^{d}({{\bf{y}}}_{j}-{X}{\hat{\bf{\upbeta}}}_{j}).\end{array}$$

The statistical inference, including multi-group comparisons, for mixed-effects models, aligns with those outlined in previous sections for fixed-effects models, and therefore, it is not repeated here.

Strategies implemented in ANCOM-BC2 to handle zeros

ANCOM-BC2 deals with zero-related challenges in microbiome data as follows. (1) Structural zero identification: taxa that are exclusively present in one ecosystem but absent in another, result in structural zeros. For example, some taxa are exclusive to desert regions but entirely absent in rainforests. Hence, they are structural zeros in rainforests. Those zeros should not be imputed or ignored, and such taxa are DA between the two regions. As the first step, using ANCOM-II (ref. 13), ANCOM-BC2 identifies all DA taxa that are due to structural zeros, and no further analysis is performed on such taxa and they are cataloged separately in the software output. (2) Prevalence-based filtration: after filtering structural zeros, ANCOM-BC2 applies a prevalence-based filtration, akin to other DA methods. By default, taxa that feature in less than 10% of all samples are removed from further analysis. (3) Sensitivity analysis for pseudo-count addition to zeros: for the remaining taxa with some zeros, we perform a sensitivity analysis to assess their robustness to pseudo-counts as follows. Much like many DA analysis methodologies, since ANCOM-BC2 log transforms the observed counts, the counts need to be positive. Often pseudo-counts are added to deal with zeros. However, it is well-known that the choice of the pseudo-count can considerably influence the false-positive as well as false-negative rates13,16,17. To mitigate this concern, we conduct a sensitivity analysis to evaluate the effect of varying pseudo-counts on zeros for each taxon. This procedure incorporates the addition of an array of pseudo-counts (ranging from 0.01 to 0.5 in increments of 0.01) to the zero counts for each taxon. Corresponding to each pseudo-count, ANCOM-BC2 is used for each taxon and P values for DA analysis are derived. The sensitivity score for each taxon is the proportion of instances where the P values exceed the specified significance level. If the proportion of significant (or non-significant) results is 1 and the significance (or non-significance) aligns with significance (or non-significance) using complete data (excluding zeros), then the taxon is regarded as insensitive to the pseudo-count addition. Otherwise, it is deemed sensitive. This step remains a recommendation and is at the discretion of the users. We offer two versions of ANCOM-BC2 for flexibility: (1) ANCOM-BC2 (no filter): this version only uses the first two steps for handling zeros and uses complete data (that is, excludes zeros by treating them as missing completely at random) for bias correction and inference. While it has larger power, it might display an inflated FDR, especially with larger sample sizes or repeated measures. (2) ANCOM-BC2 (SS filter): this version uses all three aforementioned steps for dealing with zeros and also uses complete data for both bias correction and inference. Specifically, if a taxon is found to be sensitive to pseudo-counts then it is declared as non-significant taxon. While more conservative, it provides rigorous control of FDR, albeit with a possible decrease in power.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.