An adaptive microbiome α-diversity-based association analysis method

To relate microbial diversity with various host traits of interest (e.g., phenotypes, clinical interventions, environmental factors) is a critical step for generic assessments about the disparity in human microbiota among different populations. The performance of the current item-by-item α-diversity-based association tests is sensitive to the choice of α-diversity metric and unpredictable due to the unknown nature of the true association. The approach of cherry-picking a test for the smallest p-value or the largest effect size among multiple item-by-item analyses is not even statistically valid due to the inherent multiplicity issue. Investigators have recently introduced microbial community-level association tests while blustering statistical power increase of their proposed methods. However, they are purely a test for significance which does not provide any estimation facilities on the effect direction and size of a microbial community; hence, they are not in practical use. Here, I introduce a novel microbial diversity association test, namely, adaptive microbiome α-diversity-based association analysis (aMiAD). aMiAD simultaneously tests the significance and estimates the effect score of the microbial diversity on a host trait, while robustly maintaining high statistical power and accurate estimation with no issues in validity.

essential to better understand microbial communities (e.g., lower microbial diversity may indicate higher morbidity) and make plans (e.g., plans to recover microbial diversity to normality). In ecology, α-diversity has also been widely used as a guideline for community ecologists and conservation biologists to make plans to preserve natural ecosystems or restore perturbed communities [14][15][16] .
Notably, a variety of α-diversity metrics can be considered in the analysis. Different α-diversity metrics reflect different views on the true diversity and they perform differently. For example, Richness (also known as Observed), Shannon 17 and Simpson 18 indices are non-phylogenetic metrics (i.e., based solely on abundance information) which weight relatively rare, mid-abundant and abundant species, respectively. Accordingly, they are suitable when associated species are rare, mid-abundant and abundant species, respectively. In contrast, phylogenetic diversity (PD) 19 , phylogenetic entropy (PE) 20 and phylogenetic quadratic entropy (PQE) 21,22 are phylogenetic metrics (i.e., based on both abundance and phylogenetic information) which weight relatively rare, mid-abundant and abundant species, respectively. The phylogenetic metrics are suitable when associated species have disparity in both abundance and phylogeny, where PD, PE and PQE are suitable when associated species are rare, mid-abundant and abundant species, respectively. In reality, associated species can be rare or abundant, or they can have disparity in phylogeny rather than abundance or vice versa. However, it is highly difficult to predict which situation among such various possible association patterns is the one for our study and to choose a single optimal α-diversity metric to use. This is because of the unknown nature of the true association. The approach of cherry-picking a test which has the smallest p-value or the largest effect size after running multiple item-by-item α-diversity-based association analyses is not statistically valid (e.g., do not correctly control type I error) because the multiplicity (i.e., multiple testing) issue is not properly accounted for 23 . Therefore, a valid statistical method which robustly suits various unknown association patterns is needed.
In this paper, I introduce a novel adaptive microbial diversity association test, namely, adaptive microbiome α-diversity-based association analysis (aMiAD), which robustly maintains high statistical power and accurate microbial diversity effect score estimation throughout various association patterns while satisfying the requisite validity issue. aMiAD employs the minimum p-value from multiple candidate item-by-item α-diversity-based association analyses as its test statistic and estimate its own p-value and microbial diversity effect score based on a residual-based permutation method. The use of minimum p-value statistic is to adaptively approach the highest power and the most accurate microbial diversity effect score estimation among multiple candidate analyses, while the residual-based permutation method based on the minimum p-value statistic is to robustly satisfy the validity issue (e.g., correctly controlling type I error) with no distributional assumption to be satisfied. Three non-phylogenetic metrics, Richness, Shannon, Simpson indices and three phylogenetic metrics, PD, PE and PQE are selected as the candidate α-diversity metrics for aMiAD because of their distinguished features which properly modulate abundance and phylogenetic information.
The rest of the paper is organized as follows. The methodological details for aMiAD can be found in the following Methods section. Then, extensive simulations and real data applications are addressed in the Results section. I finally discuss possible extensions for the use of aMiAD in the Discussion section.

Methods
I first organize related notations and models. Then, I address details on the six candidate α-diversity metrics, Richness, Shannon 17 , Simpson 18 , PD 19 , PE 20 and PQE 21,22 . Finally, I delineate the test statistic and microbial diversity effect score of aMiAD and the residual permutation-based computational algorithm. While the application of aMiAD can be much broader (e.g., extendable to generalized linear models), I describe aMiAD to relate microbial diversity with a continuous (e.g., BMI) or a binary (e.g., disease/treatment status) trait.
Here, I notify that the α-diversity referred in this paper considers different types of operational taxonomic units (OTUs) in the bacterial kingdom per biological sample (e.g., human, mouse), indicating within-sample diversity of OTUs in the bacterial kingdom. However, in practice, any subunits (e.g., species or other lower-level microbial taxa) in a different microbial assemblage (e.g., kingdom of archaea, fungi, protists or viruses, phylum of firmicutes or bacteroidetes) can be considered.

Models and notations.
Suppose that there are n samples, p OTUs in a microbial community (e.g., bacterial kingdom) and q covariates (e.g., age, gender). Let Y i denote a continuous (e.g., BMI) or a binary (e.g., disease/ treatment status) trait, Z ij denote OTUs, and X ik denote covariates for i = 1, …, n, j = 1, …, p and k = 1, …, q. To relate OTUs in a community with a host trait while adjusting for covariate effects, I consider a multiple linear regression model equation (1) for a continuous trait and a multiple logistic regression model equation (2) for a binary trait.
where β 0 is a regression coefficient for the intercept, α k 's are regression coefficients for the effect of q covariates (e.g., age, gender), h (Z i ) is a function which characterizes the relationship between OTUs and a host trait, and ∈ i is an error term which is independently and identically distributed with a mean zero and a variance of σ 2 . Here, we are particularly interested in testing the null hypothesis, H 0 : h (Z i ) = 0; that is, no association between OTUs and a host trait. Notably, we can flexibly specify h (Z i ) to reflect different patterns of the relationship. For example, the linear relationship between OTUs and a host trait can be surveyed by setting h (Z i ) = ∑ β = Z j 1 p j ij , while diverse non-linear relationships can be surveyed by the use of non-linear transformations of OTUs (e.g., polynomials or splines) 24,25 . Furthermore, any positive semi-definite kernel function can be used for h (Z i ), where MiRKAT 11 has especially been credited with establishing a kernel machine regression framework for distance-based community-level association analysis. Among diverse alternatives, I formulate h (Z i ) as a function of α-diversity metric equation (3) for the ultimate goal of inferring the effect of microbial diversity on a host trait.
where γ is an index for a chosen α-diversity metric (e.g., Richness, Shannon, Simpson, PD, PE, PQE), β(γ) is a regression coefficient for the α-diversity metric and D (γ) i 's are the values of the α-diversity metric for i = 1, …, n.
α-diversity indices. α-diversity is an intuitive and natural index which summarizes the extent of microbial diversity in a community. A variety of α-diversity metrics have been proposed, and they are classified into non-phylogenetic and phylogenetic metrics. The non-phylogenetic metrics are constructed based solely on microbial abundance information, while the phylogenetic metrics further utilize phylogenetic tree information. I here survey three non-phylogenetic metrics, Richness, Shannon 17 and Simpson 18 indices, and three phylogenetic metrics, PD 19 , PE 20 and PQE 21,22 .
To begin with non-phylogenetic metrics, Richness, Shannon and Simpson indices are weighted variants based on the generalized diversity framework, known as the effective number of types (or Hill number), which quantifies how many effective types of interest exist in a community [26][27][28] . Here, the effective number of types (D w ) equation (4) is defined as the inverse of the mean weighted proportional abundance 26,27 .
where p is the total number of OTU types present in a community, r j is the relative abundance (i.e., proportion) of the j-th OTU for j = 1, …, p and w (  ∈ ) is the weight for the proportions (also known as the order of the diversity) which needs to be pre-specified.
Notably, with different pre-specifications for the order of the diversity (w) equation (4), different α-diversity metrics can be derived. In particular, when w = 0, D 0 equals to p (i.e., the total number of OTU types present in a community) which is known as Richness (D Richness ) equation (5).
Richness 0 where p is the total number of OTU types present in a community. When w = 1, D 1 cannot be defined; hence, the mathematical limit of  where p is the total number of OTU types present in a community and r j is the proportion of the j-th OTU for j = 1, …, p. When w = 2, D 2 equals to ∑ = − ( r ) j 1 p j 2 1 , which is the weighted arithmetic mean proportional abundance known as Inverse Simpson index 26,27 . Then, Simpson index (D Simpson ) equation (7) is derived by taking the minus of the inverse of D 2 , −D 2 where p is the total number of OTU types present in a community and r j is the proportion of the j-th OTU for j = 1, …, p. Importantly, by the formula equation (4), we can infer that as the value of w increases, relatively abundant OTUs are weighted, but it is vice versa as the value of w decreases 27 . Therefore, Richness, Shannon and Simpson indices weight relatively rare, mid-abundant and abundant OTUs, respectively; hence, they are also suitable when associated OTUs are rare, mid-abundant and abundant, respectively.
In contrast, the phylogenetic metric, PD 19 , utilizes phylogenetic tree information while considering only the incidence (i.e., presence/absence) information of OTUs. Specifically, PD (D PD ) is defined as the sum of the lengths of the branches for the OTUs present in a community equation (8).
where p is the total number of OTU types present in a community and 1 j is the length of all the branches that belong to the j-th OTU for j = 1, …, p. Therefore, PD is suitable when associated OTUs have high disparity in phylogeny rather than in abundance. Given that prevalent OTUs are likely to be present in all samples, PD is also suitable especially for rare OTUs which have high disparity in the classification of presence/absence.  (10) are phylogenetic generalizations of the Shannon and Simpson indices, which incorporate all differing microbial abundance information (i.e., beyond the incidence (presence/ absence) information for PD) while weighting relatively mid-abundant and abundant OTUs.
where p is the total number of OTU types present in a community, 1 j is the length of all the branches that belong to the j-th OTU and r j is the proportion of the j-th OTU for j = 1, …, p. Therefore, PE and PQE are suitable when associated OTUs have high disparity in phylogeny, where they are relatively mid-abundant and abundant, respectively. The above α-diversity metrics are the most fundamental and widely used, and they were sufficient in my simulations and real data analyses. Yet, the potential extension to other α-diversity metrics is addressed later in Discussion.
aMiAD. aMiAD is constructed based on the score test 29 of the linear equation (1) or logistic equation (2) regression model, which surveys the association between each of the α-diversity metrics and a host trait while adjusting for covariates. Here, the unstandardized score statistic (U (γ) ) is formulated with equation (11).
where γ is an index for a chosen α-diversity metric (e.g., Richness, Shannon, Simpson, PD, PE, PQE) and μ i,0 is the fitted value under the null hypothesis, which is estimated as X 0 k 1 q ik k β′ + ∑ α′ = for the linear regression model equation (1) ik k for the logistic regression model equation (2), where β′  0 and α′  k are maximum likelihood estimates (MLEs) under the null hypothesis. This unstandardized score statistic (U (γ) ) is sufficient to estimate the p-value (P (γ) ) based on my residual permutation-based method (see Computational algorithm) because its mean and standard error are evaluated under the null hypothesis equivalently for both the observed and null (i.e., permuted) statistic values resulting in no change in their relative comparison 25 . Yet, the mean and standard error under the null hypothesis are also estimated to derive the standardized score statistic ( γ ⁎ U ( ) ). The standardized score statistic ( γ ⁎ U ( ) ) is asymptotically related to the regression coefficient (β (γ) ) equation (3) and tells effect direction and size of a chosen α-diversity metric 29,30 . I denote γ ⁎ U ( ) as MiDivES (γ) and use it as the effect score of a chosen α-diversity metric.
Here, the score test equation (11) with its resulting p-value (P (γ) ) and effect score (MiDivES (γ) ) handles α-diversity metrics one-by-one. Yet, as described above, the performance differs according to the choice of α-diversity metric and the true underlying association pattern. Because of the unknown nature of the true association pattern, we cannot predict which α-diversity index is the optimal choice to our study in advance. Therefore, in order to robustly suit various association patterns, I propose a data-driven adaptive test, aMiAD. The test statistic of aMiAD (T aMiAD ) is the minimum p-value from multiple item-by-item α-diversity-based association analyses equation (12).
where γ is an index for a metric in a set of multiple candidate α-diversity metrics (Γ), where Γ = {Richness, Shannon, Simpson, PD, PE, PQE}, and P (γ) is the estimated p-value for the use of each α-diversity metric (γ ∈ Γ). Here again, T aMiAD equation (12) is the test statistic of aMiAD, and this minimum p-value (i.e., = (12)) itself is not the p-value I report for aMiAD. The approach of cherry-picking the minimum p-value among multiple candidate analyses (i.e., = (12)) and reporting it (i.e., = (12)) as it is cannot correctly control type I error rates because of the inherent multiplicity (i.e., multiple testing) issue 23 . I use a residual permutation-based method (see Computational algorithm) based on the minimum p-value statistic equation (12) to estimate the p-value for aMiAD (denoted as P aMiAD ).
The estimated microbial diversity effect score of aMiAD, namely, adaptive microbial diversity effect score (aMiDivES) equation (13), is the standardized score statistic value based on the α-diversity metric which results in the minimum p-value among multiple candidate analyses, which is then further standardized by its mean and standard error under the null hypothesis. of the test reaching the minimum p-value (i.e., MiDivES (γm) ) is not the microbial diversity effect score I report for aMiAD. I use a residual permutation-based method (see Computational algorithm) to estimate the mean (E(MiDivES (γm), 0 )) and standard error (SE(MiDivES (γm), 0 )).

Computational algorithm.
The computational algorithm to estimate the p-value (P aMiAD ) and the effect score (aMiDivES) of aMiAD is based on a residual-based permutation method which randomly shuffles the residuals estimated from the null model, which reflects the null situation of no association. It is constructed based on the score statistic equation (11) and its derivatives equations (12) and (13) which do not require MLE; hence, we can avoid heavy computation and no convergence error in the iterative algorithm for MLE. It is non-parametric; hence, the outcomes are robustly valid with no underlying distributional assumption to be satisfied. The approach based on the minimum p-value statistic and a residual-based permutation method has also been widely used in prior studies [11][12][13]25,31 , where the validity issue was robustly satisfied. Detailed procedures can be found in (Supplementary S1 Text).
Ethics approval and consent to participate. Not applicable. This study involves only secondary analyses. All utilized microbiome datasets are publicly and freely available which do not require any ethics approval and consent to participate.

Results
Simulations. I conducted simulation experiments under a wide range of scenarios in order to evaluate and compare item-by-item α-diversity-based association tests and aMiAD in terms of hypothesis testing (i.e., type I error and power) and effect score estimation (i.e., central tendency, dispersion and accuracy). I also evaluate the approach of cherry-picking a test which has the smallest p-value (denote it as Minimum P) or the largest effect size (i.e., the largest deviation from zero effect) (denote it as Largest ES) among multiple item-by-item α-diversity-based association analyses in terms of the validity issues of properly controlled type I error and the central tendency and dispersion of microbial diversity effect scores under the null hypothesis. I also evaluate other existing adaptive community-level association tests (i.e., Optimal MiRKAT (OMiRKAT) 11 , adaptive MiSPU (aMiSPU) 12 and OMiAT  Simulation design. I simulated microbiome data according to prior studies 11,13,25 which reflect real OTUs' proportions and dispersion on the basis of the Dirichlet-multinomial distribution 32 . In particular, I used real gut microbiome data 33 from 35 fecal samples (collected from non-obese diabetic (NOD) mice at 6 weeks of age in the control group with no antibiotic treatment) for 353 OTUs (after removing OTUs with proportional mean abundance ≤10 −4 ) to estimate the proportions and dispersion parameter. Then, simulation data were iteratively generated from the Dirichlet-multinomial distribution with the pre-specified values of the estimated proportions and dispersion parameter and the total reads per sample of 1,000 for small (n = 50) and large (n = 100) sample sizes, respectively 11,13,25 . Then, binary outcomes were generated based on the logistic regression model equation (14) i 1i 2i j i i j where X 1i and X 2i are two covariates (e.g., age and gender) simulated from the normal distribution with mean 50 and standard deviation (SD) 5 and the Bernoulli distribution with success probability 0.5, respectively, β is a scalar value (  ∈ ) which determines the effect direction and size of the associated OTUs in a set Λ, where Z ij is an OTU count and w i is a weight for the phylogenetic disparity defined as the sum of the branch lengths for present OTUs divided by the sum of the branch lengths for absent OTUs, and 'scale' is the standardization function to have mean 0 and SD 1 11,13,25 . To estimate empirical type I error rate and the mean (as a measure of central tendency) and variance (as a measure of dispersion) of microbial diversity effect scores under the null hypothesis, I set β = 0. To estimate statistical power and the accuracy of effect scores, I set β from the uniform distribution between −3 and 3 (i.e., Unif(−3, 3)). Here, the R 2 value between β values randomly generated from Unif(−3, 3) and microbial diversity effect scores estimated from each method was used as a measure of estimation accuracy. The set of associated OTUs in the community (Λ) was selected with four different scenarios: (1) Λ = {OTUs in bottom 20% in abundance}, (2) Λ = {A random 20% of OTUs}, (3) Λ = {OTUs in top 20% in abundance}, (4) Λ = {OTUs in a cluster among 7 clusters partitioned by partitioning-around-medoids (PAM) algorithm}, respectively. The first three scenarios mimic the situations when rare, mid-abundant and abundant OTUs, respectively, are associated. For the fourth scenario, I used PAM algorithm 34 to partition all OTUs in the community into 7 clusters based on their cophenetic distances. Here, the number of clusters, 7, was selected by maximizing the average silhouette width from 5 to 10 candidate numbers of clusters 35,36 . I randomized the choice of an associated cluster among the 7 clusters to avoid arbitrary choice 13,25 , whereas the outcomes for each of the 7 clusters can be found in Supporting Information (Fig. S1). The fourth scenario mimics the situation when phylogenetically close OTUs are associated.
Simulation results. Type I error. I estimate that the empirical type I error rates are well-controlled at the significance level of 0.05 for aMiAD, as well as all item-by-item α-diversity-based association tests and adaptive community-level association tests (OMiRKAT, aMiSPU and OMiAT), for both small (n = 50) and large (n = 100) sample sizes (Table 1). However, the cherry-picking approaches (i.e., Minimum P and Largest ES) show overly inflated empirical type I error rates for both small (n = 50) and large (n = 100) sample sizes ( Central tendency and dispersion of effect scores under the null hypothesis. I estimate that the means of microbial diversity effect scores under the null hypothesis are around zero, indicating no bias in the estimation, for all surveyed tests and for both small (n = 50) and large (n = 100) sample sizes ( Table 2). I also estimate that the variances of microbial diversity effect scores under the null hypothesis are around one for aMiAD, as well as all the item-by-item α-diversity-based association tests, for both small (n = 50) and large (n = 100) sample sizes (Table 2). However, the cherry-picking approaches (i.e., Minimum P and Largest ES) show overly inflated variance estimates for both small (n = 50) and large (n = 100) sample sizes (Table 2), indicating over-estimation of effect size. To be more detailed, the performance also varies by which cluster among the 7 clusters partitioned by PAM algorithm is selected (see Supporting Information (Fig. S1)). That is, the Shannon index estimates the greatest power and R 2 values when OTUs in the first cluster are associated ( Fig. S1A-D(C1)), PE estimates the greatest power and R 2 values when OTUs in the second, third, fifth and  Table 2. Estimated means and variances of microbial diversity effect scores under the null hypothesis (Unit: %). Minimum P and Largest ES represent the cherry-picking approaches for the smallest p-value and the largest effect size, respectively, among multiple item-by-item α-diversity-based association analyses. sixth clusters are associated (Fig. S1A-D(C2, C3, C5, C6)), and PQE estimates the greatest power and R 2 values when OTUs in the fourth cluster are associated (Fig. S1A-D(C4, C7)).
Although it may not be feasible to reflect all possible true association patterns in the natural world to our simulations, the most meaningful observation here is that aMiAD adaptively approaches the greatest power and R 2 values among different item-by-item analyses throughout all surveyed scenarios (Figs 1A-D and S1A-D), while the performance for each α-diversity metric considerably fluctuates (Figs 1A-D and S1A-D). In reality, the true association scenario is mostly unknown, while a variety of scenarios are also likely to exist. Thus, aMiAD is attractive due to its high adaptivity and robustness to better cope with the unknown nature.
To compare aMiAD with the three adaptive community-level association tests (OMiRKAT, aMiSPU and OMiAT) (Figs 1E,F and S1E,F), OMiAT estimates the greatest power values for most of the scenarios except that aMiAD estimates the greatest power values for small sample size (n = 50) when abundant OTUs (Figs 1E and (S3)) and OTUs in the second cluster among the 7 clusters partitioned by the PAM algorithm are associated (Fig. S1E(C2)), aMiSPU estimates the greatest power values when OTUs in the fourth cluster are associated for both small (n = 50) (Fig. S1E(C4)) and large (n = 100) (Fig. S1F(C4)) sample sizes and OMiRKAT estimates the greatest power values when OTUs in the seventh cluster are associated for both small (n = 50) (Fig. S1E(C7)) and large (n = 100) (Fig. S1F(C7)) sample sizes. To summarize, we may conclude that OMiAT is most robustly powerful. However, once again, OMiAT, as well as OMiRKAT and aMiSPU, does not provide any effect estimation facilities; hence, its interpretability and usability are limited.
Real data applications. The disparity in microbial diversity between control and antibiotic treatment groups. Cox et al. (2013) performed microbiota-profiling studies to survey if the gut microbiota affected during maturity by antibiotic treatment leads to continued metabolic consequences 37 . To demonstrate the use of aMiAD, I analyzed a part of the original data, which surveys the effect of antibiotic treatment with low-dose penicillin (LDP) on microbial diversity of the gut microbiota. In particular, I compared microbial diversity of the bacterial kingdom between two groups of mice, 8 control and 7 antibiotic treatment mice. To summarize the sampling and profiling procedures while details are found in the original literature 37 , the 8 control mice are 8 germ-free mice to whom cecal microbiota from mice with no treatment were transferred and the 7 antibiotic treatment mice are 7 germ-free mice to whom cecal microbiota from LDP-treated mice were transferred. Fecal samples from the 8 control and 7 antibiotic treatment mice were collected after 23 days of the transfer, and the V4 region of the bacterial 16S rRNA gene was targeted in the amplicon sequencing with barcoded fusion primers 38 . Then, the QIIME pipeline 2 was used to quantify OTUs and construct their phylogenetic tree. The OTUs were rarefied using the software package, phyloseq 39 due to the varying total reads per sample 40 . 59 OTUs were included in the analysis after removing OTUs which are not present in any sample after random subsampling of the rarefaction 39 . Here, only a few OTUs (i.e., 59 OTUs), which may not represent the entire ecosystem, were analyzed because of some data quality issues (e.g., small sample size, low sequencing depth and the antibiotic treatment effect which can substantially reduce microbial abundance/diversity).
We can first visually observe in the box-plots ( Fig. 2A) that all the α-diversity metrics are lower for the antibiotic treatment group than the control group, while PD and then Richness show the greatest disparity. Correspondingly, we can observe negative estimated effect scores for all α-diversity metrics, indicating microbial diversity is lower for the antibiotic treatment group than the control group, where the disparity is especially significant for PD (p-value: <0.001) and Richness (p-value: <0.001) indices (Fig. 2B). aMiAD estimates that microbial diversity is significantly different between the two groups (p-value: 0.001), where the microbial diversity is lower for the antibiotic treatment group than the control group (aMiDivES: −2.028 < 0) (Fig. 2B).
The disparity in microbial diversity between non-diseased and diseased groups. Environmental exposures (e.g., antibiotic use) during maturation have been associated with immunological and metabolic development through the mechanisms involved in the interaction between microbiota and host 41 . Type 1 diabetes (T1D) is one of the most common autoimmune diseases, which is caused by pancreatic β-cell destruction. T1D often appears in the pediatric age, and its incidence rate is globally increasing 42 . Livanos et al., (2016) performed microbiota-profiling studies to survey if the gut microbiota mediates the effect of antibiotic treatment on T1D onset 33 . To demonstrate the use of aMiAD, I analyzed a part of the original data, which surveys if the microbial diversity of gut microbiota altered by antibiotic treatment is differential by T1D status. To summarize the sampling and profiling procedures 33 , 19 NOD mice were exposed to the antibiotic (specifically, therapeutic-dose pulsed antibiotic) treatment, then, their fecal samples were collected after 6 weeks of the exposure. The V4 region of the bacterial 16S rRNA gene was targeted in the amplicon sequencing with barcoded fusion primers 38 and the QIIME pipeline 2 was used to quantify OTUs and construct their phylogenetic tree. The OTUs were rarefied using the software package, phyloseq 39 due to the varying total reads per sample 40 . 390 OTUs were included in the analysis after removing OTUs which are not present in any sample after random subsampling of the rarefaction 39 .
We can first visually observe in the box-plots (Fig. 3A) that the phylogenetic metrics (PD, PE and PQE) show a greater disparity than the non-phylogenetic metrics (Richness, Shannon and Simpson), where PQE and then PE show the greatest disparity. Here, we can also observe that the microbial diversity is lower for the T1D group than the non-diseased group for all α-diversity metrics but the Shannon index (Fig. 3A). Correspondingly, PQE (p-value: 0.012) and PE (p-value: 0.015) estimate significant p-values with negative effect direction (Fig. 3B). The Shannon index is the only metric which estimates positive effect direction (Fig. 3B). This indicates that item-by-item analyses are substantially sensitive to (e.g., the decision on significance and/or effect direction can even be reversed by) the choice of α-diversity metric. aMiAD estimates that microbial diversity is significantly different between the two groups (p-value: 0.048), where the microbial diversity is lower for the T1D group than the non-diseased group (aMiDivES: −1.619 < 0) (Fig. 3B).

Discussion
The recent microbial community-level association tests might be more powerful, where we, especially, observed in Simulations that OMiAT is most robustly powerful (Figs 1E,F and S1E,F). However, they do not provide any effect estimation facilities; hence, any further information about the disparity in microbial community composition is not accessible. Instead, aMiAD additionally estimates microbial diversity effect score, which can further enhance the interpretability. Here, I briefly discuss that other ANOVA-based methods (e.g., mvabund 43 ) cannot directly adjust potential confounding effects (e.g., age, gender), while the regression-based methods (e.g., MiRKAT, MiSPU, OMiAT, aMiAD) can easily adjust them. I chose the six α-diversity metrics, Richness, Shannon 17 , Simpson 18 , PD 19 , PE 20 and PQE 21,22 , as the candidate α-diversity metrics for aMiAD because of their distinguished features 44 . However, we are not restricted to these metrics, and other α-diversity metrics might be considered. For example, Chao1 45 and ACE 46 , can be used to further modulate the extent of the rarity of association OTUs. Chao1 and ACE utilize abundance information as "≥2 or <2 reads" and "≥10 or <10 reads", respectively, while Richness utilizes it as presence (i.e., ≥1 reads) or absence (i.e., 0 read). Thus, we may expect that Chao1 might be suitable when the extent of the rarity is relatively lower than the one for Richness, but relatively higher than the one for ACE. The Inverse Simpson index can also be considered by replacing the original Simpson index. Yet, I heuristically determined to use the original Simpson index as the Inverse Simpson index did not show any better performance. Notably, novel statistical estimates for α-diversity have still been proposed while further addressing the issues of missing species, sampling noise, experimental noise and so forth [47][48][49][50][51][52] . Any α-diversity metrics can be easily employed in my software package, aMiAD, through user options. In this paper, I introduced aMiAD which adaptively approaches to the highest power and the most accurate microbial diversity effect score estimation among multiple item-by-item α-diversity-based association analyses. aMiAD also robustly satisfies the requisite validity issues in hypothesis testing and effect score estimation. Although I proposed aMiAD to relate microbial diversity with a continuous (e.g., BMI) or binary (e.g., disease/ treatment status) trait of interest, it would be extendable to different types of trait (e.g., survival, multinomial trait) 25,[53][54][55] . Moreover, an extension to the linear mixed effect model 56 /generalized linear mixed effect model 57 is needed for correlated (e.g., family-based or longitudinal) study designs.