Abstract
In human microbiome studies, mediation analysis has recently been spotlighted as a practical and powerful analytic tool to survey the causal roles of the microbiome as a mediator to explain the observed relationships between a medical treatment/environmental exposure and a human disease. We also note that, in a clinical research, investigators often trace disease progression sequentially in time; as such, time-to-event (e.g., time-to-disease, time-to-cure) responses, known as survival responses, are prevalent as a surrogate variable for human health or disease. In this paper, we introduce a web cloud computing platform, named as microbiome mediation analysis with survival responses (MiMedSurv), for comprehensive microbiome mediation analysis with survival responses on user-friendly web environments. MiMedSurv is an extension of our prior web cloud computing platform, named as microbiome mediation analysis (MiMed), for survival responses. The two main features that are well-distinguished are as follows. First, MiMedSurv conducts some baseline exploratory non-mediational survival analysis, not involving microbiome, to survey the disparity in survival response between medical treatments/environmental exposures. Then, MiMedSurv identifies the mediating roles of the microbiome in various aspects: (i) as a microbial ecosystem using ecological indices (e.g., alpha and beta diversity indices) and (ii) as individual microbial taxa in various hierarchies (e.g., phyla, classes, orders, families, genera, species). To illustrate its use, we survey the mediating roles of the gut microbiome between antibiotic treatment and time-to-type 1 diabetes. MiMedSurv is freely available on our web server (http://mimedsurv.micloud.kr).
Similar content being viewed by others
Introduction
The human microbiome is the entire ecosystem of all microbes that reside in and on the human body. The field of human microbiome has been rapidly growing in both academia and industry due to the recent advances in next-generation sequencing technologies (i.e., 16S ribosomal RNA gene amplicon sequencing1,2 and shotgun metagenomic3 sequencing). Investigators are nowadays exploring, competing, and struggling to probe for the roles of the microbiome on human health or disease.
There have been many clinical and epidemiologic clues on the relationships between medical treatments/environmental exposures (e.g., diet, residence, smoking, preterm birth, delivery mode, antibiotic/probiotic use) and human diseases (e.g., obesity, intestinal disease, cancers, diabetes, brain disorders). The causal roles of the microbiome, as a mediator, between them are nowadays gaining increasing recognition4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19. That is, investigators seek to comprehend the causal link on if the treatment/exposure alters the microbiome, and then the altered microbiome, in turn, influences the human disease. For this, we have recently introduced a web cloud computing platform, named as microbiome mediation analysis (MiMed) (http://mimed.micloud.kr)20. MiMed enables comprehensive microbiome causal mediation analysis on user-friendly web environments20; as such, even non-professional programmers (e.g., microbiologists, medical doctors, public health practitioners) can easily deal with it. However, MiMed is limited to continuous (e.g., body mass index) or binary (e.g., normal vs. diseased) responses20.
We noted that, in a clinical research, investigators often trace disease progression sequentially in time; as such, time-to-event (e.g., time-to-disease, time-to-cure) responses, known as survival responses, are in practice commonly available as a surrogate variable for human health or disease. Typically, it is likely that the survival responses are right-censored indicating that the time-to-event is above a certain value, but it is unknown by how much; as such, only some specialized statistical methods can properly handle them. Thus, in this paper, we introduce a web cloud computing platform, named as microbiome mediation analysis with survival responses (MiMedSurv), that is an extension of MiMed20 for survival responses. The two main features that are well-distinguished are as follows. First, MiMedSurv can conduct some baseline exploratory non-mediational survival analysis, not involving microbiome, to survey the disparity in survival time between medical treatments (e.g., treatment vs. placebo, new treatment vs. old treatment, and so forth)/environmental exposures (e.g., rural vs. urban, smoking vs. non-smoking, calorie restriction vs. ad libitum diet, and so forth). Then, MiMedSurv can identify the mediating roles of the microbiome in various aspects: (i) as a microbial ecosystem using ecological indices (e.g., alpha and beta diversity indices) and (ii) as individual microbial taxa in various hierarchies (e.g., phyla, classes, orders, families, genera, species). We also stress that covariate-adjusted analysis is supported to control for potential confounding factors (e.g., age, sex) to improve causal inference, which is particularly essential for observational studies. MiMedSurv also provides user-friendly step-by-step data preprocessing and analytic modules and provides nice graphical facilities. Besides, MiMedSurv automatically lists all related references for each run of data preprocessing/analytic modules for user convenience. There have been many other web-based platforms for microbiome data analysis, for example, MiCloud21 for cross-sectional/longitudinal association analysis, MiPair22 for paired data analysis, MiSurv23 for survival analysis, MiTree24 for classification and regression modelling, MiMultiCat25 for classification modelling and association analysis with multi-categorical responses, and MiMed20 for causal mediation analysis with binary and continuous responses, yet MiMedSurv is the first for microbiome causal mediation analysis with survival response (Table 1). Recently, there have also been many other statistical methods proposed for microbiome causal mediation analysis26,27,28,29,30, yet we excluded them because they are not web-based and cannot handle survival responses. We organized the overall workflow of MiMedSurv in Fig. 1 for easy catch-ups, where the data processing module is the same with the one for MiMed20 while the non-mediational analysis and mediational analysis modules are unique for MiMedSurv [Fig. 1].
The rest of the paper is organized as follows. In the Materials and Methods section, we describe methodological aspects and web server architecture of MiMedSurv. Then, in the Results section, we illustrate the use of MiMedSurv, through an example data analysis to survey if the gut microbiome mediates the effect of antibiotic treatment to the onset of type 1 diabetes (T1D)11. Then, finally, in the Discussion section, we summarize and finish with concluding remarks.
Materials and methods
Mediation analysis begins with the observed relationship between a treatment/exposure and a response. Then, it surveys if there is any underlying mediating mechanism between them. In this research, we refer the former as ‘non-mediational analysis’, and the latter as ‘mediational analysis’. Although the non-mediational analysis does not involve any portion of microbiome, it is an essential procedure for baseline exploratory purposes. However, the latter, mediational analysis, is indeed the main analytic module of MiMedSurv.
Non-mediational analysis
The non-mediational analysis module is for some baseline exploratory survival analysis to survey the disparity in survival time between medical treatments (e.g., treatment vs. placebo, new treatment vs. old treatment, and so forth)/environmental exposures (e.g., rural vs. urban, smoking vs. non-smoking, calorie restriction vs. ad libitum diet, and so forth). For this, MiMedSurv (i) performs the Kaplan–Meier analysis31 coupled with the log-rank test32 or the Wilcoxon test33 for univariate analysis or (ii) fits the Cox proportional hazards model34 for both univariate and covariate-adjusted analyses. MiMedSurv employs the Kaplan–Meier or covariate-adjusted survival curve for graphics.
Mediational analysis
The mediational analysis module is to survey the mediating roles of the microbiome (i) as a microbial ecosystem using ecological indices (e.g., alpha and beta diversity indices) and (ii) as individual microbial taxa in various hierarchies (e.g., phyla, classes, orders, families, genera, species). In this research, we refer the former as ‘community-level analysis’, and the latter as ‘taxonomy-level analysis’. We also categorize the community-level analysis further into alpha diversity and beta diversity analyses.
First, for the alpha diversity analysis, as in MiMed20, MiMedSurv employs the Imai method for causal mediation analysis35,36. The only difference is in the outcome model35,36. That is, while MiMed employs the ordinary linear regression model for continuous responses and the logistic regression model for binary responses, MiMedSurv employs the Weibull regression model for survival responses35,36,37. MiMedSurv employs the forest plot for graphics.
Second, for the beta diversity analysis, MiMedSurv first surveys the disparity in beta diversity between treatment groups (e.g., treatment vs. placebo, new treatment vs. old treatment, and so forth) using MiRKAT38,39 (treatment model) and the relationship between beta diversity and survival responses adjusting for treatment status using MiRKAT-S39,40 (outcome model). Then, MiMedSurv combines the P-values from the treatment and outcome models using the Divide-Aggregate Composite-null Test (DACT) 41 that considers the null hypothesis of no mediation effect as a composite hypothesis to enhance statistical power. That is, based on the two regression models in Eqs. (1) and (2), the null hypothesis of no mediation effect is typically formulated as \({H}_{0}\): \({\alpha }_{1}{\beta }_{1}\) = 0, which indicates no effect of the treatment/exposure on the mediator (\({\alpha }_{1}\) = 0) or no effect of the mediator on the response conditioned on the treatment/exposure status (\({\beta }_{1}\) = 0).
where \({Y}_{i}\) is a health or disease response, \({T}_{i}\) is a treatment/exposure status (e.g., treatment (t = 1) vs. placebo (t = 0), new treatment (t = 1) vs. old treatment (t = 0), and so forth), and \({M}_{i}\) is a mediator, for each individual i = 1, …, n. However, DACT41 formulates it as a composite hypothesis that \({H}_{0}\): (1) \({\alpha }_{1}\) = 0 & \({\beta }_{1}\) ≠ 0; (2) \({\alpha }_{1}\) ≠ 0 & \({\beta }_{1}\) = 0; or (3) \({\alpha }_{1}\) = 0 & \({\beta }_{1}\) = 0. Then, DACT41 can powerfully detect mediation effect rejecting at least one of the three disjoint null statements. More tails can be found in41. MiMedSurv employs the principal coordinate analysis plot (PCoA) for graphics42.
Third, for the taxonomy-level analysis, as for the alpha diversity analysis, MiMedSurv employs the Imai method for causal mediation analysis coupled with the Weibull regression model for survival responses35,36,37. MiMedSurv applies the Imai method35,36,37 and the Benjamini-Hochberg (BH) procedures to control for the error rate with regards to false discovery rate (FDR)43 to each taxonomic hierarchy separately. MiMedSurv employs the forest plot and dendrogram for graphics.
Some more methodological details and terminologies for the Imai method35,36 are as follows. Again, for the alpha diversity and taxonomy-level analyses, MiMedSurv employs the Imai method35,36, which is based on the potential outcomes framework of causal inference44, that is \({Y}_{i}({T}_{i}\), \({M}_{i}\)(\({T}_{i}\))), where \({Y}_{i}\) is a health or disease response, \({T}_{i}\) is a treatment/exposure status (e.g., treatment (t = 1) vs. placebo (t = 0) , new treatment (t = 1) vs. old treatment (t = 0), and so forth), and \({M}_{i}\) is a mediator, for each individual i = 1, …, n. Then, for each individual, the Imai method35,36 defines (i) the total treatment effect as \({\tau }_{i}\) in Eq. (3); (ii) the direct effect as \({\zeta }_{i}(t)\) in Eq. (4); and (iii) the indirect effect that represents the ‘causal mediation effect’ as \({\delta }_{i}(t)\) in Eq. (5).
Then, the Imai method35,36 averages the total treatment effects, the direct effects, and the indirect effects, respectively, across all individuals, and then returns the average total effect (ATE), average direct effect (ADE), and average causal mediation effect (ACME) as its final outcomes with the equation that ATE = ADE + ACME. Since we are here especially interested in the roles of the microbiome as a mediator, ACME is the main analytic outcome. Further, the Imai papers35,36 introduce both parametric and non-parametric approaches to calculate the P-value and confidence interval, and the R package, mediation 4.5.0 (https://cran.r-project.org/web/packages/mediation), supports both of them. Yet, we employed the semi-parametric approach based on a bootstrap method45 for robust statistical inferences due to the possible high skewness and zero-inflation of the microbiome data especially for lower-level taxa (e.g., genera or species)46.
To be more detailed, the Imai method35,36 first generates bootstrap resamples through random sampling with replacement (say, there are B resamples). Then, for each resample, the Imai method35,36 fits the mediator model \({\widehat{f}}_{{M}_{b}}({M}_{i}\)| \({T}_{i}\)) non-parametrically based on the linear regression model using the method of least squares and the outcome model \({\widehat{f}}_{{Y}_{b}}({Y}_{i}\)| \({T}_{i}, {M}_{i}\)) based on the Weibull regression model; and then calculates (i) the total treatment effect Eq. (4) as \({\widehat{\tau }}_{bi}\) = \({\widehat{f}}_{{Y}_{b}}({Y}_{i}\)| \(1, {\widehat{f}}_{{M}_{b}}({M}_{i}\)|1)) − \({\widehat{f}}_{{Y}_{b}}({Y}_{i}\)| \(0, {\widehat{f}}_{{M}_{b}}({M}_{i}\)|0)); (ii) the direct effect Eq. (5) as \({\widehat{\zeta }}_{bi}(t)\) = \({\widehat{f}}_{{Y}_{b}}({Y}_{i}\)| \(1, {\widehat{f}}_{{M}_{b}}({M}_{i}\)| \(t\))) − \({\widehat{f}}_{{Y}_{b}}({Y}_{i}\)| \(0, {\widehat{f}}_{{M}_{b}}({M}_{i}\)| \(t\))); and (iii) the indirect effect Eq. (6) as \({\widehat{\delta }}_{bi}(t)\) = \({\widehat{f}}_{{Y}_{b}}({Y}_{i}\)| \(t, {\widehat{f}}_{{M}_{b}}({M}_{i}\)|1)) − \({\widehat{f}}_{{Y}_{b}}({Y}_{i}\)| \(t, {\widehat{f}}_{{M}_{b}}({M}_{i}\)|0)) for each individual, and also their average effects \({\text{ATE}}_{b}\), \({\text{ADE}}_{b}\), and \({\text{ACME}}_{b}\) for b = 1, …, B. Then, a breadth of statistical inferences is made based on the bootstrap sampling distributions of ATE, ADE and ACME35,36.
Web server architecture
MiMedSurv is written in R 4.1.1 language under the license of general public license 3 (GPL3). Especially, its app interfaces were developed using the shiny 1.7.5 package (https://shiny.rstudio.com). A web server is any computer software as well as its underlying hardware that can interact with an external client through a network protocol. MiMedSurv is based on a typical client–server architecture, for which we employed ShinyProxy 2.6.1 (https://shinyproxy.io), and Apache 2 (https://httpd.apache. org) to deploy it to the web, and a computing device with the specifications of Intel i9-1290 (16-core) processor (Intel, Santa Clara, CA, USA) and 64 GB DDR4 memory (Samsung, Seoul, South Korea). The web server is freely available at http://mimedsurv.micloud.kr not requiring users to log in or register. In case that the web server is busy or under repair, we developed a GitHub repository (https://github.com/yj7599/mimedsurvgit) for users to run MiMedSurv using their local computer. On the GitHub repository, users can also see related instructions, references, prerequisites, and troubleshooting tips.
Results
Example microbiome data
To ease the illustration of MiMedSurv, we employed example microbiome data to survey the mediating roles of the gut microbiome between antibiotic treatment and T1D onset11. T1D is an autoimmune disease, in which the human immune system is overly active and hence attacks even non-pathogenic normal cells. Unfortunately, T1D is increasing in incidence while decreasing in its onset age world-wide. Zhang et al.11 revealed that early-life antibiotic (known as tylosin) administration can accelerate T1D development through the gut microbiome dysbiosis. For illustration purposes, we reanalyze the data using (i) the antibiotic (tylosin) treatment as a treatment variable, (ii) the gut microbiome as a mediator, (iii) the time-to-T1D as a survival variable, and (iv) sex and elapse time after the antibiotic (tylosin) treatment as two covariate variables (Fig. 2). We expected sex and the elapse time after the antibiotic (tylosin) treatment to be potential confounders between the antibiotic (tylosin) treatment and the onset of T1D. Especially, as the elapse time increases, the effect of the antibiotic on gut microbiome can decrease and the elapse time can also be linked to the onset of T1D. Here, the time-to-T1D survival variable consists of typical right-censored survival responses with two components of (i) follow-up time (in week) and (ii) censored/event indicator with 0 for censored (no event) T1D free and 1 for event that is T1D onset.
The 16S ribosomal RNA gene amplicon sequence data are publicly available in the European Nucleotide Archive (ENA) (https://www.ebi.ac.uk/ena/browser/view/PRJEB14696, accession number: PRJEB14696). We processed them using a bioinformatic pipeline, QIIME 2 (https://qiime2.org)47, and a database, Greengenes (https://greengenes.secondgenome.com), to construct a feature table, taxonomic annotations, and a phylogenetic tree. We stored the final processed data as example data in the Data Input module of MiMedSurv for users to easily confirm data formatting requirements.
In the following sections, we describe all the preprocessing and analytic modules of MiMedSurv step-by-step using this example microbiome data (see Application Note).
Data processing: data input
Users first need to upload their microbiome data with (i) four components: a feature (i.e., operational taxonomic unit (OTU) or amplicon sequence variants (ASV)) table, a taxonomic table, a phylogenetic tree, and metadata or (ii) three components: a feature (i.e., OTU or ASV) table, a taxonomic table, and metadata. Here, the taxonomic table must contain seven taxonomic ranks, kingdom, phylum, class, order, family and species, while the phylogenetic tree much be a rooted tree that reflects evolutionary relationships across features (i.e., OTUs or ASVs). If users upload their microbiome dataset with three components without a phylogenetic tree, only non-phylogenetic community-level (alpha- and beta-diversity) analyses will later be performed.
Users can start with downloading the example microbiome data from the Example Data section. The data are stored in a widely used unified format, called phyloseq48, that can efficiently combine all essential microbiome data components. Alternatively, users can also employ them all individually. In this module, we also described all the instructions with example codes to check compatible data formats so that users can prepare microbiome data easily.
Application Note: We uploaded the example microbiome data using the Browse and Upload buttons.
Data processing: quality control
As in MiMed20, users can perform quality controls with respect to (i) a microbial kingdom of interest (default: Bacteria), (ii) a minimum library size (i.e., total read count) for the study subjects to be retained (default: 3000), (iii) a minimum mean proportion for the features (i.e., OTUs or ASVs) to be retained (default: 0.02%), and (iv) errors in taxonomic names to be removed. For reference, users can download the resulting microbiome data after quality controls.
Application note: We performed quality controls using the default settings and simply clicking the Run button. Then, we rescued 519 study subjects for 224 features, 7 phyla, 12 classes, 15 orders, 19 families, 25 genera, and 9 species (Fig. 3). We can see that the library sizes (i.e., total read counts) vary dramatically by study subjects, and the mean proportions are highly skewed to the left (Fig. 3).
Data processing: data transformation
As in MiMed20, for the community-level analysis, users can compute nine alpha diversity indices (i.e., Observed, Shannon49, Simpson50, Inverse Simpson50, Fisher51, Chao152, abundance-based coverage estimator (ACE)53, incidence-based coverage estimator (ICE)54, phylogenetic diversity (PD)55) and five beta diversity indices (i.e., Jaccard dissimilarity56, Bray–Curtis dissimilarity57, Unweighted UniFrac distance58, Generalized UniFrac distance59, Weighted UniFrac distance60). For the taxonomy-level analysis, users can normalize the data using the widely used centered log-ratio (CLR) transformation method61 to relax the compositional constraint of the data, yet three other normalization methods of arcsine-root, rarefied count62 and proportion are also available. For reference, users can download all the resulting alpha and beta diversity indices and normalized taxonomic data.
Application Note: We computed all the alpha and beta diversity indices and normalized taxonomic data simply clicking the Run button.
Data analysis: non-mediational analysis
In this module, users can perform some baseline exploratory survival analysis to survey the disparity in survival time between medical treatments (e.g., treatment vs. placebo, new treatment vs. old treatment, and so forth) or environmental exposures (e.g., rural vs. urban, smoking vs. non-smoking, calorie restriction vs. ad libitum diet, and so forth). For this, users need to select (i) a survival time variable, (ii) a censored/event indicator variable, (iii) a treatment variable, (iv) covariate(s), and (v) an analytic method as we described in the previous section, Materials and Methods: Non-Mediational Analysis.
Application note: We selected (i) ‘T1Dweek’ as a survival time variable, (ii) ‘T1D’ as a censored/event indicator variable, (iii) ‘Antibiotic’ as a treatment variable, (iv) ‘Sex’ and ‘SampleTime’ as covariates, and (v) the Cox model34 as an analytic method. Then, we found a significant disparity in survival (against T1D) rate between the normal control and antibiotic (tylosin) groups adjusting for sex and elapse time after the antibiotic (tylosin) treatment (P-value: < 0.001) (Fig. 4). We also estimated that the hazard rate (toward T1D) is higher for the antibiotic (tylosin) group than the normal control group (hazard ratio (HR): 1.758 > 1), indicating that the antibiotic (tylosin) treatment is harmful accelerating T1D onset (Fig. 4).
Data analysis: mediational analysis
In the previous section, Results: Data Analysis: Non-Mediational Analysis, we observed a significant disparity in survival rate between the normal control and antibiotic (tylosin) groups adjusting for sex and elapse time after the antibiotic (tylosin) treatment. In the following sections, we seek to comprehend if the gut microbiome plays a mediating role between the antibiotic (tylosin) treatment and T1D onset with respect to (i) alpha diversity, (ii) beta diversity and (iii) microbial taxa at different hierarchies (i.e., phyla, classes, orders, families, and genera).
Community-level analysis: alpha diversity
In this module, users can perform the microbiome causal mediation analysis to test jointly if the treatment alters the microbial alpha diversity, and then the altered microbial alpha diversity, in turn, influences the survival responses. For this, users need to select (i) a survival time variable, (ii) a censored/event indicator variable, (iii) a treatment variable, and (iv) covariate(s) for covariate-adjusted analysis or not for univariate analysis. The available analytic method here is the Imai method35,36 coupled with the Weibull regression model37 as we described in the previous section, Materials and Methods: Mediational Analysis.
Application note: We selected (i) ‘T1Dweek’ as a survival time variable, (ii) ‘T1D’ as a censored/event indicator variable, (iii) ‘Antibiotic’ as a treatment variable, (iv) ‘Sex’ and ‘SampleTime’ as covariates, and (v) the Imai method35,36 as an analytic method. Then, we found a significant mediation effect of microbial alpha diversity with respect to the Simpson index50 between the antibiotic (tylosin) treatment and T1D onset adjusting for sex and elapse time after the antibiotic (tylosin) treatment (P-value: 0.045 < 0.05) [Fig. 5], yet all the other alpha diversity indices are not statistically significant. We also estimated that as the Simpson index50 level decreases, the T1D onset tends to be accelerated (Est. − 1.720 < 0), indicating a lower Simpson index level is harmful to T1D [Fig. 5].
Community-level analysis: beta diversity
In this module, users can perform the microbiome causal mediation analysis to test jointly if the treatment alters the microbial beta diversity, and then the altered microbial beta diversity, in turn, influences the survival responses. For this, users need to select (i) a survival time variable, (ii) a censored/event indicator variable, (iii) a treatment variable, and (iv) covariate(s) for covariate-adjusted analysis or not for univariate analysis. The available analytic method here is DACT41 coupled with MiRKAT38,39 for the treatment model and MiRKAT-S39,40 for the outcome model as we described in the previous section, Materials and Methods: Mediational Analysis.
Application Note: We selected (i) ‘T1Dweek’ as a survival time variable, (ii) ‘T1D’ as a censored/event indicator variable, (iii) ‘Antibiotic’ as a treatment variable, (iv) ‘Sex’ and ‘SampleTime’ as covariates, and (v) DACT41 as an analytic method. Then, we could not find any significant mediation effect of microbial beta diversity between the antibiotic (tylosin) treatment and T1D onset adjusting for sex and elapse time after the antibiotic (tylosin) treatment [Fig. 6].
Taxonomy-level analysis
In this module, users can perform the microbiome causal mediation analysis to test jointly if the treatment alters microbial taxa, and then the altered microbial taxa, in turn, influence the survival responses. For this, users need to select (i) a data format (default: CLR61), (ii) a survival time variable, (iii) a censored/event indicator variable, (iv) a treatment variable, and (v) covariate(s) for covariate-adjusted analysis or not for univariate analysis, and (vi) the taxonomic ranks to be analyzed from phylum to genus for 16S ribosomal RNA gene amplicon sequencing1,2 or from phylum to species for shotgun metagenomics3. The available analytic method here is the Imai method35,36 coupled with the Weibull regression model37 as we described in the previous section, Materials and Methods: Mediational Analysis.
Application note: We selected (i) ‘CLR’61 as a data format, (ii) ‘T1Dweek’ as a survival time variable, (iii) ‘T1D’ as a censored/event indicator variable, (iv) ‘Antibiotic’ as a treatment variable, (v) ‘Sex’ and ‘SampleTime’ as covariates, (vi) ‘Phylum-Genus’ as taxonomic ranks to be analyzed, and (vii) the Imai method35,36 as an analytic method. Then, we found a significant mediation effect of the two phyla (Antinobacteria and Verrucomicrobia), three classes (Erysipelotrichi, Actinobacteria, and Verrucomicrobiae), three orders (Erysipelotrichales, Bifidobacteriales, and Verrucomicrobiales), three families (Erysipelotrichaceae, Bifidobacteriaceae, and Verrucomicrobiaceae), and three genera (Allobaculum, Bifidobacterium, and Akkermansia) between the antibiotic (tylosin) treatment and T1D onset adjusting for sex and elapse time after the antibiotic (tylosin) treatment (Q-value: < 0.001) (Fig. 7). We also estimated that as their relative abundance level decreases, the T1D onset tends to be accelerated (Est. < 0), indicating that they might be beneficial microbes to prevent T1D onset (Fig. 7).
Discussion
In this paper, we introduced a unified cloud computing platform, MiMedSurv, for comprehensive microbiome causal mediation analysis with survival responses. We described that MiMedSurv is well-distinguished with unique analytic procedures to deal with microbiome data with survival responses. That is, MiMedSurv conducts some baseline exploratory non-mediational survival analysis, not involving microbiome, for an initial check-up to survey the disparity in survival time between medical treatments / environmental or behavioral exposures (e.g., treatment vs. placebo, new treatment vs. old treatment, rural vs. urban, smoking vs. non-smoking, calorie restriction vs. ad libitum diet, and so forth) based on the Kaplan–Meier analysis31 or the Cox proportional hazards model34. Then, MiMedSurv identifies elaboratively the mediating roles of the microbiome in various spheres: (i) as a microbial ecosystem using alpha diversity indices based on the Imai method35,36 and using beta diversity indices based on DACT41 and (ii) as individual microbial taxa in different hierarchies (e.g., phyla, classes, orders, families, genera, species) based on the Imai method35,36. Besides, the covariate-adjusted analysis is also supported for both non-mediational and mediational analysis to enhance the causation of the results, which is crucial especially for observational studies. Moreover, MiMedSurv is of nice graphical user interfaces and automatically organizes all related references for user convenience. Overall, it is user-friendly; as such, even non-professional programmers (e.g., microbiologists, medical doctors, public health practitioners) can easily perform microbiome causal mediation analysis with survival responses.
We illustrated the use of MiMedSurv step-by-step through an example mediation analysis to see if the gut microbiome mediates the effect of antibiotic treatment to T1D onset11. We also uploaded the example data in the Data Input module of MiMedSurv to easily confirm compatible data formats. Users can also easily catch up with all related instructions, references, prerequisites, and troubleshooting tips from our GitHub page (https://github.com/yj7599/mimedsurvgit).
The mediation analysis has recently been spotlighted as a practical and powerful analytic tool to survey the causal roles of the microbiome as a mediator to explain the observed relationships between a medical treatment/environmental exposure and a human disease. In clinical studies, researchers often trace disease progression sequentially in time; hence, survival data are also one of the most common types of the data. Therefore, MiMedSurv can be widely used as a practical and powerful analytic tool for many researchers in various disciplines (e.g., microbiology, medical science, public health).
However, MiMedSurv has limitations as follows. First, its underlying statistical methods, such as the Kaplan–Meier analysis31, Cox proportional hazards model34, Imai method35,36, MiRKAT38,39, MiRKAT-S39,40 and DACT41, are all model-based methods; as such, any possible model misspecifications can lead to spurious outcomes. This means that all the treatment, mediator, response, and covariate variables need to be correctly specified by users. Second, MiMedSurv performs downstream data analysis for many alpha and beta diversity indices and data normalization methods; as such, users can encounter many inconsistent outcomes and interpretations across different diversity indices or data normalization methods. It was challenging to resolve such an issue because there is no consensus on which diversity index or normalization method is the best in the microbiome research community. Finally, MiMedSurv does not perform gene- or strain-level analyses. We could not incorporate such analytic approaches because of the different data structures and specialized requirements (e.g., software architecture, analytic protocols). We could not satisfy all the demands, and this is the reason why further development is needed.
Data availability
The 16S ribosomal RNA gene amplicon sequence data are publicly available in the European Nucleotide Archive (ENA) (https://www.ebi.ac.uk/ena/browser/view/PRJEB14696, accession number: PRJEB14696). The final processed data are available as example data in the Data Input module of MiMedSurv (http://mimedsurv.micloud.kr).
References
Woese, C. R. & Fox, G. E. Phylogenetic structure of the prokaryotic domain: The primary kingdoms. Proc. Natl. Acad. Sci. 74(11), 5088–5090. https://doi.org/10.1073/pnas.74.11.5088 (1977).
Woese, C. R., Kandler, O. & Wheelis, M. L. Towards a natural system of organisms: Proposal for the domains archaea, bacteria, and Eucarya. Proc. Natl. Acad. Sci. 87(12), 4576–4579. https://doi.org/10.1073/pnas.87.12.4576 (1990).
Thomas, T., Gilbert, J. & Meyer, F. Metagenomics: A guide from sampling to data analysis. Microb. Inform. Exp. 2(1), 3. https://doi.org/10.1186/2042-5783-2-3 (2012).
Yanai, H. et al. Short-term periodic restricted feeding elicits metabolome-microbiome signatures with sex dimorphic persistence in primate intervention. Nat. Commun. 15, 1088. https://doi.org/10.1038/s41467-024-45359-z (2024).
Oduaran, O. H. et al. Gut microbiome profiling of a rural and urban South African cohort reveals biomarkers of a population in lifestyle transition. BMC Microbio. 20(1), 330. https://doi.org/10.1186/s12866-020-02017-w (2020).
Park, B. et al. The mediating roles of the oral microbiome in saliva and subgingival sites between E-cigarette smoking and gingival inflammation. BMC Microbiol. 23(1), 35. https://doi.org/10.1186/s12866-023-02779-z (2023).
Fettweis, J. M. et al. The vaginal microbiome and preterm birth. Nat. Med. 25(6), 1012–1021. https://doi.org/10.1038/s41591-019-0450-2 (2019).
Reyman, M. et al. Impact of delivery mode-associated gut microbiota dynamics on health in the first year of life. Nat. Commun. 10, 4997. https://doi.org/10.1038/s41467-019-13014-7 (2019).
Zhang, C. et al. The effects of delivery mode on the gut microbiota and Health: State of Art. Front. Microbial. 12, 724449. https://doi.org/10.3389/fmicb.2021.724449 (2021).
Hemarajata, P. & Versalovic, J. Effects of probiotics on gut microbiota: Mechanisms of intestinal immunomodulation and neuromodulation. Therap. Adv. Gastroenterol. 6(1), 39–51. https://doi.org/10.1177/1756283X12459294 (2013).
Zhang, X. S. et al. Antibiotic-induced acceleration of type 1 diabetes alters maturation of innate intestinal immunity. Elife 7, e37816. https://doi.org/10.7554/eLife.37816 (2018).
Turnbaugh, P. J. et al. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444(7122), 1027–1031. https://doi.org/10.1038/nature05414 (2006).
Ridaura, V. K. et al. Gut microbiota from twins discordant for obesity modulate metabolism in mice. Science 341(6150), 1241214. https://doi.org/10.1126/science.1241214 (2013).
De Palma, G. et al. Transplantation of fecal microbiota from patients with irritable bowel syndrome alters gut function and behavior in recipient mice. Sci. Transl. Med. 9(379), eaaf6397. https://doi.org/10.1126/scitranslmed.aaf6397 (2017).
Johnsen, P. H. et al. Faecal microbiota transplantation versus placebo for moderate-to-severe irritable bowel syndrome: A double-blind, randomised, placebo-controlled, parallel-group, single-centre trial. Lancet Gastroenterol. Hepatol. 3(1), 17–24. https://doi.org/10.1016/S2468-1253(17)30338-2 (2018).
Gopalakrishnan, V. et al. Gut microbiome modulates response to anti–PD-1 immunotherapy in melanoma patients. Science 359(6371), 97–103. https://doi.org/10.1126/science.aan4236 (2018).
Matson, V. et al. The commensal microbiome is associated with anti–PD-1 efficacy in metastatic melanoma patients. Science 359(6371), 104–108. https://doi.org/10.1126/science.aao3290 (2018).
Livanos, A. E. et al. Antibiotic-mediated gut microbiome perturbation accelerates development of type 1 diabetes in mice. Nat. Microbiol. 1(11), 1–13. https://doi.org/10.1038/nmicrobiol.2016.140 (2016).
Sampson, T. R. et al. Gut microbiota regulate motor deficits and neuroinflammation in a model of parkinson’s disease. Cell 167(6), 1469–1480. https://doi.org/10.1016/j.cell.2016.11.018 (2016).
Jang, H., Park, S. & Koh, H. Comprehensive microbiome causal mediation analysis using MiMed on user-friendly web interfaces. Biol. Methods Protoc. 8(1), bpad023. https://doi.org/10.1093/biomethods/bpad023 (2023).
Gu, W. et al. MiCloud: A unified web platform for comprehensive microbiome data analysis. PLoS One 17(8), e0272354. https://doi.org/10.1371/journal.pone.0272354 (2022).
Jang, H., Koh, H., Gu, W. & Kang, B. Integrative web cloud computing and analytics using MiPair for design-based comparative analysis with paired microbiome data. Sci. Rep. 12(20465), 1–14. https://doi.org/10.1038/s41598-022-25093-6 (2022).
Gu, W., Koh, H., Jang, H., Lee, B. & Kang, B. MiSurv: An integrative web cloud platform for user-friendly microbiome data analysis with survival responses. Microbiol. Spectr. 11(3), e05059-e5122. https://doi.org/10.1128/spectrum.05059-22 (2023).
Kim, J. & Koh, H. MiTree: A unified web cloud analytic platform for user-friendly and interpretable microbiome data mining using tree-based methods. Microorganisms 11(11), 2816. https://doi.org/10.3390/microorganisms11112816 (2023).
Kim, J., Jang, H. & Koh, H. MiMultiCat: A unified cloud platform for the analysis of microbiome data with multi-categorical responses. Bioengineering 11(1), 60. https://doi.org/10.3390/bioengineering11010060 (2024).
Sohn, M. B. & Li, H. Compositional mediation analysis for microbiome studies. Ann. Appl. Stat. 13(1), 661–681. https://doi.org/10.1214/18-AOAS1210 (2019).
Sohn, M. B., Lu, J. & Li, H. A compositional mediation model for a binary outcome: Application to microbiome studies. Bioinformatics 38(1), 16–21. https://doi.org/10.1093/bioinformatics/btab605 (2022).
Wang, C., Hu, J., Blaser, M. J. & Li, H. Estimating and testing the microbial causal mediation effect with high-dimensional and compositional microbiome data. Bioinformatics 36(2), 347–355. https://doi.org/10.1093/bioinformatics/btz565 (2020).
Zhang, H. et al. Mediation effect selection in high-dimensional and compositional microbiome data. Stat. Med. 40(4), 885–896. https://doi.org/10.1002/sim.8808 (2021).
Wang, C. et al. A microbial causal mediation analytic tool for health disparity and applications in body mass index. Microbiome https://doi.org/10.1186/s40168-023-01608-9 (2023).
Kaplan, E. L. & Meier, P. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53(282), 457–481. https://doi.org/10.2307/2281868 (1958).
Mantel, N. & Haenszel, W. Statistical aspects of the analysis of data from retrospective studies of disease. J. Natl. Cancer. Inst. 22(4), 719–748. https://doi.org/10.1093/jnci/22.4.719 (1959).
Peto, R. & Peto, J. Asymptotically efficient rank invariant test procedures. J. R. Stat. Soc. Ser. A. 135(2), 185–207. https://doi.org/10.2307/2344317 (1972).
Cox, D. R. Regression models and life-tables. J. R. Stat. Soc. Ser. B. 34(2), 187–220. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x (1972).
Imai, K., Keele, L. & Tingley, D. A general approach to causal mediation analysis. Psychol. Methods 15(4), 309–334. https://doi.org/10.1037/a0020761 (2010).
Imai, K., Keele, L. & Tingley, D. Identification, inference and sensitivity analysis for causal mediation effects. Stat. Sci. 25(1), 51–71. https://doi.org/10.1214/10-STS321 (2010).
Carroll, K. J. On the use and utility of the Weibull model in the analysis of survival data. Control. Clin. Trials. 24(6), 682–701. https://doi.org/10.1016/s0197-2456(03)00072-2 (2003).
Zhao, N. et al. Testing in microbiome-profiling studies with MiRKAT, the microbiome regression-based kernel association test. Am. J. Hum. Genet. 96(5), 797–807. https://doi.org/10.1016/j.ajhg.2015.04.003 (2015).
Wilson, N. et al. MiRKAT: Kernel machine regression-based global association tests for the microbiome. Bioinformatics 37(11), 1595–1597. https://doi.org/10.1093/bioinformatics/btaa951 (2021).
Plantinga, A. et al. MiRKAT-S: A community-level test of association between the microbiota and survival times. Microbiome 5(17), 17. https://doi.org/10.1186/s40168-017-0239-9 (2017).
Liu, Z. et al. Large-scale hypothesis testing for causal mediation effects with applications in genome-wide epigenetic studies. J. Am. Stat. Assoc. 117(437), 67–81. https://doi.org/10.1080/01621459.2021.1914634 (2022).
Torgerson, W. S. Multidimensional scaling: I. Theory and method. Psychometrika 17, 401–419. https://doi.org/10.1007/BF02288916 (1952).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. 57, 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x (1995).
Rubin, D. B. Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66, 688–701 (1974).
Efron, B. Bootstrap methods: Another look at the Jackknife. Ann. Stat. 7(1), 1–26 (1979).
Li, H. Microbiome, metagenomics, and high-dimensional compositional data analysis. Ann. Rev. Stat. Appl. 2, 73–94. https://doi.org/10.1146/annurev-statistics-010814-020351 (2015).
Bolyen, E. et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 37(8), 852–857. https://doi.org/10.1038/s41587-019-0209-9 (2019).
McMurdie, P. J. & Holmes, S. Phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data. PloS One 8(4), e61217. https://doi.org/10.1371/journal.pone.0061217 (2013).
Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x (1948).
Simpson, E. H. Measurement of diversity. Nature 163, 688. https://doi.org/10.1038/163688a0 (1949).
Fisher, R. A., Corbet, A. S. & Williams, C. B. The relation between the number of species and the number of individuals in a random sample of an animal population. J. Anim. Ecol. 12(1), 42–58. https://doi.org/10.2307/1411 (1943).
Chao, A. Nonparametric estimation of the number of classes in a population. Scand. J. Stat. 11(4), 265–270 (1984).
Chao, A. & Lee, S. M. Estimating the number of classes via sample coverage. J. Am. Stat. Assoc. 87(417), 210–217. https://doi.org/10.1080/01621459.1992.10475194 (1992).
Lee, S. M. & Chao, A. Estimating population size via sample coverage for closed capture-recapture models. Biometrics 50, 88–97. https://doi.org/10.2307/2533199 (1994).
Faith, D. P. Conservation evaluation and phylogenetic diversity. Biol. Conserv. 61(1), 1–10. https://doi.org/10.1016/0006-3207(92)91201-3 (1992).
Jaccard, P. The distribution of the flora in the alpine zone. New Phytol. 11(2), 37–50. https://doi.org/10.1111/j.1469-8137.1912.tb05611.x (1912).
Bray, J. R. & Curtis, J. T. An ordination of the upland forest communities of Southern Wisconsin. Ecol. Monogr. 27(4), 326–349. https://doi.org/10.2307/1942268 (1957).
Lozupone, C. & Knight, R. UniFrac: a new phylogenetic method for comparing microbial communities. Appl. Environ. Microbiol. 71(12), 8228–8235. https://doi.org/10.1128/AEM.71.12.8228-8235.2005 (2005).
Chen, J. et al. Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics 28(16), 2106–2113. https://doi.org/10.1093/bioinformatics/bts342 (2012).
Lozupone, C. A., Hamady, M., Kelley, S. T. & Knight, R. Quantitative and qualitative β diversity measures lead to different insights into factors that structure microbial communities. Appl. Environ. Microbiol. 73(5), 1576–1585. https://doi.org/10.1128/AEM.01996-06 (2007).
Aitchison, J. The statistical analysis of compositional data. J. R. Stat. Soc. Ser. B Methodol 44(2), 139–160. https://doi.org/10.1111/j.2517-6161.1982.tb01195.x (1982).
Sanders, H. L. Marine benthic diversity: A comparison study. Am. Nat. 102, 243–282 (1968).
Acknowledgements
The authors are grateful for anonymous reviewers for their careful observations and insightful comments.
Funding
The research was funded by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (2021R1C1C1013861).
Author information
Authors and Affiliations
Contributions
H.J. wrote the programs, developed the web server and GitHub repository, performed data analysis and contributed to visualization and design. H.K. conceptualized and initiated the study, curated the data, contributed to the methodological aspects, overall architecture and wrote the manuscript. All authors have read and agreed to the submission of the manuscript. H.J. and H.K. contributed equally to this work as first authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Jang, H., Koh, H. A unified web cloud computing platform MiMedSurv for microbiome causal mediation analysis with survival responses. Sci Rep 14, 20650 (2024). https://doi.org/10.1038/s41598-024-71852-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-71852-y