Series Editors’ Note

When we do a clinical trial in which we randomize for one variable, say adding pretransplant anti-thymocyte globulin (ATG), and we see a benefit, say less graft-versus-host disease (GvHD), most people assume receiving ATG caused the benefit. This reasoning, termed causal inference, is common but wrong. Reasons why is described in the accompanying typescript. What we observe is an association or correlation between ATG and less GvHD, not necessarily the cause. This incorrect reasoning is referred to as the association-causation fallacy. A good example is the correlation between US per capita cheese consumption and deaths by strangulation from bedsheets with a Pearson correlation coefficient of 0.95 (see below). This and other problems of human cognition can be found in Thinking, Fast and Slow by Daniel Kahneman.

How can we reconcile this discordance between the goal of the clinical trialist who wants to know why GvHD is decreased and the rigor of the statistician? In the following typescript Zheng and colleagues describe the difference between causality and association. They describe statistical methods by which we can plausibly infer causality to results of a randomized clinical trial. We hope this typescript and others will prompt a dialogue between readers and statisticians interested in analyses of data from clinical trials of haematopoietic cell transplants. We welcome comments at #BMTStats.

Introduction

Correct interpretation of statistical data requires caution in implying causality [1,2,3], a caution contrasting with the purpose of most clinical trials whose major objective is the opposite, to assign causality. A typical example showing that association does not imply causation is given in Fig 1. How can we reconcile these opposing considerations? In this brief review we provide basic definitions of causal inference and discuss why treatment effect, a common clinical trial endpoint after an intervention, should not be interpreted as implying causation. We provide a concise guide on how to conduct statistical analyses to obtain results where causal interpretation may be reasonable. We also introduce classical causal methods for randomized trials and discuss methods to use covariate information to improve efficiency and methods to deal with non-compliance. Lastly, we introduce recent advanced research in causal inference of survival effects including methods for time-varying treatment experiments and high-dimensional covariate information.

Fig. 1
figure 1

Per capita consumption of cheese US. Number of people who died by becoming tangled in their bedsheets

Causal inference

We begin with some basic concepts in causal inference illustrating why it is wrong to draw conclusions regarding causality using treatment effect from simple group comparisons. Under the stable unit treatment value assumption (SUTVA) causal effects (or causal estimandFootnote 1) are defined based on the comparison of certain functionals of the distribution of potential outcomes after two different actions (treatment or control) made on the same object or group of objects (for example, subjects in a clinical trial) [4, 5]. In the survival setting we are interested in quantities such as the potential survival function, cumulative hazard function (for example, cumulative incidence of relapse), restricted mean survival time (RMST) [6] and/or residual life-time [6,7,8,9,10,11]. Because treatment effect is obtained by comparing two groups it does not convey causal information. Obviously, one can never observe both potential outcomes since only one action such as treatment or placebo can be taken in each subject. This exclusivity is called the fundamental problem of causal inference [1, 12]. To identify causal effects considering the unavoidably missing data requires assumptions regarding the assignment mechanism of treatment and control. These different assumptions require methods other than treatment effect if one wants to accurately estimate causality. We expand on this point below.

Workflow for causal inference analyses

To obtain causal interpretation, we need to define the causal estimand through potential outcome framework (introduced in section 'Causal inference') and figure out a way to find an estimator (a functional of observed data) to identify this causal estimand. The choice of the estimator depends on the type of data we have: (1) whether there is covariate information and whether the covariate is balanced (2) whether there is non-compliance issue; and (3) whether the treatment is at one time-point or time-varying treatment. If we have one time-point treatment and the data is from a perfect randomized trial with no non-compliance, the methods from section 'Randomization methods' can be used to to make causal conclusions. However, if we also have co-variate information available and some of the co-variates are unbalanced, we can consider methods from section 'Improving efficiency with co-variate balance and adjustment to gain efficiency'. Next, we need to consider if there is non-compliance in the trial. If so, it may be necessary to use methods we discuss in section 'Non-compliance'. When there is time-varying treatment or high-dimensional covariates methods we discuss in section 'Advanced topics' should be used.

Randomization methods

Therapy-assignment conforming to individualistic, probabilistic and un-confounded assumptions is defined as a classic randomized experiment [12]. If one further assumes a constant effect we can use the Fisher exact P-value method under random censoring [13]. However, if we are only interested in the average causal effect (ACE) we do not need the strong constant effect assumption implied in the Fisher exact P-value and can use the Neyman approach by subtracting the average of treated group with the average of untreated group [12, 14]. Inverse probability of censoring weighting can be used to deal with censoring [15]. We also need to consider that in a classic randomized trial the Neyman approach provides a consistent estimator for ACE and coincides with the simple group comparison. However, this approach ignores potentially important covariate data. We discuss how we can improve on this next.

Improving efficiency with covariate balance and adjustment

Analysis of a randomized clinical trial with covariate information (for example, age, sex etc.) can be improved by regression adjustment and model-based imputation methods [12]. A key point in using these methods is when there are interaction terms they must be added to get an unbiased estimator for the super population ACE. If the model form is non-collapsible (for example, a Cox model), proper integration over covariate distribution is needed to compute the correct ACE [8, 16]. When sample size is small and numbers of covariates large (often so in haematopoietic cell transplant trials), propensity score (the probability of a unit to be assigned treatment given all covariates) can be used to reduce finite sample bias and increase efficiency [17]. Several propensity score-based methods are available including propensity score matching for sub-classification [18,19,20,21,22], propensity score adjustment [23], trimming based on propensity score [24] and variants combining several techniques [15]. In practice, the propensity score is unknown and is commonly fitted from a logistic regression model. To deal with issues from potential model misspecification, the multiple robust estimator [15] can be used to analyze the data at the price of potential loss of efficiency.

Non-compliance

Non-compliance is common in clinical trials and makes implying causal inference even more difficult. A common practice is to analyze treatment effect by intent-to-treat ignoring compliance such that randomization assumptions still operate. In this way methods in sections 'Workflow for causal inference analyses' and 'Randomization methods' are all valid to estimate the ACE of the intent-to-treat effect. This approach is obviously different from the ACE of the real treatment because it includes subjects not receiving the assigned treatment but analyzed as if they had. A less valid approach to estimating treatment effect is to analyze only data from subjects assigned to and receiving the therapy. However, this approach violates the randomization assumptions. Consequently, methods we describe in sections 'Workflow for causal inference analyses' and 'Randomization methods' cannot lead to an unbiased estimator for ACE.

A proposed solution to this problem is the principal stratification method [25]. Subjects are classified into four latent groups based on their potential compliance state under different treatment assignments: (1) complier; (2) always taker; (3) never taker; and (4) defier. A subject’s group attribute can only be partially identified directly from his observed compliance state (for example, a subject who complied to the treatment is either a complier or an always taker). Treatment efficacy is usually considered as the ACE in the complier group, identifiable based on different combination of assumptions such as exclusion criteria, monotonicity and/or parametric model assumption [26,27,28]. Or we can obtain a bound for the causal effect with weaker assumptions [29,30,31]. Another way to handle non-compliance is to consider compliance state as a mediator and use an instrumental variable approach to handle potential un-measured confounders between compliance and outcome [32,33,34,35,36]. Another way is to assume sequential ignorability. Under this assumption the true treatment action (whether the subject received the therapy or not) can be analyzed as conditionally randomized and therefore propensity score methods or methods specifically designed for non-compliance issue [37,38,39,40] can be used to estimate causality followed by a sensitivity analyses [41,42,43,44,45] to evaluate robustness of the estimator under assumption violation.

Advanced topics

There is considerable recent research in how to estimate causal inference in survival data analyses. One important direction is how to deal with the time-varying treatment studies. These are studies where the treatment is not a one-time binary choice (for example, a transplant versus chemotherapy) [46] but assigned over time and possibly adjusted based on prior outcomes (for example, giving azacitidine to subjects with a positive posttransplant measurable residual disease [MRD]-test). Cox models using time-dependent covariates face the problem of confounding such as prior therapy treatment and cannot therefore imply causality. The marginal structural model [47,48,49] and the nested structural mean model [50, 51] should be considered for these types of data. Another important direction is how to take advantage of rapidly-increasing availability of covariate data from clinical trials to increase efficiency of ACE estimations. High-dimensional methods using novel machine learning techniques have been developed for this purpose [52,53,54].

Conclusion

In the brief review, we provided the concept of the potential outcome framework and described the complex challenge of inferring causality in survival analyses of randomized clinical trials. We discuss limitations of estimating causality under these conditions and suggest potential statistical techniques to help estimate causality with greater accuracy.