Introduction

Genetic research in the last two decades has taken an enormous flight, and a wealth of genetic data is now available for a wide variety of human phenotypes [1]. Besides providing ever-increasing insight into the genetic etiology of these phenotypes, it may provide an opportunity to study causal relations between these phenotypes as well.

Although causal inference is generally considered the domain of experimental methods like randomized controlled trials (RCT), some nonexperimental methods can be applied to estimate causal relations indirectly [2]. Though less robust, these can be used when RCTs are not a viable option. Mendelian Randomization (MR), a form of instrumental variable analysis that uses genetic variants as instruments to investigate causal relations between phenotypes, is one such method [3]. MR has become very popular in recent years, with thousands of methodological and applied MR studies published to date [4, 5], and with the continued growth of available genetic data this trend will likely persist.

MR relies on strong assumptions however, yielding biased and misleading results if those assumptions fail [6, 7]. Given the widespread popularity of MR, it is therefore imperative that these assumptions are clearly understood by the researchers using it, to allow them to properly evaluate the validity of these assumptions in the context of their own data and analyses [8,9,10].

The aim of this Perspective is to outline the assumptions that are needed to perform MR, what role those assumptions play in the analysis and its interpretation, and what information different elements of input data contribute to the support of these assumptions. Our aim is not to give an exhaustive overview of individual methods, but rather to elucidate the underlying logic of MR in its different forms. As such, we will also abstract away from issues pertaining to estimation, assuming an idealized scenario in which all associations between observed variables are fully known, examining what challenges remain even when estimation uncertainty is entirely eliminated.

Core principle

The aim of an MR analysis is to estimate and test the causal effect of a putative causal phenotype X, the exposure, on another phenotype Y, the outcome. It uses the principles of instrumental variable analysis to do so, with the genotype Gj of a genetic variant j serving as the instrument [8, 11].

To serve as a valid instrument for the causal effect of exposure on outcome, there must be an association between Gj and the exposure. Moreover, it must be the case that any association of Gj with the outcome is mediated by the exposure, as depicted in Fig. 1A. In other words, associations of Gj directly with the outcome, or with a variable C that acts as a confounder of exposure and outcome cannot be present (Fig. 1B). There is no requirement that Gj itself has a causal effect (see also Supplementary Information—Relevance assumption); if variant j is in LD with causal variants that are valid instruments, then Gj is a valid instrumental variable as well (Fig. 1C). For ease of notation however, the graphs used throughout the paper will assume the selected variants used are causal.

Fig. 1: Graphical representation of valid instrument causal scenarios, for a variant j.
figure 1

These causal graphs depict the genetic instrumental variable assumptions on which MR is based, with a genetic variant with genotype \(G_j\) causally affecting the exposure \(X\) which in turn (potentially) causally affects \(Y\), while allowing for the presence of confounders \(C\) of the exposure and outcome. In this and subsequent figures, variables are shown as rectangles or ovals, with ovals denoting that the variable is not (necessarily) observed, and causal effects are indicated using one-sided arrows in the direction of the causal effect, with an accompanying effect size parameter shown next to it. Two-sided arrows denote correlations between variables caused by other variables external to the model. For simplicity of notation throughout the paper, all variables are assumed to be standardized, with mean of zero and unit variance. Shown in (A) is the basic valid instrument scenario, with in (B) the same graph emphasizing the causal paths explicitly ruled out by the independence and exclusion restriction assumptions. The graph in (C) shows an alternative valid instrument scenario where the variant \(j\) used is not causal, but is in LD with another variant \(k\) that is.

If we assume the effect sizes of all associations and causal effects to be constant (i.e., simple linear relations), we can easily see how this can provide the parameter βXY of the causal effect of the exposure on the outcome. Denoting the marginal associations of Gj with exposure and outcome as γXj and γYj respectively, for the assumed scenario in Fig. 1A we can express these as \(\gamma _{Xj} \,=\, \alpha _{Xj}\) and \(\gamma _{Yj} \,=\, \alpha _{Xj}\beta _{XY}\). Because the association γYj between Gj and the outcome is fully mediated by the exposure, it equals the causal effect βXY scaled by the causal effect \(\alpha _{Xj}\) of Gj on the exposure.

Thus, defining the ratio of marginal effects \(\beta _j \,=\, \frac{{\gamma _{Yj}}}{{\gamma _{Xj}}}\), it follows that if variant j is a valid instrument then \(\beta _j \,=\, \frac{{\alpha _{Xj}\beta _{XY}}}{{\alpha _{Xj}}} \,=\, \beta _{XY}\) [11]. In other words, the variant-specific causal effect αXj cancels out in the ratio of the marginal genetic effects, making βj equal to the causal effect parameter βXY for every variant that is a valid instrument. Although not every MR method is explicitly defined in terms of βj, they all ultimately depend on this property. To examine the impact of different causal scenarios, we will thus focus on the functional form βj takes in those scenarios, and whether it still equals βXY.

We can thus obtain βXY using any genetic variant for which the instrumental variable assumptions hold [12], since all such variants provide the same causal parameter. However, the a priori plausibility of these assumptions varies greatly, depending particularly on the exposure being studied, and establishing that the variants used are indeed valid instruments requires further analysis and data. As such it is crucial that active steps are taken to ensure that all assumptions are met, since reliable interpretation of MR results is otherwise impossible.

MR also generally depends on some additional assumptions [8, 13], which are listed in Table 1. Different methods may relax these additional assumptions in various ways so these are not always all required. In the next two sections, we will examine causal scenarios that violate the instrumental variable assumptions, and various strategies to deal with such violations, either by direct modeling and testing or by levering constrained data. Following that we discuss the role of the additional assumptions and what can happen if they do not hold. Throughout, we will use the simplest causal scenario that can illustrate the particular issue being discussed, rather than providing an exhaustive list of such scenarios. Additional discussion and mathematical details for these issues is found in the Supplemental Information. An overview of the main methods referenced is given in Table 2.

Table 1 Instrumental variable and other assumptions relevant for MR.
Table 2 Overview of referenced methods.

Evaluating instrumental variable assumptions

Heterogeneity of causal estimates

One common way in which the exclusion restriction can be violated is by a direct causal effect of the genetic variant on the outcome (Fig. 2A). The reason why this is a problem can be readily discerned when considering how this changes the functional form of the marginal association γYj of the variant with the outcome, which becomes \(\gamma _{Yj} \,=\, \alpha _{Xj}\beta _{XY} \,+\, \alpha _{Yj}\) This means that the ratio parameter βj now equals \(\beta _j \,=\, \frac{{\alpha _{Xj}\beta _{XY} \,+\, \alpha _{Yj}}}{{\alpha _{Xj}}} \,=\, \beta _{XY} \,+\, \frac{{\alpha _{Yj}}}{{\alpha _{Xj}}}\). The same thing happens in a scenario where there is LD between Gj and another variant Gk that has a causal effect on the outcome (Fig. 2B).

Fig. 2: Graphical representation of several violations of instrumental variable assumptions, for a variant j.
figure 2

In (A, B) are two similar violations of the exclusion restriction, with causal effects directly on the outcome either from variant \(j\) itself or from another variant \(k\) in LD with it. C Shows a reverse causation scenario, another violation of the exclusion restriction, with a causal effect of variant \(j\) directly on the outcome, which is then mediated onto the exposure by the causal effect of outcome on exposure. D, E show two mediated confounding scenarios which violate the independence assumption, with the confounder \(C\) mediating the genetic effect of variant \(j\) onto both the exposure and outcome, with in (F) a further variation on (E) with additional direct causal effects of the variant on the outcome.

In other words, βj becomes offset from the value of the true causal effect βXY by a bias term specific to that variant. Although in this case we can no longer directly obtain the causal effect from βj, the way this type of violation manifests itself makes it relatively straightforward to detect. Because this bias term is variant-specific it will tend to differ across (independent) variants, resulting in a heterogeneity of their βj values (see also Supplementary Information—heterogeneity of estimated causal effects). By contrast, for a set of variants that are all valid instruments, their βj will be the same, because as noted above they will all equal the causal effect parameter βXY.

Given this, if we have multiple variants available as potential genetic instruments, an obvious and commonly used way to leverage this is therefore to test for heterogeneity of the βj. Then, if such heterogeneity is found to be present, we can prune away variants from the selection until we retain a subset of variants with homogeneous βj. In this way we can rule out violations of the exclusion restriction of the kind depicted in Fig. 2A, B, and under the assumption that the remaining variants are valid instruments we can use those variants to obtain βXY as before [14,15,16].

An alternative to explicit heterogeneity testing and pruning is to use “robust” models for multivariant MR analysis, which do not require that all variants used for their input are valid instruments (see also Supplementary Information—robust methods). These subdivide into two main types. The first type assumes that only a subset of the variants used are valid instruments, and take either a median- or mode-based approach. Median-based methods only require that more than half of the variants are valid instruments, which guarantees that the median of the βj equals βXY [17]. Mode-based methods make an even weaker assumption, only requiring that the largest subset of variants with homogeneous βj consists of valid instruments, in which case the mode of the βj will equal βXY [18,19,20].

The second type of robust model does not require that any variant is a valid instrument. Instead, it models the marginal association of each variant with the outcome as \(\gamma _{Yj} \,=\, \gamma _{Xj}\beta _{XY} \,+\, \delta _j\) with a heterogeneity term δj, and then makes an assumption about the distribution of these δj. The most prominent example of this second type is the MR-Egger model [21], which is based on the so-called InSIDE (Instrument Strength Independent of Direct Effect) assumption. This assumption states that these δj terms are independent of the marginal associations γXJ of the variant with the exposure, and based on this the MR-Egger model can estimate \(\beta _{XY}\) using essentially a linear regression of \(\gamma _{Yj}\) on γXJ. For valid instruments this assumption is automatically true, since \(\delta _j\) is zero, and for a scenario such as in Fig. 2A it is very plausible as well: in that case, \(\gamma _{Xj} \,=\, \alpha _{Xj}\) and \(\delta _j \,=\, \alpha _{Yj}\), and since \(\alpha _{Xj}\) and \(\alpha _{Yj}\) represent two distinct causal paths that share no mediating variables there is no clear mechanism by which they would become correlated.

Robust methods can thus in principle directly estimate the causal effect from a mixture of valid and invalid instruments, but this requires specific assumptions about the degree or structure of the heterogeneity, which are not directly testable. Even when using such robust methods, it is therefore still imperative that the heterogeneity, and the validity of the assumptions made about it (with specific valid subsets of variants present in the data for median- and mode-based methods, or the independence specified by InSIDE for MR-Egger), are explicitly considered.

Moreover, homogeneity of the \(\beta _j\) does not imply that the instrumental variable assumptions (or the InSIDE assumption) do hold, since there are other causal scenarios that violate the assumptions without resulting in heterogeneity. For the remainder of the paper, we will therefore generally assume that heterogeneity has been dealt with, and focus on scenarios where all variants used correspond to the same homogeneous causal graph, and with \(\beta _j\) equal to the same value \(\beta\).

Reverse causation

The “reverse causation” scenario is illustrated in Fig. 2C, the mirror image of Fig. 1A, with the genetic variant now exerting a direct causal effect on the outcome, which in turn has a causal effect on the exposure. This is also a violation of the exclusion restriction, but unlike in Fig. 2A, B this does not result in heterogeneity. This is because the marginal genetic associations of the variant are \(\gamma _{Xj} \,=\, \alpha _{Yj}\beta _{YX}\) and \(\gamma _{Yj} \,=\, \alpha _{Yj}\), which means that \(\beta \,=\, \frac{{\alpha _{Yj}}}{{\alpha _{Yj}\beta _{YX}}} \,=\, \frac{1}{{\beta _{YX}}}\), the inverse of the causal effect of the outcome on the exposure. As such, the value of \(\beta\) we would get in this scenario is completely different from the \(\beta _{XY}\) we are attempting to estimate, which in this case is simply zero. The InSIDE assumption also does not hold here, since the heterogeneity term \(\delta _j \,=\, \alpha _{Yj}\), meaning that both \(\delta _j\) and \(\gamma _{Xj}\) are dependent on the same parameter \(\alpha _{Yj}\).

When the genetic effect on the outcome is fully mediated by the exposure as in Fig. 1A, it follows that the correlations between the variant and the outcome are weaker than those between the variant and the outcome; unless the exposure fully determines the outcome in which case the correlations are equal. In case of reverse causation, as in Fig. 2C, the opposite is true, with the correlations between variant and exposure being weaker than those between variant and outcome. For Fig. 1A, since in our notation all variables are standardized, the correlations of the variant with the exposure and outcome equal the genetic associations \(\gamma _{Xj}\) and \(\gamma _{Yj}\) respectively, and the standardization also means that the absolute value of all causal parameters is at most one as well, including \(\beta _{XY}\). Since as previously noted \(\gamma _{Yj} \,=\, \gamma _{Xj}\beta _{XY}\), the absolute value of \(\gamma _{Yj}\) must therefore be smaller than (or at most equal to) that of \(\gamma _{Xj}\).

It is therefore generally possible to infer direction from the relative size of these correlations, or more directly from the causal estimate itself. In case of reverse causation \(\beta _j \,=\, \frac{1}{{\beta _{YX}}}\), which (since \(|\beta _{YX}|\) is at most 1) will have an absolute value greater than or equal to 1. As such we can decide between forward and reverse causation by determining whether \(\beta _j\) is smaller or greater than 1. This can be assessed manually by running MR analyses in both directions or using a model that incorporates both [22, 23]. Moreover, depending on the choice of exposure and outcome we will often already have strong a priori information about the causal direction, and in some cases reverse causation is inherently impossible because the exposure is known to occur before the outcome. In this regard, resolving the order of causation is often relatively straightforward in practice.

However, these methods and a priori information can only help to decide between forward and reverse causation as long as the independence assumption holds, and it is thus presumed that one of these two scenarios is correct. This therefore still requires ruling out the possibility of genetic effects on exposure and outcome being mediated by one of their confounders.

Analysing potential confounders

Two variations of what we will refer to as “mediated confounding” are depicted in Fig. 2D, E, with a causal effect \(\alpha _{Cj}\) of the variant on a confounder \(C\), violating the independence assumption. These scenarios result in a \(\beta\) value of \(\beta _{XY} \,+\, \frac{{\beta _{CY}}}{{\beta _{CX}}}\) (with \(\beta _{XY} \,=\, 0\) for Fig. 2D), demonstrating a bias away from the true causal effect of the exposure on the outcome. The InSIDE assumption is violated here as well, with both \(\gamma _{Xj} \,=\, \alpha _{Cj}\beta _{CX}\) and \(\delta _j \,=\, \alpha _{Cj}\beta _{CY}\) dependent on \(\alpha _{Cj}\). Note that these scenarios are specific to the particular confounder \(C\), and there may be other sets of variants operating on different confounder variables, with correspondingly different biases.

Because the \(\beta _{XY} \,+\, \frac{{\beta _{CY}}}{{\beta _{CX}}}\) term can take any value that \(\beta _{XY}\) itself can take, it is impossible to rule out mediated confounding scenarios using just the genetic associations with exposure and outcome. Some methods have been developed that use a mixture model approach to explicitly include a mediated confounding component in their model, such as CAUSE [24] which assumes that the variants used are a mixture of ones conforming to Fig. 2A and others conforming to Fig. 2F. LHC-MR [23] offers an even more general model also allowing for reverse causation. However, the problem remains that for any forward causation scenario as in Fig. 2A, it is possible to formulate parameter values for the mediated confounding scenario like in Fig. 2F that result in an identical pattern of genetic associations. As such, the components of these mixture models that are assumed to capture forward causation may still be capturing mediated confounding instead (see also Supplementary Information—whole-genome methods).

Additional data is therefore required to resolve the issue of mediated confounding. If genetic associations conditioning on a putative confounder variable \(C\) are available for both exposure and outcome, evaluating and correcting for that particular \(C\) is relatively straightforward. If this \(C\) is indeed mediating (part of) the effect of the variants on the exposure and outcome, adding \(C\) as a covariate to compute the conditional associations will remove this confounding effect from a subsequent MR analysis based on them. Similarly, if separate GWAS results for a possible confounder \(C\) are available, these can be used to obtain corrected MR estimates. This can be accomplished by either first correcting the \(\gamma _{Xj}\) and \(\gamma _{Yj}\) and then performing a regular MR analyis [25], or by using an MR-Egger style regression approach, essentially regressing \(\gamma _{Yj}\) on both \(\gamma _{Xj}\) and \(\gamma _{Cj}\) (the genetic associations with the possible confounder) simultaneously. The latter approach can be considered a form of multiple-exposure model, treating \(C\) as a second exposure potentially correlated with \(X\) [26]. Note that both correction using \(C\) directly or based on the \(\gamma _{Cj}\) is susceptible to collider bias when \(C\) is not a confounder [27], which therefore needs to be considered when using such methods (see also Supplementary Information—mediated confounding).

Although approaches like these can be effective in detecting and correcting for effects mediated by confounders, the obvious limiting factor is that this requires the potential confounders to be explicitly tested. If no data is available for a particular confounder, or if it was simply not considered as a potential confounder in the analysis, its effects will not have been accounted for. This poses a major challenge, since any confounder of the exposure and outcome is itself almost certainly heritable, and any variant directly associated with that confounder will also have associations with the exposure and outcome mediated by that confounder.

This implies that in practice all (potential) confounders of the exposure and outcome would need to be considered and evaluated in an MR context. This is particularly problematic with confounding endophenotypes such as those involved in specific biological pathways and processes, as their causal effects on exposure and outcome may be specific to a particular context such as a tissue or developmental time period, and measurements of such confounders would therefore need to be specific to that context as well.

Leveraging constrained data

Negative control populations

MR has sometimes been compared to RCTs, drawing a parallel between the random inheritance of alleles from parents to offspring and the randomized assignment of study participants to treatment groups, with the exposure taking the role that the actual treatment has in RCT [28]. However, this analogy is problematic, because although part of the inferential strength of RCT comes from random assignment of individuals to groups, such randomization only deals with pre-existing differences between individuals in the trial. Potential confounding that occurs after assignment remains a constant challenged even in RCT and must accounted for in the experimental design, by using well-designed control groups and strictly controlling other experimental and background variables. This level of control does not exist in the MR context, and since the exposure occurs at an unknown time possibly many years after the “randomized assignment” (and measurement of the exposure and outcome typically happens even later still), there is ample opportunity for confounding to arise.

An MR approach that more closely mimics the structure of RCT however, is the use of negative control populations [13, 29]. A negative control population is one where the exposure is constrained to a particular value, but that in other respects matches the population from which the main MR data for was derived (i.e., the relations between all relevant variables are the same). An example of this is alcohol consumption as the exposure, using a population where people do not drink alcohol due to religious or cultural taboo as control [30]. A negative control population does need to have an actual constraint on the exposure; simply selecting a subset of a population for whom the exposure is zero does not work, as this would lead to collider bias (see Supplementary Information—negative control populations).

Because in such a control population the exposure does not vary, causal effects involving that exposure are essentially blocked. The constraint on the exposure stops other variables from affecting the exposure, and stops the exposure from affecting other variables. Genetic association between a variant and the outcome in this control population therefore only consists of effects not mediated by the exposure, and thus should be zero for valid genetic instruments like in Fig. 1A. Testing the genetic association between variants and the outcome can thus serve to validate them as instruments, provided the control sample is sufficiently well-powered.

This approach can be further extended to determine how much of the genetic association with the outcome \(\gamma _{Yj}\) is not mediated by the exposure (with some restrictions, see Supplementary Information—negative control populations) [31]. Modeling this genetic association as \(\gamma _{Yj} \,=\, \gamma _{Xj}\beta _{XY} \,+\, \delta _j\), similar to MR-Egger, this can essentially provide a direct estimate of the heterogeneity term \(\delta _j\) for each individual variant \(j\). With that, it becomes possible to obtain a corrected genetic association \(\gamma _{Yj} \,-\, \delta _j\), by subtracting out the heterogeneity from the overall association, and then using this corrected \(\gamma _{Yj}\) to perform MR analysis. However, although potentially quite powerful, using negative control populations in this way is also vulnerable to bias, since this will create a hidden bias if the assumptions of the negative control population fail. This is in contrast to using negative control populations to determine validity of variants as an instrument, which will instead only tend to generate false negatives (rejecting valid instruments as invalid) if the negative control population assumptions do not hold.

Other forms of constrained data

Using negative control populations leverages natural constraints on data to provide a means of validating the instrumental variable assumptions that does not require explicit testing of individual confounders. Other approaches that utilize such constraints can be employed as well, and a prime example of this is the use of longitudinal data, for either exposure, outcome, or both. Use of such data allows the timing of the causally relevant exposure and of the causal effects to be narrowed down.

If for example we have two measurements of the exposure, as in Richardson et al. [32], there are three main scenarios to consider: a direct causal effect on the outcome only by the early exposure \(X_1\) (Fig. 3A), only by the late exposure \(X_2\) (Fig. 3B), or by both (Fig. 3C). This can be resolved by a set of three MR analyses, including one that has \(X_2\) as the exposure with a set of variants such as in Fig. 3D that only affect the later exposure. Here, the early exposure essentially functions as a baseline value, allowing us to identify variants that only affect the change in exposure that occurred since the first time point (see also Supplementary Information—longitudinal data).

Fig. 3: Graphical representation of scenarios involving longitudinal data and imperfect measurement of variables, for a variant j.
figure 3

Different longitudinal data scenarios are shown in (A) through (D), with \(X_1\) and \(X_2\) corresponding to an earlier and later measurement of the exposure. Causal effects on the outcome occur either (A) at the earlier time point, (B) the later time point, or (C) at both time points, with in (D) an additional scenario where variant \(j\) directly affects only the later measurement of the exposure and not the earlier one. In (E) is shown a scenario where the observed exposure \(X_{obs}\) does not fully represent the causally relevant instance \(X\) of the exposure, and the same in (F) for an observed outcome \(Y_{obs}\) that does not fully represent the causally relevant outcome \(Y\).

This process can be generalized to more than two time points, allowing for better determination of the likely timing of the causal effects. If longitudinal measurements of the outcome are available, these can be used in the same way to narrow down the timing. Moreover, for later time points these models can be interpreted as conditioning on the value of the exposure or outcome at an earlier time point, which would block any confounder-mediated genetic effects that occurred prior to that time point from affecting the estimate of \(\beta _{XY2}\) [33]. Although confounders may still be present for the later time points (acting e.g., on \(X_2\) and \(Y\) in Fig. 3A), this is restricted to a more limited time window, making it easier to identify likely confounders and correct for them.

Another way of leveraging known constraints on data is the use of positive and negative control outcomes: outcomes which already have strong evidence that they respectively are or are not causally influenced by the exposure, which can be used to evaluate the validity of candidate genetic instruments [8, 34]. Positive control outcomes are subject to a causal effect of the exposure, and as such any variants causally acting on the exposure must be affecting such control outcomes as well. As such, if the variants used in our MR analysis show no association with this positive control outcome, beyond what could be explained by possible lack of statistical power, this suggests that the variants used do not in fact have such a causal effect on the exposure. Similarly, if we perform an MR analysis with a negative control outcome that should not be causally affected by the exposure, and the analysis suggests that there actually is a causal effect on that negative control outcome, this casts doubt on the validity of the variants used as genetic instruments.

Relaxing the additional assumptions

The causal graph in Fig. 1A is a common way of depicting the instrumental variable assumptions central to MR, clearly showing the causal paths that need to be either present or absent for the standard analysis to work. Less explicit in this graph are some of the additional assumptions implied by it, listed in Table 1, that the analysis depends on as well. These assumptions can be condensed to two general constraints: first, that the causal graph applies in the same way to every individual used in the analysis, both in its structure and in the value of the causal effect sizes; and second, that the variables as we have measured them in our data, correspond to the true causal variables depicted in the graph without bias or error. In this section we will discuss scenarios in which these assumptions may not hold, and the implications of this for the MR analysis.

Variable effect sizes across samples

In the commonly used two-sample approach to MR analysis, variable effect sizes can potentially occur and pose a problem when the genetic associations \(\gamma _{Xj}\) and \(\gamma _{Yj}\) are obtained from samples each derived from different populations with different values for the causal parameters in Fig. 1A. As described, MR works on the core premise that \(\gamma _{Xj} \,=\, \alpha _{Xj}\) and \(\gamma _{Yj} \,=\, \alpha _{Xj}\beta _{XY}\), and that therefore the variant-specific part \(\alpha _{Xj}\) will cancel out when we take their ratio \(\beta _j \,=\, \frac{{\gamma _{Yj}}}{{\gamma _{Xj}}}\), leaving only \(\beta _{XY}\). But this will fail if the value of \(\alpha _{Xj}\) in the population from which the exposure GWAS was drawn, differs from the value of \(\alpha _{Xj}\) in the population that the outcome GWAS was based on, resulting in \(\beta _j\) being biased away from \(\beta _{XY}\).

The extent to which this is a problem will depend on the way the MR analysis is conducted. The biases produced by this scenario will usually cause heterogeneity of the \(\beta _j\), and as such it should be possible to detect and remove the affected variants (see also Supplementary Information—variable effect sizes). The MR-Egger style models are more susceptible to this issue, as the average bias will tend to end up in their estimate of \(\beta _{XY}\), which may go unnoticed unless these are used in conjunction with other types of models. Differences in \(\alpha _{Cj}\) across the populations from which GWAS data was drawn will pose similar problems when using additional GWAS data with a putative confounder \(C\) as outcome to correct for confounding.

A similar issue can arise even when all data is taken from the same population, if the GWAS samples are subject to explicit or implicit selection criteria. If these criteria differ between the exposure and outcome GWAS, this can lead to the same kind of issue as between different populations described above, if the \(\alpha _{Xj}\) differ between the selected subpopulations. Moreover, selection effects occurring in the GWAS sample for the outcome also have the potential to result in collider bias, because selection implicitly conditions on the variables being selected on [27, 35]. For example, the outcome may be measured specifically in older individuals, thus selecting for individuals who have survived to that age [36] and resulting in collider bias if the exposure causally affects life expectancy and there are any confounders of the relation between the exposure and outcome [37] (see also Supplementary Information—variable effect sizes). This sort of bias will not generally result in any heterogeneity in the \(\beta _j\), as it will affect every variant in proportionally the same way. Addressing it will therefore often require identifying relevant selection processes and evaluating whether the specific variables involved may be causing collider bias.

Variable effect sizes within samples

Effect sizes may also vary across individuals within a population, due to for example interactions of causal variants with other variables. In this case, different individuals in the population have a different value of \(\alpha _{Xj}\), depending on their score on the interactor variable. In practice, the genetic associations \(\gamma _{Xj}\) would reflect an average of these different \(\alpha _{Xj}\) values across the levels of the interactor variable. The \(\gamma _{Yj}\) are based on this average \(\alpha _{Xj}\), and thus as long as the distribution of the interactor variable is the same in both samples this will still cancel out in the ratio \(\beta _j \,=\, \frac{{\gamma _{Yj}}}{{\gamma _{Xj}}}\). On the other hand, if for example the mean of the interactor is greater in one of the samples, this no longer holds. In that case however, as with the differences in \(\alpha _{Xj}\) across samples described above, it should result in heterogeneous \(\beta _j\), and can therefore be addressed by careful application of heterogeneity testing and modeling.

It is possible for the \(\beta _{XY}\) parameter itself to vary across individuals as well, with different causal effect sizes for different individuals in the population. This can arise as an interaction effect with another variable but also as a non-linear effect of the exposure, which can be seen as essentially an interaction of the exposure with itself. In effect, the value of \(\beta _{XY}\) that MR would estimate in this case is an average of the different \(\beta _{XY}\) values across the levels of the interactor variable. In this sense, this therefore does not substantially affect the MR analysis, since such an average causal effect is still generally interpretable and informative of the relation between exposure and outcome. It can make it somewhat more difficult to generalize however, since this average \(\beta _{XY}\) would be potentially quite different in other populations if the distribution of the interaction variable in that population substantially differs from that in the population from which the outcome GWAS sample was drawn.

Imperfectly observed variables

In the graphs in Figs. 1 and 2 it is implicitly assumed that the observed variables we use in the GWAS, the exposure and outcome, as well as putative confounder variables we may be trying to evaluate, are sufficiently good proxies for the causally relevant variables. Yet this can fail to be the case for a variety of reasons [38, 39]. There could be simple measurement or diagnostic error, where the observed variables in the data are a noisy representation of the variables of interest. The causal graph in Fig. 3E depicts a scenario like this, with the true exposure of interest \(X\) now unobserved, and with a noisy observed exposure variable \(X_{obs}\) from which the genetic associations \(\gamma _{Xj}\) are estimated. Such situations often also arise when using binary variables, such as a medical diagnosis or a dichotomized continuous variable (e.g., hypertension as dichotomized blood pressure) [40], where the relevant causal effects are likely related to the underlying biological state rather than with the diagnosis or dichotomized value.

This is can arise from more systematic causes as well. It is possible that the context in which the variable was observed does not sufficiently match that of its causally relevant instance: if for instance we use gene expression as our exposure, it may well be that the tissue in which that gene’s expression causally affects the outcome is different from the tissue in which the exposure variable we are using in our analysis is measured. Similarly, there may be differences in timing and developmental period, or environmental triggers, or the observed variable may have a complex internal structure, with the causal effect only pertaining to a subtype or subscale of that variable. In case of large differences between the developmental timing of the causal effect of the exposure and when the exposure was measured, processes such as canalization and behavioral adaptive responses may also have amplified or dampened the changes induced by earlier causal effects [10, 41].

Regardless of the underlying mechanism, in a scenario such as in Fig. 3E where the “true” exposure \(X\) is imperfectly represented by the observed exposure \(X_{obs}\), the causal effect we would estimate becomes biased away from \(\beta _{XY}\). For the exposure the genetic effect changes to \(\gamma _{Xj} \,=\, \alpha _{Xj}\beta _{XO}\), and as such the ratio \(\beta _j \,=\, \frac{{\gamma _{Yj}}}{{\gamma _{Xj}}}\) becomes \(\frac{{\beta _{XY}}}{{\beta _{XO}}}\). Depending on the nature of the relation between the “true” and observed variables, the value we get may therefore differ considerably from the true value of \(\beta _{XY}\) (see also Supplementary Information—imperfectly observed variables). Note that this issue of imperfectly observed variables is not unique to MR, and would pose a problem even in the context of RCT.

All these same mechanisms can operate on the outcome as well, as depicted in Fig. 3F, in which case \(\beta _j\) will be \(\beta _{XY}\beta _{YO}\). Although this does affect interpretation, the value we are estimating does still represent a legitimate causal effect, in contrast to Fig. 3E where the causal structure would be misspecified. If for example our intended outcome is true schizophrenia status, and the \(Y_{obs}\) we use is diagnosis of schizophrenia, the causal effect we would obtain is that of our exposure on schizophrenia diagnosis, and as such does have a meaningful interpretation, even if it does not give us an estimate of the causal effect on true schizophrenia status. In this regard, full observation of the exposure is considerably more crucial than full observation of the outcome.

It should also be noted that a further consequence of such issues is that it may no longer be possible to distinguish forward and reverse causation in the way described above [39], since the parameter constraints upon which this would be based would no longer apply in the same way. Similarly, imperfect observation of a putative confounder \(C\) will also tend to render corrections of confounding effects only partially effective, not fully removing the confounding effect. Other approaches for evaluating these alternative causal scenarios would therefore need to be employed.

A somewhat related issue is that even if the observed exposure is in fact a good proxy for the causally relevant exposure, it may also be a good proxy for any number of other instances of the exposure. For example, if the expression of a particular gene is relatively stable across various tissues, the expression in a specific tissue will likely be a good proxy for expression in other tissues. As such, even if we use expression in that tissue as the exposure, we cannot know if the causal effect \(\beta _{XY}\) is indeed specific to that tissue. Similarly, we also generally do not know other aspects of the exposure such as the dosage, duration and frequency, also limiting the specificity of our conclusions [10, 41, 42].

Conclusion

In this Perspective we have outlined how the different assumptions and elements of the data figure into an MR analysis. This outline is not exhaustive, but should provide further insight in how the different components of MR fit together, on both a mathematical and conceptual level. Throughout this paper we have entertained the hypothetical that we know all true associations, focusing specifically on the challenges that remain even in such an idealized scenario. These challenges become substantially harder when having to deal with all the uncertainty in the estimates as well.

As we have shown, causal inference with MR strongly depends on its assumptions. When performing an MR study, it is thus crucial that the validity of these assumptions is examined for each specific analysis, with all alternative scenarios can be carefully considered and ruled out as much as possible. Consequently, performing a reliable MR study requires a considerable investment of time and effort, and access to high quality data for both exposures and outcomes. Despite all its complications however, a well-executed MR study can be a valuable tool in providing greater insight in the relations between our phenotypes. Moreover, the data we have available continues to improve, with more detailed measurements of phenotypes in ever larger biobanks, and rapid innovation in new data and technologies in molecular genetics. With this growth of our data, and our understanding of phenotypes, opportunities for well-designed MR studies will continue to improve.