Abstract
Mendelian randomization (MR) harnesses genetic variants as instrumental variables (IVs) to study the causal effect of exposure on outcome using summary statistics from genomewide association studies. Classic MR assumptions are violated when IVs are associated with unmeasured confounders, i.e., when correlated horizontal pleiotropy (CHP) arises. Such confounders could be a shared gene or interconnected pathways underlying exposure and outcome. We propose MRCUE (MR with Correlated horizontal pleiotropy Unraveling shared Etiology and confounding), for estimating causal effect while identifying IVs with CHP and accounting for estimation uncertainty. For those IVs, we map their cisassociated genes and enriched pathways to inform shared genetic etiology underlying exposure and outcome. We apply MRCUE to study the effects of interleukin 6 on multiple traits/diseases and identify several S100 genes involved in shared genetic etiology. We assess the effects of multiple exposures on type 2 diabetes across European and East Asian populations.
Similar content being viewed by others
Introduction
In the postgenomewide association study (GWAS) era, many efforts were made to step beyond genetic associations towards causation and mechanistic examinations. Mendelian randomization (MR) assesses the causal effect of potential risk exposures on outcome traits and diseases by leveraging genetic variants as instrument variables (IVs) and integrating existing GWAS summary statistics^{1}. MR has been widely applied to study the relationships among complex traits and diseases, and has achieved numerous successes in providing causal evaluations and suggesting disease prevention and therapeutic strategies^{2}.
Twosample MR methods take as input two sets of summary statistics, IVtoexposure and IVtooutcome association statistics, to estimate the causal effect of exposure on outcome. Since genotypes are ‘Mendelian randomized’ during meiosis, they are generally not correlated with external unmeasured confounding factors. Classic MR methods imposed strong assumptions on the validity of IVs. They assumed IVs to be associated with the exposure (“relevance”); to affect the outcome only through the exposure (“exclusion restriction”); and to be unconfounded (“exchangeability”). Figure 1a illustrated the classic assumptions. However, those assumptions are often challenged by the pervasive horizontal pleiotropy — genetic variants affecting outcome via other pathways than exposure. The presence of horizontal pleiotropy can bias the estimation and confound the causal inference if not properly handled. Specifically, the ‘uncorrelated horizontal pleiotropy (UHP)’ is a phenomenon where a genetic variant affects outcome via other pathways not through exposure (see Fig. 1b left panel for an illustration), and ‘correlated horizontal pleiotropy (CHP)’ is a phenomenon where a genetic variant affects both exposure and outcome through a heritable shared factor, i.e., an IV being associated with unmeasured confounders (see Fig. 1b right panel). In the recent literature, many robust MR methods were proposed to relax IV assumptions and allow for IVs with UHP either by treating those IVs as outliers^{3,4}, or by accounting for UHP effects in a model of mixture distributions^{5,6,7,8,9,10,11}. Some MR methods^{12,13,14,15} were developed to estimate and adjust for both UHP and CHP. MRMix^{12} uses a fourcomponent mixture model to identify and estimate the causal effect using the group of IVs estimated to be valid, without distinguishing the mechanisms (UHP/CHP) of those invalid IVs. CAUSE^{13} identifies the IVs with CHP effects, and estimates the causal effect of exposure on outcome using IVs estimated to be not affected by CHP. The method cMLMA^{14} uses a constrained maximum likelihood to draw causal inference by excluding IVs with either UHP or CHP. Similar to CAUSE and cMLMA, GRAPPLE^{15} assumes the CHP effects (i.e., IVtooutcome via confounders) being proportional to IV strengths (i.e., IVtoexposure via confounders). The assumption implies that all IVs perturb the whole confounder set and further affect outcome under a same mechanism, differing by only IV strengths.
Correlated horizontal pleiotropy is a challenging and frequently occurring issue in MR analyses. When there is only one confounder, all IVs with CHP affect the same confounder and the CHP effects of different IVs are proportional to IV strengths. Existing methods^{13,14,15} consider and model the shared CHP effect for all IVs. Often for complex traits and diseases, many genes and pathways (e.g., metabolism, immune pathways) may affect both exposure and outcome. In this work, we propose a MR method, MRCUE (MR with Correlated horizontal pleiotropy Unraveling shared Etiology and confounding). MRCUE accounts for more complex and realistic CHP effects in the presence of multiple confounders and by leveraging correlated IVs to boost power. As illustrated in Fig. 1b right panel, for IVs affected by CHP, we set the effect of IVtoconfounder to be 1^{13}, confoundertoexposure to be γ_{k}, and confoundertooutcome effect to be α_{k}. When estimating the causal effect from exposure on outcome, CHP induces a bias and the bias is equal to the shared CHP effect parameter on outcome, \(\delta=E(\frac{{\alpha }_{k}}{{\gamma }_{k}})\). If unbalanced CHP is present (δ ≠ 0) and unadjusted, false positives may arise or power may be reduced. We propose that the effect of confounder set on outcome can be decomposed into two parts, \({\alpha }_{k}=\delta {\gamma }_{k}+{\widetilde{\alpha }}_{k}\). The first part is the shared confounding effect across all IVs with CHP and is proportional to the confounders’ effect on exposure (γ_{k}) induced by each IV; and the second part (\({\widetilde{\alpha }}_{k}\)) captures how IVspecific perturbation to confounder set may affect outcome, and is orthogonal to the first part. When there exist multiple confounders (Fig. 1c), different IVs may be associated with multiple confounders at different strengths, and those IVs perturb the confounder set differently. For each IV, the ratio α_{k}/γ_{k} is a weighted average among all confounders, and the ratios are not a constant for all IVs. Additionally, the inclusion of correlated IVs in MR analyses increases the number of instruments and may boost the power^{8}. When there are multiple correlated IVs and even if there is only one confounder (Fig. 1d), the correlations among IVs with different mechanisms may induce IVspecific CHP effects. The issue is insufficiently addressed in the existing literature when using correlated IVs. Figure 1e, f shows a real data example in which CHP is present between body mass index (BMI) and triglycerides (TG). Without properly identifying and accounting for complicated CHP effects, the effect of exposure on outcome could be confounded and the estimated causal effect of outcome on exposure is also nonzero, i.e., reverse causation may occur. By modeling both the shared CHP and IVspecific CHP effects, MRCUE estimates the causal effect and distinguishes it from reverse causation. Moreover, the modeling of IVspecific CHP effects alleviates the potential bias in the presence of many weak instruments.
Another feature of MRCUE is that we propose to further study sets of IVs estimated to have CHP and examine their cisassociated genes and involved pathways. In contrast to existing method^{15}, MRCUE allows for overlapping genes/pathways. It provides the quantification of estimation uncertainty in identifying IVs with CHP and allows us to further study the sets of IVs estimated to have CHP at different levels of confidence. Through two examples, we illustrate that the estimated IVs/variants with CHP can suggest genes and pathways that are suspected sources of IVassociated confounders. Those genes and pathways may shed light on the shared genetic etiology for traits and diseases affected by a common exposure, or may reveal relevant pathways and mechanisms underlying different causal exposures for a complex disease outcome. Those diseaserelevant common confounders and pathways could inform concerted mechanisms and etiologies across populations and ethnic groups.
Results
MRCUE examines causal effects by delineating correlated and uncorrelated horizontal pleiotropic effects
We propose MRCUE to estimate the causal effect from exposure (X) on outcome (Y) while accounting for both UHP and CHP. As illustrated in Fig. 1b, we model the IVtooutcome effect of the kth IV (k = 1, …, p), Γ_{k}, as a function of IVtoexposure effect, γ_{k}, and pleiotropic effects:
where β_{1} is the causal effect of exposure on outcome; θ_{k} is the UHP effect, and α_{k} is the CHP effect of the kth IV; and both the IVtooutcome and IVtoexposure effects, Γ_{k} and γ_{k}, respectively, can be obtained from GWASs. We assume that all IVs may have UHP effects, θ_{k}, while only a proportion of IVs may also have CHP effects. Following existing literature^{13}, we rescale the IVtoconfounder effect to be 1 and the effect of confounders on exposure is then γ_{k}. In Fig. 1b (right panel), the line representing the direct effect from IV to exposure is omitted to avoid overparameterization since it is assumed to change proportionally with IVtoconfounder effect. As discussed before, we decompose the CHP effect into two components, \({\alpha }_{k}=\delta {\gamma }_{k}+{\widetilde{\alpha }}_{k}\), representing IVshared and IVspecific CHP effects. We reparametraize our model as
where β_{2} = β_{1} + δ is a nuisance parameter capturing both β_{1} and δ, and δ is the IVshared confounding parameter due to CHP. For IVs in Set 2, the IVspecific CHP effect, \({\widetilde{\alpha }}_{k}\), is assumed to have a Gaussian prior. By accounting for IVspecific CHP effects (i.e., IVspecific perturbations to the confounder set), our model is robust to the presence of multiple confounders without explicitly modeling the effect of each confounder. MRCUE is built on a Bayesian hierarchical model that estimates the parameters from the above model and obtains inference via Gibbs sampling. In Fig. 1e, we illustrate our model using a real data example to assess the causal effect of BMI on TG. When plotting IVtoBMI effects against IVtoTG effects in Fig. 1e, there is a positive causal relationship for some IVs (blue) while there are a few other IVs entailing a different pattern with an opposite slope (red). The proposed MRCUE model identifies the IVs affected by CHP (red dots), and estimates the causal effect from BMI on TG using IVs not affected by CHP (blue dots). The unconfounded causal effect is estimated to be significant and positive, \({\hat{\beta }}_{1({{{{{{{\rm{BMI}}}}}}}}\to {{{{{{{\rm{TG}}}}}}}})}=0.262\). For IVs affected by CHP, their estimated causal effects is significant and negative, \({\hat{\beta }}_{2({{{{{{{\rm{BMI}}}}}}}}\to {{{{{{{\rm{TG}}}}}}}})}=0.655\), due to the large and negative confounding bias δ. As further illustrated in Fig. 1f, MRCUE reduces false positive findings due to reverse causation by identifying the IVs affected by CHP and quantifying the uncertainty in the estimation/identification. Without properly handling CHP, one may obtain a crude sum of effect estimates combining the unconfounded and the confounded effects. In the BMITG example, we observe that the combined effects (red), \({\hat{\beta }}_{2}\)’s, for both BMItoTG and TGtoBMI are significant and negative, due to the shared confounding. While the unconfounded effect is only significant from BMI to TG, not the reverse. In the presence of unadjusted CHP, one may suffer from a reduced power or an inflated type I error rate depending on the direction of confounding effect.
In practice, there is often no clear cut for IVs unaffected or affected by CHP due to trait polygenicity and LD. The uncertainty of each variant belonging to either IV Set 1 or Set 2 can be accounted for by modeling a latent variable, η_{k}. MRCUE imposes a spikeslab prior^{16,17} for \({\widetilde{\alpha }}_{k}\), with a spike (mass density) at zero and a slab spreading over a wide range of plausible values. MRCUE quantifies the probability of each variant being affected by CHP. Different than existing clusteringbased methods or methods involving the selection of IVs estimated to be valid, MRCUE provides the estimated probabilities of IVs from Set 1 or Set 2. MRCUE obtains the causal effect estimate as a weighted estimator from all IVs weighing by the posterior probabilities of IVs being from Set 1. With the estimated probabilities of IVs from Set 2, MRCUE also works as a useful tool for further examining the potential shared genetic components underlying exposure and outcome. The IVs estimated to have CHP and their cisassociated genes may imply common genes and genetic pathways associated with both exposure and outcome. To further allow IVs in LD, MRCUE partitions the whole genome into independent blocks and introduce a group latent variable, η_{l}, for IVs in same blocks (see Methods).
MRCUE identifies IVs with CHP effects, estimates the causal effects and reduces false positives
We conducted simulation studies to evaluate the performance of MRCUE and compare with existing MR methods in a variety of scenarios. We first generated genotype matrices from different LD patterns (Methods section). Both exposure and outcome were simulated based on polygenic architecture as shown in Eq. (13). In simulations, we considered both single and multiple confounders (Methods section and Supplementary Materials). All IVs (p = 1000 or 2000) contributed a total heritability of 0.1 to exposure, while the heritability for outcome can be decomposed as variation through the causal effect (β_{1}), variation contributed by UHP (θ), and variation attributable to CHP (α). We controlled the combinatorial values for heritability due to UHP and CHP, denoted as \({h}_{\theta }^{2}\) and \({h}_{\alpha }^{2}\), respectively. As discussed earlier, we assumed that CHP is due to shared genetic components between exposure and outcome traits and only a proportion of IVs have nonzero CHP effects. We performed singlevariant association tests to obtain the summary statistics for both IVtoexposure and IVtooutcome associations as input for MR analyses.
We compared MRCUE with nine other methods, including CAUSE^{13}, GRAPPLE^{15},cMLMA^{14}, RAPS^{6}, IVW^{18}, MREgger^{5}, MRMix^{12}, MRClust^{10}, MRLDP^{8}. In existing literature, other methods including BESIDEMR^{19}, JAMMR^{20}, Berzuini’s method^{21}, and MRCorr^{2}^{22} have also been proposed to account for either UHP or CHP. For cMLMA, we evaluated its performance using its default setting, cMLMABICDP. Among those methods, MRLDP, RAPS, IVW, MREgger, and MRClust assumed that no IV/variant is affected by CHP, but allowed IVs to have UHP effects. The proposed MRCUE and four other methods, i.e., CAUSE, GRAPPLE, cMLMA, and MRMix, allowed IVs to have both UHP and CHP. Among all competing methods, MRCUE and MRLDP can handle variants in moderatetostrong LD, and CAUSE allowed for variants in weak LD.
First, we evaluated the performance of type I error rate control (Fig. 2a, b) for all competing methods in the scenarios of both single and multiple confounders. In both scenarios, MRCUE could sharply control type I error rates in all settings while CAUSE, GRAPPLE and cMLMABICDP had a reasonable control of the type I error rates. CAUSE and GRAPPLE were conservative in many settings while cMLMABICDP could control the type I error rate at the expenses of power reduction (Fig. 2c, d). Since MRClust and MRLDP did not account for CHP effects, their type I error rates were inflated. We also observed inflated type I error rates for MRMix. Simulations for all methods except for MRCUE and MRLDP were based on independent IVs after SNP clumping, since those methods were initially proposed using IVs in weaktomoderate LD. With independent IVs, RAPS, IVW and MREgger could generally control the type I error rates, up to some slight inflation. We also performed simulations with a larger number of IVs (p = 2000) and a stronger correlation between IVtoexposure and CHP effects, ρ_{αγ}. The results were largely similar, and additional details were provided in Supplementary Fig. 1. When the correlation in CHP (ρ_{αγ}) was stronger, RAPS, IVW and MREgger suffered from increased levels of inflation in the type I error rates. Supplementary Figure 2 compares the estimation biases of MRCUE with other methods and shows the boxplots of point estimates for competing methods.
We compared the power of each method by varying \({h}_{\gamma }^{2}\) while fixing \({h}_{\theta }^{2}=0.1\), \({h}_{\alpha }^{2}=0.05\), r = 0.4, and ρ_{αγ} = 0.2, with single or multiple confounders (Fig. 2c, d). MRCUE achieved the highest power among the methods that could control the type I error rates. CAUSE, as a conservative method, was underpowered^{13} and cMLMABICDP was less powerful than MRCUE. We also considered other simulation settings with different \({h}_{\theta }^{2}\), \({h}_{\alpha }^{2}\), autoregressive coefficient r for LD, and correlation ρ_{αγ} in CHP. Results were similar, and additional details were provided in Supplementary Figs. 3–6.
Next, we evaluated the performance of MRCUE in selection/identification of IVs with CHP effects. MRCUE provided a quantitative metric for this purpose. We considered two prior distributions, i.e., the default prior (a Beta distribution with shape parameters being 2 and L, the number of LD blocks) and the noninformative prior, Beta(1,1). Here, we considered \({h}_{\theta }^{2}\) = 0.02 or 0.05, \({h}_{\alpha }^{2}\) = 0.05 or 0.1, the correlation between α_{k} and γ_{k} being ρ_{αγ} = 0.2 or 0.8, and causal effect β_{1} = 0 or 0.1 with p = 1000 or 2000. Note that when ρ_{αγ} = 0, only UHP is present. We also considered moderate and strong LD structure (r = 0.4, 0.8) with autoregressive correlation. Supplementary Figure 7 shows the false discovery rate (FDR) for identifying IVs with CHP effects and Supplementary Fig. 8 shows the corresponding area under the curve (AUC) of the receiver operating characteristic (ROC) curve. MRCUE with the default prior can control the FDR at the nominal level of 0.1 while achieving a high level of AUC.
We evaluated the performance of MRCUE and other methods using real data with negative and positive controls^{23}, with varying IV selection thresholds. In the analyses of negative control outcomes, we used selfreported tanning ability and hair color as outcome, since both traits were largely determined at birth and were unlikely to be affected by other traits we considered^{24}. We considered 16 complex traits and diseases (Supplementary Data 1a) as exposure to evaluate the control of type I error rates for MRCUE and other MR methods. For each method, we applied five different IV selection thresholds to evaluate the sensitivity of different methods to IV selection criteria. Figure 2e shows the quantilequantile (QQ) plot of negative log base 10 of pvalues for MRCUE and other methods when IV selection threshold was 5 × 10^{−4}. MRCUE and some existing MR methods including GRAPPLE, cMLMABICDP, RAPS, IVW and MREgger can well control type I error rates, with pvalues falling within the 95% confidence band of the null distribution. Note that in the analyses of negative control outcomes, some MR methods without considering CHP performed well. This was probably because that the outcomes considered were not polygenic and there was no CHP effects. On the other hand, MRLDP had slightly inflated pvalues while CAUSE, MRMix, and MRClust had deflated pvalues. In the analyses of positive controls, we selected 100 established pairs of traits and diseases with causal relationships supported by exiting literature. The pairs of exposure and outcome were listed in Supplementary Data 1b. We also applied different IV selection thresholds to evaluate the sensitivity of results to IV selection. Figure 2f shows the QQ plots of negative log base 10 of pvalues using 5 × 10^{−4} as the IV selection threshold. The QQ plots using other thresholds and only independent IVs were provided in the Supplementary Figs. 13–15. In all scenarios, MRCUE had the highest power. MRLDP also had high powers but suffered from inflated type I error rates as shown in both simulations and negative control analyses. Figure 2g shows the QQ plots of positive control for MRCUE using correlated and independent IVs, respectively. We observed a substantial power gain of the proposed MRCUE with correlated IVs and with relaxed IV selection thresholds. Last, we evaluated whether MRCUE could distinguish causal relationship from reverse causality. Reverse causality occurs when there exist IVs affecting the exposure and outcome traits through some shared confounding factors. Since MRCUE is capable of identifying IVs with CHP effects, it is expected to identify the direction of true causal effect and reduce false positive findings due to reverse causality. To examine this, we simulated data with a causal effect from a trait A on a trait B (β_{A→B} ≠ 0), and tested for a reverse causal effect from B on A (B → A) using MRCUE and other methods. The simulation details were provided in the Methods Section. In all scenarios, we fixed the heritability for exposure and outcome at 0.3 and 0.25, respectively. For each simulation replicate, we applied the above MR methods for assessing the causal effects in both directions. We evaluated and compared the powers for detecting the true causal effect of exposure A on outcome B, while also compared the type I error rates for the reverse causal effect of outcome B on exposure A. Figure 2h shows the ROC curves using 100 simulated replicates at varying significance thresholds. MRCUE, CAUSE, GRAPPLE, and MRClust could distinguish causal effects from reverse causation in all simulations, while other methods cannot.
Results from other considerations, including nonlinear confounding effects, binary outcome, the impact of different proportions in IVs with CHP effects, and a sparse vector for UHP in reverse causation, showed similar conclusions and can be found in Supplementary Figs. 9–12.
Examining the effects of interleukin 6 on multiple traits/diseases implies shared genes and pathways as sources of CHP
Interleukin 6 (IL6) is a key inflammatory cytokine, and has both pro and antiinflammatory properties. It plays an important role in immunerelated processes and pathways^{25}. Here we applied MRCUE and other MR methods to evaluate the causal effects of IL6 on 27 complex traits and diseases (Supplementary Data 1c). The soluble IL6 receptor (sIL6R), a negative regulator of IL6 signaling, has been suggested to affect many complex traits and diseases including lipid levels (e.g., highdensity lipoprotein cholesterol, HDLc), both severity and susceptibility of COVID19, heart diseases (e.g., atrial fibrillation, AF), autoimmune diseases (e.g., Crohn’s disease, CD), and others^{25,26}. We analyzed those complex traits/diseases and other diseases that may not be affected by IL6. Supplementary Table 3 and Supplementary Data 2a summarize the pvalues and the estimated causal effects for MRCUE and other methods.
IL6 is a multifunctional cytokine and is highly polygenic with a heritability estimate of up to 61%^{27}. In addition to estimating the causal effects of IL6, we further obtained the posterior probabilities of IVs having CHP effects on each of the 27 outcomes, Pr(η_{l} = 1∣data), from each chromosome clustered in blocks. In Fig. 3a right panel, we plotted the strengths of CHP effects for IVs across all chromosomes for 27 outcomes, with estimated causal effects shown in the very right column. In Fig. 3a left panel, we also plotted the genetic correlations among 27 outcome traits estimated by LDSC^{28}. From the heatmap, we observed that traits in high genetic correlations tend to have similar or dependent estimated causal effects of IL6, e.g., COVID19 severity and susceptibility; any stroke (AS), any ischemic stroke (AIS), and cardioembolic stroke (CES). Those outcomes also presented similar patterns of CHP effects. Note that the strong correlation between COVID19 susceptibility and severity may be artificial due to selection bias, since people with more severe COVID19 infection are also more likely to be diagnosed with COVID19. On the other hand, traits in mildtomoderate genetic correlations, e.g., bone mass density (BMD), blood urea nitrogen (BUN), major depressive disorder (MDD), bipolar disorder (BIP), and schizophrenia (SCZ), may not share causal effect estimates but could still share CHP effect patterns. CHP effects could be present when there are no causal effects.
We further identified the IVs with significant CHP effects, Pr(η_{l} = 1∣data) > 0.8, and examined the genes in cis (1MB distance) and being associated with those IVs (pvalue < 0.05). The identified genes and gene sets may shed light on the shared pathways between IL6 and the examined complex outcomes. In Fig. 3b, we plotted the heatmap of selected cisgenes associated with at least one IV affected by CHP across multiple outcomes, with color indicating the strength of the most significant association of the gene and its cisIVs with CHP. There were many genes involved in the same pathways and being identified as IVassociated shared factors across multiple outcomes. Those shared genes may partially explain the observed genetic correlations among those 27 traits/diseases in Fig. 3a (left panel). Specifically, MRCUE identified 13 S100 genes encoding S100 proteins located in the chromosome 1q21 region. The S100 proteins belong to a family of calciumbinding cytosolic proteins and have a broad range of intracellular and extracellular functions. The extracellular S100 proteins play a crucial role in the regulation of immune homeostasis, posttraumatic injury, and inflammation^{29}. S100 proteins trigger inflammation through their interactions with receptors for RAGE and TLR4^{30}. S100A12 has been shown to induce the production of proinflammatory cytokine IL6 and IL8 in both a dosedependent and timedependent manner^{29}. Additionally, S100 proteins play a significant role to the development of chronic inflammatory and autoinflammatory diseases^{31,32}. MRCUE also identified some genes in cornified envelope pathway, SPRR family and IVL. These genes together with S100 genes constituted the epidermal differentiation complex that are essential for epidermal differentiation, building the firstline defense against external assaults and protecting our bodies from dehydration^{33}. Genes in ATPase complex were identified to play a shared role as well. Existing literature^{34} reported that the overexpression of KAT5 gene potentiated transcription of downstream antiviral genes including IL6. Other works^{35} reported that histone methyltransferase ASH1L suppresses TLRinduced IL6 production.
The above analysis also showed that different IVs with CHP effects may be involved in multiple pathways entailed by multiple sources of IVassociated confounders. The confounding effect on outcome could be IVspecific. MRCUE allows the estimation of an overall CHP effect while accounting for IVspecific variation/perturbation to confounders and improves the estimation of CHP. By closely examining the IVs with CHP effects and their cisassociated genes, we identified genes and gene sets that were highly interconnected as suggested sources of IVassociated confounders and further informed potential shared genetic etiology among the traits examined.
MRCUE informs type 2 diabetesrelated pathways for multiple risk factors across two populations
We applied MRCUE to each exposureT2D trait pair and separately estimated the causal effect from each exposure on T2D risk in the European and East Asian populations. Type 2 diabetes (T2D) is a form of diabetes characterized by high blood sugar, insulin resistance, and relative lack of insulin^{36}. T2D is high polygenic and has a complex etiology^{37,38}. Examining multiple potential risk exposures for T2D may reveal common patterns in the etiology for related factors while also presenting unique characteristics for different types of factors. Established risk factors for T2D include both lifestyle factors, such as overweight and obesity, and medical conditions^{39}. We also considered other exposure traits, including lipid levels, e.g., TG and highdensity lipoprotein cholesterol (HDLc), blood cell parameters, e.g., counts for red blood cells (RBC) and white blood cells (WBC), insulinresistancerelated factors, e.g., fasting insulin (FI), fasting glucose (FG) and HbA1c, and others. We examined 29 and 14 exposures for T2D in European and East Asian populations, respectively. The full list of exposure traits/diseases was provided in the Supplementary Data 1d, e. Supplementary Tables 4 and 5 and Supplementary Data 2b, c summarize the pvalues and the estimated causal effects for MRCUE and other methods. We further pulled the results from MRCUE and the estimated sets of IVs with CHP across analyses of different exposures to examine shared confoundings and mechanisms in both populations. Some exposures for T2D are significant in both populations, such as obesity and blood cell parameters. Obesity is a wellknown risk factor for T2D and the associations of blood cell parameters and T2D were also reported in many studies^{40,41,42}. HbA1c was also identified by MRCUE in both populations and its association with hypoglycemia was reported in a previous study^{43}. Some established T2D risk factors, including insulin resistance, insulinresistancerelated factors, and other obesity factors, have geneticassociation summary statistics in only the European population, and thus the crosspopulation comparison was not presented. MRCUE reported significant causal effects for those factors in the European population. Crosspopulations analyses using summary statistics from different populations and ethnic groups still present many challenges due to the substantially varying LD patterns, difficulties in data harmonization, study heterogeneity and others. Moreover, only a proportion of the causal variants and genes for complex traits/diseases might be shared across populations, and the risk exposures for a complex disease could also differ by population. MRCUE is robust in crosspopulation analyses as it offers two layers of inference – it obtains the causal effect estimation using IVs not affected by IVassociated confounders, while also maps the underlying genes and pathways for IVs affected by confounding.
To further investigate the shared genetic pathways for the 29 and 14 traits in the European and East Asian populations, we obtained the IVs with significant CHP effects, Pr(η_{l} = 1∣data) > 0.8. In Fig. 4a, b, we plotted the strengths of CHP effects for IVs across chromosomes in both European and East Asian populations, respectively. In general, exposures with higher polygenicity tend to have more IVs with CHP. We further performed pathway analysis based on those IVs using SNPnexus^{44} and obtained their enriched pathways, shown in Fig. 4c, d for European and East Asian populations, respectively. The significant causal risk factors identified by MRCUE are similar in both populations, and the enriched pathways presented some crosspopulation similarity as well. MRCUE identified both metabolism and immune response pathways for multiple exposures and T2D in both populations. T2D itself is an inflammatory disease triggered by disordered metabolism^{45}. MRCUE identified many metabolicrelated factors, including glycine, fasting glucose, and fasting insulin, having shared genetic components in metabolism pathway with T2D. Dysregulation of lipid metabolism triggers NLRP3 activation leading to obesityinduced inflammation and insulin resistance^{46,47}. Moreover, HbA1c that is chemically linked to a sugar was used as a screening tool to detect early T2D^{48}. Fasting glucose and HbA1c shared many common pathways in European population (Fig. 4c) while pathways for HbA1c were similar in both populations. A recent work^{49} reported that genetic variants in glutamate cysteine ligase conferred protection against T2D, while glycine was considered a promising amino acid for improving metabolic health^{50}. Glutamate and glycine are both metabolites, and they play critical roles in the metabolism pathway. Glycine was reported to improve immunity and treat metabolic disorders in diabetes^{51}, while glutamate was found to be a key immunomodulator in the initiation and development of Tcellmediated immunity^{52}. We also observed that many exposures share the signal transduction pathway with T2D in both populations. Signal transduction pathway plays an important role in both red blood cell^{53} and T2D^{54,55}. Biologically, signal transduction contains insulin receptor signaling pathway that may mediate the development of T2D by endoplasmic reticulum stress^{56}. MRCUE assessed the causal effect of each risk exposure on T2D risk, while other T2Drelated exposures are potential confounders and may contribute to the CHP effect. An alternative and complementary analysis may be using a multivariable MR method to jointly examine the effect of multiple exposures. Most existing multivariable MR methods assume no CHP, i.e., all IVassociated confounders being accounted for, and we did not proceed this direction.
Discussion
In this work, we propose MRCUE to obtain causal inference accounting for both UHP and CHP in complex and realistic settings. When there are multiple confounding genes affecting both exposure and outcome, different IVs may be associated with more than one confounder at varying levels of strengths, resulting in both IVshared and IVspecific CHP effects. In contrast to existing methods focusing on IVshared CHP effects, MRCUE also models IVspecific CHP effects, and estimates the causal effect of exposure on outcome. Moreover, MRCUE allows moderately correlated IVs to boost power in MR analyses. When correlated IVs are included, IVspecific CHP effects may also arise. Existing methods insufficiently address the issue, while MRCUE can obtain unbiased and efficient estimation in the presence of multiple confounders and/or correlated IVs. MRCUE simultaneously quantifies the probabilities of IVs with CHP, and further examines their cisassociated genes for potential shared genes/pathways/mechanisms underlying exposure and outcome. With simulation studies and analyses of negative control outcomes and positive controls, we demonstrated that MRCUE can reduce false positives due to reverse causation, control the type I error rates in the presence of multiple confounders and correlated IVs; by including correlated IVs, MRCUE improves the power of MR analyses; MRCUE is insensitive to IV selection threshold; and MRCUE identifies IVs with CHP at the desired confidence levels. To minimize potential bias due to the winner’s curse, we recommend selecting the IVs first using a third independent sample^{57}, if possible.
We studied the causal effects of IL6 on multiple outcomes. By further examining the IVs with significant CHP effects and their cisassociated genes, we highlighted multiple genes that may be shared (also served as confounders) between IL6 and some examined traits/diseases. Those suggested genes included multiple S100 genes and genes in the cornified envelope pathway, shedding light on the shared genetic etiology. In another analysis, we applied MRCUE to study the effects of multiple putative exposures on T2D risks in both European and East Asian populations. A crosspopulation analysis and comparison of multiple risk exposures showed consistent causal effect estimates in both populations. We further examined the IVs with CHP effects and their enriched pathways. In both populations, it was suggested that metabolism and immune response pathways play a central role in the shared etiologies among multiple putative exposures and T2D.
MRCUE paved the way for future crosspopulation MR analyses to reduce disparity. Crosspopulations MR analyses using summary statistics from different populations is still challenging due to varying LD patterns, difficulties in data harmonization, study heterogeneity and others. MRCUE is robust in crosspopulation analyses as it provides double layers of inference for crosspopulation comparisons – it estimates the causal effect of exposure using IVs not affected by IVassociated confounders, while also maps the underlying genes and pathways for IVs affected by confounding.
MRCUE has some caveats that may require further explorations. First, MRCUE assumes that all IVs could have potential UHP effect while only a sparse proportion of IVs have CHP effect. When the proportion is nonsparse, the identification condition may lead to biased estimation. Second, MRCUE works for a single exposure and a single outcome. When the exposure is known to be highly correlated with other exposures, or when multiple outcomes may often cooccur, multivariable MR methods accounting for both CHP and UHP may be considered. Third, MRCUE requires multiple (at least dozens of) IVs to identify and delineate CHP effects and is not suitable for analyzing molecular risk exposures such as gene expression levels. Last, MRCUE identifies the IVs with significant CHP effects, though the mapping of cisassociated genes/pathways from those identified IVs is still not an automated process. We are working on improving the automation of this step.
When using MR to infer causation, caution should always be exercised. By leveraging GWAS summary statistics from large genetic consortia or biobanksized studies, MR analysis is empowered. On the other hand, insights are still limited regarding potential subgroup effects, indirect effects from different mediators between exposure and outcome, and potential exposuremediator interactions. Further integration of MR with mediation analyses could be valuable for the development of prevention and treatment strategies towards precision medicine.
Methods
MRCUE model for independent IVs
To estimate the causal effect in the MRCUE model, we use the marginal effect size and standard error estimates from GWASs for exposure (X) and outcome (Y) diseases/traits as input. Let \(\{{\widehat{\gamma }}_{k},{\widehat{{{{{{{{\bf{s}}}}}}}}}}_{{\gamma }_{k}}\}\) denote the IVtoexposure effect size and its standard error for IV k. Let \(\{{\widehat{{{\Gamma }}}}_{k},{\widehat{{{{{{{{\bf{s}}}}}}}}}}_{{{{\Gamma }}}_{k}}\}\) denote the IVtooutcome effect size and standard error. Let γ_{k} and Γ_{k} be the true marginal effect size of IV k for traits X and Y, respectively. For independent IVs, we model the distribution for the estimated effect sizes in both exposure and outcome diseases/traits using the following independently and identically distributed (i. i. d. ) normal distributions,
The proposed MRCUE models the IVtooutcome effect as a function of IVtoexposure, and UHP and CHP effects using Eq. (2), with UHP effects i. i. d. as \({\theta }_{k} \sim {{{{{{{\mathcal{N}}}}}}}} (0,{\sigma }_{\theta }^{2})\). The IVtoexposure effect (γ_{k}) and the CHP effect (α_{k}) are correlated, and i. i. d. with a bivariate normal distribution:
where ρ_{αγ} is the correlation between γ_{k} and α_{k}.
The decomposition of CHP effects
From Eq. (4), we reparameterize γ_{k} and α_{k} as follows
where Z_{k} follows a standard normal distribution, \({Z}_{k} \sim {{{{{{{\mathcal{N}}}}}}}} (0,1)\) and Z_{k}⫫γ_{k}, and \(\delta={\rho }_{\alpha \gamma }\frac{{\sigma }_{{{{{{{{\boldsymbol{\alpha }}}}}}}}0}}{{\sigma }_{{{{{{{{\boldsymbol{\gamma }}}}}}}}}}\). Equation (5) decomposes the CHP effect α_{k} into two parts, with one being proportional to γ_{k} and the other part being independent of γ_{k}, i.e., \({\widetilde{\alpha}}_{k}\)⫫γ_{k}. The decomposition in Eq. (5) can also be viewed as a linear regression of α_{k} regressed on γ_{k} with \({\widetilde{\alpha }}_{k}\) being the residuals. Let \({\widetilde{\alpha }}_{k} \sim {{{{{{{\mathcal{N}}}}}}}} (0,{\sigma }_{\alpha }^{2})\). We call \({\widetilde{\alpha }}_{k}\) as the orthogonal projection of CHP. We can further parameterize the effect size of IVtooutcome for IV k as in Eq. (2). Therefore, identifying the IVs with CHP effects in Eq. (2) is equivalent to identifying the IVs with nonzero projected CHP, namely \({\widetilde{\alpha }}_{k}\,\ne\, 0\). The estimation of causal effect β_{1} is based on IVs with \({\widetilde{\alpha }}_{k}=0\).
We further introduce a latent indicator η_{k} for each IV k, with η_{k} = 1 for IVs with nonzero CHP effects. We impose the following spikeslab prior^{16,58} on \({\widetilde{\alpha }}_{k}\):
where δ_{0} denotes the Dirac delta function at zero, and η_{k} follows a Bernoulli distribution with \({\eta }_{k} \sim {\omega }^{{\eta }_{k}}{(1\omega )}^{1{\eta }_{k}}\). Then, Eq. (2) can be written as
where \({\tau }_{1}^{2}={\sigma }_{\theta }^{2}\) for IVs with potential UHP only and \({\tau }_{2}^{2}={\sigma }_{\theta }^{2}+{\sigma }_{\alpha }^{2}\) with both potential UHP and CHP. Following existing literature^{13,14}, our model also assumes that all IVs could have potential UHP while only a sparse proportion of IVs have CHP. As a consequence of the assumption, the variability of Γ_{k} is larger for the β_{2} group of IVs than the β_{1} group because of the existence of \({\widetilde{\alpha }}_{k}\). Thus, in Eq. (6), \({\tau }_{2}^{2} \, > \, {\tau }_{1}^{2}\). Since both \({\tau }_{1}^{2}\) and \({\tau }_{2}^{2}\) are model parameters, we can obtain their estimates using MCMC and use them to identify \({\hat{\beta }}_{1}\) (see Supplementary Materials).
To promote the computational efficiency in lowsignalnoiseratio regime, we expand the original distribution (6) as follows^{59,60}:
where ξ ^{2} is an expanded parameter with a noninformative prior. By combing Eqs. (3) and (7), we build the Bayesian hierarchical model with conjugate priors for hyper parameters, \({\sigma }_{\gamma }^{2} \sim {{{{{{{\mathcal{IG}}}}}}}}({a}_{\gamma },{b}_{\gamma })\), \({\tau }_{1}^{2} \sim {{{{{{{\mathcal{IG}}}}}}}}({a}_{\tau 1},{b}_{\tau 1})\), \({\tau }_{2}^{2} \sim {{{{{{{\mathcal{IG}}}}}}}}({a}_{\tau 2},{b}_{\tau 2})\), and ω ~ Beta(a, b).
Accounting for LD
We expand the MRCUE model to allow for correlated IVs by modeling their LD structure. We model the estimated effect sizes in both exposure and outcome diseases/traits with approximated multivariate normal distributions^{61} as follows,
where \(\widehat{{{{{{{{\boldsymbol{\gamma }}}}}}}}}={[{\widehat{\gamma }}_{1},\ldots,{\widehat{\gamma }}_{p}]}^{T}\) and \(\widehat{{{{{{{{\boldsymbol{\Gamma }}}}}}}}}={[{\widehat{{{\Gamma }}}}_{1},\ldots,{\widehat{{{\Gamma }}}}_{p}]}^{T}\) are vectors for the marginal effect sizes in exposure and outcome diseases/traits, respectively; \({\widehat{{{{{{{{\bf{S}}}}}}}}}}_{\gamma }=\,{{\mbox{diag}}}\,([{\widehat{{{{{{{{\bf{s}}}}}}}}}}_{{\gamma }_{1}},\cdots \,,{\widehat{{{{{{{{\bf{s}}}}}}}}}}_{{\gamma }_{p}}])\) and \({\widehat{{{{{{{{\bf{S}}}}}}}}}}_{{{\Gamma }}}=\,{{\mbox{diag}}}\,([{\widehat{{{{{{{{\bf{s}}}}}}}}}}_{{{{\Gamma }}}_{1}},\cdots \,,{\widehat{{{{{{{{\bf{s}}}}}}}}}}_{{{{\Gamma }}}_{p}}])\) are the corresponding diagonal matrices for standard errors; and \(\widehat{{{{{{{{\bf{R}}}}}}}}}\in {{\mathbb{R}}}^{p\times p}\) is the estimated correlation matrix among all selected IVs. In the approximated distributions in Eq. (8), all quantities except for \(\widehat{{{{{{{{\bf{R}}}}}}}}}\) can be obtained from summarylevel GWAS results while \(\widehat{{{{{{{{\bf{R}}}}}}}}}\) is estimated using an independent reference panel data.
Estimating LD matrix from a reference panel
To estimate the LD matrix, we used independent reference panel data from the following sources: UK10K Project (Avon Longitudinal Study of Parents and Children, ALSPAC^{62}, and TwinsUK^{63}) merged with Europeanancestry samples in 1000 Genome Project Phase 3^{64}. There are 4284 individuals in total. We conducted strict quality control for the reference data using PLINK^{65} and GCTA^{66}. We removed the individuals with genotype missing rates greater than 5%, and further removed one pair of individuals that have genetic relatedness larger than 0.05. Since both ALSPAC and TwinsUK cohorts contain nonEuropean samples, we further performed the principal components analysis (PCA)^{67} followed by the analysis of hierarchical clustering on principal components (HCPC)^{68} to extract and restrict the analysis to samples from European ancestries. After data preprocessing, roughly 3700 samples were retained as the reference panel data.
Often it is useful to define approximately independent LD blocks a priori. Here we used LDetect^{69} based on an efficient signal processing approach for choosing segment boundaries between blocks. Consequently, LDetect partitioned the entire genome into 1703 and 1445 independent blocks for European and Asian populations, respectively (http://bitbucket.org/nygcresearch/ldetectdata). For each LD block, we calculated the empirical correlation matrix and further applied a simple shrinkage correlation estimator^{70} to obtain
where \({\widehat{{{{{{{{\bf{R}}}}}}}}}}_{{{{{{{{\rm{emp}}}}}}}}}^{(l)}\in {{\mathbb{R}}}^{{p}_{l}\times {p}_{l}}\) was the empirical correlation matrix for the lth block in the panel data and λ ≥ 0 was a shrinkage parameter. By obtaining all \({\widehat{{{{{{{{\bf{R}}}}}}}}}}^{(l)}\)s, l = 1, …, L, we could further obtain \(\widehat{{{{{{{{\bf{R}}}}}}}}}=\,{{\mbox{diag}}}\,({\widehat{{{{{{{{\bf{R}}}}}}}}}}^{(l)})\in {{\mathbb{R}}}^{p\times p}\) with \(\mathop{\sum }\nolimits_{l=1}^{L}{p}_{l}=p\). Here we fixed the shrinkage parameter λ at 0.85^{8}.
A group spikeslab prior
For IVs in moderatetostrong LD, if there is a single variant k with a nonzero CHP effect, the CHP effect for other nearby variants in the block would be also nonzero. In our analyses, genetic variants across the genome can be partitioned into independent blocks. IVs from different blocks could be roughly taken as independent. Thus, the projected \({\widetilde{\alpha }}_{k}\) is estimated in a group manner. We introduce a grouplevel latent status η_{l}, indicating whether IVs within the lth block having nonzero CHP effects and assigning a grouplevel spikeslab prior as follows:
where η_{l} = 1 implies the IVs within the lth block having nonzero projected CHP effects and η_{l} = 0 means the projected CHP effects being all zero for IVs in the block. Here, η_{l} is a Bernoulli random variable with probability ω being 1, \({\eta }_{l} \sim {\omega }^{{\eta }_{l}}{(1\omega )}^{1{\eta }_{l}}\).
Considering IVs in LD, we have the following mixture distribution for Γ_{lk} that is similar to Eq. (7):
Accounting for sample overlap
When IVtoexposure and IVtooutcome summary statistics are taken from biobanksized or consortiabased GWASs with potential overlapping samples, we need to account for the potential additional correlations. To allow overlapping samples in GWAS for both diseases/traits, we could rewrite the distribution for summary statistics in Eq. (8) as a joint distribution and propose the following Bayesian hierarchical model for correlated IVs with overlapping samples,
where ⊗ denote the Kronecker product and \({{{{{{{{\bf{R}}}}}}}}}_{e}=\left[\begin{array}{cc}1&{\rho }_{e}\\ {\rho }_{e}&1\end{array}\right]\) is the correlation matrix that accounts for sample overlap. Here, the correlation due to sample overlap ρ_{e} can be estimated using summary statistics among independent variants with no associations to both exposure and outcome diseases/traits.
Since the estimated LD matrix is blockdiagonal, the resulting Gibbs sampler can be performed in a parallel manner for each block. The algorithmic details are given in the Supplementary Materials.
Generation of summary statistics in the simulation studies
We generated the summary statistics using simulated individuallevel data. We first simulated genotypes \({{{{{{{{\bf{G}}}}}}}}}_{x}\in {{\mathbb{R}}}^{{n}_{x}\times p},{{{{{{{{\bf{G}}}}}}}}}_{y}\in {{\mathbb{R}}}^{{n}_{y}\times p}\) and \({{{{{{{{\bf{G}}}}}}}}}_{r}\in {{\mathbb{R}}}^{{n}_{r}\times p}\) for both exposure and outcome as well as for an independent reference data, respectively, where n_{x}, n_{y}, and n_{r} were the corresponding sample sizes and p was the total number of IVs. We set the number of blocks L to be 100 or 200, and the number of IVs within a block to be 10, respectively. Correspondingly, the number of IVs was either 1000 or 2000. For all simulations, we considered n_{x} = 50,000, n_{y} = 50,000 and n_{r} = 4000.
We then generated a data matrix from a multivariate normal distribution \({{{{{{{\mathcal{N}}}}}}}} ({{{{{{{\boldsymbol{0}}}}}}}},{{{{{\mathbf{\Sigma }}}}}}(r))\), where r ∈ {0.4, 0.8} represented the autoregressive correlation among IVs. We simulated genotype matrix by categorizing data matrices into dosage values {0, 1, 2} according to minor allele frequency that is uniformly distributed in [0.05, 0.5]. We then considered the following structural model to generate individuallevel data
where \({{{{{{{{\bf{U}}}}}}}}}_{x}\in {{\mathbb{R}}}^{{n}_{x}\times q}\) and \({{{{{{{{\bf{U}}}}}}}}}_{y}\in {{\mathbb{R}}}^{{n}_{y}\times q}\) are the matrices for q confounders in the samples from IVtoexposure and IVtooutcome, respectively, \({{{{{{{{\boldsymbol{\psi }}}}}}}}}_{x}\in {{\mathbb{R}}}^{q\times 1}\) and \({{{{{{{{\boldsymbol{\psi }}}}}}}}}_{y}\in {{\mathbb{R}}}^{q\times 1}\) are the corresponding vector of coefficients, x_{x} and x_{y} are exposure traits in two samples, \({{{{{{{{\boldsymbol{\epsilon }}}}}}}}}_{{x}_{x}}\in {{\mathbb{R}}}^{{n}_{x}\times 1}\), \({{{{{{{{\boldsymbol{\epsilon }}}}}}}}}_{{x}_{y}}\in {{\mathbb{R}}}^{{n}_{y}\times 1}\), and \({{{{{{{{\boldsymbol{\epsilon }}}}}}}}}_{y}\in {{\mathbb{R}}}^{{n}_{y}\times 1}\) are the random errors, and β_{1} is the causal effect of interest. In all simulations, we considered q = 50 and each column of U_{x} and U_{y} was sampled from a standard normal distribution. The coefficients of these confounders, ψ_{x} and ψ_{y}, were sampled from a bivariate normal distribution \({{{{{{{\mathcal{N}}}}}}}} ({{{{{{{\bf{0}}}}}}}},{{{{{{{{\boldsymbol{\Sigma }}}}}}}}}_{\psi })\), where Σ_{ψ} was a twobytwo matrix with diagonal elements of 1 and offdiagonal elements of 0.8. For CHP effects, we assumed γ_{k} and α_{k} following a bivariate normal distribution \({{{{{{{\mathcal{N}}}}}}}} ({{{{{{{\bf{0}}}}}}}},{{{{{{{\boldsymbol{\Sigma }}}}}}}}({\rho }_{\alpha \gamma }))\). We considered α_{k} to be sparse, i.e., only 10% of α_{k} was sampled from the bivariate normal distribution and the others were zero. For UHP, we assumed θ_{k} to be dense and follow an independent normal distribution, \({{{{{{{\mathcal{N}}}}}}}} (0,{\sigma }_{\theta }^{2})\).
We further performed the singlevariant analysis to obtain summary statistics, \(\{{\widehat{\gamma }}_{k},{\widehat{{{{{{{{\bf{s}}}}}}}}}}_{{\gamma }_{k}}\}\) and \(\{{\widehat{{{\Gamma }}}}_{k},{\widehat{{{{{{{{\bf{s}}}}}}}}}}_{{{{\Gamma }}}_{k}}\}\), ∀ k = 1, …, p, for both exposure and outcome, respectively. In the simulation study, we controlled the magnitudes for γ, α and θ using \({h}_{\gamma }^{2}=\frac{var({\beta }_{1}{{{{{{{{\bf{G}}}}}}}}}_{y}{{{{{{{\boldsymbol{\gamma }}}}}}}})}{var({{{{{{{\bf{y}}}}}}}})}\), \({h}_{\alpha }^{2}=\frac{var({{{{{{{{\bf{G}}}}}}}}}_{y}{{{{{{{\boldsymbol{\alpha }}}}}}}})}{var({{{{{{{\bf{y}}}}}}}})}\) and \({h}_{\theta }^{2}=\frac{var({{{{{{{{\bf{G}}}}}}}}}_{y}{{{{{{{\boldsymbol{\theta }}}}}}}})}{var({{{{{{{\bf{y}}}}}}}})}\), respectively. We considered \({h}_{\gamma }^{2}=0.1\) and varied \({h}_{\theta }^{2}\in \{0.02,0.05\}\) and \({h}_{\alpha }^{2}\in \{0.05,0.1\}\) to evaluate the performance of MRCUE in selecting/identifying IVs with CHP effects and in the control of type I error rates. To further examine the power, we varied \({h}_{\gamma }^{2}\) in a sequence of values from 0 to 0.1 while fixing other parameters.
Generation of summary statistics for reverse causation analysis
We considered the following structural model to generate individuallevel data that is similar to existing work^{13}:
where γ and θ are from two independent normal distributions. In this simulation, we first controlled the heritability of exposure and outcome, denoted as \({h}_{x}^{2}\) and \({h}_{y}^{2}\), respectively. We further assumed that 20% of the outcome heritability, \({h}_{y}^{2}\), can be explained by the causal effect (β_{1}) of exposure on outcome. Thus, we have three quantities below
We set \({h}_{y}^{2}=0.25\), \({h}_{x}^{2}=0.3\), and only 5% of γ being nonzero. We fixed r = 0.4, p = 2000, and ρ_{αγ} = 0.2. To examine reverse causality, we applied MRCUE and other methods to assess the causal effects in both directions for 100 simulated replicates. By varying significance thresholds, we obtained the ROC curves for true positives vs. false positives averaged over the 100 replicates.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
The reference panel is the merged genotype data from UK10K and 1000 Genome Project Phase 3, available for download from the European GenomePhenome Archive (https://www.ebi.ac.uk/ega/) with ID EGAD00001000776. The LD estimates using UK10K genotype data for the list of SNPs from HapMap Project Phase 3 (HapMap3) can be download at https://zenodo.org/record/7152063. All GWAS summary statistics used in this study are publicly available. GWAS summary statistics for IL6 are available at http://www.phpc.cam.ac.uk/ceu/proteins/. GWAS summary statistics for T2D in the European population can be obtained at http://diagramconsortium.org/downloads.html. GWAS summary statistics for T2D in the East Asian population can be accessed here https://blog.nus.edu.sg/agen/summarystatistics/t2d2020/. Other summary statistics are publicly available from the studies as referenced in Supplementary Data 1.
Code availability
The MRCUE method is implemented in an opensource, publicly available R package that is available at https://github.com/QingCheng0218/MR.CUE^{71}. The code to reproduce the analysis can be found at https://github.com/QingCheng0218/MR.CUE/tree/main/simulation.
References
Smith, G. D. & Ebrahim, S. Mendelian randomization: prospects, potentials, and limitations. Int. J. Epidemiol. 33, 30–42 (2004).
Ference, B. A. et al. Effect of longterm exposure to lower lowdensity lipoprotein cholesterol beginning early in life on the risk of coronary heart disease: a mendelian randomization analysis. J. Am. Coll. Cardiol. 60, 2631–2639 (2012).
Zhu, Z. et al. Causal associations between risk factors and common diseases inferred from gwas summary data. Nat. Commun. 9, 1–12 (2018).
Verbanck, M., Chen, C.y, Neale, B. & Do, R. Detection of widespread horizontal pleiotropy in causal relationships inferred from mendelian randomization between complex traits and diseases. Nat. Genet. 50, 693–698 (2018).
Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through egger regression. Int. J. Epidemiol. 44, 512–525 (2015).
Zhao, Q. et al. Statistical inference in twosample summarydata mendelian randomization using robust adjusted profile score. Ann. Stat. 48, 1742–1769 (2020).
Zhao, J. et al. Bayesian weighted mendelian randomization for causal inference based on summary statistics. Bioinformatics 36, 1501–1508 (2020).
Cheng, Q. et al. MRLDP: a twosample mendelian randomization for gwas summary statistics accounting for linkage disequilibrium and horizontal pleiotropy. NAR Genomics Bioinform. 2, lqaa028 (2020).
Burgess, S., Foley, C. N., Allara, E., Staley, J. R. & Howson, J. M. A robust and efficient method for mendelian randomization with hundreds of genetic variants. Nat. Commun. 11, 1–11 (2020).
Foley, C. N., Mason, A. M., Kirk, P. D. & Burgess, S. MRClust: clustering of genetic variants in mendelian randomization with similar causal estimates. Bioinformatics 37, 531–541 (2021).
Iong, D., Zhao, Q. & Chen, Y. A. Latent mixture model for heterogeneous causal mechanisms in mendelian randomization. arXiv preprint arXiv:2007.06476 (2020).
Qi, G. & Chatterjee, N. Mendelian randomization analysis using mixture models for robust and efficient estimation of causal effects. Nat. Commun. 10, 1–10 (2019).
Morrison, J., Knoblauch, N., Marcus, J. H., Stephens, M. & He, X. Mendelian randomization accounting for correlated and uncorrelated pleiotropic effects using genomewide summary statistics. Nat. Genet. 52, 740–747 (2020).
Xue, H., Shen, X. & Pan, W. Constrained maximum likelihoodbased mendelian randomization robust to both correlated and uncorrelated pleiotropic effects. Am. J. Hum. Genet. 108, 1251–1269 (2021).
Wang, J. et al. Causal inference for heritable phenotypic risk factors using heterogeneous genetic instruments. PLoS Genet. 17, e1009575 (2021).
Ishwaran, H. & Rao, J. S. et al. Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Stat. 33, 730–773 (2005).
MalsinerWalli, G. & Wagner, H. Comparing spike and slab priors for Bayesian variable selection. Austrian Journal of Statistics. 40, 241–264 (2011).
Burgess, S., Butterworth, A. & Thompson, S. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol. 37, 658–665 (2013).
Shapland, C. Y., Zhao, Q. & Bowden, J. Profilelikelihood Bayesian model averaging for twosample summary data mendelian randomization in the presence of horizontal pleiotropy. Stat. Med. 41, 1100–1119 (2022).
Gkatzionis, A., Burgess, S., Conti, D. V. & Newcombe, P. J. Bayesian variable selection with a pleiotropic loss function in mendelian randomization. Stat. Med. 40, 5025–5045 (2021).
Berzuini, C., Guo, H., Burgess, S. & Bernardinelli, L. A Bayesian approach to mendelian randomization with multiple pleiotropic variants. Biostatistics 21, 86–101 (2020).
Cheng, Q. et al. MRCorr2: a twosample Mendelian randomization method that accounts for correlated horizontal pleiotropy using correlated instrumental variants. Bioinformatics 38, 303–310 (2022).
Burgess, S. et al. Guidelines for performing mendelian randomization investigations. Wellcome Open Res. 4, 1–28 (2019).
Sanderson, E., Richardson, T., Hemani, G. & Smith, G. D. The use of negative control outcomes in mendelian randomisation to detect potential population stratification or selection bias. Int. J. Epidemiol. 50, 1350–1361 (2021).
Tanaka, T., Narazaki, M. & Kishimoto, T. IL6 in inflammation, immunity, and disease. Cold Spring Harb. Perspect. Biol. 6, a016295 (2014).
McElvaney, O. J., Curley, G. F., RoseJohn, S. & McElvaney, N. G. Interleukin6: obstacles to targeting a complex cytokine in critical illness. Lancet Respir. Med. 9, 643–654 (2021).
Ahluwalia, T. S. et al. Genomewide association study of circulating interleukin 6 levels identifies novel loci. Hum. Mol. Genet. 30, 393–409 (2021).
BulikSullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Xia, C., Braunstein, Z., Toomey, A. C., Zhong, J. & Rao, X. S100 proteins as an important regulator of macrophage inflammation. Front. Immunol. 8, 1908 (2018).
Vogl, T. et al. Mrp8 and mrp14 are endogenous activators of tolllike receptor 4, promoting lethal, endotoxininduced shock. Nat. Med. 13, 1042–1049 (2007).
Perera, C., McNeil, H. P. & Geczy, C. L. S100 calgranulins in inflammatory arthritis. Immunol. Cell Biol. 88, 41–49 (2010).
Heizmann, C. W. The multifunctional s100 protein family. CalciumBinding Protein Protocols. 172, 69–80 (2002).
Kypriotou, M., Huber, M. & Hohl, D. The human epidermal differentiation complex: cornified envelope precursors, s100 proteins and the ‘fused genes’ family. Exp. Dermatol. 21, 643–649 (2012).
Song, Z.M. et al. KAT5 acetylates cgas to promote innate immune response to dna virus. Proc. Natl Acad. Sci. USA 117, 21568–21575 (2020).
Xia, M. et al. Histone methyltransferase ash1l suppresses interleukin6 production and inflammatory autoimmune diseases by inducing the ubiquitinediting enzyme a20. Immunity 39, 470–481 (2013).
The National Institute of Diabetes and Digestive and Kidney Diseases. Symptoms & Causes of Diabetes. https://www.niddk.nih.gov/healthinformation/diabetes/overview/symptomscauses?dkrd=hispt0015. Accessed: 20160210.
Langenberg, C. & Lotta, L. A. Genomic insights into the causes of type 2 diabetes. Lancet 391, 2463–2474 (2018).
Xue, A. et al. Genomewide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. Nat. Commun. 9, 1–14 (2018).
Funnell, M. M. & Anderson, R. M. Type 2 Diabetes Mellitus, p. 455–466 (Springer, 2008).
Tong, P. C. et al. White blood cell count is associated with macroand microvascular complications in chinese patients with type 2 diabetes. Diabetes Care 27, 216–222 (2004).
Demirtunc, R. et al. The relationship between glycemic control and platelet activity in type 2 diabetes mellitus. J. Diabetes Complications 23, 89–94 (2009).
Magri, C. J. & Fava, S. Red blood cell distribution width and diabetesassociated complications. Diabetes Metab. Syndrome Clin. Res. Rev. 8, 13–17 (2014).
Lipska, K. J. et al. HbA1c and risk of severe hypoglycemia in type 2 diabetes: the diabetes and aging study. Diabetes Care 36, 3535–3542 (2013).
Oscanoa, J. et al. SNPnexus: a web server for functional annotation of human genome sequence variation (2020 update). Nucleic Acids Res. 48, W185–W192 (2020).
Donath, M. Y. & Shoelson, S. E. Type 2 diabetes as an inflammatory disease. Nat. Rev. Immunol. 11, 98–107 (2011).
Vandanmagsar, B. et al. The NLRP3 inflammasome instigates obesityinduced inflammation and insulin resistance. Nat. Med. 17, 179–188 (2011).
Hameed, I. et al. Type 2 diabetes mellitus: from a metabolic disorder to an inflammatory condition. World J. Diabetes 6, 598 (2015).
Bennett, C., Guo, M. & Dharmage, S. HbA1c as a screening tool for detection of type 2 diabetes: a systematic review. Diabet. Med. 24, 333–343 (2007).
Azarova, I., Klyosova, E., Lazarenko, V., Konoplya, A. & Polonikov, A. Genetic variants in glutamate cysteine ligase confer protection against type 2 diabetes. Mol. Biol. Rep. 47, 5793–5805 (2020).
Alves, A., Bassot, A., Bulteau, A.L., Pirola, L. & Morio, B. Glycine metabolism and its alterations in obesity and metabolic diseases. Nutrients 11, 1356 (2019).
Wang, W. et al. Glycine metabolism in animals and humans: implications for nutrition and health. Amino Acids 45, 463–477 (2013).
Pacheco, R., Gallart, T., Lluis, C. & Franco, R. Role of glutamate on tcell mediated immunity. J. Neuroimmunol. 185, 9–19 (2007).
Richmond, T. D., Chohan, M. & Barber, D. L. Turning cells red: signal transduction mediated by erythropoietin. Trends Cell Biol. 15, 146–155 (2005).
MandrupPoulsen, T. Apoptotic signal transduction pathways in diabetes. Biochem. Pharmacol. 66, 1433–1440 (2003).
Björnholm, M. & Zierath, J. Insulin signal transduction in human skeletal muscle: identifying the defects in type ii diabetes. Biochem. Soc. Trans. 33, 354–357 (2005).
Özcan, U. et al. Endoplasmic reticulum stress links obesity, insulin action, and type 2 diabetes. Science 306, 457–461 (2004).
Zhao, Q., Chen, Y., Wang, J. & Small, D. S. Powerful threesample genomewide design and robust statistical inference in summarydata mendelian randomization. Int. J. Epidemiol. 48, 1478–1492 (2019).
Shi, X. et al. VIMCO: variational inference for multiple correlated outcomes in genomewide association studies. Bioinformatics 35, 3693–3700 (2019).
Gelman, A. et al. Bayesian data analysis (CRC press, 2013).
Gelman, A. et al. Prior distributions for variance parameters in hierarchical models (comment on article by browne and draper). Bayesian Anal. 1, 515–534 (2006).
Zhu, X. & Stephens, M. Bayesian largescale multiple regression with summary statistics from genomewide association studies. Ann. Appl. Stat. 11, 1561 (2017).
Boyd, A. et al. Data resource profile: The alspac birth cohort as a platform to study the relationship of environment and health and social factors. Int. J. Epidemiol. 48, 1038–1039k (2019).
Moayyeri, A., Hammond, C. J., Valdes, A. M. & Spector, T. D. Cohort profile: Twinsuk and healthy ageing twin study. Int. J. Epidemiol. 42, 76–85 (2013).
Fairley, S., LowyGallego, E., Perry, E. & Flicek, P. The international genome sample resource (igsr) collection of open human genomic variation resources. Nucleic Acids Res. 48, D941–D947 (2020).
Purcell, S. et al. PLINK: a tool set for wholegenome association and populationbased linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genomewide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Turner, S. et al. Quality control procedures for genomewide association studies. Curr. Protoc. Hum. Genet. 68, 1–19 (2011).
Husson, F., Josse, J. & Pages, J. Principal component methodshierarchical clusteringpartitional clustering: why would we need to choose for visualizing data. Technical Report, Rennes, France: Agrocampus Ouest. 1–17 (2010).
Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283 (2016).
Schäfer, J. & Strimmer, K. A shrinkage approach to largescale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. Biol. 4, 1–32 (2005).
Cheng, Q. MR.CUE. GitHub https://doi.org/10.5281/zenodo.7134872 (2022).
Acknowledgements
The computational work for this article was partially performed on resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg). The research of L.C. was supported by NIH 2R01GM108711, R35ES028379, and 1R01CA229618. The research of J.L. was supported by AcRF Tier 2 grant (MOET2EP202200009) from the Ministry of Education, Singapore, and DukeNUS/Khoo Bridge Funding Award (DukeNUSKBrFA/2020/0034).
Author information
Authors and Affiliations
Contributions
L.C. and J.L. conceived the design of the study and provided funding support. Q.C. undertook all the statistical and computational analyses, developed the software tool with assistance from X.Z.; L.C. and J.L. wrote the first draft of the manuscript. Q.C., X.Z., L.C., and J.L. provided comments to refine the manuscript and approved the final version.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Jingshu Wang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Cheng, Q., Zhang, X., Chen, L.S. et al. Mendelian randomization accounting for complex correlated horizontal pleiotropy while elucidating shared genetic etiology. Nat Commun 13, 6490 (2022). https://doi.org/10.1038/s41467022341641
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467022341641
This article is cited by

Altered DNA methylation within DNMT3A, AHRR, LTA/TNF loci mediates the effect of smoking on inflammatory bowel disease
Nature Communications (2024)

A robust cisMendelian randomization method with application to drug target discovery
Nature Communications (2024)

The eQTL colocalization and transcriptomewide association study identify potentially causal genes responsible for economic traits in Simmental beef cattle
Journal of Animal Science and Biotechnology (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.