Testing and controlling for horizontal pleiotropy with probabilistic Mendelian randomization in transcriptome-wide association studies

Integrating results from genome-wide association studies (GWASs) and gene expression studies through transcriptome-wide association study (TWAS) has the potential to shed light on the causal molecular mechanisms underlying disease etiology. Here, we present a probabilistic Mendelian randomization (MR) method, PMR-Egger, for TWAS applications. PMR-Egger relies on a MR likelihood framework that unifies many existing TWAS and MR methods, accommodates multiple correlated instruments, tests the causal effect of gene on trait in the presence of horizontal pleiotropy, and is scalable to hundreds of thousands of individuals. In simulations, PMR-Egger provides calibrated type I error control for causal effect testing in the presence of horizontal pleiotropic effects, is reasonably robust under various types of model misspecifications, is more powerful than existing TWAS/MR approaches, and can directly test for horizontal pleiotropy. We illustrate the benefits of PMR-Egger in applications to 39 diseases and complex traits obtained from three GWASs including the UK Biobank.


Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection

Data analysis
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Xiang Zhou
Jun 25, 2020 The present study does not involve data collection. We used publicly available data sets to validate the effectiveness of our proposed method. No software was used for data collection.
We used the newly developed R package PMR for data analysis. PMR is described in the Methods section and deposited at [http:// www.xzlab.org/software.html] and at Github [https://github.com/yuanzhongshang/PMR]. In addition, we also used the following software for comparative analysis in simulations. glmnet[https://cran.r-project.org/web/packages/glmnet/index.html](R version 3.6.3): an extremely efficient procedures for fitting the elastic-net regularization path for linear regression. GEMMA[http://xzlab.org/software.html](version 0.96): A genome-wide efficient mixed model association algorithm for a linear mixed model and some of its close relatives for GWAS. CoMM[https://github.com/gordonliu810822/CoMM](version 1.0): a collaborative mixed model to dissecting genetic contributions to complex traits by leveraging regulatory information. MR-PRESSO[https://github.com/rondolab/MR-PRESSO](version 1.0): A method that allows for the evaluation of horizontal pleiotropy in multi-instrument mendelian randomization utilizing genome-wide summary association statistics. Minimac3[https://genome.sph.umich.edu/wiki/Minimac3](version 2.0.1), is a lower memory and more computationally efficient implementation of the genotype imputation algorithms which is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy. IMPUTE2[https://mathgen.stats.ox.ac.uk/impute/impute_v2.html](version 2), a flexible and accurate genotype imputation method for the next generation of genome-wide association studies. SHAPEIT[https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html](version v2.r900),a fast and accurate method for estimation of haplotypes from genotype or sequencing data.