Model-based assessment of replicability for genome-wide association meta-analysis

Genome-wide association meta-analysis (GWAMA) is an effective approach to enlarge sample sizes and empower the discovery of novel associations between genotype and phenotype. Independent replication has been used as a gold-standard for validating genetic associations. However, as current GWAMA often seeks to aggregate all available datasets, it becomes impossible to find a large enough independent dataset to replicate new discoveries. Here we introduce a method, MAMBA (Meta-Analysis Model-based Assessment of replicability), for assessing the “posterior-probability-of-replicability” for identified associations by leveraging the strength and consistency of association signals between contributing studies. We demonstrate using simulations that MAMBA is more powerful and robust than existing methods, and produces more accurate genetic effects estimates. We apply MAMBA to a large-scale meta-analysis of addiction phenotypes with 1.2 million individuals. In addition to accurately identifying replicable common variant associations, MAMBA also pinpoints novel replicable rare variant associations from imputation-based GWAMA and hence greatly expands the set of analyzable variants.


Supplementary
: Three exemplary instances of non-replicable common variants from the discovery cohort (MAF>0.1%). Cross mark indicates the inversevariance weighted meta-analysis z-score from a two-sided hypothesis test, without any adjustment for multiple comparisons. Sizes of the dots are proportional to the sample size of the cohort. Orange color indicates the estimated posterior probability of a particular z-score to be an "outlier" given the association is non-replicable. Source data are provided as a Source Data file.

Supplementary Note Methods Descriptions and Simulation Procedure for Summary Statistics
We simulate datasets assuming a particular data-generating process (DGP) for the summary statistics. Datasets of 50,000 markers are generated in two steps: 1. Simulate the true effect sizes from a spike-and-slab distribution, where the probability of a real-associated SNP is 0.01 and the variance of real-associated SNPs is 2 = 2.5 × 10 −4 | = { (0, 2 ), = 1 0 = 0 2. Simulate the effect size estimates based upon the MAMBA, FE, RE, RE2, or BE DGP, conditional on the simulated true effect The effect size estimate variances 2 are generated in our simulation by sampling with replacement from the variance of the observed genetic effects from GSCAN studies.
In each simulated dataset, the total sample size across cohorts is 500,000. We consider two combinations for the number of cohorts being meta-analysed (5 or 10), and two combinations of sample size distribution (either equal or unequal sample sizes across different participating studies). We also vary the heterogeneity ( 2 ), effect size variance ( 2 ), and variance inflation ( ) depending on the DGP, as described below.

MAMBA Model
When summary statistics are generated according to the MAMBA model, where ( = 1) = 0.025 indicating a small probability of an outlier. When a summary statistic is an outlier, the variance of the effect estimate is inflated by a factor of . We set to be either 2, 5, or 15 for each simulation dataset, representing increasing severity of the outliers.

Fixed Effect Meta-Analysis Method
Fixed effects meta-analysis assumes that the genetic effects are constant across different participating studies. The fixed effects meta-analysis statistic combines study level genetic effect estimates using weights that are inversely proportional to the variance of the genetic effect estimates, i.e.
Under the FE DGP, summary statistics are generated according to the fixed effects model, The underlying genetic effect sizes are simulated based upon ~(0, 2 ). We set 2 to be either 5 × 10 −5 , 2.5 × 10 −4 , or 5 × 10 −4 for simulation datasets generated according to a fixed effects DGP.

Random Effects Meta-Analysis
Random effects methods assume that the underlying genetic effect varies across studies and follows a normal distribution. This is often written as a multi-level model, for example where is the study-specific genetic effect in cohort , is the overall population mean genetic effect, and 2 is the variance of study-specific effects, characterizing heterogeneity across study cohorts. In addition to estimating the degree of heterogeneity through 2 , random effects meta-analysis methods typically calculate a pvalue testing the null hypothesis that the population mean effect is zero, i.e. 0 : = 0. Under the multilevel random effect structure above, the meta-analysis test statistic can be calculated as The DerSimonian and Laird method (1986) is one of the most popular random-effect methods, as it is easily calculated in closed form. The variance of study-specific effects is calculated as Where is Cochran's statistic 1 . We control the level of heterogeneity in simulation by setting the random effect variance 2 such that 2 is either 0.05, 0.1, or 0.3 for all SNPs in a dataset, where 2 = 100% × ( − +1) and larger values of 2 indicate higher levels of heterogeneity.

Han and Eskin's Random Effects Model (RE2)
Han and Eskin developed the 2 test statistic as a likelihood ratio test 2 , where effect heterogeneity only exists under the alternative hypothesis. In RE2, P-values are calculated from an empirical distribution of test-statistics obtained using simulations. Under this model, it's assumed that heterogeneity only exists under the alternative hypothesis, when ≠ 0.
Similar to RE, we specify heterogeneity in our simulation through 2 . The random effect variance 2 is set such that 2 is either 0.05, 0.1, or 0.3 for all SNPs in a dataset.

Han and Eskin's Binary Effects Model (BE)
Han and Eskin developed the Binary Effects (BE) model 3 as a meta-analysis model to interpret the study specific effect heterogeneity. In the BE model, for each SNP with real non-zero effects, we randomly sample a subset of the studies to have fixed effect size , while the rest of the studies have no effect. In simulations, both the RE2 and BE analyses were performed using METASOFT by the same authors with default parameters.

SCREEN Method:
SCREEN 4 is a method which was developed to identify the number of studies where a given SNP has replicable effects. SCREEN takes the p-values from each participating study as input. It calculates the posterior probability that a SNP is non-null in at least k studies, where k=2,3, . . . , K. In our simulation studies, we used the author's implementation of SCREEN available in the Supplement of their paper, and used the estimated FDR of = 2 as the primary significance metric for a SNP. This calculates the FDR for "minimal replicability" in at least 2 studies, and thus maximizes the "power" for SCREEN in our comparisons.

EM Algorithm
Let = ( , , 2 , ), the parameters we need to estimate in our model. and ( 1 , . . . , ) are the latent variables for SNP , where ( = 1) denotes at the SNP level the presence of a non-zero replicable effect. For non-replicable SNPs with zero-mean effect ( = 0), we let denote at the cohort level whether a particular study's effect estimate is a misrepresentative outlier ( = 1), or a well-behaved estimate ( = 0). The complete data log-likelihood is then given by Now we maximize ( , ( ) ). These solutions can be found by taking the derivative of ( , ( ) ) with respect to each parameter, and solving for the root.