## Introduction

A large portion of the signal discovered in genome-wide associations studies (GWAS) has been localized to non-coding regions1. In light of this, researchers have developed post-GWAS approaches to elucidate the functional consequences of variants and their impact on the etiology of traits2. One notable approach has been to generate genetic predictors of gene expression and leverage these predictors with GWAS data to associate genes with traits of interest3,4. These transcriptome-wide association studies (TWAS) have not only shown great promise in terms of discovery and interpretation of association signals but have also helped prioritize potentially causal genes for complex diseases5. Nonetheless, methods like TWAS are limited by the accuracy and power of the genetic predictors generated in training datasets6,7,8,9,10,11.

The original TWAS methodology builds genetic predictors of expression on a context-by-context basis. For example, in a study with RNA-seq and genotypes collected across multiple tissues, the expression of each tissue would be modeled independently3,4. More recent methods model multiple contexts simultaneously and leverage the sharing of genetic effects across contexts8,9,10,12. However, these approaches do not maximize predictive power because they ignore the intra-individual correlation of gene expression across contexts inherent to studies with repeated sampling, e.g., the Genotype-Tissue Expression (GTEx) project13 (Supplementary Fig. 1) or single-cell RNA-Sequencing (scRNA-Seq) experiments (Supplementary Fig. 2). Moreover, they build predictors which are mixtures of both context-specific and context-shared (pleiotropic) genetic effects, making it difficult to distinguish the relevant contexts for a disease gene, and are often computationally inefficient9. A recent approach by Wheeler et al.14 does model correlated intra-individual noise with a linear-mixed model, but does not produce combined predictions of expression, reducing overall power. Finally, existing methods, with the goal of maximizing the number of discoveries made, may employ multiple testing strategies that either fail to control for all tests performed, (e.g., by controlling the false discovery rate (FDR) within each context separately4,15), or limit their discoveries as they are based on conservative FWER control (e.g., by using Bonferroni adjustment across all contexts15). Together, these shortcomings reduce power and interpretability of TWAS.

Here, we introduce CONTENT—CONtexT spEcific geNeTics—a method that leverages the correlation structure of multi-context studies to efficiently and powerfully generate genetic predictors of gene expression. Briefly, CONTENT decomposes the gene expression of each individual across contexts into context-shared and context-specific components16, builds genetic predictors for each component separately, and creates a final predictor using both components. To identify genes with significant disease associations, CONTENT employs a hierarchical testing procedure (termed “hFDR”; see Supplementary Fig. 3)17,18. CONTENT has several advantages over existing methods. First, it explicitly accounts for intra-individual correlation across contexts, boosting prediction performance. Second, by building specific and shared predictors, it can distinguish context-shared from context-specific genetic components of gene expression and disease. Third, it employs a recently developed hierarchical testing procedure18 to not only adequately control the FDR across and within contexts, but boost power in cases where a gene has a significant association to disease in multiple contexts. Fourth, this adjustment procedure allows for inclusion of other TWAS predictors3,4,8,9,10,12, enabling approaches to be complementary in discovering associations. Finally, CONTENT is orders of magnitude more computationally efficient than several previous approaches.

We evaluate the performance of CONTENT over simulated data sets, GTEx2,11,13, and a single-cell RNA-Seq data set19,20,21. We show in simulations that CONTENT captures a greater proportion of the heritable component of expression than previous methods (at minimum over 22% more), and that CONTENT successfully distinguishes the specific and shared components of genetic variability on expression. In applications to GTEx, CONTENT improves over previous context-by-context methods both in the number of genes with a significant heritable component (average 42% increase in significant gene-tissue pairs discovered) as well as the proportion of variability explained by the heritable component (average increase of 28%)3,4. Consistent with complex cell-type heterogeneity within tissues22,23,24,25, we find that in applications to the single-cell data, genetic predictors at the cell-type level have substantially more context-specific heritability than the tissue-level models. We perform TWAS across 22 phenotypes using weights trained on GTEx and scRNA and find that CONTENT discovers over 51% additional independent, significantly associated loci relative to previous approaches. We provide CONTENT gene expression weights for both GTEx and the single-cell dataset at the TWAS/FUSION repository (http://gusevlab.org/projects/fusion/).

## Results

### Methods overview

We developed CONTENT, a method for generating genetic predictors of gene expression across contexts for use in downstream applications such as TWAS. Briefly, for each individual, CONTENT leverages our recently developed FastGxC method16 to decompose the gene expression across C contexts into one context-shared component and C context-specific components (Fig. 1). Next, CONTENT builds genetic predictors for the shared component and each of the C context-specific components of expression using penalized regression. We refer to these predictors as the CONTENT(Shared) and CONTENT(Specific) models. In addition, CONTENT generates genetic predictors of the total expression in each context by combining the context-shared and context-specific genetic predictors with linear regression. We refer to these predictors as the CONTENT(Full) models. A given gene may have CONTENT(Specific), CONTENT(Shared), and/or CONTENT(Full) models depending on the architecture of genetic effects.

We residualized the expression of each gene in each context over their corresponding covariates (e.g. PEER factors, age, sex, batch information) prior to decomposing and then fitting an elastic net with double ten-fold cross-validation for both CONTENT(Shared) and CONTENT(Specific). We examined the number of significantly predicted genes as well as the prediction accuracy (in terms of adjusted R2) between the cross-validation-predicted and true gene expression per gene-context pair. To properly control the FDR for each method across contexts and genes, we employed a hierarchical FDR correction17,18 (Supplementary Fig. 3 and Methods). We note that groups of contexts may comprise additional sources of pleiotropy (e.g., in GTEx the group of brain tissues may have their own shared effects in addition to the overall tissue-shared effects). The decomposition of CONTENT is flexible and can account for both levels of pleiotropy among contexts (see Supplementary Methods).

### CONTENT is powerful and well-calibrated in simulated data

We evaluate the prediction accuracy of CONTENT in a series of simulations and compare its performance to the context-by-context approach3,4, which builds predictors by fitting an elastic net in each context separately, as well as UTMOST9, which builds predictors over all contexts simultaneously using a group LASSO penalty. Implicitly, we compare to the method from14 which decomposes expression into orthogonal context-shared and context-specific components, as the CONTENT(Shared) and CONTENT(Specific) models are related to these components (See Methods). We omit comparison to other TWAS methods as many of them are built on the same framework as the context-by-context approach, or require external data, such as curated DNase I hypersensitivity measurements8,10,12.

We used simulation parameters from GTEx, the largest multi-context eQTL study to-date, as a guideline. Specifically, we generated gene expression and genotype data such that context-specific genetic effects mostly lie on the same loci as context-shared eQTLs, and context-specific eQTLs without context-shared effects are rare2,16. Intuitively, this framework assumes that, most often, SNPs affect expression of a gene in all contexts, but to a different extent in each context (rather than, for example, acting as an eQTL in only a single context). We varied the proportion of contexts with context-specific heritability, the number of context-specific eQTLs without a context-shared effect, the number of causal SNPs, and the intra-individual residual correlation while keeping the number of genes (1000), contexts (20), cis-SNPs (500) and the proportion of context-shared and context-specific heritability constant (0.3 and 0.1 respectively).

Throughout our simulations, CONTENT significantly outperformed the context-by-context and UTMOST approaches in terms of prediction accuracy of the total genetic contribution to expression variability (Fig. 2A, Supplementary Fig. 4). The average increase in adjusted R2 between the true genetic component of expression and the CONTENT(Full) predictor was 0.22 over UTMOST (p < 2e-16 paired two-way t-test) and 0.48 over the context-by-context approach (p < 2e-16 paired two-way t-test). Across nearly the entirety of parameter settings, CONTENT generated context-specific components that were uncorrelated with the true context-shared components (mean adjusted R2 = 0.023, and vice versa 0.026; Fig. 2B, C). This property is central to the objective as it reduces confounding from pleiotropy in downstream applications such as context fine-mapping. As expected, the previous methods failed to disentangle the context-specific and context-shared components (Fig. 2B, C), since they were not developed with this property in mind. Our results were consistent under different values of the simulation parameters (Supplementary Figs. 58).

### CONTENT improves prediction accuracy over previous methods in the GTEx and CLUES datasets

We next evaluated CONTENT, the context-by-context approach, and UTMOST in terms of prediction accuracy and power across 22,447 genes measured in 48 tissues of 519 European individuals in the bulk RNA-seq GTEx data set2,11,13. Due to computational issues (Supplementary Fig. 9), UTMOST was examined only on 22,307 genes rather than the entire data set of 22,447 genes. We show a comparison on this smaller set of genes in Supplementary Fig. 10. We also examined, for the first time in a large-scale TWAS context, a single-cell RNAseq data set from the California Lupus Epidemiology Study (CLUES)19,20. The CLUES data set contained 9592 genes measured in 8 cell types in peripheral blood from 90 individuals.

In GTEx, CONTENT identified more gene-tissue pairs with a significantly predictable genetic component of expression (278,101 over 20,506 genes) than the context-by-context approach (195,607 over 17,723 genes) and UTMOST (167,865 over 11,442 genes) at an hFDR of 5% for all approaches. These results also held when using the traditional FDR approach within each context separately for all approaches (Supplementary Table 1 and Supplementary Fig. 11). We also compared the performance of each method on the union of genes that were significantly predicted (hFDR ≤ 5%) by at least one method. As CONTENT can generate up to three models (Shared, Specific, Full) for a given gene-tissue pair, and because each gene may have its own unique architecture (i.e. different proportions of specific or shared heritability), we selected the model that achieved the greatest cross-validated adjusted R2. CONTENT greatly outperformed the context-by-context and UTMOST approaches across all tissues (average 28% and 22% increase in adjusted R2 across tissues and genes; Fig. 3; Supplementary Figs. 10, 12). Further, for genes with significant CONTENT(Shared), CONTENT(Specific), and CONTENT(Full) predictors, prediction accuracy increases substantially with the addition of the context-specific component to the context-shared component (average gain of CONTENT(Full) over CONTENT(Shared) adj. R2 of 55.92%), emphasizing the need to extend previous approaches14 with CONTENT(Full) to build a powerful predictor.

Within the single-cell CLUES data set, CONTENT again outperformed the context-by-context (in this case, cell type-by-cell type) and UTMOST approaches, discovering 9080 heritable gene-cell type pairs (5067 genes) whereas the context-by-context model and UTMOST found 4314 (2355 genes) and 804 (288 genes) respectively. The average improvement in adjusted R2 of CONTENT over the context-by-context model was 13.6%. In gene-cell type pairs with significant CONTENT(Full), CONTENT(Specific), and CONTENT(Shared) models, CONTENT(Full) improved the adjusted R2 over CONTENT(Shared) by 104.09%. Once more, the improvement in variability explained when including both the cell type-specific and cell type-shared components highlights the need to consider both components simultaneously when building a predictor.

### CONTENT discovers significant context-specific components of expression in bulk multi-tissue and single-cell datasets

Given the ability of CONTENT to disentangle context-shared and context-specific variability, we examined the context-specific components of expression in GTEx and CLUES. In GTEx, CONTENT discovered 128,985 gene-tissue pairs (19,765 genes) with a significant context-specific genetic component of expression (Fig. 4; Supplementary Fig. 13). As with previous reports16,26, we found that testis was the tissue with the greatest number of tissue-specific genetic components. Nonetheless, we observe that the tissues with larger sample sizes more frequently had significant context-specific components. Consistent with previous works that have discovered extensive eQTL sharing across tissues2,26,27, we found that in gene-tissue pairs with a CONTENT(Full) model, the variability explained was dominated by CONTENT(Shared) model—across tissues, the context-shared component explained on average 70% of the variability explained by CONTENT(Full).

In the CLUES data set, CONTENT discovered 7466 gene-cell type pairs (4658 genes) with a significant cell type-specific component of expression (hFDR ≤ 5%). We found that all cell types had a similar number of cell type-specific components, and emphasize that the sample size across all cell types was equivalent. In genes with a CONTENT(Full) model, the variability was often dominated by the cell type-specific variability (average 75% of the explained variability), unlike GTEx, in which the average tissue-specific variability explained only 30% of total variance. Consequently, we found that within the 20,433 genes in GTEx with any genetic component, 51.50% (10,522) had a significant shared component, whereas of the 5067 genes in CLUES with a genetic component, only 14.25% (722) had a shared component. This is consistent with complex celltype heterogeneity in bulk tissues28 since there is more power to discover eQTLs with pleiotropy across the underlying cell types.

### CONTENT more accurately distinguishes disease-relevant genes than traditional TWAS approaches in simulated data

We performed a simulation study in which we evaluated the sensitivity, specificity, and power of CONTENT, UTMOST, and context-by-context to implicate the correct gene in TWAS. In our experiments, we simulated a phenotype in which 20% of the variability was composed of the genetically regulated expression of 300 randomly selected gene-context pairs (100 genes and 3 contexts each). We simulated gene expression for 1000 genes across 20 contexts as before, however, to capture a range of genetic architectures in the simulation, for each gene, we sampled from a standard uniform distribution to determine the proportion of shared variability. We varied the heritability of gene expression and considered power as a method’s ability to discover the correct genes associated with a phenotype. To compare power, we calculated the area under receiver-operating curve (AUC) using the maximum association statistic for a given gene across contexts.

Across simulations, CONTENT(Full) was the highest powered in terms of gene discovery (Fig. 5). CONTENT(Shared) performed very similarly to CONTENT(Full) in the setting with the lowest heritability, however, our simulations show the necessity for CONTENT(Full) as it substantially outperforms both CONTENT(Specific) and CONTENT(Shared) across a range of heritabilities. Moreover, CONTENT(Full) significantly outperformed both the context-by-context approach and UTMOST. Specifically, the range of percent change in AUC of CONTENT(Full) over previous methods was as follows: CONTENT(Shared) 1.9–9.9%; CONTENT(Specific) 13.6–22.4%; UTMOST 2.2–8.6%; context-by-context 1.2–10.6%. Generally, we observed that CONTENT(Full) was its most powerful for genes in which there was both shared and specific effects, UTMOST was its most powerful in settings with high sharing, and the context-by-context approach was its most powerful in settings with low sharing and high specificity of genetic effects within contexts.

### Application of CONTENT to TWAS yields additional discoveries over previous methods

We performed TWAS across 22 complex traits and diseases collected from a variety of GWAS29,30,31,32,33,34,35,36,37,38,39,40,41,42 using weights trained by CONTENT, UTMOST and the context-by-context approach on GTEx and CLUES. We passed forward weights to FUSION-TWAS3—a software that performs TWAS using GWAS summary statistics, user-specified gene expression weights, and an LD reference panel—for a gene-context pair if the pair’s expression was predicted at a nominal p-value less than 0.1 (See Methods; Supplementary Fig. 16).

Across all traits at an hFDR of 5%, CONTENT discovered a median of 51% (range of 5–178%) and 135% (51–400%) more associations (unique TWAS loci) than the context-by-context approach and UTMOST respectively with GTEx weights, and 62% (0–289%) and 101% (47–600%) more loci than the context-by-context approach and UTMOST respectively with weights built from the CLUES dataset (Table 1). We find that, with GTEx weights, the associations implicated by the context-by-context approach had more overlap with the associations implicated by CONTENT(Specific) (median Jaccard similarity (JS) across traits = 0.419) than CONTENT(Shared) (JS = 0.234). This is consistent with our simulation results in which the context-by-context approach was most powerful in cases of high context-specificity and low context-sharing (Supplementary Figs. 14, 15). Conversely, the associations discovered by UTMOST, which leverages pleiotropy, had slightly higher overlap with CONTENT(Shared) (JS = 0.221) than CONTENT(Specific) (JS = 0.177). With CLUES weights, the context-by-context approach again had greater similarity with CONTENT(Specific) (JS = 0.291) than CONTENT(Shared) (JS = 0.098), however UTMOST discovered TWAS genes that had similar overlap between CONTENT(Shared) (JS = 0.119) and CONTENT(Specific) (JS = 0.135). As UTMOST, CONTENT, and the context-by-context approach discovered both overlapping and unique associations, we suggest that the approaches complement—rather than replace—one another.

We next compared the different CONTENT models to understand their properties in real data. With GTEx weights, CONTENT(Full) replicated an average of 98.3% and 67.3% of the associations discovered by CONTENT(Shared) and CONTENT(Specific) respectively (hFDR ≤ 5%). CONTENT(Full) replicated an average of 81.2% and 61.6% of the associations discovered by CONTENT(Shared) and CONTENT(Specific) respectively with the CLUES weights. Notably, CONTENT(Full) is the best predictor out of all the CONTENT models on average, particularly when there exist both shared and specific effects. Consequently, across all traits, the inclusion of CONTENT(Full) with CONTENT(Shared) and CONTENT(Specific) led to an average increase of 12% and 21% in the number of genes with significant TWAS associations with GTEx weights and CLUES weights respectively.

We investigated the genes implicated by CONTENT(Full) that were not significant in CONTENT(Shared) or CONTENT(Specific) and found that many of the discoveries replicated known gene-trait associations. For example, CONTENT(Full) discovered a significant association of fasting glucose levels and CAMK2 (p = 2.44e-23, brain cortex), a gene responsible for calcium signaling and regulation of hepatic glucose production43, as well as BLVRA (4.21e-06, CD8 T cell), a gene involved in insulin signaling and likely metabolic syndrome44. Furthermore, CCL2, which is thought to be involved in HDL internalization and cholesterol efflux45, was not implicated by either CONTENT(Shared) or CONTENT(Specific), but was implicated in the TWAS of HDL with CONTENT(Full) (p = 2.30e-08, small intestine terminal ileum). CONTENT(Full) also discovered a significant association of F2 (prothrombin) and primary biliary cirrhosis (PBC) (1.47e-07, liver), whereas CONTENT(Shared) and CONTENT(Specific) did not; PBC patients have been shown to have higher prothrombin times than controls46. Moreover, CONTENT(Full) discovered an association of GIT1—a gene involved with synaptic transmission and plasticity47,48—with bipolar disorder (BIP; B cell, p = 3.20e-06) as well as an association of GSDMB—a gene involved with airway remodeling and airway-hyperresponsiveness49—and asthma (CD4 T cell, p = 1.25e-20).

Moreover, the genes implicated by CONTENT but neither UTMOST nor the context-by-context approach (at an hFDR of 5%) replicated previously associated gene-trait pairs, several of which with known biological relationships to the trait of interest. Within Alzheimer’s disease, these genes included VGF50, FZD451, and TRPV6 (a transient receptor potential channel)52,53 with the GTEx weights, as well as IRF754 and GANC55 with CLUES weights. Additionally, in Crohn’s disease, CONTENT implicated the following genes, whereas previous methods did not: STAT356 and CTBP257 with GTEx weights, as well as ATG16L58 and PKAR2A59 using CLUES weights. For major depression disorder (MDD), CONTENT implicated SYN2M60 and CYB56AD161 using GTEx weights, and GAB162, TLR461 and ARL363 using CLUES weights.

As the individuals comprising the GTEx and CLUES datasets are disjoint, we also investigated whether using both datasets could highlight relevant biological genes (akin to a replication study). We first examined LDL genes and found SORT1, which alters plasma LDL levels (GTEx min. p = 2.15e-251, CLUES min. p = 2.41e-19)64,65,66. We next found an association between S100A4, S100A8, S100A10, S100A11 as well as S100A12 (part of the epidermal differentiation complex) and Eczema using both datasets (S100A10 p = 2.78e-41, p = 2.90e-11)67,68. Additionally, when we looked at discoveries made with GTEx and CLUES weights for Alzheimer’s disease, we found MARK4 (p = 8.72e-20, p = 6.39e-63), a gene associated with tau phosphyrlation in granulovacuolar degeneration bodies69. Finally, both sets of weights produced a significant association of immune checkpoint gene CTLA4 (p = 1.71e-11, p = 2.28e-21) with Rheumatoid Arthritis70.

While CONTENT discovered substantially more loci and genes than previous approaches, we also wished to verify that it does not enrich for false positives. To do so, we performed an analysis similar to one carried out by Ndungu et al.71. Briefly, Ndungu et al. evaluated the extent to which TWAS associations may be driven by horizontal pleiotropy or linkage disequilibrium by examining TWAS associations for a set of genes with a known causal relationship to a set of metabolites. In our analyses, we examined the within-locus (±1 Mb) rank of the causal TWAS gene with its suspected metabolite when using weights built by CONTENT and the context-by-context approach on the GTEx dataset. To order genes within a method, we first filtered for statistically significant gene-context-metabolite associations, then sorted genes by their maximum absolute TWAS association statistic between a given metabolite across contexts (and models for CONTENT). In line with our applications of TWAS to GTEx and CLUES, CONTENT discovered additional loci that were not discovered by the context-by-context approach (39 compared to 36 of 58 known gene-metabolite pairs; Supplementary Table 3). Moreover, despite having more models built per locus, CONTENT ranked the known causal gene similarly to the context-by-context approach on the intersection of gene-metabolite pairs discovered by both methods (CONTENT average rank of 2.257 compared to context-by-context rank of 2.371, where a ranking of 1 is ideal).

## Discussion

In this work, we introduce CONTENT, a computationally efficient and powerful method to estimate the genetic contribution to expression in multi-context studies. CONTENT can distinguish the context-shared and context-specific components of genetic variability and can account for correlated intra-individual noise across contexts. Using a range of simulation and real studies, we showed that CONTENT outperforms previous methods in terms of prediction accuracy of the total genetic contribution to expression variability in each context. We also found that when there exists a gene with a genetic component of expression, the heritability is often dominated by the context-specific effects at the single-cell level, but at the tissue level, the heritability is dominated by the context-shared effects. Finally, CONTENT was more powerful, specific, and sensitive than previous approaches in applications to TWAS.

Using weights trained by CONTENT, UTMOST and the context-by-context approach, we discovered 12,150 unique gene-trait associations through TWAS. To our knowledge, we present the first application of TWAS trained on a single-cell RNAseq dataset for a collection of 90 individuals’ PBMCs. For both the weights generated by GTEx and CLUES, CONTENT was largely more powerful than UTMOST and the context-by-context approach in TWAS. However, we emphasize that the approaches often capture genes unique to each approach. Each method may therefore complement each other and may be combined in TWAS to maximize the number of discoveries made as different methods are likely favorable under different genetic architectures. Though we show that CONTENT may be useful in fine-mapping the specific tissue relevant for a TWAS association in simulations, we note that fine-mapping to the correct tissue in real data is a particularly difficult task. For example, throughout this manuscript, we assume that the causal tissue is included in the measured tissues, however, when this is not the case, CONTENT and all TWAS approaches may associate an incorrect, correlated tissue. For example, in the case of chronic kidney disease, CONTENT implicated GATM—a gene thought to be involved with kidney disease and GFR levels72,73,74—however, the significant association was within the thyroid. This may be due to the fact that kidney expression is not measured in this version of the GTEx dataset. Future work may explore using the CONTENT-trained weights and jointly fitting all TWAS Z scores, or otherwise accounting for missingness.

We also leveraged recently developed methodology for controlling the false discovery rate when summarizing significantly predicted genes, gene-contexts, and TWAS associations17,18. This approach has been shown to effectively control the FDR across contexts in eQTL studies, and to our knowledge, it is the first time such an approach has been used to effectively control the FDR when predicting expression values and when making discoveries using TWAS. While our analyses focused on the comparison of CONTENT, UTMOST, and the context-by-context approach, we emphasize that by using this type of false discovery correction, all methods can be used in combination with one another, rather than in replacement of one another. For downstream analyses, combining all prediction methods is crucial, as certain genes or gene-context pairs may be (better) predicted by one method and not others. In the GTEx data for example, when we included models built by UTMOST and the context-by-context approach to the correction scheme for CONTENT, the number of genes for which there was a significant model for a given tissue increased on average by 7.56%.

Importantly, neither UTMOST nor the context-by-context method distinguishes the context-specific and context-shared components of genetic effects on expression. Implicitly, by modeling all contexts independently, the context-by-context fit is best-suited for cases in which there is no effect-sharing across contexts (Supplementary Fig. 12). As UTMOST considers all contexts simultaneously, its power is maximized in cases where the genetic effects are mostly shared (Supplementary Fig. 12). Additionally, these methods do not account for the shared correlated residuals between samples, thus they do not maximize their predictive power.

While a previous approach proposed by Wheeler et al.14 does model the correlated intra-individual noise, CONTENT offers several advantages. The previous decomposition does not include an option to leverage both the context-shared and context-specific components of expression to form a final predictor of the observed expression for a given context. We show that this is especially crucial in the context of single-cell data wherein the prediction accuracy for a given gene-context increases drastically when using both components (Fig. 3). Further, without properly combining both components (e.g. via regression), the context-specific genotype-expression weights produced by the previous decomposition may have the incorrect sign, as they are considered residuals of the context-shared component and are not properly re-calibrated to the observed expression. Unlike the decomposition proposed by CONTENT, this previous approach also does not intuitively allow for additional sources of pleiotropy or effects-sharing (see Supplementary Information for discussion of brain-level sharing in GTEx; Supplementary Figs. 1720 and Table 4). Finally, the decomposition used in the previous method is based on a linear mixed model fit on a per-gene basis, and is therefore less computationally efficient.

Notably, a limitation of TWAS methods in general is interpretability, as associations may be confounded by linkage disequilibrium or horizontal pleiotropy71,75. We emphasize that CONTENT discovered substantially more independent loci than previous methods, however, since CONTENT is more powerful than previous methods, it may build more models within a given locus relative to previous approaches. We performed a brief set of analyses in line with Ndungu et al.71, in which we evaluated the ability of TWAS approaches to associate the suspected causal gene to a collection of metabolites. Despite CONTENT building more models than the context-by-context approach, it prioritized suspected genes the same as or better than the context-by-context approach in addition to discovering several more loci than did the context-by-context approach (Supplementary Table 3). We therefore conclude that, similarly to GWAS fine-mapping studies, resolution of downstream TWAS fine-mapping methods (e.g., FOCUS75) should increase with the use of our models, as our gain in performance is akin to that expected from an increase in sample size. Moreover, since CONTENT discovers additional loci over previous approaches, it undoubtedly will present additional useful information for such studies.

In this manuscript we focused on prediction of the total genetic contribution to expression as well as the context-shared and context-specific components of expression. Nonetheless, future work using the methodology presented here can be extended to a wide variety of problems. Primarily, the decomposition can be used to efficiently estimate Gene × Context heritability using existing software for heritability estimation, e.g., GCTA76, on the decomposed components offering computational speed up over existing methods for cross-context heritability estimation27. Additionally, the decomposed components from CONTENT may also be included in previous approaches, e.g. UTMOST, to gain further power. Further, by training each method on the single-cell level data, we offer researchers the means to pursue their own association analyses at a lower level of granularity than was previously available.

Notably, we found that single-cell data may have lower levels of effects-sharing than tissue-level data. While this may be due to genuine biological differences in genetic regulation, this finding is also consistent with a large degree of sharing of cell types across contexts. For example, endothelial cells can be found in tissues such as breast, endometrium, esophagus, eye, heart muscle, liver, lung, ovary, pancreas, placenta, prostate, skeletal muscle, and skin and often make up a substantial fraction of the collected tissue77,78. We believe our work is consistent with this observation: Primarily, the proportion of eGenes (genes with a heritable component of expression) that also have a shared component is substantially lower at the single cell level compared to the proportion at the bulk, tissue level. What’s more is that the ability to discover context-specific components of expression is indeed related to sample size in the GTEx dataset. Despite the above, and having a lower number of individuals in the single-cell data, we discover a greater proportion of genes with a context-specific component than in GTEx. Further, when there exists a CONTENT(Full) model, it is dominated by the specific variability at the single-cell level, whereas it is dominated by the shared variability at the tissue level. Nonetheless, as this finding, to our knowledge, was previously unappreciated, it warrants further investigation.

In summary, we present an approach for generating context-shared and context-specific predictors that is much simpler than previous approaches14,16. Moreover, unlike previous methods, we offer a way to combine both predictors, as well as extend the decomposition to additional levels of pleiotropy. Finally, we show utility of existing hierarchical FDR correction methods to properly adjust for analyses that take advantage of multiple methods as well as investigate genes in the space of multiple contexts. The increased prediction accuracy, specificity, computational speed, and hierarchical testing framework of CONTENT will be paramount to unveiling context-specific effects on disease as well as uncovering the mechanisms of context-specific genetic regulation.

## Methods

### An overview of the CONTENT model

In this section, we detail the assumed generative model and objectives of CONTENT. CONTENT is based on the methodology and decomposition of a previous work by Lu et al. FastGxC16. In brief, like FastGxC, we assume that the expression of an individual in a given gene and context is a combination of a context-shared genetic component that is shared across different contexts and a context-specific genetic component that is specific to a context, that is

$${E}_{c}={E}_{G}^{{{{{{{{\rm{Shared}}}}}}}}}+{E}_{G,c}^{{{{{{{{\rm{Specific}}}}}}}}}+{\varepsilon }_{c}$$
(1)
$${E}_{G}^{{{{{{{{\rm{Shared}}}}}}}}}={{{{{{{\bf{g}}}}}}}}{{{{{{{\boldsymbol{\beta }}}}}}}}$$
(2)
$${E}_{G,c}^{{{{{{{{\rm{Specific}}}}}}}}}={{{{{{{\bf{g}}}}}}}}{{{{{{{{\boldsymbol{\gamma }}}}}}}}}_{c}$$
(3)

where Ec denotes the expression of the individual at the gene in context c, $${E}_{G}^{{{{{{{{\rm{Shared}}}}}}}}}$$ and $${E}_{G,c}^{{{{{{{{\rm{Shared}}}}}}}}}$$ denote the components of the expression due to context-shared and context-specific genetic effects respectively, β and γc represent the context-shared and context-specific cis-genetic effects respectively, g the individual’s cis-genotypes and $${\varepsilon }_{c} \sim N(0,{\sigma }_{c}^{2})$$ represents the environmental effects (and non-cis-genetic effects) on the individual’s gene expression.

The objective of CONTENT is to build a genetic predictor of context-specific phenotypes. While previous work has focused on building powerful genetic models for Ec, we aim to build unbiased models that partition and estimate the context-shared gβ and context-specific terms gγc. Specifically, we aim to maximize the power to detect the context-specific terms, allowing some leniency in the accuracy of context-shared terms, as we are interested in context-specific effects. Moreover, as a context-specific predictor can be used in downstream analyses to identify the specific context(s) through which genetic variation manifests its effect on the phenotype and disease risk, we also aim to minimize the correlation between the predicted context-specific component and the true context-shared component. Finally, our method must account for the correlated intra-individual noise across contexts, and do so in a computationally efficient manner.

### Decomposing multilevel data

Many genomic datasets, such as those of GTEx, have a multilevel nature; first the individuals are sampled, and second an individual is measured in each context. To take the multilevel structure of the data into account, the observed expression on gene j can be decomposed into an offset term, a between-individual component and a within-individual component79. That is, if Eijc denotes the observed expression level for individual i (i = 1, …, I) on gene j (j = 1, …, J) and context c (c = 1, …, C), Eijc can be decomposed as

$${E}_{ijc}={E}_{.j.}+({E}_{ij.}-{E}_{.j.})+({E}_{ijc}-{E}_{ij.})$$
(4)

where $${E}_{.j.}=\frac{1}{I\times C}\mathop{\sum }\nolimits_{i=1}^{I}\mathop{\sum }\nolimits_{c=1}^{C}{E}_{ijc}$$ the mean expression of gene j computed over all (I) individuals and all (C) contexts, and $${E}_{ij.}=\frac{1}{C}\mathop{\sum }\nolimits_{c=1}^{C}{E}_{ijc}$$ the mean expression of individual i on gene j, computed over all contexts. In (4), E.j. is a term that is constant across individuals and contexts for each gene, (Eij. − E.j.) is the between-individuals deviation, and (Eijc − Eij.) is the within-individual deviation of the expression on gene j in context c.

Variables that differ between but not within individuals, e.g. sex and genotype, will have an effect on (Eij. − E.j.) but not on (Eijc − Eij.). On the other hand, variables that change within individuals but are the same between individuals, e.g. the genetic effect on a specific context, will have an effect on (Eijc − Eij.) but not on (Eij. − E.j.).

In the context of estimation, we first center and scale the expression of gene j in each context c, i.e., $$\frac{1}{I}\mathop{\sum }\nolimits_{i=1}^{I}{E}_{ijc}=0$$ and $$\frac{1}{I}\mathop{\sum }\nolimits_{i=1}^{I}{E}_{ijc}^{2}=1$$. Therefore, $${E}_{.j.}=\frac{1}{I\times C}\mathop{\sum }\nolimits_{i=1}^{I}\mathop{\sum }\nolimits_{c=1}^{C}{E}_{ijc}=0$$, and equation (4) simplifies to:

$${E}_{ijc}=\underbrace{{E}_{ij.}}_{{E}_{ij}^{{{{{{{{\rm{Shared}}}}}}}}}}+\underbrace{\left({E}_{ijc}-{E}_{ij.}\right)}_{{E}_{ijc}^{{{{{{{{\rm{Specific}}}}}}}}}}$$
(5)

### A formal description of CONTENT

We use the simplified decomposition in equation (5) to build genetic predictors of context-specific effects while accounting for the correlated intra-individual noise across contexts. Intuitively, the between-individuals variability serves as the component of expression that is shared across contexts, EShared, and the deviance from this shared component (i.e. the within-individual variability) serves as the context-specific component of expression, ESpecific. Moreover, treating the context-specific component as a deviance from the context-shared component leads the decomposition to have the property that as the correlation of intra-individual noise across contexts increases, the power to detect context-specificity also increases. In addition, the decomposition generates context-shared and context-specific components of expression that are orthogonal to each other. Further rationale for using the decomposed expression is included Supplementary Note 1 and the text by Lu et al.16. Lu et al. also include a description of the decomposition’s equivalence to a linear mixed model.

For a single gene j, CONTENT takes as input centered, scaled, and residualized (over a set of covariates) expression measured across I individuals in C contexts and an I × m genotype matrix Gj with m measured cis-SNPs for gene j. CONTENT then decomposes the expression vectors into C context-specific components and a single context-shared component by simply calculating the mean of expression for each individual across contexts, and setting the context-specific expression for context c as the difference between the observed expression of context c and the calculated context-shared (mean) expression. As it has been observed that cis-genetic effects may be sparse and that the elastic net may perform best relative to other penalized linear models in the context of genetically regulated gene-expression4,14, CONTENT fits C + 1 penalized linear models for the C + 1 expression components using an elastic net. Lastly, CONTENT generates a final genetic predictor of expression by combining the context-shared and context-specific components. Importantly, as the context-specific component is a deviance from the context-shared component, the sign of the context-specific component must be properly realigned when combining both components of expression to make a final predictor. We refer to this linear combination of expression components as the “full” model of CONTENT and fit it using a simple linear regression:

1. 1.

Obtain $${{{{{{{{\bf{E}}}}}}}}}_{j}^{{{{{{{{\rm{Shared}}}}}}}}}$$ and $${{{{{{{{\bf{E}}}}}}}}}_{jc}^{{{{{{{{\rm{Specific}}}}}}}}}$$ from the decomposition across all individuals.

2. 2.

Generate cis-genetic predictors of each component using cross-validated elastic net:

1. (a)

Fit cross-validated elastic net regressions for the shared and specific components:

$${{{{{{{{\bf{E}}}}}}}}}_{j}^{{{{{{{{\rm{Shared}}}}}}}}}={{{{{{{{\boldsymbol{\alpha }}}}}}}}}^{{{{{{{{\rm{Shared}}}}}}}}}+{{{{{{{{\bf{G}}}}}}}}}_{j}{{{{{{{\boldsymbol{\beta }}}}}}}}+{{{{{{{{\boldsymbol{\varepsilon }}}}}}}}}^{{{{{{{{\rm{Shared}}}}}}}}}$$
(6)
$${{{{{{{{\bf{E}}}}}}}}}_{jc}^{{{{{{{{\rm{Specific}}}}}}}}}={{{{{{{{\boldsymbol{\alpha }}}}}}}}}_{c}^{{{{{{{{\rm{Specific}}}}}}}}}+{{{{{{{{\bf{G}}}}}}}}}_{j}{{{{{{{{\boldsymbol{\gamma }}}}}}}}}_{c}+{{{{{{{{\boldsymbol{\varepsilon }}}}}}}}}_{c}^{{{{{{{{\rm{Specific}}}}}}}}}$$
(7)
1. (a)

Use the estimates to generate genetic predictors of each component:

$${\hat{{{{{{{{\bf{E}}}}}}}}}}_{jG}^{{{{{{{{\rm{Shared}}}}}}}}}={\hat{{{{{{{{\boldsymbol{\alpha }}}}}}}}}}^{{{{{{{{\rm{Shared}}}}}}}}}+{{{{{{{{\bf{G}}}}}}}}}_{j}\hat{{{{{{{{\boldsymbol{\beta }}}}}}}}}$$
(8)
$${\hat{{{{{{{{\bf{E}}}}}}}}}}_{jcG}^{{{{{{{{\rm{Specific}}}}}}}}}={\hat{{{{{{{{\boldsymbol{\alpha }}}}}}}}}}_{c}^{{{{{{{{\rm{Specific}}}}}}}}}+{{{{{{{{\bf{G}}}}}}}}}_{j}{\hat{{{{{{{{\boldsymbol{\gamma }}}}}}}}}}_{c}$$
(9)
1. 3.

Regress the expression of context c onto the context-shared and context-specific components:

$${{{{{{{{\bf{E}}}}}}}}}_{jc}={{{{{{{{\boldsymbol{\alpha }}}}}}}}}_{c}^{{{{{{{{\rm{Full}}}}}}}}}+{\hat{{{{{{{{\bf{E}}}}}}}}}}_{jG}^{{{{{{{{\rm{Sh.}}}}}}}}}{w}_{jc}^{{{{{{{{\rm{Sh.}}}}}}}}}+{\hat{{{{{{{{\bf{E}}}}}}}}}}_{jcG}^{{{{{{{{\rm{Sp.}}}}}}}}}{w}_{jc}^{{{{{{{{\rm{Sp.}}}}}}}}}+{{{{{{{{\boldsymbol{\varepsilon }}}}}}}}}_{jc}$$
(10)

Where “Sh.” and “Sp.” indicate “shared” and “specific” respectively, α represents the offset within each regression„ and all ε are assumed to be from a normal distribution with mean 0 and standard deviation that is a function of the given outcome.

We save for each gene the set of estimated regression weights $${\hat{w}}_{jc}^{{{{{{{{\rm{Shared}}}}}}}}}$$ and $${\hat{w}}_{jc}^{{{{{{{{\rm{Specific}}}}}}}}}$$ from equation (10) for use in downstream analyses. Namely, in TWAS, each context receives a single vector of weights, and to test the association of a gene-context’s full model to a trait, we simply use a weighted sum of the predictors learned from equation (3), $${\hat{w}}_{jc}^{{{{{{{{\rm{Sh.}}}}}}}}}\hat{{{{{{{{\boldsymbol{\beta }}}}}}}}}+{\hat{w}}_{jc}^{{{{{{{{\rm{Sp.}}}}}}}}}{\hat{{{{{{{{\boldsymbol{\gamma }}}}}}}}}}_{c}$$. We also use the same procedure for the context-specific weight to ensure the correct directionality. To test for significance of genetic effects (i.e. to call an eGene or eAssociation), we correlate each component of expression—the context-shared, context-specific, and full—to its corresponding genetically predicted value. More concretely, we perform a likelihood ratio test with one degree of freedom for the specific and shared models, and a likelihood ratio test with two degrees of freedom for the full model (the null model for all tests contains no genetic predictors of expression).

### Controlling the false discovery rate across contexts

Generally, methods for building genetic predictors of expression or TWAS predictors leverage either Bonferroni correction or false discovery rate (FDR). Nonetheless, using a Bonferroni correction may be too stringent (for example, as tests across contexts may be correlated), and using FDR within each context or across all contexts simultaneously may lead to an inflation or deflation to the false disovery proportion within certain contexts17. To simultaneously control the FDR across all contexts at once, a hierarchical false discovery correction—treeQTL—was developed17. The treeQTL procedure leverages the hierarchical structure of a collection of tests (e.g. gene level and gene-context level) to properly control the FDR across an arbitrary number of contexts and levels in the hierarchy as well as boost power in cases where a gene has a significant association in multiple contexts6,17,18. (See Supplementary Methods for further intuition.)

Notably, using CONTENT, our testing hierarchy contains 3 levels; (1) at the level of the gene, (2) at the level of the context, and (3) at the level of the method or model (Supplementary Fig. 3). Intuitively, a gene may contain a genetic component that is shared across all contexts, or a given context may have its own genetic architecture. In CONTENT, a given context may have its own genetic predictor from either the context-specific component or the full model. Using treeQTL with this structure is robust across multiple contexts, and since the tree is structured such that a specific method/model is at the final level of testing for a context, it enables incorporation of additional models trained from other approaches (such as those fit on a context-by-context basis or by UTMOST). Moreover, we can add to the shared leaf an additional level of tests to account for additional components of effects-sharing, such as a brain tissue-shared component.

### Comparison to other methods

We compared the prediction accuracy of CONTENT to a context-by-context TWAS model3,4 in which the expression of each context is modeled separately, and to UTMOST9, a method that jointly learns the genetic effects on all contexts simultaneously. Specifically the model based on TWAS fits a penalized linear model for each context. UTMOST, on the other hand, employs a group LASSO penalty across all contexts simultaneously, allowing it to gain power over the context-by-context approach by considering all individuals and contexts in a study at once. As we were we able to use a fast R package for penalized regression80, we used 10-fold cross-validation to fit the context-by-context model. Owing to UTMOST’s computational intensity, we used its default value of 5 folds for cross-validation.

We also compared CONTENT to a previous approach by Wheeler et al., orthogonal tissue decomposition (OTD)14. OTD is a direct correlate of CONTENT(Shared) and CONTENT(Specific), and is generated by fitting a mixed effects model across all contexts for a given individual. Namely, a mixed effects model is fit as follows: an individual’s expression across all tissues is set as the outcome, the shared expression is modeled as a random individual-level intercept and is estimated using the posterior mean, and the specific expression is treated as the residuals from the fit model (after adjusting for covariates). Under infinite sample sizes, the components of OTD are equivalent to CONTENT(Shared) and CONTENT(Specific).

### Evaluations on GTEx and CLUES

We residualized the expression of each gene in each context over their corresponding covariates (e.g., PEER factors, age, sex, batch information) prior to fitting UTMOST and an elastic-net model for each context in the context-by-context approach. We did the same residualization before decomposing and then fitting the context-shared and context-specific components with an elastic net for CONTENT. After generating cross-validated predictors for each method, we examined the number of significantly predicted genes as well as the prediction accuracy (in terms of adjusted R2) between the cross-validation-predicted and true gene expression per gene-context pair.

To properly control the false discovery proportion at .05 across-contexts and within-methods, we employed a hierarchical FDR correction17,18 separately for CONTENT, UTMOST, and the context-by-context approaches. Notably, using this correction for all methods provides a generous comparison to previous methods, as when there exists at least one significantly heritable gene-context association for a given gene, there is a relative gain in power over the context-by-context FDR for other contexts tested within this gene17,18.

### Application to TWAS

We performed transcription-wide association studies across 24 phenotypes using FUSION-TWAS3. FUSION-TWAS uses GWAS summary statistics and user-specified gene expression weights with an LD reference panel to perform the test of association between genetically predicted gene expression and a phenotype of interest. We tested a gene-context pair for association if the pair’s expression was predicted at a nominal p-value of .1, and note that this threshold does not substantially alter the number of TWAS discoveries (Supplementary Fig. 16). Notably, previous methods may use their own test of gene-context-trait association or leverage set tests (e.g. Berk Jones9) to combine their associations across all contexts for a given gene and therefore increase power. In this comparison, we report the association as output by FUSION (a single gene-context-trait association) and corrected by hierarchical false discovery without any sort of set test for the sake of equality in the comparison. We ran FUSION-TWAS using the default recommended settings, with reference data from the 1000 genomes project81. TWAS weights were trained on the GTEx v7 dataset2 as well as the CLUES20 single-cell RNAseq dataset of PBMCs. For a given gene-context-trio, we ran up to 5 TWAS—1) context-by-context, 2) UTMOST, 3) CONTENT(Shared), 4) CONTENT(Specific), and 5) CONTENT(Full). Notably, we re-trained each methods’ predictors on genetic variants that are present in the LDREF cohort as well as GTEx or CLUES to ensure selected expression weights had overlap with the reference panel (LDREF).

### Simulations to evaluate prediction accuracy

To evaluate the properties of our method relative to other methods we perform a series of simulation experiments. We first simulate genotypes for each individual, where each individual i and each locus m (m = 1: M) is independent, and there are no rare SNPs:

$${{{{{{{{\bf{G}}}}}}}}}_{im} \sim \,{{\mbox{Bin}}}\,(2,\,{{\mbox{Unif}}}\,[0.05,0.50])$$
(11)

We then draw both context-shared (βj.) and context-specific (βjc) effect sizes for each SNP from a normal distribution with a Bernoulli random variable Im controlling the probability that the mth SNP is causal (i.e. induce sparsity of genetic effects).:

$${I}^{m} \sim \,{{\mbox{Bernoulli}}}\,(.05),\,\,{\beta }_{j.}^{m} \sim \,{{\mbox{N}}}\,\left(0,\frac{{h}^{2}}{M*\pi }\right)\times {I}^{m},\,\,{{\mbox{and}}}\,\,\,{\beta }_{jc}^{m} \sim \,{{\mbox{N}}}\,\left(0,\frac{{h}_{c}^{2}}{\lambda*M*\pi }\right)\times {I}_{\lambda }^{m}$$
(12)

Here, h2 and $${h}_{c}^{2}$$ are the context-shared and context-specific heritabilities of expression on gene j. In general, the SNPs with nonzero context-specific effect sizes were subsampled from SNPs with nonzero context-shared effect sizes. We additionally simulate for a subset of contexts some number of truly context-specific eQTLs drawn from Poisson(λ = 1) for randomly selected SNPs that were not eQTLs for the context-shared effects. Finally, we simulate the expression of gene j as follows:

$${{{{{{{{\bf{E}}}}}}}}}_{jc}={{{{{{{{\bf{G}}}}}}}}}_{j}{{{{{{{{\boldsymbol{\beta }}}}}}}}}_{j.}+{{{{{{{{\bf{G}}}}}}}}}_{j}{{{{{{{{\boldsymbol{\beta }}}}}}}}}_{jc}+{{{{{{{{\boldsymbol{\varepsilon }}}}}}}}}_{jc}$$
(13)
$${{{{{{{\boldsymbol{\varepsilon }}}}}}}} \sim {{{{{{{\mathcal{N}}}}}}}}({{{{{{{\bf{0}}}}}}}},\;{{{{{{{\boldsymbol{\Sigma }}}}}}}}),\,\,\,{{{{{{{\boldsymbol{\Sigma }}}}}}}}\in {{\mathbb{R}}}^{C\times C}=\left[\begin{array}{ccc}{\sigma }_{1}^{2}&\ldots &{\sigma }_{1,C}\\ \vdots &\ddots &\vdots \\ {\sigma }_{C,1}&\ldots &{\sigma }_{C}^{2}\end{array}\right]$$
(14)

where $${{{{{{{\boldsymbol{\varepsilon }}}}}}}}\in {{\mathbb{R}}}^{I}$$, represents the correlation of environment or intra-individual noise across contexts, $${\sigma }_{c}^{2}=1-{h}^{2}-{h}_{c}^{2}$$ is the variances of each context c, and $${\sigma }_{{c}_{1},{c}_{2}}={\rho }_{{c}_{1},{c}_{2}}{\sigma }_{{c}_{1}}{\sigma }_{{c}_{2}}$$ is the covariance of context c1 and c2. We generated data under varying levels of context-specific heritability, truly context-specific eQTLs, causal SNPs, and correlation of intra-individual noise across contexts. The number of contexts was set to 20, and to replicate a setting similar to GTEx, the corresponding sample sizes of each ranged from 75 to 410 where individuals were not necessarily measured in every context. In our simulations, we generated one train and one test data set using the above framework. We evaluated the performance of each method by comparing the true and predicted expression in the test data set, using the predictor learned from the training data set.

To assess the effect of additional sharing on a subset of contexts, we also set up a simulation framework using the same generative process as above, only that a subset of contexts also received additional genetic effects. More rigorously, for this subset of contexts (acting as brain contexts in GTEx, for example), expression was generated as in equation (6) with an additional term:

$${{{{{{{{\bf{E}}}}}}}}}_{jc}={{{{{{{{\bf{G}}}}}}}}}_{j}{{{{{{{{\boldsymbol{\beta }}}}}}}}}_{j.}+{{{{{{{{\bf{G}}}}}}}}}_{j}{{{{{{{{\boldsymbol{\beta }}}}}}}}}_{jc}+{{{{{{{{\bf{G}}}}}}}}}_{j}{{{{{{{{\boldsymbol{\beta }}}}}}}}}_{j\dot{b}}+{{{{{{{{\boldsymbol{\varepsilon }}}}}}}}}_{jc},\quad {\beta }_{j\dot{b}}^{m} \sim \,{{\mbox{N}}}\,\left(0,\frac{{h}_{\dot{b}}^{2}}{\lambda*M*\pi }\right)\times {I}_{\lambda }^{m}$$
(15)

where each variable is simulated as before, $${\beta }_{j\dot{b}}^{m}$$ corresponds to additional genetic effects that are subsampled from SNPs that have a context-shared effect, and $${h}_{\dot{b}}^{2}$$ is the brain-shared heritability.

### Simulations of TWAS performance

Using the above generated genotypes and gene expression, we simulated phenotypes to evaluate the performance of each method under the assumed model in TWAS. For a given phenotype, we randomly selected 300 gene-context pairs (100 genes, 3 contexts each) whose expression would comprise a portion of a phenotype. Explicitly, we generated a phenotype as follows:

$${y}_{i}={E}_{i}\delta+\varepsilon \quad \delta \sim N\left(0,\frac{{\sigma }_{ge}^{2}}{300}\right),\,\,{\varepsilon }_{i} \sim N\left(0,1-\frac{{\sigma }_{ge}^{2}}{300}\right)$$
(16)

Where Ei is the standardized genetic expression of the 300 gene-context pairs for individual i, δ is the length-300 vector of effect sizes for each gene-contexts’ expression, $${\sigma }_{ge}^{2}$$ is the variance in the phenotype yi due to cis-genetic gene expression, and εi corresponds to environmental effects (or noise) as well as trans-genetic effects for individual i. In our simulations, we varied the heritability of gene expression and fixed variability in the phenotype due to genetic gene expression to 0.2. To simulate a wide range of genetic architectures, the proportion of heritability of gene expression due to the context-shared effects was sampled from a standard uniform distribution, and the proportion of heritability due to context-specific effects was (1- the context-shared proportion). Once we generated a phenotype, we performed a TWAS using weights output from each method by imputing expression into a simulated external, independent set of 10000 genotypes that followed the same generation process as in the previous subsection.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.