Genetic effects on gene expression across human tissues

Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.


Study design
The GTEx project has created a reference resource of gene expression levels from 'normal' , non-diseased tissues. Every tissue sample was examined histologically; the sample was accepted for the project if the tissue was non-diseased and in the normal range for the age of the donor. RNA was isolated from postmortem samples in an ongoing manner as donors were enrolled into the study. For this data release, 44 sampled regions or cell lines were considered, each from at least 70 donors, and thereby considered suitable for eQTL analysis: 31 solidorgan tissues, 10 brain subregions including duplicates of two regions Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.
(cortex and cerebellum), whole blood, and two cell lines derived from donor blood and skin samples. We hereafter refer to these tissues, regions, and cell lines as the 'tissues' used in eQTL analysis. A total of 7,051 samples from 449 donors represent the GTEx v6p analysis freeze ( Fig. 1a; Supplementary Information 1-5; Supplementary Figs 1-6; Supplementary Tables 1-10). RNA sequencing (RNA-seq) samples were sequenced to a median depth of 78 million reads. This is 4.3 times more samples than reported in the GTEx pilot phase 10 . DNA was genotyped at 2.2 million sites and imputed to 12.5 million sites (11.5 million autosomal and 1 million X chromosome sites) using the multi-ethnic reference panel from 1000 Genomes Project Phase 1 v3 11 . Sampled donors were 83.7% European American and 15.1% African American. Whole-genome sequencing was performed for 148 donors to a mean coverage greater than 30× , and all donors were exome-sequenced to a mean coverage over captured exons of 80× . The resulting data provide the deepest survey of individual-and tissuespecific gene expression to date, enabling a comprehensive view of the impact of genetic variation on gene expression levels. All data are available from dbGaP (accession phs000424.v6.p1) with multiple data views publicly available from the GTEx Portal (www.gtexportal.org).
likely to be expressed at low levels or loss-of-function intolerant and were enriched for functions related to development and environmental response, indicating specific contexts in which additional eQTLs may be identified (Extended Data Fig. 3). To identify cis-eGenes affected by more than one functional regulatory variant, we applied forward-backward stepwise regression (see Methods). This approach identified an additional 24,886 secondary cis-eQTLs, with 41.2% of protein-coding genes and 24.8% of lincRNAs having multiple, conditionally independent eVariants in at least one tissue ( Supplementary Fig. 11).
To identify trans-eQTLs, we tested for association between every protein-coding or lincRNA gene and all autosomal variants where the gene and variant were on different chromosomes. To minimize false positives in trans-eQTL detection, we controlled for the same observed and inferred confounders as in the cis-eQTL analysis, and further removed genes with poor mappability, variants in repetitive regions, and trans-eQTLs between pairs of genomic loci with evidence of RNA-seq read cross-mapping due to sequence similarity. Applying this approach, we identified 673 trans-eQTLs at a 10% genome-wide FDR. This includes 112 distinct loci (R 2 ≤ 0.2) and 93 unique genes (94 total gene associations, including a trans-eGene detected in both testis and thyroid) in 16 tissues (Extended Data Table 1, Extended Data  Table 11). An alternative approach to quantifying FDR at the gene level (Supplementary Information 8 and Supplementary  Table 12) identified 46 genes at 10% FDR, with estimated q values of less than 0.4 for all 94 gene associations identified using the genomewide FDR 16 . By investigating long-range intra-chromosomal eQTLs (≥ 5 Mb from the TSS), we discovered an additional 33 eGenes (10% FDR; Extended Data Table 2 and Supplementary Information 9). We found decaying support for cis-regulation (or interaction between cisand trans-effects) over increased genomic distances based on evidence of allelic effects (Extended Data Fig. 4). Evidence of cis-regulation fell below background levels between 0.85 and 1.3 Mb from the TSS, empirically supporting the conventional distance threshold of 1 Mb for cis-eQTL detection.
As expected, sample size greatly affects eQTL mapping. Discovery of eGenes increased with sample size with no evidence of saturation at the full sample size for each tissue (Fig. 1c). The tissue with the highest number of identified cis-eGenes was tibial nerve, with 8,087 eGenes in 256 samples. Testis had the most trans-eGenes, with 35 eGenes in 157 samples (Fig. 1d), consistent with the elevated number of expressed genes (16,853 protein-coding genes and 4,362 lincRNA genes) and cis-eGenes (6,796 genes). Continued discovery of eGenes with increasing Not sun-exposed skin  Figure 1 | Sample size and eGene discovery in the GTEx v6p study. a, Illustration of the 44 tissues and cell lines included in the GTEx v6p project with the associated number of cis-(left) and trans-eGenes (right) and sample sizes. Each tissue has a unique colour code (defined in Supplementary Fig. 5). b, Fraction of genes that are eGenes across all tissues by transcript class. The three tissues highlighted are: testis, which has the highest proportion of trans-eGenes; skeletal muscle, which has the largest sample size; and fibroblasts, which have the highest proportion of cis-eGenes. Dark bars depict the fraction of all curated human genes in GENCODE v19. Light bars depict the fraction of genes expressed in one or more tissues. c, Proportion of expressed genes that are cis-eGenes (y-axis) as a function of tissue sample size (x-axis). Colours represent tissues, as in a. d, Number of trans-eQTLs (x-axis) per tissue (y-axis), with sample size indicated by point size.

Article reSeArcH
sample size suggests that the expression of nearly all genes is influenced by genetic variation (Extended Data Fig. 5a, b). We further observed that, for sub-sampled data ranging from 70 to 250 donors, sample size was a more significant contributor than additional tissues to the discovery of novel cis-eGenes (Extended Data Fig. 5c). For trans-eQTL mapping, we used informed subsets of variants to reduce the number of tests by one to three orders of magnitude (Supplementary  Information 10 and Supplementary Table 13). We found statistical power to detect additional associations in these restricted tests, such as the test restricted to cis-eVariants. Our results indicate that ongoing increases in sample size will continue to yield additional eQTLs, both in the cis-eQTL setting, where smaller and conditionally independent effects will be identified, and in the trans-eQTL setting, where statistical power is the main limitation.

Allele-specific expression across human tissues
The effect of cis-regulatory variation can also be quantified by allelespecific expression (ASE) analyses obtained by measuring the allelic imbalance of RNA-seq reads at transcribed heterozygous sites. A largescale, multi-tissue resource of ASE estimates complements eQTL mapping by providing access to individual-specific effects, which assists in the interpretation of rare variants, somatic mutations and patterns of imprinting 8,13,14 . We measured ASE at more than 135 million sites across tissues and donors, with a median of over 10,000 genes quantified per donor (Supplementary Information 11 and Supplementary  Fig. 16). In total, 63% of all protein-coding genes could be tested for ASE in at least one donor and tissue, with 54% exhibiting significant allelic imbalance (binomial test, 5% FDR, |effect size| ≥ 1, Extended Data Fig. 6a). Overall, 88% of testable genes had significant allelic imbalance in at least one donor (binomial test, 5% FDR), demonstrating an abundance of cis-linked regulatory effects. Per donor, a median of 1,963 genes had significant allelic imbalance in at least one tissue, with a median of 570 genes where the donor was not heterozygous for a top eVariant, suggesting more complex or rarer regulatory effects at these loci.

Tissue-sharing and specificity of eQTLs
The extensive and diverse tissue sampling allowed us to develop a global view of how genetic effects vary between tissues of the human body by evaluating the sharing of eQTLs across tissues. We performed a meta-analysis across all 44 tissues for both cis-and trans-eQTLs to assess eQTL sharing between tissues. To do so, we applied Meta-Tissue 15 , a linear mixed model that allows for heterogeneity in effect sizes across tissues and controls for correlated expression measurements that result from collecting multiple tissues from the same donors. For each eQTL, we estimated the posterior probability that the effect is shared in each tissue (m value). For both cis-and trans-eQTLs, we observed patterns that reflected relationships between related tissues and concordance between cis and trans in estimates of tissue similarity (Fig. 2a, Supplementary Information 12 and Supplementary  Fig. 17). The strongest broad pattern observed was the high correlation among brain tissues (median Spearman's ρ of 0.584 (cis) and 0.241 (trans)) and among non-brain tissues (median Spearman's ρ of 0.606 (cis) and 0.165 (trans)), with much lower correlation observed between these two groups (median Spearman's ρ of 0.499 (cis) and 0.096 (trans)). Within non-brain tissues, we observed strong correlation among closely related tissues, such as arterial tissues (median Spearman's ρ of 0.743 (cis) and 0.264 (trans)), skeletal muscle and heart tissues (median Spearman's ρ of 0.672 (cis) and 0.184 (trans)), and skin tissues (Spearman's ρ of 0.804 (cis) and 0.365 (trans)). Overall, the median pairwise correlation between tissues was 0.547 (cis) and 0.138 (trans).
The patterns of sharing were also supported by replication between single-tissue cis-eQTLs, estimated by π 1 (the proportion of true positives 16 ) among the eQTLs identified in one tissue and then tested for replication in a second tissue (Extended Data Fig. 7a, median π 1 = 0.740). The patterns held even when accounting for variable number of overlapping donors among pairs of tissues in the GTEx study design (Extended Data Fig. 7b-e). cis-eQTLs exhibited a distinctly bimodal pattern of tissue sharing-they were likely to be either shared across most of the 44 tissues or specific to a small subset of tissues For each tissue, we estimated the degree of sharing (number of tissues with m > 0.9) for all eQTLs identified in that tissue at a 5% FDR. Tissues were then binned into quartiles on the basis of sample size. A higher proportion of eQTLs identified in tissues with small sample sizes have shared effects across multiple tissues compared with more deeply sampled tissues. This pattern inverts at higher sample sizes where more of the effects are tissue-specific. The median number of shared tissues is plotted for each quartile as a horizontal black line. c, Distribution of the number of tissues having Meta-Tissue m > 0.5 for the top variant for each trans-eGene at 50% FDR, and FDR-matched, randomly selected cis-eGenes (also 50% FDR). cis-eGenes were matched for discovery tissue to the trans-eGenes.
( Fig. 2b). This bimodality was further supported by three different methods: simple overlap of the single tissue eQTLs, a hierarchical multiple comparison procedure (treeQTL 17 ), and an empirical Bayes model (MT-eQTL 18 ; Extended Data Fig. 8a-c). Each method also demonstrated that cis-eQTLs discovered in tissues with larger sample sizes were more often tissue-specific; however, estimates of tissue-specificity for large sample-size tissues can be influenced by difficulty in replicating small effect-size eQTLs in tissues with fewer samples. Overall, we observed much greater tissue specificity for trans-eQTLs than a set of FDR-matched cis-eQTLs (Fig. 2c); this observation was robust to choices of m value threshold and selection criteria for matching cis-eQTLs (Extended Data Fig. 8d-g). While 3.8% of trans-eQTLs were shared across three or more tissues at m > 0.9, 25.3% of FDRmatched cis-eQTLs were shared. The extensive tissue-specificity of trans-eQTLs was also supported by a hierarchical approach for FDR control 17 , where we found no trans-eQTLs shared across more than one tissue (Extended Data Table 3). Our estimate of increased tissue specificity for trans-eQTLs agreed with the minimal sharing of trans effects reported in previous eQTL studies with fewer tissues 4,19 , and greatly exceeds what would be expected on the basis of replication between tissues for cis-eQTLs of matched minor allele frequency (MAF) and effect size (Wilcoxon rank sum test; P ≤ 2.2 × 10 −16 for all choices of replication FDR; Extended Data Fig. 8h). Given the greater tissue-specificity of trans-eQTLs, we note that heterogeneity in cellular composition of bulk tissue samples is one important confounder that may reduce power to detect trans-eQTLs, or even lead to false positive associations 6 . Despite the high tissue-specificity, we did observe a small number of tissue-shared trans-eQTLs, including rs7683255, which was moderately associated in trans with NUDT13 across most tested GTEx tissues with a consistent direction of effect (Extended Data Fig. 9a). We also found examples of trans-eQTLs shared across a subset of related tissues, such as an association between rs60413914 and RMDN3, a gene with increased expression levels in brain regions as compared to other tissue types, and for which the trans-eQTL had moderate effects in all tested brain regions but no strong effect in other tissues (Extended Data Fig. 9b, c).
Multi-tissue cis-eQTL analyses have been shown to increase power by explicitly modelling sharing patterns across tissues 15,18,20 . We did not observe an improvement in power for trans-eQTL discovery, consistent with the limited sharing observed across tissues (Extended Data Table 3). However, we did observe improvements for cis-eQTL discovery, particularly among tissues with smaller sample sizes (Extended Data Fig. 10). To ensure that these findings did not depend on the modelling assumptions of Meta-Tissue, we analysed the P values for all genes and all tissues with treeQTL, which controls the FDR of eGene discoveries across tissues 17 . This procedure identified 17,411 cis-eGenes at 5% FDR, 2,314 fewer eGenes than with the single-tissue analysis. Although this analysis is more conservative overall than the tissue-by-tissue analysis, we observed an increase in the number of eGenes detected in the tissues with the smallest sample sizes, including several brain regions, as well as an increase in the average number of tissues in which an eGene was detected (from 7.8 for single-tissue analysis to 8.3; Extended Data Fig. 10). Additional cis-eGenes identified through meta-analysis were more likely to be significant as sample size increased compared to similar numbers of eGenes identified using a less stringent singletissue FDR (Extended Data Fig. 10). This suggests that one strategy for increasing power in studies of inaccessible or sample-limited cell types would be to analyse them jointly with data from GTEx tissues.

Functional characterization of cis-eQTLs
To characterize the biological properties of multi-tissue cis-eQTLs, we annotated discovered eVariants using chromatin state predictions from 128 cell types sampled by the Roadmap Epigenomics project 2 . eVariants were enriched in predicted promoter and enhancer states across all Roadmap cell types (Fig. 3a). However, the eVariants exhibited signifi cantly greater enrichment in promoters and enhancers from matched tissues (Wilcoxon rank sum test, P ≤ 9.3 × 10 −4 , Extended Data Table 4), illustrating consistent patterns of tissue specificity for cis-regulatory elements and eQTLs (Fig. 3a). Furthermore, eQTL activity was significantly more likely to be shared across pairs of tissues when the eVariant overlaps the same chromatin state in both tissues (Wilcoxon rank sum test, P ≤ 5.0 × 10 −5 , Fig. 3b).
Integration of genomic annotations such as chromatin state has been demonstrated to improve power for eQTL discovery 8,[21][22][23] . For 26 GTEx tissues matched with cell-type specific annotations from the Roadmap Epigenomics project, we applied a Bayesian hierarchical model for eQTL discovery by incorporating variant-level genomic annotations 24 that provided a substantial boost to discovery power. Inclusion of genomic annotations (enhancers, promoters and distance to the TSS) increased the total number of cis-eQTL discoveries by an average of 43% (or 1,200 genes) per tissue (Extended Data Fig. 10f), demonstrating the considerable advantage of integrating genomic annotations into eQTL mapping models.
Conditionally independent (secondary) cis-eQTL signals were located further from the TSS (median distance 50.1 kb from the TSS, compared to 28.9 kb for primary eQTLs; Wilcoxon rank sum test, P ≤ 2.2 × 10 −16 ). However, similar to primary eVariant associations, secondary eVariants were enriched for chromosomal contact with target eGene promoters, as determined through Hi-C, compared to background variant-TSS pairs (Supplementary Information 6). This suggests that, despite their sequence-based distance from the TSS, primary and secondary eVariants are in close physical contact with their target gene promoters via chromatin looping interactions. While primary eVariants were significantly more enriched in promoters Enrichment estimated by comparing to random MAF-and distancematched variants. Stronger enrichment was observed in matched tissues (coloured dots) than in unmatched tissues (box plots). b, Proportion of eQTLs shared between two tissues (m > 0.9) if the eVariant overlaps the same Roadmap annotation in both tissues (y-axis) or different annotations (x-axis). Points represent the mean across all tissues, coloured by the discovery tissue. c, Enrichment of eVariants (y-axis) in tissue-matched enhancers (black) and promoters (grey) for the first four conditionally independent eQTLs discovered for each eGene (x-axis). d, Proportion of eVariants overlapping tissue-matched DNase I hypersensitive sites (DHS; y-axis) as a function of the probability that a variant is causal (x-axis), coloured by the eQTL discovery tissue. e, Normalized absolute eQTL effect size (x-axis) for each eVariant annotation class (y-axis). f, Median (line) and interquartile range (shading) of normalized absolute eQTL effect size (y-axis), as a function of the number of tissues in which the eGene is expressed (x-axis). Box plots depict the interquartile range (IQR), whiskers depict 1.5× IQR. OR, odds ratio.
Article reSeArcH than enhancers, secondary associations exhibit increased enhancer enrichment, consistent with their increased distance from the TSS and tissue-restricted activity (Wilcoxon rank sum test, P ≤ 2.2 × 10 −16 , Fig. 3c).
To identify causal variants that are likely to underlie cis-eQTLs, we applied two computational fine-mapping strategies 25,26 (Supplementary  Information 13 and Supplementary Figs 11, 18). First, we identified 90% credible sets (that is, the collection of variants with 90% probability of containing all causal variants) for each eGene in each tissue using CAVIAR 25 . Across all tissues, the mean credible set size was 29 variants (per tissue means ranged from 25 to 31). Second, we estimated the probability that each eVariant is a causal variant using CaVEMaN 26 . Across tissues, between 3.5% and 11.7% of top eVariants were predicted to be causal variants (causal probability P > 0.8). Consistent with variants with high causal probabilities being functional regulatory variants (as opposed to linkage disequilibrium proxies), 24.3% of eVariants with causal probabilities in the top tenth percentile (0.77 < P < 1) lay in open chromatin regions, while only 6.56% of eVariants in the lowest tenth percentile (0.0266 < P < 0.189) lay in such regions (Fig. 3d).
To determine the effect sizes of cis-eQTLs, we used allelic fold change, a method that assumes an additive model of eQTL alleles on total gene expression, allowing for interpretation of effect sizes as a fold change between alleles 27 (Supplementary Information 14). 17.4% of eGenes had cis-eQTLs with median effect sizes of at least twofold across tissues (Extended Data Fig. 11a, c). The prevalence of many ≥ twofold effects highlights the large impacts that common regulatory variants can have on gene dosage. cis-eVariants at canonical splice sites exhibited the strongest effects, followed by variants in noncoding transcripts, while variants in the 3′ UTR had the weakest effects (Fig. 3e).
Analysis of eQTL effect sizes around the TSS demonstrated that, as a group, upstream variants had the strongest effects, while those within transcripts had the weakest effects (Extended Data Fig. 11b). This suggests that eVariants that are likely to affect transcription have stronger effects on gene expression levels than variants that are likely to affect post-transcriptional regulation of mRNA levels. A notable exception is splice site and stop-gained variants, which make up a small number of total eQTLs but have large effects on expression levels (presumably owing to nonsense-mediated decay). When genes are stratified by the number of tissues in which they are expressed, the average effect size decreases as the number of tissues increases, indicating that genes expressed in greater numbers of tissues are less likely to have eQTLs with large regulatory effects (Spearman's ρ = − 0.29, P ≤ 2.2 × 10 −16 ; Fig. 3f).
ASE provides an independent measure of a cis-eVariant's effect size. We estimated the effects of the primary eVariant for each eGene by applying allelic fold change to ASE measurements (see Methods). Effect size estimates from both total and allele-specific expression approaches were highly correlated (mean Spearman's ρ = 0.82, s.d. = 2%) with an average ratio of eQTL effect sizes to ASE effect sizes of 0.937 (s.d. = 6%; Extended Data Fig. 6b, c). This observation suggests that cis-eQTLs and ASE capture the same regulatory effects.

Functional characterization of trans-eQTLs
To better understand the cellular mechanisms of trans-eQTLs, we characterized several of their functional properties. Of the 673 trans-eQTLs from the genome-wide analysis, 161 also had a cis-association (at a cis P value threshold of P ≤ 1.0 × 10 −5 ) with 113 unique variants, yielding the set of 296 unique trios of an eVariant, a cis-eGene and a trans-eGene. This suggests a common mechanism for trans regulation in which the eVariant directly regulates expression of a nearby gene whose protein product then affects other genes downstream. Considering this observation, we ran a restricted test for trans-associations, limiting variants to the set of significant cis-eVariants (Extended Data Fig. 12a). From this, we identified a total of 33 trans-eGenes (10% FDR) among this subset of tests, 14 of which were not discovered in the genome-wide analysis ( Supplementary Information 10). There were substantially more trans-eQTLs at 50% FDR from this cis-eVariant restricted test than random variants matched for MAF and distance to TSS and stratified by tissue (Cochran-Mantel-Haenszel test, P ≤ 2.2 × 10 −16 ).
We performed Mendelian randomization on the full set of 296 trans-eQTLs matched with a unique corresponding cis-eGene, measuring the causal impact of the cis-eGene on the trans-eGene, using the eVariant as the randomized instrumental variable ( Supplementary Information 15). For trans-eQTLs with a cis-eGene, we observed strong evidence for regulation of the trans-eGene expression via the cis-eGene ( Fig. 4a; P values ranging from P ≤ 3.0 × 10 −5 to P ≤ 2.2 × 10 −16 ). trans-eVariants with no cis-eGene may alter protein function, may reflect false negatives in the cis association test, or may arise from unmeasured regulatory mechanisms. Protein-coding loci were not enriched among our trans-eVariants (odds ratio 0.94; Fisher's exact test, P ≤ 0.80), suggesting that modification of protein function is not the dominant mecha nism for trans-eQTL effects.
We investigated whether trans-eVariants were each associated with numerous target genes, which may reflect broad effects of regulatory loci, as have been reported in model organisms 5,28 . Disambiguating true broad regulatory effects from artefacts remains an important challenge 29 . In our primary analysis, we applied aggressive correction of potential confounders, controlling for 15-35 probabilistic estimation of expression residuals (PEER) factors 12 capturing 59-78% of total variance in gene expression levels (Supplementary Information 5). However, PEER and related approaches 30 may also remove variance in gene expression levels arising from regulatory pathways and broad trans effects. Indeed, several loci with numerous associations were found in uncorrected data, but disappeared after controlling for PEER factors ( Supplementary Fig. 13). Associations found in uncorrected data are likely to include many false positives for three reasons: 1) the PEER factors were strongly associated with known technical confounders (Extended Data Fig. 1 and Supplementary Figs 8, 9); 2) trans-eVariants identified from raw data and lost after correction were enriched for association with technical covariates (Supplementary Fig. 14); and 3) no other parameter setting clearly optimized trans-eQTL discovery ( Supplementary Fig. 12). Even after PEER correction, we observed evidence of eVariants with multiple targets; at genome-wide significance, four separate loci were associated with more than one trans-eGene each (Supplementary Table 14).
We quantified the enrichment of trans-eVariants in promoter and enhancer regions using the same tissue-specific annotations from the Roadmap Epigenomics project 1,2 used for cis-eQTL analysis (Extended Data Table 4). trans-eVariants (10% FDR) were enriched in cell-type matched enhancers (median Fisher's exact test, P ≤ 2.2 × 10 −3 ) but not strongly enriched for promoters (median P ≤ 0.22), compared to randomly selected variants matched by distance to nearest TSS, MAF and chromosome (Fig. 4b). trans-eVariants were more enriched than cis-eVariants at matched FDR (Wilcoxon rank sum test, promoter: P ≤ 4.6 × 10 −7 ; enhancer: P ≤ 2.2 × 10 −16 ). Stronger effect sizes are needed to detect trans-eVariants at the same FDR, but even comparing to a matched number of the strongest cis-eVariants, we observed greater enrichment in enhancer (but not promoter) regions among trans-eVariants, consistent with greater tissue-specificity of enhancer activity and trans-eVariants 31 (Fig. 2c).
Given the large number of trans-eQTLs detected in testis, we investigated their possible regulatory mechanisms in more detail. Piwiinteracting RNAs (piRNAs) are small 24-31-bp RNAs that bind to piwi-class proteins and silence mobile elements by RNA degradation and DNA methylation. PiRNAs are strongly expressed in testis and may regulate gene expression and play a role in protection against transposable elements in germ line cells 32 . We tested for enrichment of trans-eVariants in piRNA clusters identified in testis 33 . We found that 38.6% of testis trans-eVariants, corresponding to 12 independent loci (R 2 ≤ 0.2), directly overlapped piRNA clusters, a significant enrichment compared to the 2.5% of the genome covered by these regions (permutation test, P ≤ 1.0 × 10 −4 ). In aggregate, eVariants from Article reSeArcH all tissues were enriched in piRNA clusters (permutation test, P ≤ 1.0 × 10 −4 ), but this appeared to be almost entirely driven by testis eQTLs (Fig. 4c). This suggests a testis-specific functional effect of genetic variation in piRNA clusters, consistent with their biological role.

Replication of eQTLs
To assess the replicability of the identified cis-eQTLs, we compared our results to four matched tissues from the TwinsUK project 34 (Supplementary Information 16). The vast majority of GTEx cis-eQTLs replicated at 5% FDR (Extended Data Fig. 13a; 84% in whole blood, 87% in subcutaneous adipose, 94% in lymphoblastoid cell lines (LCLs), and 93% in sun-protected skin). trans-eQTLs have not replicated consistently in human studies, compared to cis-eQTLs 21,35-37 , owing in part to insufficient statistical power and a limited number of studies with comparable tissues and cohorts, but also reflecting potential false positive associations. We tested trans-eQTLs discovered at 10% FDR in GTEx for replication in the TwinsUK data. Five hundred and sixty GTEx trans-eQTLs were testable in the four TwinsUK tissues and, of these, three trans-eQTLs replicated at 10% FDR (Supplementary Table 15). Despite the small number that individually replicated, in aggregate, the full set of trans-eQTLs demonstrated significantly greater replication than expected by chance (Wilcoxon rank sum test on association P values compared to uniform; P ≤ 3.05 × 10 −5 for 16 tests in matched tissues; P ≤ 2.2 × 10 −16 for 2,176 tests across all four TwinsUK tissues). In addition, aggregate replication of trans-eQTLs was significantly stronger for matched tissue types than for unmatched tissue types (Wilcoxon rank sum test; P ≤ 1.54 × 10 −4 ; Extended Data Fig. 13b).

Expression QTLs and complex disease associations
Overlaps between genome-wide association study (GWAS) associations and eQTLs have provided important insights into regulatory genes and variants for a wide range of complex traits and diseases 5 . As both the presence and extent of overlap between GWAS and eQTLs can be tissuespecific, the current phase of GTEx overcomes a major limitation in interpretation of disease variants by enabling analysis across a broad range of tissues.
We observed that the degree of tissue sharing of an eQTL is associated with several indicators of phenotypic impact. Tissue-shared eGenes are depleted from loss-of-function mutation-intolerant genes (as curated by ExAC 41 ) (Fig. 5a), consistent with purifying selection removing large-effect regulatory variants that involve many tissues. Tissue-shared eGenes were also less likely than tissue-specific eGenes to be annotated disease genes (Fisher's exact test, nominal P ≤ 10 −6 for GWAS, Online Mendelian Inheritance in Man (OMIM), and loss-of-function-intolerant gene sets; Fig. 5a, Extended Data Fig. 14), highlighting the importance of broad tissue sampling for GWAS interpretation.
This broad tissue sampling affects the use of eQTL data for the interpretation of GWAS variants. We observed that a GWAS variant of interest is likely to be a cis-eQTL by chance. Of all common variants assayed within GTEx, 92.7% are nominally associated with the expression of one or more genes in one or more tissues (P < 0.05) and nearly   Article reSeArcH 50% are significant when correcting for the number of tissues tested (Fig. 5b). Furthermore, linking an eQTL signal to a specific gene becomes increasingly complicated with abundant eQTL data. Some variants are associated with more than 30 different nearby genes (Extended Data Fig. 15a). Furthermore, even restricting to strong associations (P < 10 −10 in each tissue), for over 10% of eVariants, the gene with the strongest association varies between tissues ( Fig. 5c; Extended Data Fig. 15b, c). These results reinforce the need for caution when using eQTL data to interpret the function of GWAS variants.
To assess the extent to which GTEx cis-eQTLs are responsible for common phenotypic variation, we applied co-localization analysis to examine local linkage disequilibrium and sharing of association signals using GWAS summary statistics across 21 traits 42-44 (Supplementary Table 16). Among tested loci, 52% of trait-associated variants colocalized with an eQTL in one or more tissues (Fig. 5d, e). Importantly, no single tissue explained the majority of trait-associated loci, but the breadth of GTEx tissue sampling identified more co-localizations than any single tissue alone. Seven per cent and 93% of co-localizations are with lincRNA and protein-coding eGenes, respectively, suggesting that lincRNAs have a limited role in common disease pathogenesis. However, several findings complicate the interpretation of GWAS-eQTL overlaps. First, 26% of GWAS loci co-localize with more than one distinct eGene (that is, half of all co-localized loci). Second, GWAS co-localized eGenes are shared across an average of four tissues. Third, similar to lead eVariants, only 40% of GWAS signals co-localize with their nearest expressed gene, a finding that has important implications for the functional characterization of GWAS results.
Genetic variants associated with complex traits have been suggested to be enriched for trans-eQTLs 6,44-47 . Accordingly, we performed trans-eQTL mapping, restricting it to variants associated with a complex trait in a GWAS (Extended Data Fig. 12b). In this analysis, across the 44 tissues, we found 29 trans-eQTL associations involving 24 unique variants and 25 unique genes (10% FDR; Fig. 4a), each specific to a single tissue. There were more trans-eVariants at 50% FDR with association in at least one tissue when testing was restricted to trait-associated variants compared with random variants matched by MAF and distance to TSS (Fisher's exact test, P ≤ 1.3 × 10 −3 ).
Among trait-associated variants with trans-eQTL effects, we found two genome-wide significant trans-eVariants at the 9q22 locus (rs7037324 and rs1867277, R 2 = 0.74) with thyroid-specific associations in trans with TMEM253 and ARFGEF3 (P ≤ 2.2 × 10 −16 for both with rs1867277; Fig. 6a and Extended Data Fig. 16). The 9q22 locus has previously been linked to multiple thyroid-specific diseases including goitre, hypothyroidism and thyroid cancer 48,49 , and loss-offunction mutations in a thyroid-specific transcription factor at this locus, FOXE1, manifest as ectopic thyroid tissue or cleft palate in developing mice. However, the mechanism of any cis-effects of these trans-eVariants remains uncertain from the GTEx data; a post hoc analysis demonstrated that PEER correction removed broad regulatory signals from the 9q22 locus, and particularly from cis-and trans-eQTL signals for FOXE1 ( Supplementary Information 17). In PEER-corrected data, cis-and trans-eQTL signals co-localized for another cis-eGene in 9q22, TRMO, for both trans-eGenes 43 (posterior probability > 0.99). Mendelian randomization analysis of the PEER-corrected data supported the idea that TRMO regulates TMEM253 (P ≤ 1.3 × 10 −9 ) and ARFGEF3 (P ≤ 2.1 × 10 −11 ) based on trans-eVariant rs1867277. By contrast, FOXE1 had weak Mendelian randomization support in the PEERcorrected data. Despite the ambiguity of cis-mediation, the locus is one of the strongest trans-eQTL signals in GTEx. We further replicated both the broad regulatory effect and specific target genes of this locus in 498 primary thyroid cancer RNA-seq samples from The Cancer Genome Atlas 50 (TCGA; Fig. 6b, Supplementary Information 18).

Discussion
Since the initial sequencing of the human genome, extensive effort has been devoted to the characterization of genome function and phenotypic consequences of genetic variation. Describing the effects of genetic variation on gene expression levels across tissues is a critical but challenging component of this goal. Here, we describe advances enabled by the GTEx project v6p data, which provide a comprehensive survey of gene expression and the impact of genetic variation on gene expression across diverse human tissues. We report widespread cis-eQTLs in 44 tissues and trans-eQTLs in 18 tissues. cis-acting genetic variants tend to affect either most tissues or a small number of tissues. By contrast, identified trans-eQTL effects tend to be tissue-specific and correspondingly show greater enrichment in enhancer regions. By integrating GTEx data with summary statistics from diverse GWAS, we observed that half of complex traitassociated loci co-localize with a GTEx eQTL. GTEx data have already served as a valuable community resource for the identification of the tissue-specific regulatory effects underlying variants associated with human disease phenotypes 58-61 .
Additional papers from the GTEx consortium for the v6p data describe the impact of rare genetic variation on gene expression 62 , methods for analysis of transcriptome data 27 , the discovery and characterization of regulatory networks across tissues 63 , and analyses of diverse regulatory processes such as RNA editing 64 and X-inactivation 65 . To enable ongoing use of the GTEx data, summary-level expression data and eQTLs across all tissues are available from the GTEx Portal (www.gtexportal.org), while all individual-level raw data have been deposited in dbGaP (accession phs000424.v6.p1).
There are both opportunities and challenges as efforts to characterize genome function grow in scope and scale. The discovery and characterization of eQTLs in these data required careful data modelling to account for confounders and to characterize statistical discovery. We anticipate that complementary analyses with novel methods, enabled by the public availability of these data, may reveal additional insights. Despite the scope of these data, we remain underpowered to detect trans-eQTLs. Larger cohorts of individuals with a smaller number of tissues have yielded hundreds of trans-eGenes 4,6,8,9 , and we similarly expect trans-eQTL discoveries to increase with additional samples in the final phase of GTEx. Furthermore, some genetic effects may manifest only within a specific cell type, rather than an entire heterogeneous tissue. Both computational and experimental methods, such as deconvolution methods and single-cell sequencing as part of the proposed Human Cell Atlas and related projects, promise to improve resolution to identify precise cell type-specific regulatory effects 66 . Future aims of the GTEx project include increased sample size, with cis-eQTLs from 53 tissues across 714 donors, now available in the v7 release, and plans to include approximately 1,000 donors in the final data release. Additional plans include the collection of complementary molecular data on subsets of samples, including epigenetic and protein data, with the Enhanced GTEx (eGTEx) project, enabling an increasingly complete picture of epigenetic and regulatory variant diversity across human tissues 67 . We expect that the continued expansion of the GTEx resource, and its integration with other efforts capturing diverse data types, will be an essential asset for the study of gene regulatory mechanisms and how these contribute to human traits and diseases.
Online Content Methods, along with any additional Extended Data display items and Source Data, are available in the online version of the paper; references unique to these sections appear only in the online paper.  Supplementary Information is available in the online version of the paper.

METHODS
No statistical methods were used to predetermine sample size. The experiments were not randomized, and investigators were not blinded to allocation during experiments and outcome assessment. Sample procurement. All human donors were deceased. Informed consent was obtained for all donors via next-of-kin consent to permit the collection and banking of de-identified tissue samples for scientific research. The research protocol was reviewed by Chesapeake Research Review Inc., Roswell Park Cancer Institute's Office of Research Subject Protection, and the institutional review board of the University of Pennsylvania. Complete descriptions of the donor enrolment and consent process, as well as biospecimen procurement, methods, sample fixation, and histo-pathological review procedures, have been described previously 10,67 . In brief, whole blood was collected from each donor, along with fresh skin samples, for DNA genotyping, RNA expression, and culturing of lymphoblastoid and fibroblast cells, and shipped overnight to the GTEx Laboratory Data Analysis and Coordination Center (LDACC) at the Broad Institute. Two adjacent aliquots were then prepared from each sampled tissue and preserved in PAXgene tissue kits. One of each paired sample was embedded in paraffin (PFPE) for histopathological review and the second was shipped to the LDACC for processing and molecular analysis. Brains were collected from approximately one-third of the donors, and were shipped on ice to the brain bank at the University of Miami, where eleven brain sub-regions were sampled and flash-frozen. These samples were also shipped to the LDACC for processing and analysis.
All DNA genotyping was performed on blood-derived DNA samples, unless these were unavailable, in which case a tissue-derived DNA sample was substituted. RNA was extracted from all tissues and RNA sequencing was performed on all samples with an RNA integrity number (RIN) of 5.7 or higher and with at least 500 ng total RNA. Nucleic acid isolation protocols and sample QC metrics applied are as described 10 ( Supplementary Information 1-5). Data production. Non-strand specific, polyA + selected RNA-seq libraries were generated using the Illumina TruSeq protocol. Libraries were sequenced to a median depth of 78 million 76-bp paired-end reads. RNA-seq reads were aligned to the human genome (hg19/GRCh37) using TopHat (v1.4) based on GENCODE v19 annotations. This annotation is available on the GTEx Portal (gencode.v19. genes.v6p_model.patched_contigs.gtf.gz, available at https://www.gtexportal.org/ home/datasets). Gene-level expression was estimated as reads per kilobase of transcript per million mapped reads (RPKM) using RNA-SeQC on uniquely mapped, properly paired reads fully contained within exon boundaries and with alignment distances ≤ 6. Samples with fewer than 10 million mapped reads or with outlier expression measurements based on the D statistic were removed 10 .
DNA from 450 donors was genotyped using Illumina Human Omni 2.5M and 5M Beadchips. Genotypes were phased and imputed with SHAPEIT2 68 and IMPUTE2 69 , respectively, using multi-ethnic panel reference from 1000 Genomes Project Phase 3 70 . Variants were excluded from analysis if they: 1) had a call rate < 95%; 2) had minor allele frequencies < 1%; 3) deviated from Hardy-Weinberg equilibrium (P < 1.0 × 10 −6 ); or 4) had an imputation info score < 0.4. The final genotyped and imputed array VCF (file format v4.1) for autosomal variants contains genotype posterior probabilities for each of the three possible genotypes for 11,552,519 variants across 450 GTEx donors. The dosages of the alternative alleles relative to the human reference genome hg19 were used as the genotype measure for subsequent eQTL analysis. In addition to array-based genotyping, 148 and 524 donors were whole-genome and exome sequenced, respectively. Additional details on genotyping, imputation and sequencing can be found in the Supplementary Information. cis-eQTL mapping. We conducted cis-eQTL mapping within the 44 tissues with at least 70 samples each. Only genes with ten or more donors with expression estimates > 0.1 RPKM and an aligned read count of six or more within each tissue were considered significantly expressed and used for cis-eQTL mapping. Within each tissue, the distribution of RPKMs in each sample was quantile-transformed using the average empirical distribution observed across all samples. Expression measurements for each gene in each tissue were subsequently transformed to the quantiles of the standard normal distribution. The effects of unobserved confounding variables on gene expression were quantified with PEER 12 , run independently for each tissue. Fifteen PEER factors were identified for tissues with fewer than 150 samples; 30 for tissues with sample sizes between 150 and 250; and 35 for tissues with more than 250 tissues. The covariates that were most consistently associated with PEER factors include factors related to parameters of donor death, ischaemic time, RIN and sequencing quality control metrics. In addition, we have observed that little, if any, genetic signal is present in the PEER factors ( Supplementary  Information 6).
Within each tissue, cis-eQTLs were identified by linear regression, as implemented in FastQTL 71 , adjusting for PEER factors, sex, genotyping platform, and three genotype-based principal components (PCs). We restricted our search to variants within 1 Mb of the TSS of each gene and, in the tissue of analysis, minor allele frequencies ≥ 0.01 with the minor allele observed in at least 10 samples.
Nominal P values for each variant-gene pair were estimated using a two-tailed t-test. The significance of the most highly associated variant per gene was determined from empirical P values, extrapolated from a Beta distribution fitted to adaptive permutations with the setting -permute 1000 10000. These empirical P values were subsequently corrected for multiple testing across genes using Storey's q value method 16 . To identify the list of all significant variant-gene pairs associated with eGenes, variants with a nominal P value below the gene-level threshold were considered significant and included in the final list of variant-gene pairs. trans-eQTL mapping. Matrix eQTL 72 was used to test all autosomal variants (MAF > 0.05) using the same expression filters as cis-eQTL mapping, but restricted to variants and genes lying on different chromosomes, in each tissue independently using an additive linear model. For trans-eQTL mapping, we tested variants for association with expression of only protein coding or lincRNA genes. We included as covariates the three genotype PCs, genotyping platform, sex, and PEER factors estimated from expression data in Matrix eQTL when performing association testing. The correlation between variant and gene expression levels was evaluated using the estimated t statistic from this model, and corresponding FDR was estimated using Benjamini-Hochberg FDR correction 72,73 separately within each tissue and also using permutation analysis. We performed restricted trans-eQTL association tests by filtering the set of variants considered in three ways. First, we filtered the final VCF files using linkage disequilibrium pruning (R 2 > 0.5, plink parametersindep 50 5 2), removing approximately 90% of variants. Second, from the original VCF file, we performed association mapping using only the most significant GTEx cis-eQTL per eGene per tissue. Third, from the original VCF file, we performed association mapping using only variants that had been found to have a trait association in a genome-wide association study 47 (P ≤ 2.0 × 10 −5 ). The three association mapping analyses and FDR estimation were performed in each tissue separately. For all trans association tests, we applied stringent quality control to account for potential false positives due to RNA-seq read mapping errors, repeat elements, and population stratification ( Supplementary Information 7). Multi-tissue eQTL mapping. We quantified the tissue-specificity and tissue-sharing of cis-and trans-eQTLs using Meta-Tissue 15 . This tool extends Metasoft 74 , a meta-analysis package, by using a mixed effects model for eQTL sharing that accounts for correlation of expression between tissues driven by overlapping donors. All genotypes and gene expression quantification estimates were adjusted for covariates in accordance to the single tissue analysis as described in the previous sections. For each variant-gene pair, we calculated mixed model effect size estimates in each expressed tissue, thereby adjusting for partial sharing of signal between tissues. These effect size estimates were used in meta-analysis using Metasoft 74 to assess the tissue-specificity of each variant-gene pair. For each variant-gene pair tested, Meta-Tissue estimates a global P value of association and the posterior probability that an effect exists in a tissue (m value). For computational feasibility, the Markov chain Monte Carlo (MCMC) method was used to approximate the exact solution.
Hierarchical agglomerative clustering was performed on trans-eGenes (50% FDR) and cis-eGenes (5% FDR) using distance metric (1 -Spearman's ρ) of Meta-Tissue effect sizes across all observed genes between tissue pairs. To supplement this analysis, we also performed multi-tissue analysis using 1) replication analysis (Extended Data Fig. 7); 2) hierarchical FDR control 17 for both cis and trans analysis (Supplementary Information 8); and 3) an empirical Bayes approach 18 . Allele-specific expression. For each sample, allele-specific RNA-seq read counts were generated at all heterozygous variants with the GATK ASEReadCounter tool 75 . Only uniquely mapping reads with a base quality ≥ 10 at the variant were counted, and only those variants with coverage of at least eight reads were reported. Variants that met any of the following criteria were flagged and removed from downstream analyses: 1) UCSC 50-mer mappability of < 1; 2) simulation-based evidence of mapping bias 76 ; and 3) heterozygous genotype not supported by RNA-seq data across all samples for that donor and no significant (FDR > 1%) evidence that the variant is monoallelic in expression data 75 . Gene level measurements of haplotype expression were calculated by aggregating counts per sample across all heterozygous variants with ASE data within the gene using population phasing. Full ASE data are available through dbGaP. Functional enrichment. We annotated discovered eVariants using chromatin state predictions from 128 cell types or cell lines sampled by the Roadmap Epigenomics project 2 . Genome segmentation was performed for each cell type or cell line using a 15-state hidden Markov model (HMM) over 400 bp windows. Several of the learned states are labelled as enhancers, promoters, and repressed regions. For the standard 15-state Roadmap segmentations, regulatory elements are labelled independently for each cell type. For enrichment analyses, we constructed background variants sets that matched eVariants to randomly selected variants based on chromosome, distance to nearest TSS, and MAF.
trans-eQTL analysis was restricted to protein-coding genes and to GTEx tissues that are composed of at least one Roadmap Epigenomics cell type (26 tissues), which included 85 eVariants and 23 eGenes (10% FDR). We quantified enrichment of the trans variants relative to random variants in both enhancer and promoter elements in the GTEx discovery tissue's matched Roadmap cell type (Extended Data  Table 4). We then performed the same analysis with randomly matched cis-eGenes. Matching cis-eGenes were selected as follows: for each of the 23 trans-eGenes g, each having N g associated eVariants (10% FDR), we randomly selected a cis-eGene that also had at least N g associated variants (10% FDR). We then selected the top N g variants associated with this gene based on P value. We then performed the same analysis using random sets of the strongest cis-eGenes, rather than random eGenes. Matching the strongest cis-eGenes was performed as follows: for each of the 23 trans-eGenes g, each having N g associated eVariants (10% FDR), we randomly selected a cis-eGene amongst the ten strongest cis-eGenes in that tissue, based on the P value of the strongest associated variant that also had at least N g associated variants (10% FDR). We then selected the top N g associated variants with this gene based on P value. Selecting 23 random cis-eGenes a single time yields unstable results, so we ran cis-eGene selection and enrichment 70 times with different selections. This was done for both random cis-eGenes and random selections amongst the strongest cis-eGenes. We rank-ordered the 70 trials for both promoters and enhancers based on average odds ratio enrichment relative to background. We then used the trial that was closest to median rank for plotting both promoter and enhancer enrichment results.
For piRNA enrichment analysis, we obtained a list of 6,250 piRNA clusters that were experimentally determined from RNA sequencing of human testis 34 . When considering all unique trans-eVariants identified in all tissues, we identified an enrichment of trans-eQTLs overlapping a piRNA cluster (17.8%) compared to the null expectation that trans-eVariants are randomly distributed relative to piRNA clusters (2.5%). To further establish the statistical significance of this observation, we generated a null distribution of piRNA-eVariant overlap by permutation. Using bedtools2 77 , we permuted the location of piRNA clusters on the human genome 10,000 times, requiring the piRNA clusters to be excluded from centromeres and sex chromosomes. We also evaluated the proportion of trans-eVariants located within 10 kb of a piRNA cluster, and estimated the significance of this enrichment using the same permutation scheme. Co-localization of GWAS and eQTL associations. In order to assess the probability that molecular traits as estimated by cis-and trans-eQTLs and physio logical traits as estimated by GWAS share the same causal variant, we applied the coloc R package 43 . For each GWAS, we approximated the number of independent loci by extracting variants with at least genome-wide significance (P < 5 × 10 −8 ) and farther than 1 Mb away from all other variants of higher statistical significance. For each genome-wide significant variant, we extracted the list of all eGenes (q < 0.05 for cis-eGene) within 1 Mb for coloc analyses. For each eGene, we excluded any variants without either eQTL or GWAS association statistics (effect size estimate, standard error and P value). We obtained reference information such as MAF, sample size and case-to-control proportions (in case of binary traits) for each variant whenever available; otherwise, study-wide estimate was used as a proxy. We defined a region or an eGene as having evidence of co-localization when region-or gene-based posterior probability of co-localization > . . Data and biospecimen availability. Genotype data from the GTEx v6p release are available in dbGaP (study accession phs000424.v6.p1; https://www.ncbi.nlm. nih.gov/projects/gap/cgi-bin/study.cgi?study_id= phs000424.v6.p1). The VCFs for the imputed array data are available through dgGAP, in phg000520.v2.GTEx MidPoint Imputation.genotype-calls-vcf.c1.GRU.tar (the archive contains a VCF for chromosomes 1-22 and a VCF for chromosome X). Allelic expression data are also available in dbGaP. Expression data (read counts and RPKM) and eQTL input files (normalized expression data and covariates for 44 the tissues) from the GTEx v6p release are available from the GTEx Portal (http://gtexportal.org). Expression QTL results are available from the GTEx Portal. In addition to results tables for the 44 tissues in this study (eGenes, significant variant-gene pairs, and all variant-gene pairs tested), the portal provides multiple interactive visualization and data exploration features for eQTLs, including: • eQTL box plot: displays variant-gene associations • Gene eQTL visualizer: displays all significant associations for a gene across tissues and linkage disequilibrium information • Multi-tissue eQTL plot: displays multi-tissue posterior probabilities from meta-analysis against single-tissue association results • IGV browser: displays eQTL, across tissues and GWAS catalogue results for a selected genomic region Residual biospecimens are available to all researchers according to the Genotype-Tissue Expression (GTEx) project biospecimens access policy. The policy and related forms can be found on the GTEx Portal under the Biobank tab. Code availability. Software used to process the RNA-seq, genotypes, cis-eQTLs, and ASE is available at: https://github.com/broadinstitute/gtex-pipeline.

Article reSeArcH
Extended Data Figure 8 | Tissue-specificity of cis-and trans-eQTLs. a, Sharing of independently identified cis-eGenes across the 44 GTEx tissues (cis-eGenes are independently identified in each of the 44 tissues and then binned by the number of tissues in which they appear). b, Sharing of cis-eGenes across 44 GTEx tissues that were identified using the hierarchical multi-tissue analysis. c, The prior probabilities of having significant variant-gene association in different numbers of tissues, calculated using an empirical Bayes model. The prior probabilities are summed up on the basis of the Hamming weights of the corresponding cis-eQTL configurations. d-g, Meta-analysis performed using Meta-Tissue for trans-eGenes (50% FDR), randomly selected cis-eGenes (50 % FDR), and an equal number of the top cis-eGenes by P value. Distribution of the number of tissues that have Meta-Tissue m values greater than a given threshold (d, 0.5; e, 0.6; f, 0.9) across variant-gene pairs that have an effect (based on m value thresholding) in at least one tissue. g, The same distribution as d except that variant-gene pairs with predicted effect in zero tissues (based on the number of m values > 0.5) are included. Meta-Tissue predicts that many cis-eGenes (50% FDR) and trans-eGenes (50% FDR) will have an effect in zero tissues. The number of zero predictions is largely reduced for the top most significant cis-eGenes. h, Distribution of observed replication rate between pairs of tissues for trans-eQTLs (10% FDR) versus the predicted replication rate for trans-eQTLs (10% FDR) based on a negative binomial generalized linear model trained on cis-eQTLs (10% FDR0.1). This model directly accounts for effect size and minor allele frequency. Replication rates shown for a range of FDR thresholds in replication tissue. Box plots depict the IQR, whiskers depict 1.5× IQR.