The impact of rare variation on gene expression across tissues

Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.


Introduction 33
The recent and rapid expansion of human populations has led to an abundance of rare genetic variants 34 some of which are expected to contribute to an individual's genetic risk of disease 1-4 . However, 35 prioritizing the subset of rare variants most likely to cause cellular and phenotypic changes from the tens 36 of thousands of rare variants within each individual's genome remains a major challenge. While genetic 37 association analyses have successfully identified many common genetic risk factors for non-Mendelian 38 traits, rare variants are private or at such low frequency that association studies become infeasible 1,5 . To 39 overcome this challenge, multiple approaches for distinguishing pathogenic from benign rare variants 40 have leveraged the genetic code to identify nonsense or other deleterious protein coding alleles 1,6-8 . Such 41 variants not only inform individual genetic risk but are valuable natural gene knockouts that underlie 42 extreme phenotypes and help predict potential drug targets. Unfortunately, no analogous code exists for 43 identifying non-coding variants with functional consequences. 44 Promising models have been developed to predict variant impact from diverse genomic features, 45 including cis-regulatory element annotation and conservation status 9-13 . We hypothesized that 46 incorporating each individual's gene expression data would improve prioritization of functional rare 47 variants. Indeed, for rare loss-of-function variants in protein-coding regions, allele-specific effects across 48 multiple tissues have characterized the systemic impact of nonsense-mediated decay 14,15 . In single-tissue 49 studies, rare non-coding variants, in aggregate, have been associated with outlier gene expression levels, 50 suggesting their potential to drastically alter gene expression [16][17][18][19] . However, it remains unknown which 51 categories of rare variation have the strongest impact on gene expression and how their consequences are 52 reflected across multiple tissues. As whole genome sequencing becomes more prevalent, new means to 53 understand rare variant biology and to prioritize the variants with important individual consequences will 54 be essential to personal genomics and its integration in precision medicine. 55

Extreme expression is shared across tissues 56
To assess the impact of rare genetic variation on gene expression in diverse human tissues, we analyzed 57 data from the Genotype Tissue Expression project (GTEx V6p), which includes 7,051 RNA-sequencing 58 samples from 44 tissues in 449 individuals (median of 126 individuals per tissue and 16 tissues sampled 59 per individual). We restricted rare variant analysis to the 123 individuals of European ancestry, but used 60 the entire cohort for all other analyses (Extended Data Fig. 1). We defined rare variants as those with 61 minor allele frequency below 1% within GTEx as well as within the European panel of the 1000 62 Genomes project for single nucleotide variants (SNVs) and short insertions and deletions (indels) 20 . Each 63 individual had a median of 43,739 rare SNVs, 4,835 rare indels and 59 rare structural variants (SVs) 64 (Extended Data Fig. 2). 65 Our analysis focused on individuals with extremely high or low expression of a particular gene compared 66 with the rest of the cohort. We refer to these individuals as gene expression outliers. The GTEx data 67 affords the ability to identify both single-tissue and multi-tissue expression outliers, with the latter 68 showing consistent extreme expression for a gene across many tissues. To account for broad 69 environmental or technical confounders, we removed hidden factors estimated by PEER 21 from each 70 tissue, which increased the predictive power of outlier expression across tissues (Extended Data Fig. 3). 71 After confounder removal and data normalization, we identified both single-tissue and multi-tissue 72 outliers among the entire cohort of 449 individuals. For each tissue, an individual was called a single-73 tissue outlier for a particular gene if that individual had the largest absolute Z-score and the absolute 74 value was at least two. For each gene, the individual with the most extreme median Z-score taken across 75 tissues was identified as a multi-tissue outlier for that gene provided the absolute median Z-score was at 76 least two (Fig. 1a). Therefore, each gene had at most one single-tissue outlier per tissue and one multi-77 tissue outlier. Under this definition an individual can be an outlier for multiple genes. 78 Figure 1. Gene expression outliers and sharing between tissues. (a) A multi-tissue outlier. In this example, the individual has extreme expression values for the gene AKR1C4 in multiple tissues (red arrows) and the most extreme median expression value across tissues. (b) Outlier expression sharing between tissues, as measured by the proportion of single-tissue outliers that have |Z-score| ≥ 2 for the corresponding genes in each replication tissue. Tissues are hierarchically clustered by gene expression. (c) Estimated replication rate of multi-tissue outliers in a constant held-out set of tissues for different sets of discovery tissues. We compared outliers identified in the discovery set to the same number of randomly selected individuals (see Online methods). Due to incomplete tissue sampling, the number of tissues supporting each outlier is at least five but less than the size of the discovery set.
We identified a single-tissue expression outlier for almost all expressed genes (≥ 99%) in each tissue and 79 a multi-tissue outlier for 4,919 of 18,380 tested genes (27%). Each individual was a single-tissue outlier 80 for a median of 1,653 genes (83 per tissue) compared with a median of 10 genes as a multi-tissue outlier. 81 We confirmed that known environmental factors of race, sex, and BMI were uncorrelated with the 82 number of genes for which an individual was a multi-tissue outlier (Extended Data Fig. 4). We did 83 observe a weak but statistically significant, positive correlation with ischemic time (Spearman ρ = 0.175, 84 nominal P = 0.00022) and age (Spearman ρ = 0.101, nominal P = 0.033). Single-tissue outliers discovered 85 in one tissue replicated in other tissues at rates up to 33%, with stronger replication rates among related 86 tissues, such as the two skin tissues as well as the left ventricle and atrial appendage of the heart (Fig. 1b).

87
Replication estimates were underestimated for tissues with smaller sample sizes but biased upward for 88 Number of tissues in discovery set Replication rate pairs of tissues with many overlapping individuals sampled (Extended Data Fig. 5). However, we 89 confirmed that the overall sharing patterns were maintained when we accounted for sampling differences, 90 using pairs of tissues with enough overlapping samples to assess the inflation directly. Single-tissue 91 outliers were also detected as multi-tissue outliers at rates from 1.2% to 5.6%, with more overlap for 92 tissues with more samples (Extended Data Fig. 6, Pearson r = 0.79, P = 1.4 x 10 -10 ). While tissue-specific 93 expression may partially explain the small overlap, the trend is most likely due to the inherent noise in the 94 single-tissue analyses. Indeed, the replication rate for multi-tissue outliers was much higher than for 95 single-tissue outliers and increased with the number of tissues used for discovery, highlighting the value 96 of multiple tissue data for robust outlier detection (Fig. 1c). The difference in replication rate between 97 outliers and randomly selected individuals was greater than could be explained by the bias from 98 overlapping individuals in the discovery and replication sets. 99 Functional rare variants underlie multi-tissue outliers 100 We investigated the extent to which extreme expression could be explained by genetic variation. Here, we 101 focused on the 123 individuals of European descent with whole genome sequencing (average coverage 102 30X), among whom we identified 1,144 multi-tissue outliers. We evaluated the proportion of outliers with 103 variants at different frequencies within 10 kb of the transcription start site (TSS) compared to 104 corresponding genes in non-outliers to identify the effects of variants acting in cis. Multi-tissue outliers 105 were more enriched for rare variants than common ones (Fig. 2a). This enrichment was most pronounced 106 for structural variants (SVs), and larger for short insertions and deletions (indels) than for single 107 nucleotide variants (SNVs). The enrichment for rare variants was markedly stronger for multi-tissue 108 outliers compared to single-tissue outliers (Fig. 2b, Extended Data Fig. 7), a trend that became more 109 striking at larger Z-score thresholds (Fig. 2b). 110 As rare variants are often heterozygous, expression outliers driven by rare variants in cis should exhibit 111 allele-specific expression (ASE). At multiple Z-score thresholds, both single-tissue and multi-tissue 112 outliers were significantly enriched for ASE, as compared to non-outliers (two-sided Wilcoxon rank sum 113 tests, each nominal P < 2.2 x 10 -16 ). ASE was stronger for multi-tissue outliers than for single-tissue 114 outliers, and increased with the Z-score threshold (Fig. 2c). This, along with the stronger rare variant 115 enrichments for multi-tissue outliers, suggests that single-tissue outliers are less robust to non-genetic 116 confounders. 117 Figure 2. Enrichment of rare variants and ASE in outliers. (a) Enrichment of SNVs, indels, and SVs within 10 kb of the TSS of genes with outliers in the corresponding outlier individuals, as compared with the same genes in non-outlier individuals. For each frequency stratum, we calculated enrichment as a ratio of proportions. The numerator is the proportion of outliers with a variant whose frequency lies within the range, and the denominator is the corresponding proportion for non-outliers. Bars indicate 95% Wald confidence intervals. (b) Rare SNV enrichment for multi-tissue and single-tissue outliers at increasing Z-score thresholds. This threshold applies to the median absolute Z-score for multi-tissue outliers and the absolute Z-score for single-tissue ones. Text labels indicate the number of outliers at each threshold. (c) ASE, measured as the magnitude of the difference between the reference-allele ratio and the null expectation of 0.5. The non-outlier category is defined in the Online methods section.
We aimed to identify the specific properties of rare genetic variants that induce large changes in gene 118 expression. We evaluated the enrichment of diverse variant classes (Extended Data Table 1) in outliers 119 compared with non-outliers. To capture both coding and non-coding variant classes, we evaluated 120 variants in the gene body and up to 10 kb (200 kb for SVs and variants in enhancers) from the 121 transcription start or end sites of genes with outliers. SVs, taken together, had the strongest enrichment, 122 and their impact on gene expression across tissues is well characterized 22 . We also observed, in order of 123 significance, enrichments for variants near splice sites, introducing frameshifts, at start or stop codons, 124 near the TSS, outside of coding regions and among the top 1% of CADD or vertebrate PhyloP scores, and 125 with other coding annotations (Fig. 3a). These results suggest that variants in coding regions contribute 126 disproportionately to outlier expression. Indeed, we observed weakened enrichments for all variants types 127 (SNVs, indels, and SVs) when excluding exonic regions (Extended Data Fig. 8).   To identify the relationship between outlier expression and genomic annotation, we tested whether rare 129 variants near genes with outliers had high conservation or CADD scores 9 and whether they occurred in 130 known regulatory regions. Multi-tissue outliers were strongly enriched for variants in promoter or CpG 131 sites, and they had variants with higher conservation and CADD scores than non-outliers. We observed a 132 weaker enrichment for variants in enhancers and transcription factor binding sites (Fig. 3b, Extended Data 133 Fig. 9). By jointly considering major classes of variation, we observed that 58% of underexpression and 134 28% of overexpression outliers had rare variants near the relevant gene, compared with 9% for non-135 outliers (Fig. 3c). These results confirmed that rare variation is more likely to decrease expression 23-25 and 136 that overexpression outliers may more often be due to environmental factors. Some variant classes had 137 strong directionality in their effect: duplications caused overexpression outliers, while deletions, start and 138 stop codon variants, and frameshifts led to underexpression outliers (Fig. 3d). This directionality agrees 139 with the expected regulatory effect of these variant types and offers further evidence for the role of 140 genetic variation in outlier expression. There was also strong ASE for outliers carrying all categories of 141 variants except those with only non-conserved variants or without any rare variants near the gene (Fig.  142 3e), which suggests that common variants or non-genetic factors likely caused the extreme expression in 143 those cases. 144

Constrained genes rarely have multi-tissue outliers 145
We hypothesized that rare functional variants and extreme expression in essential genes would be subject 146 to selective pressure. Consistent with ongoing purifying selection against large, multi-tissue effects, rare 147 promoter variants in outliers exhibited significantly lower allele frequencies in the UK10K cohort of 148 3,781 individuals 3 than those in non-outliers for the same genes ( Fig. 4a, two-sided Wilcoxon rank sum 149 test, P = 0.0013). Genes intolerant to loss-of-function mutations as curated by the Exome Aggregation 150 Consortium 26 were depleted of multi-tissue outliers and multi-tissue eQTLs (Fisher's exact test, both P < 151 2.2 x 10 -16 ; Fig. 4b), which supports our hypothesis that altering expression levels of critical genes can be 152 deleterious. We observed a similar depletion in genes resistant to missense variation (for genes with 153 outliers P = 1.676 x 10 -15 and for multi-tissue eGenes P < 2.2 x 10 -16 ; Extended Data Fig. 10a). Genes 154 with a multi-tissue outlier were enriched for multi-tissue eQTLs (two-sided Wilcoxon rank sum test P < 155 2.2 x 10 -16 , Extended Data Fig. 10c,d). However, we found some evidence that genes with outliers were 156 more constrained for missense and loss-of-function variation than genes with multi-tissue eQTLs 157 (Tukey's range test, missense Z-score P = 0.0044, probability of loss-of-function intolerance score P = 158 0.086; Fig. 4b, Extended Data Fig. 10a). 159 We expected disease genes to be depleted of multi-tissue expression outliers in the general population 160 since extreme expression at critical genes may have severe health consequences. We confirmed this 161 among the GTEx individuals for two well curated disease gene lists: a list of genes involved in heritable 162 cardiovascular disease (Cardio) and genes in the ACMG guidelines for incidental findings (Fig. 4c). For 163 broader lists like the GWAS and OMIM catalogs, we found no significant evidence of depletion or 164 enrichment. We observed a similar pattern for multi-tissue eQTL genes (Extended Data Fig. 10b). 165 Nonetheless, outlier expression affected some important and actionable disease genes. We observed 166 multi-tissue outliers for five ACMG genes, five high-risk cardiovascular disease genes, and 14 cancer 167 genes (Extended Data

Expression data improves variant prioritization 181
In addition to characterizing the regulatory impact of rare variation across the GTEx cohort in aggregate, 182 we sought to prioritize candidate regulatory variants from each individual genome. Existing methods for 183 predicting rare variant impact use epigenetic data and other genomic annotations derived from external 184 studies 9-13 . We hypothesized that by integrating gene expression data from the same individual whose 185 genome we seek to analyze, along with these external annotations, we could significantly improve our 186 Regulation), a probabilistic modeling framework that jointly analyzes personal genome and transcriptome 188 data to estimate the probability that a variant has regulatory impact in that individual 189 (https://github.com/ipw012/RIVER, see Online methods). RIVER is based on a generative model that 190 assumes that genomic annotations (Extended Data Table 3), such as the location of a variant with respect 191 to regulatory elements, determine the prior probability that variant is a functional regulatory variant, 192 which is an unobserved variable. The functional regulatory variant status then influences whether nearby 193 genes are likely to display outlier levels of gene expression in that person (Fig. 5a). RIVER is trained in 194 an unsupervised manner. It does not require a labeled set of functional/non-functional variants; rather it 195 derives its predictive power from identifying expression patterns that tend to coincide with particular rare 196 variant annotations. 197  genome and RNA sequencing data) from one individual, and assessed the accuracy with respect to the 205 second individual's held out expression levels (see Online methods). Using this labeled test data, we 206 evaluated the predictive accuracy of RIVER compared with a L2-regularized multivariate logistic 207 regression model that uses genomic annotations alone, and observed a significant improvement by 208 incorporating expression data (Fig. 5b, AUC for RIVER and the genomic annotation model were 0.638 209 and 0.541, respectively, P = 3.5 x 10 -4 ). Allele-specific expression was also enriched among the top 210 RIVER instances compared with genome annotation models (Extended Data Fig. 11). Although RIVER 211 was trained in an unsupervised manner, the learned model prioritized variants that were supported by both 212 extreme expression levels for a nearby gene and genomic annotations suggestive of potential impact (Fig.  213 5c). Rather than using a heuristic or manual approach, RIVER automatically learns the relationship 214 between genomic annotations and changes in gene expression from data to provide a coherent estimate of 215 the probability of regulatory impact. For instance, multi-tissue outliers with a large proportion of single-216 tissue outliers were more likely to have high RIVER scores (Extended Data Fig. 12). Using a simplified 217 supervised model, we estimated that even after accounting for the most informative genomic annotations 218 or summary scores from state-of-the-art models including CADD and DANN, an individual was more 219 likely to be an expression outlier if another individual with matched rare variants was an outlier (average 220 log-odds ratio 2.76, Extended Data Table 4). This simplified approach supported the benefit of integrating 221 gene expression data into variant prioritization. 222

To investigate how RIVER might inform disease variant analysis, we intersected rare variants in the 223
GTEx individuals with variants from ClinVar 30 (Extended Data Table 5). We identified 27 pathogenic or 224 risk variants present in 21 individuals, and evaluated the RIVER score of each (Fig. 5c). Overall, 225 pathogenic variants scored higher than background variants (two-sided Wilcoxon rank sum test, P = 3.25 226 x 10 -9 , Extended Data Fig. 13). We note that rare indels and SVs were not found nearby the genes in the 227 individuals carrying these pathogenic variants. Considering that ClinVar is biased toward protein-coding 228 variants, we observed that six of the 27 variants were annotated as nonsense, splice site, or synonymous 229 variants, with the rest being missense. These likely regulatory variants had RIVER scores of 0.980 on 230 average, putting them in the top 99.9 th percentile. Among these, three individuals harbored the minor 231 allele at two distinct pathogenic variants (rs113993991 and rs113993993) near SBDS, each associated 232 with Shwachman-Diamond syndrome. This recessive syndrome causes systemic symptoms including 233 pancreatic, neurological, and hematologic abnormalities 31 and can disrupt fibroblast function 32 . The GTEx 234 individuals were heterozygous for these variants and thus lacked the disease phenotype. Nonetheless, we 235 saw extreme underexpression of SBDS across almost all tissues in these individuals, including brain 236 tissues, fibroblasts, and pancreas (Extended Data Fig. 14). In another case, an individual harbored the 237 minor allele of rs80338735, which is associated with cerebral creatine deficiency syndrome 2, shown to 238 cause neurological deficiencies and also lead to low body fat 33 . The nearby gene GAMT showed the most 239 extreme underexpression (Z-score < -4) in adipose (subcutaneous), although unfortunately no brain tissue 240 was available for evaluation in this individual (Extended Data Fig. 14). These cases demonstrate that 241 RIVER can provide an important and novel ability to prioritize disease-causing regulatory variants by 242 integrating population-scale patterns of gene expression. 243

244
Using whole genome sequences and RNA-seq from 44 human tissues from the GTEx project, we 245 identified high-confidence gene expression outliers and completed the largest study to date of rare 246 variants impacting gene expression. Outliers were better explained by genetic variation when we 247 combined expression data from multiple tissues. We found that rare structural variants, frameshift indels, 248 coding variants, and variants near the transcription start site were most likely to have large effects on 249 expression. These effects were often directional; for example, we saw duplications tended to cause 250 overexpression Single-tissue and multi-tissue outlier discovery 291 Single-tissue and multi-tissue outlier calling was restricted to autosomal lincRNA and protein coding 292 genes. In addition to the constraints described in the main text, we only tested for multi-tissue outliers 293 among individuals with expression measurements for the gene in at least five tissues. To reduce cases 294 where non-genetic factors may cause widespread outliers, we removed eight individuals that were multi-295 tissue outliers for 50 or more genes from all downstream analyses. These individuals were also removed 296 before single-tissue outlier discovery. 297

Replication of expression outliers 298
We evaluated the replication of single-tissue outliers between pairs of tissues. We calculated the 299 proportion of outliers discovered in one tissue that had a |Z-score| ≥ 2 for the same gene in the replication 300 tissue. We also required that the replication Z-score have the same sign as the Z-score in the discovery 301 tissue. Since each tissue had a different number of samples and certain groups of tissues were sampled in 302 a specific subset of individuals, we evaluated the extent to which replication was influenced by the size 303 and the overlap of the discovery and replication sets. To make pairs of tissues comparable, we repeated 304 the replication analysis with the discovery and replication in exactly 70 individuals for each pair of tissues 305 with enough overlapping samples. We compared the replication patterns in this subsetted analysis to those 306 obtained by using all individuals for discovery and replication. To estimate the extent to which individual 307 overlap biased replication estimates, for each pair of tissues with a sufficient number of samples, we 308 defined three disjoint group of individuals: 70 individuals with data for both tissues, 69 distinct 309 individuals with data in the first tissue, and 69 distinct individuals with data in the second tissue. We 310 discovered outliers in the first tissue using the shared set of individuals. Then we tested the replication of 311 these outliers in the discovery individuals in the second tissue. Finally, for each gene, we added the 312 identified outlier to distinct set of individuals and tested the replication again in the second tissue. We 313 repeated the process running the discovery in the second tissue and the replication in the first one. We 314 compared the replication rates when using the same or different individuals for the discovery and 315 replication. 316 We assessed the confidence of our multi-tissue outliers using cross-validation. Specifically, we separated 317 the tissue expression data randomly into two groups: a discovery set of 34 tissues and a replication set of 318 10 tissues. For t = 10, 15, 20, 25, and 30, we randomly sampled t tissues from the discovery set and 319 performed outlier calling as described above. To assess the replication rate, we computed the proportion 320 of outliers in the discovery set with |median Z-score| ≥ 1 or 2 in the replication set. We set no restriction 321 on the number of tissues required for testing in the replication set. To calculate the expected replication 322 rate, we randomly selected individuals in the discovery set requiring that each individual show expression 323 in at least five tissues for the gene. We then computed the replication rate for this background using the 324 procedure described above. We repeated this process 10 times for each discovery set size. 325

Quality control of genotypes and rare variant definition 326
We restricted our rare variant analyses to individuals of European descent, as they constituted the largest 327 homogenous population within our dataset. We considered only autosomal variants that passed all filters 328 in the VCF (those marked as PASS in the Filter column). Minor allele frequencies (MAF) within the 329 GTEx data were calculated from the 123 individuals of European ancestry with whole genome 330 sequencing data. The MAF was the minimum of the reference and the alternate allele frequency where the 331 allele frequencies of all alternate alleles were summed together. Rare variants were defined as having 332 MAF ≤ 0.01 in GTEx, and for SNVs and indels we also required MAF ≤ 0.01 in the European population 333 of the 1000 Genomes Project Phase 3 data 20 . We also sought to ensure that population structure among 334 the individuals of European descent was unlikely to confound our results. Therefore, we verified that the 335 allele frequency distribution of rare variants included in our analysis (within 10 kb of a protein coding or 336 lincRNA gene, see below) was similar for the five European populations in the 1000 Genomes project 337 (Extended Data Fig. 1b). 338

Enrichment of rare and common variants near outlier genes 339
We assessed the enrichment of rare SNVs, indels, and SVs near outlier genes. Proximity was defined as 340 within 10 kb of the TSS for all analyses, with the exception of Fig. 3 where we included all variants 341 within 10 kb of the gene, including the gene body, (200 kb for enhancers and SVs) to also capture coding 342 variants. For each gene with a multi-tissue outlier, we chose the remaining set of individuals tested for 343 multi-tissue outliers at the same gene as our set of non-outlier controls. We only included genes that had 344 both a multi-tissue outlier and at least one control. We stratified variants of each class into four minor 345 allele frequency bins (0-1%, 1-5%, 5-10%, 10-25%) to compare the relative enrichments of rare and 346 common variants. We also assessed the enrichment of SNVs at different Z-score cutoffs. Enrichment was 347 defined as the ratio of the proportion of outliers with a rare variant within 10 kb of the transcription start 348 site (TSS) to the proportion of non-outliers with a rare variant in the same window. This enrichment 349 metric is equivalent to the relative risk of having a nearby rare variant given outlier status. We used the 350 asymptotic distribution of the log relative risk to obtain 95% Wald confidence intervals. Within our set of 351 European individuals, we observed some individuals with minor admixture that had relatively more rare 352 variants than the rest (Extended Data Fig. 2b). We confirmed that inclusion of these admixed individuals 353 did not substantially affect our results (Extended Data Fig. 2c). We also calculated rare variant 354 enrichments when restricting to variants outside protein-coding and lincRNA exons in Gencode v19 355 annotation (extending internal exons by 5 bp to capture canonical splice regions). 356 To measure the informativeness of variant annotations (Extended Data Table 1), we used logistic 357 regression to model outlier status as a function of the feature of interest, which yielded log odds ratios 358 with 95% Wald confidence intervals. Note that for the feature enrichment analysis in Fig. 3b and  359 Extended Data Fig. 9, we required that outliers and their gene-matched non-outlier controls have at least 360 one rare variant near the gene. We scaled all features, including binary features, to have mean 0 and 361 variance 1 to facilitate comparison between features of different scale. We also calculated the proportion 362 of overexpression outliers, underexpression outliers and non-outliers with a rare variant near the gene 363 TSS (within 10 kb for SNVs and indels and 200kb for SVs). To each outlier instance, we assigned at most 364 one of the 12 rare variant classes we considered, which are listed in Fig. 3. If an outlier had rare variants 365 from multiple classes near the relevant genes, we selected the class that was most significantly enriched 366 among outliers. 367

Annotation of variants 368
We obtained annotations for SV categories from Chiang et al. 22 . We computed features for rare SNVs and 369 indels using three primary data sources: Epigenomics Roadmap 39 , CADD v1. protein-coding and transcription-related annotations from VEP. This information was provided in the 379 GTEx V6 VCF file. Using the pipeline described above, we generated features at the site-level for all 123 380 European individuals with WGS data. We then collapsed these features to generate gene-level features. 381 The collapsed features are described in Extended Data Tables 1 and 3. 382

Allele-specific expression (ASE) 383
We only considered sites with at least 30 total reads and at least five reads supporting each of the 384 reference and alternate alleles. To minimize the effect of mapping bias, we filtered out sites that showed 385 mapping bias in simulations 41 , that were in low mappability regions 386 (ftp://hgdownload.cse.ucsc.edu/gbdb/hg19/bbi/wgEncodeCrgMapabilityAlign50mer.bw), or that were 387 rare variants or within 1 kb of a rare variant in the given individual (the variants were extracted from 388 GTEx exome sequencing data). The first two filters were provided in the GTEx ASE data release (Aguet 389 et al., GTEx cis-eQTL paper, co-submitted). The third filter was applied to eliminate potential mapping 390 artefacts that mimic genetic effects from rare variants. We measured ASE effect size at each testable site 391 as the absolute deviation of the reference allele ratio from 0.5. For each gene, all testable sites in all 392 tissues were included. We compared ASE in single-tissue and multi-tissue outliers at different Z-score 393 thresholds to non-outliers using a two-sided Wilcoxon rank sum test. To obtain a matched background, 394 we only included a gene in the comparison when ASE data existed for both the outlier individual and at 395 least one non-outlier. In the case of single-tissue outliers, we also required the tissue to match between the 396 outlier and the non-outlier. All individuals that were neither multi-tissue outliers for the given gene nor 397 single-tissue outliers for the gene in the corresponding tissue were included as non-outliers. 398 were assigned a count of 0. 405

Enrichment of genes with multi-tissue outliers as eGenes 406
We defined multi-tissue eGenes using two approaches. The  which the gene appeared as a significant eGene (approach 1) or had a shared eQTL effect (approach 2). 418 We compared this value for genes with and without a multi-tissue outlier with a two-sided Wilcoxon rank 419 sum test. 420 Finally, we wanted to show that this enrichment of outlier genes as multi-tissue eGenes was not 421 confounded by gene expression level. To this end, using the Metasoft results, we stratified genes tested 422 for multi-tissue outliers into RPKM deciles and repeated the comparison between genes with and without 423 a multi-tissue outlier. 424

Evolutionary constraint of genes with multi-tissue outliers 425
We obtained gene level estimates of evolutionary constraint from the Exome Aggregation Consortium 26 426 (http://exac.broadinstitute.org/, ExAC release 0.3). By jointly analyzing the patterns of exonic variation in 427 over 60,000 exomes, the ExAC database can be used to rank evolutionary constraint of genes based on 428 their tolerance for synonymous, missense, and loss-of-function variation. We intersected the 17,351 429 autosomal lincRNA and protein coding genes with constraint data from ExAC with the 18,380 genes 430 tested for multi-tissue outliers from GTEx, yielding 14,379 genes for further analysis (3,897 and 10,482 431 genes with and without an outlier, respectively). We examined three functional constraint scores from the 432 ExAC database: synonymous Z-score, missense Z-score, and probability of loss-of-function intolerance 433 (pLI). We defined sets of synonymous and missense intolerant genes as genes with a synonymous or 434 missense Z-score above the 90 th percentile. We defined loss-of-function intolerant genes as those with a 435 pLI score above 0.9 following the guidelines provided by the ExAC consortium. We then tested for the 436 enrichment of genes with multi-tissue outliers in the lists of synonymous, missense, and loss-of- The top 3,879 multi-tissue eGenes were classified as shared eGenes, while the remaining 11,386 genes 443 were considered as a background. The number of shared eGenes was chosen to match the number of 444 multi-tissue outlier genes in the intersection with the ExAC database. 445 We tested for a difference in the mean constraint for genes with multi-tissue outliers and genes with 446 multi-tissue eQTLs using ANOVA. For each of the three constraint scores in ExAC, we treated the score 447 for each gene as the response and the status of the gene as having a multi-tissue outlier and/or a multi-448 tissue eQTL as a categorical predictor with four classes. After fitting the model, we performed Tukey's 449 range test to determine whether there was a significant difference in the mean constraint between genes 450 In the M-step, at the ith iteration, given the current estimates ω (i) , the parameters (β (i + 1) *) are estimated as 483 where λ is an L2 penalty hyper-parameter derived from the Gaussian prior on β. 485 The parameters θ get updated as: 486 !" where I is an indicator operator, t is the binary value of expression E n , s is the possible binary values of 487 FR n , and C is a pseudo count derived from the Beta prior on θ. The  any individual with |median Z-score| ≥ 1.5 as an outlier if the expression was observed in at least five 503 tissues; the remaining individuals were labeled as non-outliers for the gene. In total, we extracted 48,575 504 instances where an individual had at least one rare variant within 10 kb of TSS of a gene. We then 505 incorporated standardized genomic features (G nodes in Fig. 5a) and multi-tissue outlier states (E node in 506 Fig. 5a) as input to RIVER. 507 To train and evaluate RIVER on the GTEx cohort, we first identified 3,766 instances of individual and 508 gene pairs where two individuals had the same rare SNVs near a particular gene. We used these instances 509 for evaluation as described below. We held out those instances and trained RIVER parameters with the 510 remaining instances. RIVER requires two hyper-parameters λ and C. To select λ, we first applied a 511 multivariate logistic regression with features G and response variable E, selecting lambda with the 512 minimum squared error via 10-fold cross-validation (we selected λ = 0.01). We selected C = 50, informed 513 simply by the total number of training instances available, as validation data was not available for 514 extensive cross-validation. Initial parameters for EM were set to θ = (P(E = 0 | FR = 0), P(E = 1 | FR = 0), 515 P(E = 0 | FR = 1), P(E = 1 | FR = 1)) = (0.99, 0.01, 0.3, 0.7) and β from the multivariate logistic 516 regression above, although different initializations did not significantly change the final parameters 517 (Extended Data Table 6). 518 The 3,766 held out pairs of instances from individuals with an identical rare variant were used to create a 519 labeled evaluation set. For one of the two individuals from each pair, we estimated the posterior 520 probability of a functional rare variant P (FR | G, E, β, θ). The outlier status of the second individual, 521 whose data was not observed either during training or prediction, was then treated as a "label" of the true 522 status of functional effect FR. Using this labeled set, we compared the RIVER score to the posterior P(FR 523 | G, β) estimated from the plain multivariate logistic regression model with genomic annotations alone. 524 We produced ROC curves and computed AUC for both models, testing for significant differences using 525 DeLong's method 29 . This metric relies on outlier status reflecting the consequences of rare variants-526 pairs of individuals who share rare variants tend to have highly similar outlier status even after regressing 527 out effects of common variants (Kendall's tau rank correlation, P < 2.2 x 10 -16 ). As a second metric, we 528 also evaluated performance of both the genomic annotation model and RIVER by assessing ASE. We 529 tested the association between ASE and model predictions using Fisher's Exact Test. High allelic 530 imbalance, defined by a top 10% threshold on median absolute deviation of the reference-to-total allele 531 ratio from an expected ratio (0.5) across 44 tissues, was compared to posterior probabilities of rare 532 variants being functional from both models with four different thresholds (top 10% -40%). 533

Supervised model integrating expression and genomic annotation 534
To assess the information gained by incorporating gene expression data in the prediction of functional 535 rare variants, we applied a simplified supervised approach to a limited dataset. We used the instances 536 where two individuals had same rare variants to create a labeled training set where the outlier status of the 537 second individual was used as the response variable. We then trained a logistic regression model with just 538 two features: 1) the outlier status of the first individual and 2) a single genomic feature value such as 539 CADD or DANN. We estimated parameters from the entire set of rare-variant-matched pairs using 540 logistic regression to determine the log odds ratio and corresponding P-value of expression status as a 541 predictor. While this approach was not amenable to training a full predictive model over all genomic 542 annotations jointly, given the limited number of instances, it provided a consistent estimate of the log 543 odds ratio of outlier status. We tested five genomic predictors: CADD, DANN, transcription factor 544 binding site annotations, PhyloP scores, and one aggregated feature, posterior probability from a 545 multivariate logistic regression model learned with all genomic annotations (Logistic) (Extended Data 546 Table 4). 547

RIVER assessment of pathogenic ClinVar variants 548
We downloaded pathogenic variants from the ClinVar database 30 (accessed 04/05/2015). We searched for 549 the presence of any of these disease variants within the set of rare variants segregating in the GTEx 550 cohort. Using the ClinVar database, we then manually curated this set of variants, classifying them as 551 pathogenic only if there was supporting clinical evidence of their role in disease. Specifically, any disease 552 variant reported as pathogenic, likely pathogenic, or a risk factor for disease was considered pathogenic. 553 To explore RIVER scores for those pathogenic variants, all instances were used for training RIVER. We 554 then computed a posterior probability P (FR | G, E, β, θ) for each instance coinciding with a pathogenic 555 ClinVar variant. 556

Stability of estimated parameters with different parameter initializations 557
We tried several different initialization parameters for either β or θ to explore how this affected the 558 estimated parameters. We initialized a noisy β by adding K% Gaussian noise compared to the mean of β 559 with fixed θ (for K = 10, 20, 50 100, 200, 400, 800). For θ, we fixed P(E = 1 | FR = 0) and P(E = 0 | FR = 560 0) as 0.01 and 0.99, respectively, and initialized (P(E = 1 | FR = 1), P(E = 0 | FR = 1)) as (0.1, 0.9), (0.4, 561 0.6), and (0.45, 0.55) instead of (0.3, 0.7) with β fixed. For each parameter initialization, we computed 562 Spearman rank correlations between parameters from RIVER using the original initialization and the 563 alternative initializations. We also investigated how many instances within top 10% of posterior 564 probabilities from RIVER under the original settings were replicated in the top 10% of posterior 565 probabilities under the alternative initializations (Accuracy in Extended Data Table 6). 566