Brain expression quantitative trait locus and network analyses reveal downstream effects and putative drivers for brain-related diseases

Identification of therapeutic targets from genome-wide association studies (GWAS) requires insights into downstream functional consequences. We harmonized 8,613 RNA-sequencing samples from 14 brain datasets to create the MetaBrain resource and performed cis- and trans-expression quantitative trait locus (eQTL) meta-analyses in multiple brain region- and ancestry-specific datasets (n ≤ 2,759). Many of the 16,169 cortex cis-eQTLs were tissue-dependent when compared with blood cis-eQTLs. We inferred brain cell types for 3,549 cis-eQTLs by interaction analysis. We prioritized 186 cis-eQTLs for 31 brain-related traits using Mendelian randomization and co-localization including 40 cis-eQTLs with an inferred cell type, such as a neuron-specific cis-eQTL (CYP24A1) for multiple sclerosis. We further describe 737 trans-eQTLs for 526 unique variants and 108 unique genes. We used brain-specific gene-co-regulation networks to link GWAS loci and prioritize additional genes for five central nervous system diseases. This study represents a valuable resource for post-GWAS research on central nervous system diseases.

Identification of therapeutic targets from genome-wide association studies (GWAS) requires insights into downstream functional consequences. We harmonized 8,613 RNA-sequencing samples from 14 brain datasets to create the MetaBrain resource and performed cis-and trans-expression quantitative trait locus (eQTL) meta-analyses in multiple brain region-and ancestry-specific datasets (n ≤ 2,759). Many of the 16,169 cortex cis-eQTLs were tissue-dependent when compared with blood cis-eQTLs. We inferred brain cell types for 3,549 cis-eQTLs by interaction analysis. We prioritized 186 cis-eQTLs for 31 brain-related traits using Mendelian randomization and co-localization including 40 cis-eQTLs with an inferred cell type, such as a neuron-specific cis-eQTL (CYP24A1) for multiple sclerosis. We further describe 737 trans-eQTLs for 526 unique variants and 108 unique genes. We used brain-specific gene-co-regulation networks to link GWAS loci and prioritize additional genes for five central nervous system diseases. This study represents a valuable resource for post-GWAS research on central nervous system diseases.
Psychiatric and neurological diseases continue to be a massive global health burden: The World Health Organization estimated that in 2019, globally 280 million individuals were affected by depression, 39.5 million by bipolar disorder and 287.4 million by schizophrenia (SCZ) 1 . Similarly, the number of people living with dementia is expected to rise from 50 million today to 152 million by 2050 (ref. 2 ), with similar trajectories for other neurodegenerative diseases. Although substantial progress has been made in uncovering the genetic basis Article https://doi.org/10.1038/s41588-023-01300-6 traits. Our analyses prioritize probable causal genes and reveal cell type-dependent eQTLs that may be associated with disease risk (Fig. 1).
To facilitate future studies, we have made all summary statistics and the co-expression network derived from our resource available at www.metabrain.nl.
To translate from genetic signals to mechanisms, associations with gene expression levels or expression quantitative trait loci (eQTL) have shown great potential. Cis-eQTLs (nearby) and trans-eQTLs (distal) can aid the interpretation of GWAS loci in several ways. Cis-eQTLs provide direct links between genes and phenotypes through causal inference approaches such as Mendelian randomization (MR) and genetic co-localization analyses, whereas trans-eQTLs expose sets of downstream genes and pathways on which the effects of disease variants converge.
Expression quantitative trait loci are dynamic features and vary with tissue, cell type and additional factors such as response to stimulation. Therefore, eQTLs from disease-relevant tissues are desired for optimal interrogation of GWAS loci 3 . Previous brain eQTL meta-analyses by the PsychENCODE 4 (n = 1,866) and AMP-AD 5 (n = 1,433) consortia have been published to help interpret neurodegenerative and psychiatric disease GWAS loci. However, results from statistical approaches such as MR and co-localization are improved by robust effect-size estimates from even larger carefully curated eQTL datasets. In addition, large sample sizes are better suited to decompose eQTL effects to specific cell types.
To maximize the potential of eQTL-based analyses of the brain, we combined and rigorously harmonized brain RNA-sequencing (RNA-seq) and genotype data from 14 different cohorts, including 8,613 RNA-seq samples from all major brain eQTL studies, and publicly available samples from the European Nucleotide Archive (ENA). We created a gene co-regulation network based on 8,544 RNA-seq samples covering different brain regions and performed cis-and trans-eQTL analyses of up to 2,683 individuals of European ancestry (EUR), with replication in up to 319 individuals of African ancestry (AFR). We made inferences on the brain cell types in which eQTLs operate and systematically conducted MR and co-localization analyses to find shared genetic effects between eQTLs and 31 brain-related GWAS We downloaded publicly available RNA-seq and genotype data from 14 different datasets consisting of 8,613 RNA-seq measurements from seven main brain regions and 6,518 genotype samples. We created six eQTL meta-analysis datasets and performed cis-, trans-and interaction-eQTL analyses, built a brain-specific gene co-regulation network and prioritized genes using MR, co-localization and the co-regulation network. Image of sagittal cut of brain created with BioRender.com. This figure summarizes  values from Supplementary Tables 1, 3 Article https://doi.org/10.1038/s41588-023-01300-6 (n = 208) as a repli cation dataset. Cis-eQTLs were not calculated for the amygdala and hypothalamus tissue groups due to the small sample size (n < 100).

54% of cortex cis-eQTLs have independent associations
Within each discovery dataset, we performed a sample size-weighted cis-eQTL meta-analysis on common variants (minor allele frequency (MAF) > 1%), within 1 megabase (Mb) of the transcription start site (TSS) of a protein-coding gene. We identified 1,880 Basal ganglia-EUR, 10 . We next performed conditional analysis to identify independent associations in each cis-eQTL locus. In Cortex-EUR, 8,815 genes had a significant secondary cis-eQTL (54% of cis-eQTL genes identified in this dataset), 4,489 genes had tertiary and 2,065 had quaternary cis-eQTLs. We also identified secondary associations for the other discovery datasets, albeit to a lesser extent (Fig. 3a  and Supplementary Tables 2,3). The properties of the Cortex-EUR cis-eQTLs conform to studies performed in blood 14 and brain 15 : primary lead cis-eQTL single nucleotide polymorphisms (SNPs) were generally located close to the TSS (median distance, 33.6 kilobases (kb)) and cis-eQTL genes had a lower probability for loss-of-function intolerance (pLI > 0.9; χ 2 P = 2.48 × 10 −83 ). Genes with a cis-eQTL generally had a higher median expression (Wilcoxon P = 5.5 × 10 −174 ; Fig. 3b); the other properties of cis-eQTLs were very comparable with earlier reports (Supplementary Note, Supplementary Fig. 11 and Supplementary Table 4).

High eQTL agreement between ancestries and brain regions
We investigated ancestry, brain region, dataset and tissue-type differences in cis-eQTLs. Agreement between ancestries was high: allelic concordance (AC) and correlation of effect-size (R b ) estimates were high when different ancestries were compared (R b > 0.78, AC > 92.95%; Fig. 3c, Supplementary Fig. 12 and Supplementary Table 5). The proportion of estimated true-positives (π 1 ) and correlation of allelic fold change (caFC) estimates between ancestries were lower, potentially due to differences in sample size (for example, Cortex-EUR versus Cortex-EAS, caFC = 0.55 and π 1 = 0.29; conversely, caFC = 0.85 and π 1 = 0.95; Supplementary Fig. 12). Similarly, different brain regions showed high overall agreement (R b > 0.76, caFC > 0.65, and AC > 91%), with π 1 estimates dependent on the sample size (0.39-0.95). Cerebellum was an exception and showed lower agreement with the cerebral brain regions (Fig. 3d,e and Supplementary Fig. 12). Despite the limited sample size, we identified 477 cis-eQTL genes that are significant in Cerebellum-EUR but not in Cortex-EUR ( Supplementary Fig. 13), perhaps due to low expression in the cortex or because they are regulated by transcription factors that are more active in the cerebellum (Supplementary Note  and Supplementary Table 6). Next, we repeated Cortex-EUR eQTL discovery while excluding GTEx and compared the results with cis-eQTLs from different GTEx tissues (Fig. 3e Article https://doi.org/10.1038/s41588-023-01300-6 (n = 31,684; blood-based, majority EUR ancestry), which supported the low agreement observed in GTEx blood. Of the overlapping eQTLs, 25% had an opposite allelic effect (AC = 75%, R b = 0.52 and π 1 = 0.83; Supplementary Fig. 15 and Supplementary Table 7) (ref. 16 ), which represents an increase over GTEx and suggests that many of the eQTLs are tissue-dependent. Combined, these results suggest that additional tissue-or ancestry-specific eQTLs can be identified when sample sizes increase. For instance, opposite effects may happen if two causal variants reside on the same haplotype but are specific for different tissues 17 , requiring large sample sizes for disentanglement. By revealing eQTLs with opposite allelic effects, our results highlight the relevance of tissue-dependent eQTL mapping to accurately assess the directionality of eQTLs 17 .

14% of cortex cis-eQTLs are dependent on the cell-type proportion
We evaluated the extent to which eQTLs are dependent on cell-type proportions by determining cell-type interaction eQTLs (ieQTLs) 3,18,19 .
In the Cortex-EUR subset, we predicted cell-type proportions using single-cell RNA-seq-derived signature profiles 4 (Supplementary Note and Supplementary Fig. 16). The cell-type proportions and reconstruction accuracy of our predictions (87%) were comparable to a previous study that used this reference profile on a subset of the Cortex-EUR samples 4 . We observed low-to-moderate correlations between predicted cell types (0.01 < Pearson's correlation coefficient (r) < 0.55; Fig. 4a) and high positive correlations with immunohistochemistry (IHC) counts from the ROSMAP cohort 20 (overall Pearson's r = 0.89 and per cell-type Pearson's r > 0.1; Fig. 4b). However, we note that the exact proportion for each cell type remains uncertain 21,22 . We used Decon-QTL 19 to identify ieQTLs for the 25,497 independent Cortex-EUR cis-eQTLs: 3,549 cis-eQTLs (13.9%) showed at least one significant ieQTL (4,095 ieQTLs; Benjamini-Hochberg false discovery rate (BH-FDR) < 0.05; Supplementary Table 8). The largest group of interactions were with excitatory, inhibitory and other neurons (1,627; 39.7%), probably because neurons are the most prevalent cell type. The majority of the ieQTLs (3,090; 75.5%) were uniquely mapped to one cell type (Fig. 4c), although we cannot exclude the possibility that these ieQTLs are also present in other cell types.
We replicated these findings in the Cortex-AFR dataset (n = 319) as well as in two independent single-nucleus RNA-seq (snRNA-seq)   For the mean expression and pLI score values, each row compares the eQTL genes for that rank with eQTL genes from the previous rank (for example, for tertiary eQTLs, the non-significant (grey) distribution is from genes that have secondary but lack tertiary eQTLs). P values were calculated using a Wilcoxon test between significant and non-significant genes. Differences in mean gene expression levels (left), the distance between the most significant SNP-gene combination and the TSS (middle), and pLI scores (right) are shown. For primary, secondary and quaternary eQTLs, non-significant eQTLs have higher pLI scores.
Vertical dotted lines indicate median of the distribution for the current rank (coloured) versus the previous rank (black). c,d, Number of overlapping eQTLs along with the R b and standard error (s.e.) values of primary cis-eQTLs between the cortex eQTLs of different ancestries (c) and the different brain regions for the EUR datasets (d). n, sample size of the dataset. e, Correlation of effect sizes and standard error of primary cis-eQTLs of Cortex-EUR (discovery, excluding GTEx) in all of the GTEx tissues (replication). Each dot is a different GTEx tissue; the x axis indicates the number of eQTLs that are significant in both discovery and replication.  Table 9). Across all replication datasets, we observed moderate-to-high rates of agreement, depending on the cell-type frequency and sample size (Bryois et al. 24 : 0.78 <R b < 0.86, median R b = 0.84, 0.43 < π 1 < 0.83, median π 1 = 0.69, 81% <AC < 90%, median AC = 0.9; Supplementary Note). Examples of replicating ieQTLs include the oligodendrocyte ieQTL genes FAM221A, NKAIN1 and STMN4, which were previously identified as oligodendrocyte-specific 25 , and AMPD3 and CD82, of which the SNPs were previously associated with white-matter microstructure 26 , suggesting a role for oligodendrocytes (Fig. 4d,e and Supplementary Fig. 20a-e). The high replication rates indicate that our approach can accurately identify the cell type for a large number of eQTLs. We note that summary statistics were available for only 54% of ieQTLs in a well-powered replication dataset (Bryois et al. 24 ), suggesting that our approach had the power to detect ieQTLs that are not yet identified in snRNA-seq datasets. These ieQTLs can also provide valuable information about the cell types of interest for disease-associated SNPs. For example, the A allele of variant rs4698412, which is associated with increased risk for Parkinson's disease (PD), also increased the expression of CD38, for which we identified a replicating astrocyte ieQTL ( Fig. 4f and Supplementary Fig. 20f). This gene is an immunomodulatory agent and is mainly expressed in neurons, astrocytes and microglia 27 , and increased levels of CD38 are observed with neuroinflammation (Supplementary Note).

Shared genetic effects between cis-eQTLs and central nervous system traits
We next linked Cortex-EUR cis-eQTLs to variants associated with brain-related traits and diseases. We determined the linkage-disequilibrium (LD) overlap between cis-eQTLs and GWAS SNPs, which indicated that primary eQTLs were 2.6-fold more likely to be in LD with a GWAS SNP compared with non-primary eQTLs (Fisher's exact test, P = 7.4 × 10 −125 ; Supplementary Note and Supplementary  Table 10). To more formally test whether there was evidence for sharing the same genetic effect between cis-eQTLs and 31 neurological traits, we conducted MR using the Wald ratio method and co-localization analyses (Supplementary Table 11). Among the 359,763 Wald ratios tested across 11,270 genes, 1,531 Wald ratios for 1,088 genes passed a suggestive P-value threshold (P < 5 × 10 −5 ; Supplementary Table 12). Of the cis-eQTL instruments from these findings, 294 were also cell-type ieQTLs. There were 549 significant Wald ratios that passed Bonferroni's correction (P < 1.43 × 10 −7 ), from which 186 also co-localized between the eQTL and GWAS traits when using coloc 28 (posterior probability for co-localization of significant signals PP4 > 0.7; Fig. 5a and Supplementary Fig. 21), confirming that the two traits shared the same causal SNP.   Note and Supplementary Table 16). In comparisons of the blood and brain expression levels of these genes in GTEx 30 , SLC12A5 and CCDC155 had almost no expression in blood, whereas expression was comparable between tissues for RGS1, SCO2 and MYNN (Supplementary Note and Supplementary Fig. 25). The discrepancy in MR findings observed between Cortex-EUR and eQTLGen suggest the existence of tissue-dependent genetic effects for MS.

Cell type-specific cis-eQTLs linked to MS
Two MS-associated genes, CYP24A1 and CLECL1, showed cell type-specific cis-eQTLs (Fig. 5c,d). Another gene that was previously suggested to be neuron-specific 31 , SLC12A5, did not show a significant ieQTL in our data. In our analysis, we found that higher CYP24A1 expression is associated with increased risk for MS (Wald ratio = 0.13, P = 7.8 × 10 −11 ) and that the eQTL and GWAS signals are co-localized (PP4 = 1.00). Furthermore, ieQTL analyses showed increasing expression of CYP24A1 with increasing excitatory neuron proportions for the risk allele rs2248137-C (interaction β = 1.92, interaction P = 1.98 × 10 −11 ; Fig. 5c), similar to other neurons (Supplementary Table 12). CYP24A1 encodes for a protein that catalyzes the inactivation of 1,25-dihydroxyvitamin D 3 (calcitriol), the active form of vitamin D 32 . Epidemiological studies have proposed vitamin D deficiency as a risk factor for MS 33,34 , which has recently been validated through MR [35][36][37] . Our findings are consistent with a previous report that indicates a shared signal for MS and CYP24A1 cis-eQTL in the frontal cortex 38 .
Decreased expression of CLECL1 was significantly associated with increased MS risk (Wald ratio = −0.16, P = 1.58 × 10 −9 ) and showed clear co-localization (PP4 > 0.87). The ieQTL analysis indicated that rs7306304-A increased expression of CLECL1 with increased proportions of microglia (interaction β = −2.72, interaction P = 5.09 × 10 −37 ; Fig. 5d), confirming a previous finding of a microglia cell type-specific cis-eQTL for CLECL1 at this MS risk locus 29 . This eQTL also replicates in the microglia single-cell analysis by Bryois et al. 24 (eQTL β = −0.62, P = 3.2 × 10 −20 ) with the same direction of effect. The rs7306304 SNP is in strong LD with the MS lead SNP rs7977720 (r 2 = 0.84) 29 . CLECL1 encodes a C-type lectin-like transmembrane protein expressed at high levels in dendritic and B cells, which has been proposed to modulate immune response 39 . It has 20-fold higher expression in a purified microglia dataset 29 than in cortical tissue, suggesting that decreased CLECL1 increases MS susceptibility through microglia-mediated immune processes in the brain.  Article https://doi.org/10.1038/s41588-023-01300-6

MetaBrain allows for the identification of trans-eQTLs
Trans-eQTLs can identify the downstream consequences of disease-associated variants but their effects are usually small 14 . To maximize power, we combined the Cortex-EUR and -AFR datasets (n = 2,759, excluding the ENA). We reduced the multiple-testing burden by focusing on 228,819 unique genetic variants, including GWAS and cis-eQTL variants.
When correcting for an increasing number of principal components (PCs), we observed a decrease in the number of trans-eQTLs Article https://doi.org/10.1038/s41588-023-01300-6 ( Fig. 6a, Supplementary Note, Supplementary Fig. 7 and Supplementary Table 17) as well as in heterogeneity (Supplementary Fig. 26). The majority (85%) of the trans-eQTLs observed without PC correction were located in a 7p21.3 locus previously associated with frontotemporal lobar degeneration 40 , Alzheimer's disease (AD) 41 as well as changes in neuron proportions 42 and gene expression levels 43,44 . We did not find evidence that these trans-eQTLs were dependent on AD status or neuron proportions and they were not significant when correcting for 100 PCs (Supplementary Note, Supplementary Figs. 26-29 and  Supplementary Tables 17-24). We therefore concentrated on 737 trans-eQTLs, detected after correcting for 100 PCs, which reflect 526 unique SNPs and 108 unique genes; 127 SNPs had trans-eQTL effects on multiple genes and 461 (88%) of the trans-eQTL SNPs were associated with a significant cis-eQTL in Cortex-EUR. We observed that 150 (33%) of the 461 significant trans-eQTL SNPs overlapping a cis-eQTL SNP were also the cis-eQTL index SNP, which represents an enrichment (Fisher's exact test, P = 1.2 × 10 −28 ; Supplementary Note); 29 were also cis-eQTL SNPs in tissues other than the cortex (Supplementary Table 17). This indicates that cis-eQTL index SNPs yield trans-eQTL effects more often in the brain in comparison to other cis-eQTL variants.
We observed trans-eQTLs from multiple independent genomic loci for seven genes, suggesting convergent trans-eQTL effects (ARRDC4, HBG2, POP1, COX7A1, RFPL2, ZNF311 and ZNF404; Supplementary Table 17). This includes a convergent trans-eQTL on hemoglobin subunit ɣ-2 (HBG2; 11p15.4) that was previously identified in blood. HBG2 was affected in trans by two independent variants (rs1427407 on 2p16.1 and rs4895441 on 6q23.3; Fig. 6b), which have previously been associated with fetal hemoglobin levels [45][46][47] . We also found converging effects that were not identified in blood. For example, the ZNF311 gene (6p22.1) was affected by the rs1150668 variant in cis and the rs8106871 variant in trans (19q13.2), both of which have been previously associated with smoking 48 and risk tolerance 49 . For both associations, the risk allele also increased ZNF311 expression. Furthermore, the risk allele rs1150668-G increased the expression of S100A5 in trans, and rs8106871-T decreased the expression of POU2F2 and increased expression of DEDD2 in cis (Fig. 6b). ZNF311 has been suggested to be a tumor-suppressor gene 50 potentially involved in gliomas 51 , S100A5 is used as a biomarker for astrocytomas 52 and POU2F2 has previously been associated with glioblastoma 53 . This example shows how multiple variants associated with smoking may alter multiple genes involved in cancer.

Brain co-regulation networks aid in GWAS interpretation
We generated brain region-specific co-regulation networks based on the RNA-seq data from 8,544 samples (Supplementary Note and Supplementary Figs. 31-33) using a similar approach to our previously developed multi-tissue GeneNetwork (n = 31,499) 54,55 . We applied Downstreamer 56 to SCZ 57 , PD 58 , MS 29 , AD 59 and amyotrophic lateral sclerosis (ALS) GWAS summary statistics 60 , using these networks to prioritize genes that are co-regulated with genes in their GWAS loci (Supplementary Note, Supplementary Fig. 34 and Supplementary Tables 25-30). For MS and AD, these were mostly immunity genes, whereas for PD, ALS and SCZ, these were genes that are specifically expressed in the brain (Supplementary Tables 25-30). For ALS and MS, we additionally created smaller networks for the cerebellum (n = 715) and cortex (n = 6,526) to identify brain region-specific effects.
For ALS, we applied Downstreamer to summary statistics from individuals with EUR ancestry (Supplementary Table 30) and a trans-ancestry meta-analysis including individuals with EUR and Asian ancestry 60 (EUR + ASN; Supplementary Table 25). In contrast, whereas Downstreamer did not identify genes using GeneNetwork 55 (n = 31,499), we identified a set of 27 unique co-regulated genes when applied to the smaller brain co-regulation networks (EUR + ASN summary statistics; Fig. 7a and Supplementary Table 25). Of the identified genes, HUWE1 was shared between the results from all brain regions and separate results from cortex, whereas UBR4 was shared between the cortex and cerebellum results. UBR4 encodes a ubiquitin ligase protein expressed throughout the body, which interacts with calmodulin, a protein regulating Ca 2+ -a process which has been linked to ALS disease-associated genes and motor-neuron vulnerability 61 . Furthermore, a previously discovered private mutation in UBR4 implicates its role in muscle coordination 62 . Many of the genes prioritized by Downstreamer are co-regulated with each other (Fig. 7b) and were enriched for genes implicated in causing gait disturbances (Fig. 7c). Our analysis identified genes that show strong co-regulation with positional candidate genes inside ALS-associated loci, suggesting that they must have a shared biological function.
For MS, the GeneNetwork 55 identified 257 unique genes that showed significant co-regulation with genes inside MS-associated loci ( Fig. 7d and Supplementary Table 29), many of which were immunity genes, which is also expected for this disease. However, when we used the brain co-regulation networks, we identified a much smaller set of genes that showed strong enrichment for the neurotrophin signaling pathway (Fig. 7e,f). Neurotrophins are polypeptides secreted by immunological cell types and promote the survival and proliferation of neurons as well as synaptic transmission (Supplementary Note). The identified genes showed high expression in immunity-related tissues when using the GeneNetwork 55 ( Supplementary Fig. 31a) and high expression in the spinal cord but low expression in cortex samples ( Supplementary Fig. 31b). This could implicate specific brain regions with MS development: for instance, the cerebellum is involved in muscle coordination and ataxia occurs in approximately 80% of patients with symptoms of MS 63 . But this could also implicate the 'outside-in hypothesis' that suggests the immune system may be a potential trigger for MS 29,64 .

Discussion
We describe an integrated analysis of the effects of genetic variation on gene expression levels in the brain with a sufficient sample size to identify robust cis-eQTLs and cell-type ieQTLs to compare cis-eQTLs between ancestries and identify brain trans-eQTLs that emanate from SNPs that have been previously linked to brain-related diseases. We have released harmonized results of the individual datasets to help others determine the robustness of eQTL effects.
We showed that eQTL-effect directions are generally shared between datasets, tissues and ancestries, but note that opposite allelic effects exist, which became apparent especially when we compared our results with a large blood eQTL dataset. We also identified non-primary cis-eQTLs, some of which reflect SNPs that are also the index variants for brain-related disorders, making them particularly interesting for subsequent follow-up.
We predicted cell-type proportions in the cortex and identified 3,549 cell-type ieQTLs. We compared the ieQTLs with snRNA-seq eQTLs and observed that the π 1 estimates were low due to the low sample size or other limitations in the snRNA-seq datasets 24,65,66 . As we observed good R b and AC values, we expect that the overlap and replication rates will improve once the sample sizes of snRNA-seq studies increase, highlighting the potential of ieQTL analysis in bulk RNA-seq. This is a well-powered MR and co-localization analysis using brain cis-eQTLs as instruments for bipolar disease, epilepsy, frontotemporal dementia, MS, cognitive function and years of schooling GWAS outcomes. Interestingly, for SCZ, three signals for CILP2, MAU2 and TM6SF2 met our criteria that had not been reported in a recent Article https://doi.org/10.1038/s41588-023-01300-6 psychiatric genomics consortium study 67 , further emphasizing the value of our large eQTL dataset (Supplementary Note). Our results also identify increased CYP24A1 expression to be associated with MS risk and propose excitatory neurons as the cell type most susceptible to CYP24A1 expression changes and probably active vitamin D levels. The potentially novel role of CYP24A1 in the brain could play an important role in MS etiology, as may lowered expression of CLECL1 in microglia.
The analyses identified trans-eQTLs in the brain cortex for many variants, some of which are brain-specific. Similar to blood, the trans-eQTL-effect sizes in the brain were usually small, emphasizing the importance of increasing the sample size of brain eQTL studies. The identified trans-eQTLs allowed us to gain insights into the functional effect of several disease-associated variants. We observed that trans-eQTL SNPs were enriched for cis-eQTL index SNPs, indicating  consequences beyond the gene regulated in cis. Our trans-eQTL analysis focused on a single brain region and was limited to trait-associated and cis-eQTL SNPs. We expect that a genome-wide approach will identify many more trans-eQTLs. Furthermore, we note that true separation of cis-and trans-eQTL effects would require investigation of the allelic function of the associated SNPs (Supplementary Note). We used brain-specific co-regulation networks to study several brain-related GWAS studies and prioritized genes that show significantly enriched co-regulation with genes inside GWAS loci. For ALS, this revealed a limited but significant set of genes located outside of currently known ALS loci, which might lead to a better understanding of the poorly understood pathways that cause ALS.
This study has several limitations. First, our eQTL analyses were limited to single tissues and excluded replicate RNA-seq measurements. A joint analysis with random-effects models 68,69 could increase the effective sample size, which would be especially useful for trans-eQTL identification. Second, our GWAS overlap analysis may have failed to identify previously identified genes due to differences in sample size, effect size, variant density, LD structure and imputation quality. For example, our results did not include the MAPT gene for AD because the H1/H2 haplotype separating SNP rs8070723 had an eQTL P value of 1.8 × 10 −5 due to our alignment strategy (Supplementary Note). This might have been an issue for other genes as well. Graph-based alignment tools or long-read sequencing methods are required to ultimately determine the true effects on such genes. Third, the GWAS overlap methods we used have known limitations (for example, Supplementary Note). For the MR analysis, we opted to perform single-SNP MR instead of multi-SNP MR (such as inverse-variance weighted 70 ), which requires multiple independent associations per gene. As this was the case for only a limited proportion of the tested cis-eQTLs and there were no genes with more than five independent eQTLs, we reasoned this would not provide for reliable inverse-variance-weighted estimation. Inverse-variance-weighted estimation could potentially be applied on a genome-wide trans-eQTL analysis, resulting in many more independent instruments per gene. However, such an approach would be more susceptible to confounding because of horizontal pleiotropy 71 , where a gene is affected by multiple indirect effects. Finally, our co-localization approach was based on the single-causal-variant assumption, which is not applicable to cis-eQTL genes with multiple independent associations (for example, TREM2; Supplementary Note),  Article https://doi.org/10.1038/s41588-023-01300-6 and therefore we may have failed to detect co-localizing signals in such loci. Recently published co-localization methods 72 do not have this assumption, which may improve future co-localization results. Finally, it is possible that bulk RNA-seq eQTL studies generally capture eQTL effects for genes that are not dosage sensitive and do not cause disruptive downstream consequences 73 . Furthermore, many eQTLs can only be detected in certain contexts 74 for which single-cell experiments are best suited. We expect that this resource will prove valuable for post-GWAS brain research. Our dataset can be utilized to disambiguate GWAS loci, point to causal pathways and prioritize targets for drug discovery. We expect that through future integration with single-cell eQTL studies that have higher resolution but still lower power, our results will help to pinpoint transcriptional effects in specific brain cell types for many disease-associated genetic variants.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41588-023-01300-6.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/. Article https://doi.org/10.1038/s41588-023-01300-6

Dataset collection and description
We collected published bulk brain RNA-seq samples from the AMP-AD consortium (AMP-AD MAYO, ROSMAP and MSBB) 6 , Braineac 7 and the PsychENCODE consortium 8 (Bipseq 4 , BrainGVEX 4 , CMC 9 , CMC_HBCC and UCLA_ASD 4 ) from Synapse.org using the Python 2.7.5 package synapseclient v1.9.2 (ref. 75 ). The NABEC and GTEx datasets were retrieved from NCBI dbGaP and the TargetALS data were provided directly by the investigators. In addition, we collected brain bulk RNA-seq samples from the ENA and performed rigorous QC, resulting in 1,759 samples (Supplementary Note, Supplementary Fig. 1 and Supplementary Table 28). Combined with the other datasets, this yielded a total of 9,363 samples (Supplementary Table 1).

RNA-seq alignment and QC
RNA-seq data were processed using a pipeline built with molgenis-compute 76 . FASTQ files were aligned against the GRCh38. p13 primary assembly using the GENCODE 77 v32 annotation with STAR 78 (v2.6.1c), while excluding patch sequences (Supplementary Note) with the following parameter settings: outFilterMultimapNmax = 1, twopassMode Basic and outFilterMismatchNmax = 8 for paired-end sequences; and outFilterMismatchNmax = 4 for single-end sequences. Gene quantification was performed using STAR, similar to gene quantification using HTSeq 79 , with default settings. The gene counts were then TMM-normalized 80 per cohort using edgeR 81 Supplementary Fig. 4a), <60% reads aligned ( Supplementary Fig. 4b) or <60% unique mapping. Among the RNA-seq samples, 117 did not pass this filter, mostly from GTEx 81 . The other quality measurements were visually inspected but contained no outliers. To identify outliers that had not been captured by these statistics, we performed a PCA-based filtering approach, after which 8,868 samples remained (Supplementary Note and Supplementary  Fig. 5a-c). To adjust for between-dataset differences observed in the data ( Supplementary Fig. 6a), we correlated the RNA-seq data with 77 covariates from the different QC tools and regressed-out the top-20 correlated covariates using ordinary least squares (OLS; Supplementary Note), after which clustering of datasets in PC1 and PC2 were no longer present ( Supplementary Fig. 6b).
Our collection of RNA-seq samples consisted of 36 different tissue labels, many of which were represented by only a few samples. Therefore, we next defined major brain regions present in our dataset, including samples from the amygdala, basal ganglia, cerebellum, cortex, hippocampus and spinal cord. We noted that some samples (especially from the ENA) were not annotated with a specific major brain region. To resolve this, we performed PCA over the sample correlation matrix and then performed k-nearest neighbors on the first two PCs (k = 7) to classify samples to the major brain regions. Using this approach, we defined a set of 86 amygdala, 574 basal ganglia, 723 cerebellum, 6,601 cortex, 206 hippocampus, 252 hypothalamus and 285 spinal cord samples ( Fig. 2a and Supplementary Table 1).

Genotype QC and definition of eQTL datasets
The genotype data for the included datasets were generated using different platforms, including genotypes called from whole-genome sequencing (AMP-AD, TargetALS 12 and GTEx 3 ), genotyping arrays (NABEC 11 and Braineac 7 ) and haplotype reference consortium 86 -imputed genotypes (PsychENCODE datasets), or were called from RNA-seq directly (ENA dataset; Supplementary Note). A total of 22 different genotyping datasets were available, reflecting 6,658 genotype samples (Supplementary Table 1). We performed QC on each dataset separately, using slightly different approaches per platform (Supplementary Note) and used a PCA-based approach to assign ancestries to each individual sample. Most of the included samples were of EUR ancestry: 5,138 samples had an EUR assignment, 805 samples had an AFR assignment and 573 samples were assigned to the other ancestries ( Fig. 2b and Supplementary Table 1). We next assessed links between RNA-seq samples and genotyped individuals, and were able to identify 7,644 links (Supplementary Table 1). For eQTL discovery, we grouped these links based on brain region and ancestry, in which we required at least 30 samples spanning at least two cohorts. We next removed sample mix-ups and duplicate samples (Supplementary Note), resulting in the following final eQTL discovery datasets: Basal ganglia-EUR (n = 208), Cerebellum-EUR (n = 492), Cortex-EUR (n = 2,683), Cortex-AFR (n = 319), Hippocampus-EUR (n = 208) and Spinal cord-EUR (n = 108; Fig. 2c and Supplementary Table 1).

eQTL analysis
We performed cis-eQTL analysis in each of the eQTL discovery datasets by calculating Spearman correlations within each cohort, followed by a sample size-weighted z-score meta-analysis approach, as described previously 14 . We opted for this approach to minimize the influence of potential heterogeneity between cohorts and showed that it performs comparably to FastQTL/QTLTools 87 (Supplementary Note and Supplementary Fig. 35). To correct for multiple testing, we used an approach similar to FastQTL/QTLTools 87 , where we used 1,000 permutations of the sample labels to fit a β-distribution per gene and, after adjustment using this distribution, calculated the q-values 88 over the top association per gene to determine significance (Supplementary Note). Genes with q-value < 0.05 were deemed significant. We limited these analyses to 19,373 protein-coding genes and to SNPs located within 1 Mb of the TSS, with MAF > 1% and Hardy-Weinberg P > 0.0001. The RNA-seq data were corrected for up to 20 technical covariates, dataset indicator variables and four multidimensional scaling components derived from the genotype data using OLS. In addition, we evaluated the impact of regressing out increasing numbers of PCs and defined the optimal numbers of PCs to remove for each eQTL discovery dataset (Supplementary Note and Supplementary Fig. 7). To identify secondary, tertiary, quaternary and other non-primary cis-eQTLs, we repeated the procedure in an iterative conditional approach, where in each subsequent iteration, we regressed out the cis-eQTL effect of the previous iterations using OLS and identified cis-eQTLs using the residuals, followed by an LD pruning step to circumvent SNP missingness between included cohorts (Supplementary Note).
To identify trans-eQTLs, we performed a limited analysis to reduce the multiple-testing burden by focusing on 228,819 variants with a known interpretation. This set constituted variants that were either previously associated with traits, having a GWAS P < 5 × 10 −8 in the IEU OpenGWAS database 89 or EBI GWAS catalog 90 on 3 May 2020, and additional neurological traits (Supplementary Table 17) or that showed an association with q-value < 0.05 in any of our discovery cis-eQTL analyses (including non-primary associations identified in the iterative conditional analysis). To maximize power, we combined the Cortex-EUR and Cortex-AFR datasets but excluded the ENA cohort due to the potential for genotypes of poorer quality (n = 2,759; Supplementary Note). For this dataset, we also repeated the cis-eQTL analysis (Supplementary Table 2) and normalization approach, including the selection of optimal number of 100 PCs to regress out ( Supplementary Fig. 7). We assessed those combinations of SNPs and genes where the SNP-TSS distance was >5 Mb or where the gene and SNP were on different chromosomes as trans-eQTLs. To determine significance, we employed a previously used FDR estimation method 91 using ten genome-wide permutations (Supplementary Note) and deemed trans-eQTLs with an FDR < 0.05 significant. We finally used an alignment-based approach to detect potential crossmapping artifacts, after which the FDR estimates were recalculated (Supplementary Note).

eQTL agreement
We used four different measurements of agreement of eQTL effects when comparing different brain regions or tissues: AC, π 1 , R b and caFC 92 . Each of these measures evaluates different aspects of replication: AC is an indication of the proportion of effects that have a shared direction of effect within the set of eQTLs that is significant in both the discovery and replication datasets and is expected to be 50% for random eQTL effects, π 1 93 estimates the proportion of eQTL effects that are true positive in the replication cohort but does not take into account the effect direction and can be dependent on the replication dataset sample size, R b 94 effectively estimates the correlation between the eQTL-effect slopes (for example, β values from linear regression) while controlling for potential covariance in standard errors of those slopes and caFC measures the correlation between estimates of the fold change in expression values between alleles 92 . We note that whereas AC, π 1 and R b can be calculated from summary statistics, caFC requires access to genotype and expression data. We therefore limited the caFC analysis to comparisons within the MetaBrain datasets and comparisons with the GTEx tissues. Details on how we calculated these measures of agreement are in the Supplementary Note. For AC, π 1 and R b comparisons with GTEx, we used the summary statistics for GTEx-v8 that were downloaded from the GTEx portal website. Calculations of the aFC per eQTL were performed with the original aFC script by Mohammadi and colleagues 92 using the settings -log_xform 1 and -log_base 2, after which Pearson correlation was used to calculate caFC across shared eQTLs. For all comparisons with GTEx tissues, we performed discovery in Cortex-EUR while excluding the GTEx cohort.

Identification of cell type-dependent eQTL effects in bulk RNA-seq
We predicted cell-type proportions of the MetaBrain Cortex-EUR and -AFR datasets using the method and single-cell profiles previously published by the PsychENCODE consortium 4 (Supplementary Note). We decided to discard the developmental cell types as we expected that these cell types are very rare or not present in adult human brain and because their signatures were obtained from fetal cells. The remaining cell types included all major cell types in the brain: neurons (excitatory, inhibitory and other), oligodendrocytes, astrocytes, microglia and endothelial cells. We then predicted the cell-type proportions as previously described 4 (Supplementary Note). However, to enable the joint analysis of samples, we chose to correct the log 2 -transformed transcript-per-million gene counts for 20 RNA-seq quality metrics using OLS as we observed that this removed dataset biases in the predictions. To maintain the information captured by relative expression differences between genes required for deconvolution, we rescaled the residuals to the original log 2 -transformed mean and standard deviation, and replaced negative values with zero. For the deconvolution step, we used the non-negative least squares 95 implementation in SciPy (v1.4.1) 96 . Given that the average proportions of cell subtypes were often very low (that is, <1%; Supplementary Fig. 16), we opted to sum all the subtypes of cells for excitatory neurons, inhibitory neurons and oligodendrocytes (oligodendrocyte precursor cells and oligodendrocytes). We observed a high correlation between the predicted cell proportions and PCs, indicating that cell-proportion differences contribute to a substantial variance in bulk gene expression levels ( Supplementary Fig. 36).
Using the predicted cell-type proportions, we aimed to identify cell type-dependent eQTLs. To increase the robustness of our results, we excluded 50 samples with a cell-proportion z-score > 4 on one or more cell type and limited the analysis to eQTLs with <95% missingness per dataset, a joint MAF > 5% and a joint Hardy-Weinberg P < 0.0001. With the remaining 25,497 eQTLs and 2,633 samples, we used Decon-QTL 19 , which employs a non-negative least-squares model to identify cell-type interaction effects. For this analysis, we used the steps as described in the Decon-QTL manuscript 19 . For the pre-processing of the TMM expression counts, we corrected for dataset indicator variables, 20 RNA-seq alignment metrics and four genotype multidimensional scaling components using OLS. As an additional step, we forced the data to the normal distribution per gene to reduce outliers. Finally, we evaluated whether the multiple-testing correction applied by Decon-QTL properly reflects the null distribution by comparing a permutation-based method to the default BH-FDR multiple-testing correction. We found that the vast majority (87.76%) of FDR significant interactions were also significant using permutations (Supplementary Note and Supplementary Figs. 37,38).
To confirm cell type-specific eQTL effects identified in Cortex-EUR, we used three replication datasets: Cortex-AFR and the snRNA-seq datasets Bryois et al. 24 and ROSMAP 4 . For the Cortex-AFR replication, we applied the same cell-type prediction and Decon-QTL interaction analysis as for Cortex-EUR. Over the ieQTLs significant in Cortex-EUR, we calculated BH-FDR estimates and deemed ieQTLs with a BH-FDR < 0.05 as significant. Given that Decon-QTL does not return any standard errors, we predicted β and standard errors using the sample size, MAF, interaction β and interaction P value 97 to calculate the R b metrics. For the ROSMAP dataset, encompassing 80,660 single-nucleus transcriptomes from the prefrontal cortex of 48 individuals with varying degrees of AD pathology 23 , we re-processed the expression matrix to create a pseudo-bulk expression matrix for each broad cell type and subsequently mapped cis-eQTLs using the same procedure as the trans-eQTL analysis in bulk data (Supplementary Note and Supplementary Figs. 39,40). To correct for multiple testing, we confined the analysis to only test for primary cis-or trans-eQTLs that had a significant interaction with one or more cell types in MetaBrain Cortex-EUR (BH-FDR < 0.05), while also permuting the sample labels 100 times. Finally, we attempted replication of our findings using the snRNA-seq eQTL summary statistics of the recent preprint by Bryois and colleagues 24 . We overlapped their summary statistics with the Cortex-EUR cis-eQTLs and found (depending on the cell type) that between 9,402 and 13,764 overlapped. We calculated a BH-FDR on the P values of the ieQTLs that were significant in Cortex-EUR in the respective cell type. Given that the summary statistics did not include standard errors or MAF values, we predicted β and standard errors using the MetaBrain Cortex-EUR MAF together with the eQTL sample size, β and P value 97 from Bryois et al. to calculate the R b metrics.

Single-SNP MR analysis
We conducted MR between the Cortex-EUR eQTLs and 31 neurological traits (21 neurological disease outcomes, two quantitative traits and eight brain-volume outcomes; Supplementary Table 11). For this purpose, we used the Wald ratio method, which computes the change in disease risk per s.d. change in gene expression, explained through the cis-eQTL instrument(s) for that gene. To obtain our instruments, Cortex-EUR eQTLs at a genome-wide significant P-value threshold (P < 5 × 10 −8 ) were selected and then LD-clumped using the ld_clump() function in the ieugwasr package v0.1.4 (ref. 98 ) with the default settings (10,000 kb clumping window with r 2 cut-off of 0.001 using the 1000 Genomes EUR reference panel). SNP associations for each of the eQTL instruments were then looked up in the outcome GWAS. If the SNP could not be found in the outcome GWAS using the dbSNP rsid, then a proxy search was performed to extract the next closest SNP available in terms of pairwise LD, providing a minimum r 2 threshold of 0.8 with the eQTL. These steps were performed using the associations() function in the ieugwasr package. To ensure correct orientation of effect alleles between the eQTL and outcome associations, the SNP effects were harmonized using the harmonise_data() function in TwoSampleMR 70 selecting Action 2, which assumes that the alleles are forward-stranded in the GWASs (so no filtering or re-orientation of alleles according to frequency was conducted on the palindromic SNPs). Single-SNP MR was then performed using the mr_singlesnp() function in TwoSampleMR. We reported all of the MR findings that passed a P-value threshold of Article https://doi.org/10.1038/s41588-023-01300-6 5 × 10 −5 but note that the Bonferroni-corrected P = 0.05 threshold for multiple-testing correction is P = 1.43 × 10 −7 .

Co-localization
Following the MR analysis, co-localization analysis was performed on the MR findings that passed the suggestive threshold to determine whether the eQTL and trait shared the same underlying signal. We ran co-localization 28 using both the default parameters (p1 = p2 = 10 −4 and p12 = 10 −5 ) and parameters based on the number of SNPs in the region (p1 = p2 = 1/(number of SNPs in the region) and p12 = p1 / 10). We considered the traits to co-localize if either of the parameter runs yielded PP4 > 0.7. In addition, a systematic co-localization analysis was performed to compare findings between Cortex-EUR eQTLs and other existing cortex eQTL datasets (Supplementary Note).

Ethical compliance
All cohorts included in this study enrolled participants with informed consent and collected and analyzed data in accordance with ethical and institutional regulations. Information about individual institutional review board approvals is available in the original publications for each cohort. Where applicable, data access agreements were signed by the investigators previous to acquisition of the data, either to the UMCG (AMP-AD, CMC, GTEx, CMC_HBCC, BrainSeq, UCLA_ASD, BrainGVEX, BipSeq and NABEC) or Biogen (TargetALS and Braineac), which state the data usage terms. To protect the privacy of the participants, data access was restricted to the investigators of this study, as defined in those data access agreements. Per data use agreements, only summary level data is made publicly available and strictly mentioned in the disclaimer that they cannot be used to re-identify study participants.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.