Genome-wide human brain eQTLs: In-depth analysis and insights using the UKBEC dataset

Understanding the complexity of the human brain transcriptome architecture is one of the most important human genetics study areas. Previous studies have applied expression quantitative trait loci (eQTL) analysis at the genome-wide level of the brain to understand the underlying mechanisms relating to neurodegenerative diseases, primarily at the transcript level. To increase the resolution of our understanding, the current study investigates multi/single-region, transcript/exon-level and cis versus trans-acting eQTL, across 10 regions of the human brain. Some of the key findings of this study are: (i) only a relatively small proportion of eQTLs will be detected, where the sensitivity is under 5%; (ii) when an eQTL is acting in multiple regions (MR-eQTL), it tends to have very similar effects on gene expression in each of these regions, as well as being cis-acting; (iii) trans-acting eQTLs tend to have larger effects on expression compared to cis-acting eQTLs and tend to be specific to a single region (SR-eQTL) of the brain; (iv) the cerebellum has a very large number of eQTLs that function exclusively in this region, compared with other regions of the brain; (v) importantly, an interactive visualisation tool (Shiny app) was developed to visualise the MR/SR-eQTL at transcript and exon levels.

The genotypes were then compiled into a matrix X with dimensions nSNP.pc × nsample.
An overall similar distribution of allele frequencies to that produced in the LE simulations but with correlated SNPs was also produced. These allele frequencies along with matrix X was used to generate expression phenotypes as with the LE simulations and so forth. Table 1. Genotyping error parameters with each cell reporting the probability of returning an "observed genotype" column given the "true genotype" row.

Supplementary
Consequently, each row sums to unity. The appropriate row of the table, given the "true" SNP genotypes simulated, was used as event probabilities. For example, given that the true genotype is AA, there is a probability of 0.000496583 that the error genotype will be returned as AB. The resulting probability vector was used to generate SNPs with genotype error from multinomial distributions and used as the genotyping data for MatrixEQTL 2 , as opposed to the "true" genotype matrix.

S4 Designation of cis-vs trans-eQTLs
The following plots (S4.1, S4.2, S4. 3) show distribution of distances between the transcript and the SNP, for eQTLs where the transcript and the SNP are located on the same chromosome (shown on a logarithm base 10 scale). It shows a clear separation in frequency at distances between 10 6 and 10 7 base pairs. Importantly, this same separation is seen across all brain regions and all chromosomes, suggesting that using a common cut-point is appropriate. Thus, a cut-point of 10 6.5 = 3.16 Mb was used to classify eQTLs cis versus trans. A bell-shaped distribution can be observed when transcripts are closer to the SNP but there is another smaller bell-shaped curve pass the cut-point of 10 6.5 (i.e. the dotted red-line).

S5 Comparison with previous eQTL analysis
The data analysed and presented in this paper are the UKBEC dataset. Previous analyses of the same data set were reported by Ramasamy, et al. 3 . As somewhat different approaches have been used (e.g. SNP selection, definition of cis-vs trans-acting eQTLs), the number of transcript-level eQTLs identified in common, or by the other data set only, have been tabulated, separately for each brain region (Supplementary Table 2). What is considered to be 'the same' eQTL has been evaluated in terms of SNP position within 0.25 Mb, 0.5 Mb, and 1 Mb. In addition, some different aspects and eQTL mapping procedures were applied here. As shown in Supplementary Table 2, very similar numbers of eQTLs were mapped in the original study 3 and the present study, even when considered by region. There are relatively few changes when the definition of the 'same eQTLs' is changed from within 0.25 Mb to within 2 Mb, but as expected, more are flagged as the "same eQTLs" as the interval is increased. Of the eQTLs mapped in the present study, the majority (over 90% in some regions) were usually found in the original study 3 study. However, it is apparent that a much smaller proportion of eQTLs mapped by the original study 3 were detected in the current study. Possibly, this difference relates to the initial filtering of SNPs undertaken prior to eQTL mapping, and the selection of the SNP representing an eQTL (i.e. the most significant SNP association within an LD block). (R) and the present study (S). Note that the eQTLs used for comparison in this table were the total number of eQTLs found at the transcript-level. It is also showing the percentage mapped by S of the eQTLs mapped by R, and the percentage mapped by R of the eQTLs mapped by S. For the two inclusion percentages, these are shown within varying distances between SNP locations in the two studies. For example, of the 2,142 eQTLs mapped in the cerebellum (CRBL) by R, 43.5% were mapped in S with SNP locations within < 0.25 Mb. As the distance between the SNP locations (i.e. SNP location in R versus SNP location in S) increases, the percentage of eQTLs mapped also increases (whether it is R in S or vice versa).

S6 Simulation to evaluate FDR for cis-acting vs trans-acting eQTLs
A single LE simulation was run as described in the materials and methods with MatrixEQTL 2 output from all possible associations kept (i.e. all 400,000,000 possible SNP-transcript combinations). A cis-acting eQTL was defined as a SNP at the same location as a transcript 8 while a trans-acting eQTL was everything else (i.e. 20,000 potential cis-acting eQTLs, 399,980,000 potential trans-acting eQTLs). A new FDR was calculated based on the Pvalues of cis-acting and trans-acting eQTLs separately.    However, sensitivity is very low across all sample sizes and effect size thresholds, with the greatest level of sensitivity (i.e. when n = 300, k = 3) being less than 0.14. At lower effect size thresholds k ≤ 1, sensitivity to detect eQTLs across sample sizes (except n = 50) is comparable. As both sample size and effect size threshold increase, the sensitivity to detect eQTLs increases. However, sensitivity is very low across all sample sizes and effect size thresholds, with the greatest level of sensitivity (i.e. when n = 300, k = 3) being less than 0.14. At lower effect size thresholds k ≤ 1, sensitivity to detect eQTLs across sample sizes (except n = 50) is comparable, especially true when n ≥ 200. when eQTL expression levels were simulated with lower variance (σ 2 β) than model residual error variance (σ 2 ε). As both sample size and effect size threshold increase, the sensitivity to detect eQTLs increases. However, sensitivity is very low across all sample sizes and effect size thresholds, with the greatest level of sensitivity (i.e. when n = 300, k = 2.9) being less than 0.16, though overall, sensitivity in this scenario is higher when compared to previous scenarios. At lower effect size thresholds k ≤ 1, sensitivity to detect eQTLs across sample sizes (except n = 50) is comparable. At higher sample sizes (n ≥ 200), there seems to be an unexpected decrease in sensitivity at higher effect thresholds (k ≥ 2.9).      The averages can be seen to stabilise the variability for a smoother and centred curve.        SNP rs5760176 effect on GSTT1 expression for all ten brain regions. Increased expression was associated with the homozygous minor allele (AA) in all ten regions. Figure S21. Boxplots of the effect of rs1133328 on the EFHB transcript expression levels.
SNP rs1133328 effect on EFHB expression for all ten brain regions. Decreased expression was associated with the homozygous genotype (GG) in all ten regions. CRBL shows a different pattern of lower expression.  SNP rs4688690 effect on ZCCHC13 expression for all ten brain regions. The AA homozygous genotype is associated with a decrease in expression level of ZCCHC13. This decrease is more obvious than the rs10886711-PPAPDC1A ( Figure S22) across all regions, however, it is also only significant in CRBL.