Expression quantitative trait loci-derived scores and white matter microstructure in UK Biobank: a novel approach to integrating genetics and neuroimaging

Expression quantitative trait loci (eQTL) are genetic variants associated with gene expression. Using genome-wide genotype data, it is now possible to impute gene expression using eQTL mapping efforts. This approach can be used to analyse previously unexplored relationships between gene expression and heritable in vivo measures of human brain structural connectivity. Using large-scale eQTL mapping studies, we computed 6457 gene expression scores (eQTL scores) using genome-wide genotype data in UK Biobank, where each score represents a genetic proxy measure of gene expression. These scores were then tested for associations with two diffusion tensor imaging measures, fractional anisotropy (NFA = 14,518) and mean diffusivity (NMD = 14,485), representing white matter structural integrity. We found FDR-corrected significant associations between 8 eQTL scores and structural connectivity phenotypes, including global and regional measures (βabsolute FA = 0.0339–0.0453; MD = 0.0308–0.0381) and individual tracts (βabsolute FA = 0.0320–0.0561; MD = 0.0295–0.0480). The loci within these eQTL scores have been reported to regulate expression of genes involved in various brain-related processes and disorders, such as neurite outgrowth and Parkinson’s disease (DCAKD, SLC35A4, SEC14L4, SRA1, NMT1, CPNE1, PLEKHM1, UBE3C). Our findings indicate that eQTL scores are associated with measures of in vivo brain connectivity and provide novel information not previously found by conventional genome-wide association studies. Although the role of expression of these genes regarding white matter microstructural integrity is not yet clear, these results suggest it may be possible, in future, to map potential trait- and disease-associated eQTL to in vivo brain connectivity and better understand the mechanisms of psychiatric disorders and brain traits, and their associated imaging findings.


Introduction
Expression quantitative trait loci (eQTL) are genetic variants which are proximally (cis) or distally (trans) associated with variation in the expression of genes 1 . Previous animal and human studies have found that changes in gene expression lead to phenotypic variation, including adaptive phenotypic changes and evolutionary developments. In humans, for instance, cis-regulatory mutations lead to differences in lactase (LCT) gene expression, resulting in lactase persistence in adulthood 2 . With respect to psychiatric disorders, major depressive disorder (MDD) and bipolar disorder have been associated with decreased expression of prodynorphin messenger RNA (mRNA), which is involved in regulation of mood and expressed in limbic-related areas within the brain (e.g., amygdala, hippocampus) [3][4][5] . These findings indicate the importance of cis-regulatory mutations to trait variance.
Variation in gene regulation leads to differences in individual phenotypes, indicating that eQTL may play a role in susceptibility to disease 6,7 . To test this hypothesis, methods which combine gene expression data with genome-wide association studies (GWAS) summary statistics have been developed. These approaches may provide further insight into the potential causal pathways and genes involved in specific disorders, or predict the regulatory roles of single nucleotide polymorphisms (SNPs) in linkage disequilibrium (LD) with previously associated variants 8 . Previous studies have found that genetic variation may explain some of the variance in levels of gene expression in human tissues, including post-mortem brain tissue [9][10][11][12] . In one such study, Zou et al. 13 conducted an expression genome-wide association study (eGWAS) on post-mortem brains of individuals with Alzheimer's disease (AD) and other brain pathologies (non-AD; including progressive supranuclear palsy). They found 2980 cisSNPs associated with both AD and non-AD conditions. By investigating brain eQTL in post-mortem tissue therefore, researchers have been able to discover associations between gene expression and disease states in the brain. Using brain tissue in order to investigate gene expression levels is however problematic, due to limitations such as small sample sizes and possible expression level differences in post-mortem versus ante-mortem brains 14 . As such, alternative approaches have therefore been investigated. One such approach is using eQTL measured from whole blood gene expression as a proxy for brain gene expression; an approach supported by important benefits such as greater sample size and easier accessibility 15 . Although it is recommended that wherever possible gene expression levels should be measured in a tissue-specific manner, considerable overlap has been demonstrated between blood and brain eQTL, supporting the validity of the approach 14 .
Neuroimaging measures provide a novel opportunity to investigate whether eQTL are significantly associated with in vivo brain phenotypes, and thereby increasing our knowledge of the role of eQTL in the wider context of psychiatric disorders. White matter microstructure, as measured by diffusion tensor imaging (DTI), is consistently heritable across tracts [16][17][18] and is compromised in several psychiatric disorders. Generally, decreased microstructural integrity of white matter is characterised by lower directionality of water molecule diffusion (reduced fractional anisotropy, FA) and less constrained water molecule diffusion (increased mean diffusivity, MD). Consistent findings across studies have indicated higher MD and lower FA in individuals suffering from MDD, for example 19,20 . Investigating the regulatory loci associated with white matter microstructure in health and disease may aid in the detection of molecular mechanisms influencing disease through aberrant structural brain connectivity.
Within the current study, we derived eQTL scores based on two well-powered whole-blood eQTL studies 21,22 . We then used GENOSCORES, a database of filtered summary statistics of publicly-available GWAS covering multiple phenotypes, including gene expression, to calculate eQTL scores (https://pm2.phs.ed.ac.uk/ genoscores/).
The resultant eQTL-based genetic scores can be considered proxies for the expression of particular genes, which can then be tested for association with traits of interest. Here, we analysed their association with white matter microstructure as measured by FA and MD in UK Biobank. We used participants from the October 2018 UK Biobank neuroimaging release (N FA = 14,518; N MD = 14,485). The purpose of the study was to utilise a novel approach to investigate associations between regulatory SNPs and white matter microstructure. This approach could lead to further specialised investigation into psychiatric and neurological disorders, as well as other brainrelated traits, such as cognition and behaviour.

UK Biobank (UKB)
UK Biobank is a health resource aiming to prevent, diagnose and treat numerous disorders. It is comprised of 502,617 individuals whose genetic and environmental data (e.g., lifestyle, medications) were collected between 2006 and 2010 in the United Kingdom (http://www. ukbiobank.ac.uk/). UKB received ethical approval from the Research Ethics Committee (reference: 11/NW/0382). This study has been approved by the UKB Access Committee (Project #4844). Written informed consent was obtained from all participants.

Study population
In the current study, individuals were excluded if they participated in studies such as the Psychiatric Genomics Consortium ( 51 ; PGC) MDD GWAS or Generation Scotland ( 52 ; Scottish Family Health Study), or if the individuals were biologically related to another participant, to remove overlap of genetic samples. For the brain imaging sample, a quality check performed by UK Biobank ensured that no abnormal scans were included in subsequent analyses 23 . We additionally excluded individuals whose global measures for FA and MD lay more than three standard deviations from the sample mean 19 25 and Bycroft et al. 26 .
From GENOSCORES, we used eQTL analysis summary statistics from two studies of whole-blood eQTL 21,22 . Briefly, Gusev et al. 21 developed a novel approach aimed at identifying associations between gene expression and complex traits in cases where gene expression level is not directly measured. These authors reported eQTL based on a sample of 1414 individuals with whole-blood expression measured using the Illumina HumanHT-12 version 4 Expression BeadChip. Westra et al. 22 performed a large eQTL meta-analysis in 5311 samples across 7 studies from peripheral blood, with gene expression measured using Illumina whole-genome Expression BeadChips (HT12v3, HT12v4 or H8v2 arrays). Their aim was to investigate the magnitude of the effect of cis and trans SNPs on gene expression, as well as to observe whether mapping eQTL in peripheral blood could uncover biological pathways associated with complex traits and disease. Further details of data acquisition and protocols are described in more detail in the two studies 21,22 .
We computed a total of 10,884 eQTL scores (N Gusev study = 3801; N Westra study = 7083) for individuals included in the imaging sample (N FA : 14,518; N MD : 14,485) from the SNPs found in GENOSCORES, using a p-value threshold of 1 × 10 −5 (0.00001). We then excluded overlapping eQTL scores between the two studies (i.e., scores for which SNPs affect expression of the same gene in both studies) by only including the score where a SNP had the lowest p-value, i.e., most significant association. The final eQTL score list was 6457 (N Gusev study = 3286; N Westra study = 3171). These scores were used as input variables in subsequent statistical analyses. Figure S1 in Supplementary Materials provides a summary of the score derivation process.
Briefly, eQTL scores were computed as a sum of the genotypes for an individual (g, scored as 0, 1, 2 copies of the reference allele) weighted by the effect size estimate (βt) for the trait of interest t. In order to adjust for LD, vector βt was premultiplied by the generalised inverse of the SNP-SNP correlation matrix R estimated from the 1000 Genomes reference panel, limited to the individuals with European ancestry.
The formula to compute the eQTL score for trait t for an individual (i) is therefore:

Magnetic resonance imaging (MRI) acquisition
In the current study, imaging-derived phenotypes (IDPs) produced by UKB were used. MRI acquisition and pre-processing procedures for white matter tracts were performed by UKB using standardised protocols (https:// biobank.ctsu.ox.ac.uk/crystal/docs/brain_mri.pdf). Briefly, images were acquired in Manchester (N FA = 12,248; N MD = 12,221) and Newcastle (N FA = 2270; N MD = 2264) on a standard Siemens Skyra 3 T scanner with a 32channel radio-frequency (RF) receive head coil and later pre-processed using the FMRIB Software Library (FSL), and parcellation of white matter tracts was conducted using AutoPtx 23 . Individual white matter tracts belonging to each tract category can be observed in Table S1 in Supplementary Materials.
Owing to the fact that head position and RF coil in the scanner may affect data quality and subsequent pre-processing, three scanner brain position variables were also generated by UKB, with the aim of being used as confounding variables in subsequent analyses. These are lateral brain position-X (http://biobank.ctsu.ox.ac.uk/ crystal/field.cgi?id = 25756), transverse brain position-Y (http://biobank.ctsu.ox.ac.uk/crystal/field.cgi?id = 25757) and longitudinal brain position-Z (http://biobank.ctsu. ox.ac.uk/crystal/field.cgi?id = 25758). The three variables were included as covariates in the statistical analysis described below.

Statistical methods
All analyses were conducted using R (version 3.2.3) in a Linux environment. The R code for the current analyses is available in Supplementary Materials, section 3. For each white matter tract, we used generalized linear mixed models (function "lme" in package "nlme") for bilateral brain regions, which were included as dependent variables. The eQTL scores were included as independent variables separately in each model, with additional covariates: age, age 2 , sex, fifteen genetic principal components to control for population stratification, three MRI head position coordinates, MRI site and genotype array, while hemisphere was included as a within-subject variable. For unilateral tracts, as well as global measures and white matter tract categories of FA and MD, also included in the models as dependent variables, we used a general linear model (function "lm"), using the same covariates as above, without hemisphere included as a separate term, and again including the eQTL scores as independent variables separately in each model.
For global measures and white matter tract categories of FA and MD, we applied principal component analysis (PCA) on the white matter tracts of interest (all 27 for global measures; 12 for association fibres; 6 for thalamic radiations; 9 for projection fibres) in order to extract a latent measure. Scores of the first unrotated component were extracted and set as dependent variables in general linear models. False discovery rate (FDR) correction using the "p.adjust" function in R (q < 0.05) was applied across the eQTL scores and the individual white matter tracts (N tests = 98,855), and separately across eQTL scores and global and tract categories (N tests = 25,828).
In addition, we included quadratic terms for each eQTL score in linear models to test non-linear relationships between the eQTL scores and global and regional measures of white matter microstructure. Analyses included all covariates mentioned above, with the only addition being the quadratic term of each eQTL score included alongside the original score. Results depicting the proportion of variance explained by the quadratic terms are included in Tables S2-S3 and Figs. S2-S3 in the Supplementary Materials.
Finally, for the top significant findings in the manuscript, we ran linear models where there were overlapping genes between the two discovery datasets 21,22 (i.e. eQTL scores that regulate the expression of the same gene in both Gusev et al. and Westra et al.), and report results from both datasets for comparison (Tables S4-S12 in Supplementary Materials).

Results
There were several eQTL scores that showed significant associations with a number of global measures, tract categories, and white matter tracts post FDR correction (q < 0.05); Table 1; Figs. 1 (a, b) and 2 (a, b); Tables S13-S20 in Supplementary Materials). In total, 25 scores were significantly associated with FA values (β absolute = 0.032-0.056) and 24 scores with MD values (β absolute = 0.029-0.048) in several tracts (see Tables  S21-S22 and Figs. S4-S5 in Supplementary Materials). Among these scores, 8 were associated with white matter tracts measured by both FA and MD. The primary findings reported in this manuscript focus on these 8 overlapping scores, as these were considered to provide the most consistent information with regards to gene expression within white matter tracts as measured by two different DTI scalars (see Tables 2 and 3), further findings are presented in Tables S23-S24 in Supplementary Materials. The Allen Brain Atlas 27 was used to investigate gene expression patterns across brain regions in 6 donors. Figs. S6-S8 and Table S25 in Supplementary Materials provide a summary of each donor's individual gene expression level as well as a mean gene expression level across participants in a number of brain structures.  39,40 The effect of the 8 scores on FA measures of white matter microstructure Table 2.
The effect of the 8 scores on MD measures of white matter microstructure Table 3.
Genome-wide associations between score SNPs and white matter tracts Using a previously published GWAS of imaging traits 28 , we next sought to observe the association between the SNPs comprising each of the 8 scores (N total = 53; SNP list can be found in Table S26 in Supplementary Materials) with those found previously for the white matter tracts of interest (i.e., the tracts which showed post-FDR significant associations). This SNP look-up was performed in order to observe whether our analysis of eQTL scores, comprising SNPs which together regulate the expression of a single gene, yielded any novel associations with white matter tracts which were not previously found in conventional GWAS.
We used the Brain Imaging Genetics (BIG) database (http://big.stats.ox.ac.uk/) to extract the effect size and pvalue of each SNP of interest as associated with the white matter tracts, as provided in Elliott et al. 28 . As GWAS for Fig. 1 a, b Indicates nominal p-values between each of the 8 scores (shown in legend entitled "eQTL score") and global and tract category measures (noted on the x-axis; FA = fractional anisotropy (Fig. 1a); MD = mean diffusivity (Fig. 1b)). All values in the figure met FDR correction. Two of the scores with an additional line around the points (CPNE1 and NMT1) had an effect size in the opposite direction to all other scores (also indicated by -β for MD in figure legend). The colours of the plot points indicate the score to which they belong. Magnitude of effect is shown in the legend entitled "Effect size (absolute value)". global and tract category measures were not performed in the original study 28 , we performed these GWAS as part of the current project (i.e., GWAS for global measures, association fibres, thalamic radiations and projection fibres). Our GWAS parameters and quality check procedures are described in more detail in Supplementary Materials, section 11. p-Values and effect size of each SNP for each individual white matter tract of interest (left and right hemispheres separately from Elliott et al. 28 ), as well as for global and tract categories (run locally), are also contained in Figure S9 in Supplementary Materials. Briefly, only one SNP across three eQTL scores was previously found to reach genome-wide significance with forceps minor (FA) 28 , projection fibres (FA) and global FA (GWAS run locally): rs2237077.

Discussion
In this study, we utilised a novel approach to investigate whether eQTL scores corresponding to the expression of specific genes in whole blood were significantly and specifically associated with white matter tracts in N > 14,000 individuals. We found significant associations in white matter microstructure as measured by both FA and MD for a number of scores (FA N scores = 25; MD N scores = 24). Of these, 8 scores were found to be significantly associated with various white matter tracts as measured by   both FA and MD. In particular, the largest effect was seen for the association between forceps minor (FA) and the eQTL score for SLC35A4, and across several tracts measured by MD for the eQTL score for DCAKD. Although these eQTL were derived from whole blood, there is evidence of expression in the brain for some of the genes, outlined in further detail below. These findings also provided novel information not previously found by conventional genome-wide association studies. All 8 scores were associated with white matter microstructural integrity of the forceps minor as measured by FA (7 of which were also associated with MD values). The forceps minor forms the anterior part of the corpus callosum, connecting homologous regions of the prefrontal cortex between hemispheres. It is postulated to be involved in numerous cognitive and behavioural skills, such as decision making, social behaviour, and language 29 . This connection therefore implicates forceps minor in a wide range of cognitive skills, and damage to the tract has been associated with neuropsychiatric and neurological disorders, such as multiple sclerosis and depression 30,31 .

Global and individual tract findings-largest associations
The two genes with the largest associations were DCAKD, globally and across numerous tracts as measured by higher MD, and SCL35A4 across tracts measured by lower FA, with a peak in projection fibres, localised to forceps minor. DCAKD is a protein coding gene which is ubiquitously expressed in brain, among other tissues 32 . Previous evidence using mouse models indicates expression of this gene has a putative role in neurodevelopment 33 , and is associated with a number of psychiatric and neurological disorders, including schizophrenia, autism spectrum disorder, and Parkinson's disease 32,34,35 . Evidence for involvement in autism spectrum disorder comes from Butler et al. 35 , who compiled a list of clinically relevant genes for the disorder, with DCAKD among the participating susceptibility genes. Expression of DCAKD was also found to be implicated in Parkinson's disease 32 , a disorder previously associated with lower white matter integrity in tracts within the temporal, parietal and occipital lobes 36 .
SLC35A4 belongs to the SLC35 family, members of which act as transporters of nucleotide sugars, and is known to be expressed in brain 37 . There is limited knowledge about its specific function, although a recent review investigating the subcellular localisation and topology of SLC35A4 demonstrated that it localizes mainly to the Golgi apparatus 37 .

Disease-linked genes-lower FA and higher MD (decreased white matter integrity)
Four genes (SRA1, UBE3C, SEC14L4, PLEKHM1) were associated with lower FA within several individual tracts pertaining to projection and association fibres, as well as with higher global MD. SRA1 encodes both non-coding and protein-coding RNAs, is implicated in the regulation of numerous nuclear receptor activities, such as metabolism and chromatin organisation, and is known to be expressed in the brain. Kotan et al. 38 posited that SRA1 plays a role in the initiation of puberty in humans by finding that inactivating SRA1 variants were associated with idiopathic hypogonadotropic hypogonadism (IHH) in three independent families. IHH is a rare genetic disorder caused by the inability of the hypothalamus to secrete gonadotropin-releasing hormones (GnRH) or by the inability of GnRH to act on pituitary gonadotropes 39 . These previous results might link the association of SRA1 with projection fibres, which connect the cerebral cortex to the spinal cord and brainstem, as well as to other centres of the brain (e.g., thalamus).
UBE3C contains ubiquitin-protein ligase (E3), an enzyme which accepts ubiquitin from E2 before transferring it to the target lysine; ubiquitin targets proteins for degradation via the proteasome. UBE3C is expressed in numerous tissues, including the brain, and has been previously associated with some neuropsychiatric-related phenotypes. For instance, Garriock et al. 40 performed a GWAS to determine the association between genetic variation and Citalopram response. Although not genome-wide significant, their top finding was a SNP in proximity to UBE3C and was found to be associated with antidepressant response and MDD remission (rs6966038,  41 studied the expression of genes within the ubiquitin-proteasome protein degradation system, which is implicated in Parkinson's disease, in mice with MPTPinduced pre-symptomatic and early symptomatic stages of Parkinson's disease. They found decreased expression in the striatum and the substantia nigra of mice, which may lead to a decrease in performance of the system. This may in turn lead to accumulation of abnormal and toxic proteins which guide neuronal cell death 41 .
The specific function of SEC14L4 has not yet been determined, although the protein encoded by it is similar to a protein encoded by the SEC14 gene in saccharomyces cerevisiae, which is essential to the biogenesis of Golgiderived transport vesicles. Curwin and McMaster 42 found that mutations in several SEC14 domain-containing proteins in humans may be implicated in neurodegeneration, although it is not clear what the role of SEC14L4 is within this context. Lastly, PLEKHM1 is important in bone resorption, may be involved in vesicular transport in the osteoclast, and is weakly expressed in the brain. Although mutations in this gene have been associated with numerous phenotypes 43,44 , none were neuropsychiatricrelated.

Development-linked genes-higher FA and lower MD (increased white matter integrity)
For two of the eight genes (CPNE1, NMT1) we found higher FA and lower MD, indicating increased white matter integrity, associated with increased expression level as quantified by the corresponding eQTL.
CPNE1, which is thought to regulate molecular events at the cell membrane and cytoplasm, has previously been found to mediate several neuronal differentiation processes by interacting with intracellular signalling molecules. CPNE1 has also been found to be highly expressed during brain development, indicating that it might be implicated in earlier developmental stages of neuronal function 45 . Furthermore, C2 domains of CPNE1, calciumdependent phospholipid-binding motives, have been shown to be implicated in neurite outgrowth of hippocampal progenitor HiB5 cells, which are hippocampal cell lines derived from the hippocampal analgen of E16 rat 46,47 . We provide evidence that CPNE1 expression is associated with two tracts within projection fibres (FA) and with regional association fibres (MD), which link the cortex to lower brain areas. In mouse and human models, these findings may be of use when investigating neurite outgrowth from the hippocampus, which is part of the limbic system, an area located beneath the cortex.
NMT1 (N-myristoyltransferase) catalyzes the transfer of myristate (a rare 14-carbon saturated fatty acid) from CoA to proteins, and is expressed in numerous tissues, including ubiquitously in the brain. It has been found that NMT1 is required for early mouse development, mainly due to its role in early embryogenesis 48 . Expression of this gene has also been implicated in human brain tumours 49 and tumour cell proliferation 50 . In our study, we found NMT1 to be associated with tracts within thalamic radiations and projection fibres (FA) and global MD.

General discussion
In our study, we employed a novel strategy of investigating a direct association between eQTL scores and white matter tracts to uncover a relationship between specific regulatory variants and brain connectivity. Together, our findings indicate that increases in expression of these genes may be implicated in several processes which may directly or indirectly alter white matter microstructure, each with localised, pronounced effects in specific tracts. Further, while some of the significant associations had connections with other brain-related traits, such as neurite outgrowth or psychiatric and neurological disorders, others did not. Interestingly, decreased white matter microstructure integrity, as marked by lower FA and higher MD, was associated with eQTL scores which regulate expression of genes implicated in neuropsychiatric and neurological disorders. Conversely, increased white matter integrity, as marked by higher FA and lower MD, was associated with CPNE1 and NMT1, which are important in developmental processes such as neurite outgrowth. In addition, encouragingly, regions of the corpus callosum (i.e., the forceps minor), the largest and arguably most reliably measured white matter tract in the brain, was demonstrated to be associated with all 8 scores for FA, and 7 for MD. These findings together suggest that utilising this approach to associate eQTL scores with white matter microstructure may add to previous research which found associations between genes and these brain-related traits and disorders. These genes or eQTL for them might indirectly implicate brain connectivity through other processes in which they participate.
The current study has several strengths and some potential limitations. First, to our knowledge, this study is the first one to compute eQTL scores for specific gene transcripts and attempt to associate them with white matter tracts. Moreover, our analysis consisted of a population-based sample of N > 14,000 individuals recruited to the UKB, large enough to make our findings robust and generalisable to other samples within the same age range, background and ethnicity. Lastly, our findings revealed novel associations which were not previously found in GWAS ( 28 ; GWAS of g measures run locally), indicating a potential to use such scores for further discovery analyses.
However, a potential limitation in this study is calculation of scores for data taken from whole blood.
Although we note previous evidence indicates that whole blood can be used as a proxy for brain eQTL, important for study of in vivo brain traits 14 .
In summary, our results suggest that expression of the genes discussed above alter white matter microstructure and could facilitate the manifestation of numerous brainrelated traits. Uncovering specific markers leading to the formation, maintenance and pathology of white matter could enable downstream analyses to elucidate links between genetics and neuroimaging in neurological and psychiatric disorders, as well as other brain-related traits.