Novel genetic variants associated with brain functional networks in 18,445 adults from the UK Biobank

Here, we investigated the genetics of weighted functional brain network graph theory measures from 18,445 participants of the UK Biobank (44–80 years). The eighteen measures studied showed low heritability (mean h2SNP = 0.12) and were highly genetically correlated. One genome-wide significant locus was associated with strength of somatomotor and limbic networks. These intergenic variants were located near the PAX8 gene on chromosome 2. Gene-based analyses identified five significantly associated genes for five of the network measures, which have been implicated in sleep duration, neuronal differentiation/development, cancer, and susceptibility to neurodegenerative diseases. Further analysis found that somatomotor network strength was phenotypically associated with sleep duration and insomnia. Single nucleotide polymorphism (SNP) and gene level associations with functional network measures were identified, which may help uncover novel biological pathways relevant to human brain functional network integrity and related disorders that affect it.

. Schematic representation of brain network construction using graph theory analysis. After preprocessing, the brain was divided into different parcels using the Schaefer et al. (2018) parcellation scheme. Subsequently, activity time course was extracted from each region to create the correlation matrix. We used the correlation matrix and removed all the self-connected and negative weights to derive a corresponding weighted undirected brain network matrix and functional brain network graph. Lastly, we used the network matrix to calculate the sets of topological graph theory measures. www.nature.com/scientificreports/ ures, except for strength of the visual network, were significantly heritable (p < 0.05), ranging from h 2 SNP = 0.07 for local efficiency of visual network to h 2 SNP = 0.17 for the strength of limbic network, with a mean h 2 SNP of 0.12 (Supplementary Table 3; Fig. 2A). Notably, higher h 2 SNP estimates (0.11-0.17) were observed for global efficiency, characteristic path length, transitivity, and strength of limbic, somatomotor, default, salience, and frontoparietal networks compared to the other measures. In addition, genetic and environmental correlations were examined. Strong genetic correlations between the network measures were observed. All measures were positively associated with each other with the exception of Louvain modularity and characteristic path length being negatively correlated with all other measures (Supplementary Table 4; Fig. 2B).
Genome-wide association study. GWAS of the rs-fMRI data for each individual graph theory measure (n = 18) were carried out using an additive genetic model adjusted for age, age 2 , sex, age × sex, age 2 × sex, head motion from resting-state fMRI, head position, volumetric scaling factor needed to normalize for head size, genotyping array and 10 genetic principal components.
At the genome-wide significance level of p < 5 × 10 -8 (unadjusted for the number of measures assessed), there were 31 SNPs significantly associated with nine of the 18 graph theory measures namely, global efficiency, characteristic path length, Louvain modularity, local efficiency of default and somatomotor networks, strength of default, limbic, salience, and somatomotor networks (Supplementary Table 5). Supplementary Fig. 2 shows the Manhattan and quantile-quantile plots of all the 18 graph theory measures. However, after adjusting for the number of independent tests using this method by Nyholt and colleagues 16 (n = 6 tests for this study; p-threshold = 5 × 10 -8 /6 = 8.33 × 10 -9 ), only fourteen variants from a single locus remained significant with strength of the somatomotor network (lead SNP: rs12616641), and one of which also was significant for strength of the limbic network (rs62158161) ( Table 1; Fig. 3A,B). All SNPs were located in an intergenic region near PAX8 (Paired box gene 8) on chromosome 2 (BP 114065390-114092549) (Fig. 3C).
The conditional and joint association (COJO) analysis using the GWAS summary data of the network measures did not identify any additional SNPs apart from the top associated within each locus (Supplementary Table 5). There is high linkage disequilibrium in the genomic region (114065390-114110568) on chromosome 2 ( Supplementary Fig. 3).

Multivariate association test.
Since all network measures were highly correlated, to increase power a multivariate analysis using GWAS summary statistics from multiple network measures was performed. Based on their network properties, pooled summary statistics as implemented in metaUSAT 17 were obtained for (i) global efficiency and characteristic path length; (ii) modularity and transitivity; (iii) local efficiency of all networks; and (iv) strength across all networks. Multivariate analysis of the combined strength network measures yielded 31 significant associations at p < 1.25 × 10 -8 (adjusted for four tests). Out of this, 23 were found in GWAS of somatomotor strength measure reported in supplementary table 5 and the remaining 8 SNPs were found for combined strength measure (Table 2). No additional hits were found in the multivariate analysis of the other three grouped measures.

Functional annotations.
We assessed the potential functions of the 39 significant SNPs from the GWAS (n = 31) and multivariate association tests (n = 8) in SNPnexus 19 . The results showed that many of the network associated variants were associated with other phenotypes including sleep patterns, psychiatric disorders, coronary artery disease, cholesterol, and blood pressure (Supplementary Tables 7 and 8). None of the variants were found to be eQTLs.
Gene expression enrichment analysis of the list of significant genes in our results was subsequently performed using the Functional Mapping and Annotation (FUMA) platform 20 . They were expressed in the brain but also across other tissue types ( Supplementary Fig. 4). Moreover, there was no enrichment in brain tissue types (Supplementary Table 9).
Genetic correlations with other traits. Given that the most robust GWAS result was observed for the strength of somatomotor network, we examined its genetic correlations with other traits via linkage disequilibrium (LD) score regression (LDSC) 21 (Supplementary Table 10). Results showed that strength of somatomotor network was genetically correlated with other traits, such as nervous feelings, sleep traits, neuroticism, depressive symptoms, blood pressure, education (unadjusted p ≤ 0.05). However, none of them passed multiple comparisons correction.
Correlations with other associated traits. As the SNPs in our study have been associated with sleep and insomnia in previous GWAS studies (Supplementary Table 7), we explored the phenotypic correlations between the graph theory and sleep-related measures in the UK Biobank data (Supplementary Tables 11 and  12). Self-reported sleep duration and insomnia were significantly associated with the graph theory measures. www.nature.com/scientificreports/  www.nature.com/scientificreports/ Strength of somatomotor network showed the most significant association with sleep duration (p = 8.33 × 10 -11 ) and insomnia (p = 0.0019).

Discussion
Functional graph theory measures reflect the underlying functional topography of the brain. To date, this is the first study investigating the genetics of weighted functional graph theory measures. We present h 2 SNP estimates and results from GWAS of graph theory measures using resting-state fMRI data from 18,445 UK Biobank participants. We identified significant SNPs and gene associations that survived multiple correction testing with six of the 18 graph theory measures including global efficiency, characteristic path length, and strength of default, dorsal attention, limbic, and somatomotor networks. The novel contributions of this paper are the identification of new genetic associations at the variant, locus, and gene levels, providing insights into the genetic architecture of graph theory metrics using resting-state fMRI data.
Similar to the study by Elliot and colleagues 10 who showed that resting-state fMRI connectivity edges had the lowest levels of h 2 SNP , we found low h 2 SNP across all the graph theory measures. This is in contrast to previous classical twin study design studies that have shown moderate to high heritability of 0.52 to 0.64 for global efficiency, 0.47 to 0.61 for mean clustering coefficient, and 0.38 to 0.59 for modularity 8 . One of the plausible explanations is that h 2 SNP typically provides smaller heritability estimates due to uncaptured rare genetic variants compared to those provided by the classic twin study design 10 . High genetic correlations observed between graph theory measures may be due to the high phenotypic correlations between the measures.
The GWAS results found that strength of limbic and somatomotor networks were associated with SNPs located in an intergenic region near PAX8 (Paired box gene 8) on chromosome 2, which is the closest gene to the top SNP. The PAX gene family encodes transcription factors which are essential during development and tissue homeostasis 22 . Specifically, PAX8 protein is considered as a master regulator for key cellular processes in DNA repair, replication, and metabolism 23 . It has also been shown to regulate several genes involved in the production of thyroid hormone 24 , essential for brain development and function such as neuronal differentiation, synaptogenesis, and dendritic proliferation 25,26 . A previous study has also linked reductions in intrinsic functional connectivity in the somatomotor network to participants with subclinical hypothyroidism compared to controls 27 . Interestingly, studies have also found subclinical thyroid dysfunction to be associated with sleep quality 28,29 . Other genes located in this intergenic region near PAX8 include CBWD2 and CHCHD5, which have also been associated with sleep duration in prior studies 30,31 . Considering that we also observed phenotypic correlations between sleep duration, insomnia and graph theory measures, it is possible that variants in or near PAX8 and other genes in this region may play a role in the regulation of genes associated with functional brain network properties and sleep.
Results from multivariate SNP-based analyses from combined network strength measures found associations with SNPs on chromosome 2, which were associated with inflammation and oncogenesis. The majority of these SNPs were located in intergenic regions close to non-coding RNA genes. LINC01826 (long intergenic non-protein coding RNA 1826) has been associated with inflammatory responses of the vascular endothelial cells, which Table 1. GWAS genome-wide significant results for brain functional network measures. Linear regression models were adjusted for age, age 2 , sex, age × sex, age 2 × sex, head motion from resting-state fMRI, head position, volumetric scaling factor needed to normalize for head size, and 10 genetic principal components. P values are two-tailed. Only results that survived Bonferroni correction (p < 5 × 10 -8 /6) were reported here. *rs12616641 was the lead SNP. Abbreviations: SNP, single nucleotide polymorphism; Chr, chromosome; A1, coded allele; A2, non-coded allele; EAF, effect allele frequency; β, beta; SE, standard error. www.nature.com/scientificreports/  www.nature.com/scientificreports/ are important in the development of cardio-cerebrovascular diseases 32 . MIR3679 (microRNA 3679) is a short non-coding RNA involved in post-transcriptional regulation of gene expression, which has been postulated to function as a tumor suppressor due to lower levels observed in patients with diffuse glioma than controls 33 . LINC01101 (long intergenic non-protein coding RNA 1101), on the other hand, has been down-regulated in cervical cancer 34 . The identified genes may contribute to understanding the relationship between strength of brain networks and disease. Gene-based association analysis showed associations with genes involved in neuronal differentiation/ development, cancer, and susceptibility to neurodegenerative diseases. ZEB1 has been implicated in neuronal glioblastoma 35 whereas SLC25A33 has been associated with insulin/insulin-like growth factor 1 (IGF-1) necessary for metabolism, cell growth, and survival 36 and showed higher expression in transformed fibroblasts and cancer cell lines compared to non-transformed cells 36,37 . TMEM201 mice, which undergo accelerated senescence, exhibited an early onset age-related decline in antibody response and have a higher rate of mortality 38 . ATXN2 belongs to a class of genes associated with microsatellite-expansion diseases, where an interrupted CAG repeat expansion has been associated with brain-related diseases including spinocerebellar ataxia type 2 (SCA2), frontotemporal lobar degeneration (FTLD), and amyotrophic lateral sclerosis (ALS) 39,40 . A neighbouring gene SH2B3 to ATXN2 has also been implicated in increased ALS risk 40 . Consistent with previous studies that identified functional graph theory measures that were associated with the most common FTLD (behavioral variant of frontotemporal dementia) 41,42 , we observed similar graph theory measures to be associated with the ATXN2 gene. This implies that ATXN2 may be involved in the relationship between the disruption of brain networks and neurological/neuromuscular disorders.
In addition, we also observed that brain functional networks are associated with self-reported sleep traits. Consistent with a previous study showing that individuals with chronic insomnia also showed disrupted global and local properties of the brain involving networks such as default mode, dorsal attention, and sensory-motor 43 , we found that insomnia was most significantly associated with decreased strength of somatomotor network.  www.nature.com/scientificreports/ Therefore, it seems that changes in sleep quality may have a profound impact on the brain functional networks. Given that we observed genes associated with these networks, it is possible that altered networks may be driving the sleep abnormality. The reverse is also possible, with sleep disturbance influencing the activity in the networks. The directionality of the relationship between brain networks and sleep traits may be determined using longitudinal data in future studies. The strengths of this study include the well characterized sample and its large sample size, and uniform MRI methods. The results, however, should be interpreted with caution. While using weighted undirected matrix circumvents issues surrounding filtering/thresholding the connectivity matrix to maintain significant edge weights represented in a binary matrix, there are inherent difficulties associated with the interpretation of the results. As brain signals recorded from resting-state fMRI are typically noisy, it is possible that edge weights may be affected by non-neural contributions 44 . Despite this, with careful denoising of the resting-state fMRI data 45,46 and covarying for motion, it is possible to minimize the noise in the data. In addition, previous studies have suggested that stronger edge weights make greater contributions in the computation of graph metrics than lower weight connections 47,48 . This implies that when evaluating weighted graphs, false positive connections based on lower correlations may have a less disruptive impact on the network topology 49 . Given that the brain is a complex system with hierarchical network structure, studying weighted networks, as was done in this study, may provide a more holistic representation of the brain functional network. Future studies may benefit from investigating the genetic effects between binarized and weighted graph theory metrics. Moreover, given the high correlations between the graph theory measures suggest that these measures may account for the same phenotype, or at least almost account for the same variability in the population, performing GWAS on each of these measures may not be necessary. However, these measures have shown to affect different disease states in older age. Moreover, this is an exploratory study and the findings presented are purely correlative. Despite so, the findings from the paper provide a new preliminary support for combining resting-state fMRI, graph theory methods, and GWAS to identify genetic variants associated with the various graph theory measures. It may also shed light to direct future studies in determining what graph theory imaging phenotypes should be included. In order to further understand the mechanisms about brain function at a more basic level, it may be useful for future studies to use developmental and adult human brain gene expression data to associate the expression of a single gene or genes with specific graph theory measures and phenotypes. Furthermore, given that the UK Biobank has recently released additional data, replication may be possible in the future.
In summary, this is the first study to investigate the genetics of weighted functional graph theory measures in a large and well characterized cohort. We observed multiple SNPs and genes associated with weighted graph theory measures, which have been observed to be implicated in sleep duration, neuronal differentiation/development, cancer, and susceptibility to neurodegenerative diseases. Our findings may help in the identification of novel biological pathways relevant to human brain functional network integrity and disease. Image processing. Briefly, the UK Biobank structural T1-weighted MRI scans were acquired on three 3 T Siemens Skyra MRI scanners (software platform VD13) at three sites (Reading, Newcastle, and Manchester) using a 32-channel receiving head coil and a 3D MPRAGE protocol (1.0 × 1.0 × 1.0 mm resolution, matrix 208 × 256 × 256, inversion time (TI)/repetition time (TR) = 880/2,000 ms, in-plane acceleration 2). An extensive overview of the data acquisition protocols and image processing carried out on behalf of the UK Biobank can be found elsewhere 51 . Rs-fMRI data previously pre-processed by the UK Biobank 51 were used. Briefly, motion correction, intensity normalization, highpass temporal filtering, echo-planar imaging (EPI) unwarping, and gradient distortion correction were performed. Subsequently, structured artefacts were removed by ICA + FIX processing (Independent component analysis followed by FMRIB's ICA-based X-noiseifier) [52][53][54] . We removed participants with motion of > 2 mm/degrees of translation/rotation. After image quality control and removal of participants with head motion outliers, we excluded 1,626 participants and 18,972 participants remained.

Study population.
Graph theory analyses. The Schaefer atlas 15 was used as it met the following requirements: (1) it integrated both local gradient and global similarity approaches (i.e. Markov Random Field model) for parcellation, which showed greater homogeneity than four other previously published parcellations implying that it will not overestimate local connectivity of the regions; (2) it revealed neurobiologically meaningful features of the brain organization; and (3) it was based of the Yeo 7-parcels atlas 55 and provided a more fine-grained parcellation, which allowed us to look at an average of the parcels across the 7 networks for local efficiency and nodal strength. 3dNetCorr command from Analysis of Functional Neuroimages (AFNI) 56 was used to produce network adjacency matrix for each participant. The mean time-series for each region was correlated with the mean timeseries for all other regions and extracted for each participant. Partial correlation, r, between all pairs of signals was computed to form a 400-by-400 (Schaefer atlas) connectivity matrix, which was then Fisher z-transformed. Self-connections and negative correlations were set to zero. The main network analysis was performed on positive weighted networks. Given that connection weights in brain networks can vary across magnitude, undirected www.nature.com/scientificreports/ weighted connectivity matrices were used instead of binary connectivity matrices. The higher the weight, the stronger the functional connectivity is between the brain regions 44 . All graph theory measures were quantified by using the brain connectivity toolbox (BCT) 2 . Global-level measure included global efficiency, characteristic path length, transitivity, and Louvain modularity. A single value is derived for each of these four measures and quantitatively represent the whole brain network. Network level measures, such as local efficiency and strength, were estimated for each node and averaged across the all parcels within each network. Subsequently, we averaged the left-right hemisphere to derive a value for each node and averaged within each network to derive a value for each of the 7 networks.
To assess integration of information, we calculated global efficiency and characteristic path length 2,57 . Global efficiency represents how effectively the information is transmitted at a global level and is the average inverse shortest path length while the latter measures the integrity of the network and how fast and easily information can flow within the network. To assess network segregation, which characterizes the specialized processing of the brain at a local level, we calculated the Louvain modularity, transitivity, and local efficiency indices 2,57 . Louvain modularity is a community detection method, which iteratively transforms the network into a set of communities, each consisting of a group of nodes. Higher modularity values indicate denser within-modular connections but sparer connections between nodes that are in different modules. Transitivity refers to the sum of all the clustering coefficients around each node in the network and is normalized collectively. Local efficiency is a node-specific measure and is defined relative to the sub-graph comprising of the immediate neighbours of a node. Finally, strength (weighted degree) is described as the sum of all neighbouring edge weights 2 . High connectivity strength indicates stronger connectivity between the regions, which provides an estimation of functional importance of each network.
Genotype data in UK Biobank. Genetic data for approximately 500,000 were available and full details on the genetics data used were described previously 58 . The samples were collected from stored blood samples in the UK Biobank and genotyped either using the UK Bileve or the UK biobank axiom array. Genotyping was performed on 33 batches of ~ 4700 samples by Affymetrix (High Wycombe, UK). Further details on the UK Biobank sample pre-processing are available here http:// bioba nk. ctsu. ox. ac. uk/ cryst al/ refer. cgi? id= 155583.
An imputed data set was made available where the UK Biobank interim release was imputed to a reference set that consisted of > 92 million autosomal variants imputed from the Haplotype Reference Consortium (HRC) 59 and UK10K + 1000 Genomes resources reference panels.. After SNP QC filters (MAF > 0.1% and imputation information score > 0.3), 9926107 SNPs were used in the GWAS analysis.
Further quality control measures were performed. Due to the confounds associated with population structure 60 , only samples reported to have recent British ancestry were used in the GWAS analysis. Outliers, including those with high missingness, relatedness, quality control failure, and sex mismatch, were removed. The final UK Biobank sample, after genotyping quality control and including those with rs-fMRI data, was n = 18,445 participants.
Sleep behavioral data in UK Biobank. Self-reported sleep data, namely sleep duration and frequency of insomnia, were used. Sleep duration was recorded as the number of reported hours of sleep in every 24 h and frequency of insomnia was recorded as never/rarely, sometimes, or usually. More details can be found on http:// bioba nk. ndph. ox. ac. uk/ showc ase/ field. cgi? id= 1160 and http:// bioba nk. ndph. ox. ac. uk/ showc ase/ field. cgi? id= 1200.
Potential confounds. In accordance with previous GWAS of brain imaging phenotypes in the UK Biobank 10 , we controlled for similar confounding variables in this study. In addition to age at scanning and sex (i.e. age, age 2 , age × sex, age 2 × sex), covariates relating to imaging parameters and genetic ancestry were also included. These included: head motion from resting-state fMRI, head position, volumetric scaling factor needed to normalize for head size, and the 10 genetic principal components. SNP heritability. Heritability analysis, which is defined as the proportion of observed phenotypic variance explained by additive genetic factors of all common autosomal variants 61 , estimates the relative contribution of genes and the environment on a phenotype. Using BOLT-REML 62 implemented in BOLT-LMM v2.3 63 , we estimated the heritability accounted by autosomal SNPs among the graph theory measures. BOLT-REML uses multiple component modelling to partition SNP heritability and applies Monte Carlo algorithm for variance component analysis.
Genome-wide association analysis. BOLT-LMM v2.3 63 was used to conduct GWAS for the graph theory measures in the UK Biobank sample and adjusted for potential confounds. To correct for multiple hypothesis testing, we estimated the number of independent tests used based on Nyholt et al. method 16 and derived n = 6 independent tests for this study. The genome-wide significant threshold was set at p < (5 × 10 -8 /6) = 8.33 × 10 -9 . Findings with unadjusted threshold of p < 5 × 10 -8 were also reported in Supplementary Table 5. Quantile and Manhattan plots were also presented for each of the graph theory measure in Supplementary Fig. 2. Manhattan and QQ plots were made using the R-package 64 . Locus Zoom 65 was used for the visualization of the nearest genes within a ± 500-kilobase genomic region for the strength of somatomotor and limbic networks based on the hg19 UCSC Genome Browser assembly (Fig. 3). www.nature.com/scientificreports/ Linkage disequilibrium and independent SNPs. The linkage disequilibrium plot of the associated genomic region was made using the LD plot function in the R package "gaston" 66 . To identify independent SNPs associated with each of the network measures a stepwise model selection method as implemented in the COJO 67 (cojo-slct) procedure of GCTA 68 package with default parameters was used.
Multivariate association analysis. Multivariate association analysis was conducted using metaUSAT 17 .
The method uses summary statistics from individual studies and is suitable for correlated traits. MetaUSAT derives strength from two methods of multivariate association tests (score test and multivariate analysis variance test) as it uses convex linear combination of two test statistics. Data-driven minimum p-value corresponding to the best-linear combination is obtained. The significant threshold was set as p < (5 × 10 -8 /4) = 1.25 × 10 -8 .
Gene-based association analysis and functional mapping. Gene-based association analysis was performed via MAGMA 18 (v1.07, https:// ctg. cncr. nl/ softw are/ magma/). MAGMA uses 1000G reference panel for calculation of LD between the SNPs and gene coordinates based on NCBI build 37. SNPs were mapped to a gene if they were within ± 5 kb of the gene co-ordinates. Gene-based association test statistic was derived using the default option, which is the sum of -log(SNP p-value). The Bonferroni correction was used for significance of the gene-based association tests (p-value of the gene-based tests / (number of independent tests × number of genes tested)). In other words, significant threshold was set as p < 0.05/(6 × 18,319) = 4.56 × 10 -7 .
Functional annotation. We performed lookups for variants that passed the suggestive GWAS threshold of p < 5 × 10 -8 to investigate the previously reported associations with the other traits. NHGRI-EBI GWAS Catalogue 69 included previous GWAS publications (Supplementary Table 7) and SNPnexus 19 included all other publications (Supplementary Table 8). FUMA 20 gene2func online platform (version 1.3.4, http:// fuma. ctglab. nl/) was used to explore the functional consequences of significant genes. All the genes in the GWAS analysis and the gene-based analysis after nominal significance were used as input (a total of 18 genes). Using the FUMA platform, the GTEx v7 30 general tissue types data set was used for tissue specificity analyses. Differentially expressed gene (DEG) sets were pre-calculated by performing two-tailed t-test for any one of the tissue type against all others. Expression values were normalized (zero-mean) following a log 2 transformation of expression values. Using the genes as background gene set, 2 × 2 enrichment sets were performed. Genes with p-value ≤ 0.05 were after Bonferroni correction were defined as differentially expressed and colored in red in Supplementary Fig. 3.
Test of associations with graph theory measures. The graph theory measures were normalized using ranked transformed using the rntransform() function in R from GeneABEL package 70 and age were z-transformed for regression analysis. Regression model similar to GWAS analysis was used to test the association of self-reported sleep duration and frequency of insomnia variables with the graph theory measures. Only results with a two-tailed p < 0.05/18 (number of graph theory measures) were considered significant. Data management, derivation of summary statistics and other statistical analyses, and correlation plots were performed using R (V 4.0.0) software 64 .