Cerebral small vessel disease genomics and its implications across the lifespan

White matter hyperintensities (WMH) are the most common brain-imaging feature of cerebral small vessel disease (SVD), hypertension being the main known risk factor. Here, we identify 27 genome-wide loci for WMH-volume in a cohort of 50,970 older individuals, accounting for modification/confounding by hypertension. Aggregated WMH risk variants were associated with altered white matter integrity (p = 2.5×10-7) in brain images from 1,738 young healthy adults, providing insight into the lifetime impact of SVD genetic risk. Mendelian randomization suggested causal association of increasing WMH-volume with stroke, Alzheimer-type dementia, and of increasing blood pressure (BP) with larger WMH-volume, notably also in persons without clinical hypertension. Transcriptome-wide colocalization analyses showed association of WMH-volume with expression of 39 genes, of which four encode known drug targets. Finally, we provide insight into BP-independent biological pathways underlying SVD and suggest potential for genetic stratification of high-risk individuals and for genetically-informed prioritization of drug targets for prevention trials.

A s a leading cause of stroke, cognitive decline, and dementia, cerebral small vessel disease (SVD) represents a major source of morbidity and mortality in aging populations [1][2][3] . Exploring the mechanisms of SVD and their contribution to dementia risk has recently been identified as a priority research area 4,5 , based on its more frequent recognition with magnetic resonance imaging (MRI), its high prevalence in older community persons 3,6 and the demonstration that intensive management of vascular risk factors, especially hypertension, may slow down its progression 7,8 . The biological underpinnings of SVD are poorly understood and no mechanism-based treatments currently are available. White matter hyperintensities of presumed vascular origin (WMH), the most common MRI-marker of SVD, can be measured quantitatively using automated software. They are highly heritable 9 , and confer an increased risk of stroke and dementia 3 , thus making them well-suited to identify potential genetic determinants of SVD and its contribution to stroke and dementia risk. WMH are most often covert, i.e., not associated with a history of clinical stroke. They are highly prevalent in the general population, and much more frequently observed than clinical stroke caused by SVD (which can be both ischemic [small vessel stroke] and hemorrhagic [deep intracerebral hemorrhage]) ( Supplementary Fig. 1).
Studying the genomics of SVD also provides a powerful approach to discovery of underlying molecular mechanisms and targets in order to accelerate the development of future therapies, or identify drug repositioning opportunities [10][11][12] . Although genomic studies of WMH have been most fruitful for deciphering SVD risk variants compared with other MRI-features of SVD (lacunes, cerebral microbleeds, dilated perivascular spaces) 13 or small vessel stroke 14 , or deep intracerebral hemorrhage 15 , few risk loci have been identified to date [16][17][18] . This is likely due to limited sample size of populations studied and possibly also the failure to take into account the role of hypertension (HTN), the strongest known risk factor for WMH, in confounding or modifying genetic associations. There is also mounting evidence suggesting that early-life factors play a crucial role in the occurrence of latelife vascular and neurological conditions, including SVD 19 , likely due to both genetic and environmental factors that may intrinsically influence the vascular substrate of SVD or modulate the brain's resilience to SVD [20][21][22] . Identifying these early predictors could have major implications for our understanding of disease mechanisms across the lifespan and for devising effective prevention strategies.
Here, we conduct a large multiancestry meta-analysis of WMH-volume genome-wide association studies (GWAS), accounting for HTN as a potential confounder and effect modifier. We explore association of WMH risk loci with early changes in white matter microstructure on MRI using diffusion tensor imaging (DTI) in young adults in their early twenties. Last, we explore biological pathways underlying the observed genetic associations with SVD and their clinical significance through shared genetic variation and Mendelian randomization (MR) experiments with vascular risk factors and neurological traits, linking them with multiple epigenomic, transcriptomic, and drug-target databases.

Results
Genetic discovery from association analyses. Figure 1 summarizes the overall workflow of our study that included data from 50,970 participants (N = 48,454 Europeans and 2516 African-Americans) from population-based studies taking part in the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) 23 consortium and from the UK Biobank (Supplementary Data 1). The mean age of participants was 66.0 ± 7.5 years, 53% were women and 52% hypertensive (Methods, Supplementary Methods 1, and Supplementary Data 1 for cohortspecific characteristics). There was no evidence for systematic inflation of SNP-WMH association statistics at the individual cohort or meta-analysis level (Supplementary Data 3 and Supplementary Fig. 2) for the three types of analyses performed (Methods).
In the European-only SNP-main-effects analysis, 22 independent loci harbored common variants associated with WMH volume at genome-wide significance (P < 5 × 10 −8 , Table 1, Fig. 2), lead SNPs for each independent locus were confirmed by both LD clumping and GCTA-COJO 24 . Additionally, the NID2 locus reached genome-wide significance by the joint effect of multiple SNPs (P = 4.87 × 10 −8 , Supplementary Data 4), with P = 5.45 × 10 −8 for lead SNP rs72680374, using GCTA-COJO (Methods, Supplementary Fig. 3). The African-American-only analysis identified a genome-wide significant locus at ECHDC3 (Supplementary Data 4). For loci showing heterogeneity in allelic effects across ancestry groups (PHet < 0.01), using MR-MEGA 25 the ECHDC3 locus and another locus near KCNK2 reached genome-wide significance ( Table 1, Supplementary Data 5). In the HTN-adjusted model two additional loci were associated with WMH volume at P < 5 × 10 −8 (PKN2 and XKR6), while three loci were no longer genome-wide significant ( Table 1). The 2-df genome-wide gene-HTN interaction joint meta-analysis (JMA) did not identify any additional locus (Table 1, Supplementary Data 6). Five loci reached genome-wide significance in the small African-American-only JMA, but these were not maintained in the fixed-effects multiancestry JMA (Supplementary Data 7).
In total, 27 loci reached genome-wide significance in association with WMH volume in at least one of the aforementioned analyses, of which 18 have not previously been reported (Table 1, Fig. 2). Associations with WMH volume at these loci were similar in participants with and without HTN and when stratifying on quartiles of genetically predicted SBP and DBP levels (Methods, Supplementary . In aggregate, however, a weighted genetic risk score of independent WMH risk loci (WMH wGRS, Methods) showed significant 1-df interaction with HTN in association with WMH volume (β GRSxHTN = 0.15, P GRSxHTN = 0.009, Supplementary Fig. 4). One previously described risk locus for WMH burden did not reach genome-wide significance in the current analysis (near PMF1, P = 3.9 × 10 −4 ). Of note one genomewide significant locus (COL4A2) and one suggestive locus (HTRA1, P < 5 × 10 −6 , Supplementary Data 11), involve genes implicated in monogenic forms of SVD 26,27 .
Additional, gene-based tests using MAGMA 28 yielded 49 genewide significant associations (P < 2.8 × 10 −6 ), of which 13 were outside GWAS loci, including the APOE gene (Methods, Supplementary Data 12). Using the Heritability Estimator from Summary Statistics (HESS) 29 we found that 29 ± 2% of WMHvolume variance is explained by common and low frequency variants across the genome, the amount of heritability attributable to loci containing GWAS index SNPs being 2.4 ± 0.1%.
Implications of WMH genes across the lifespan. To examine the lifetime impact of WMH risk variants on brain structure, we explored the association of the WMH wGRS with MRI-markers of white matter microstructure in 1738 young healthy adults participating in the i-Share cohort (mean age 22.1 ± 2.3 years, 72% women). Integrity of the white matter microstructure was measured on diffusion tensor imaging (DTI) using the following metrics: fractional anisotropy (FA), mean diffusivity (MD), radial diffusivity (RD), axial diffusivity (A x D), and the recently described peak width of skeletonized mean diffusivity (PSMD) 30 . These MRI-markers are associated with the maturation and aging process of white matter microstructure 31,32 , and alterations in DTI metrics were shown to precede the occurrence of WMH lesions in older patients with SVD 30,33 . The WMH wGRS showed significant associations with higher MD, RD, and PSMD and lower FA values in i-Share; four WMH risk loci individually showed significant associations with at least one of the DTI parameters (SH3PXD2A, NMT1, KLHL24, and VCAN, Table 2). Increasing values of PSMD (but not other DTI markers) shows a trend towards association with slower information processing speed on the Stroop test in i-Share participants (N = 1,401, effect estimate ± SE: 0.085 ± 0.040, P = 0.031), which did not survive correction for multiple testing (for three independent DTI markers). The WMH wGRS was not associated with the Stroop test in i-Share but showed a trend towards an association with poorer episodic memory performance in older community persons (N = 24,597, effect estimate ± SE: -0.19 ± 0.11, P = 0.08) 34 .
We also examined whether genetically predicted larger WMH volume was associated with increased risk of stroke and Alzheimertype dementia, the most common age-related neurological diseases, and with lower cognitive performance in older age, using previously reported GWAS data (Supplementary Table 2). Several genomewide significant WMH risk loci showed significant association with ischemic stroke (three loci), all stroke and small vessel stroke (two loci each), cardioembolic stroke, deep intracerebral hemorrhage, and Alzheimer-type dementia (one locus each) (Supplementary Data 13). Using linkage disequilibrium score regression (LDSR) 35 , we observed significant genetic correlation of WMH volume with all stroke, ischemic stroke, small vessel stroke, and lower general cognitive function, after Bonferroni correction for multiple testing (P < 3.6 × 10 −3 , Methods, Fig. 3, Supplementary Data 14). Using the Bayesian pairwise GWAS (GWAS-PW) approach 36 , significant regional overlap (posterior probability of association for model 3, PPA3 ≥ 0.90, Methods) was observed between WMH volume and general cognitive function and between WMH volume and stroke, especially ischemic and small vessel stroke (Supplementary Data 15). This included regions previously implicated in complex and monogenic forms of stroke (FOXF2/FOXQ1 37,38 , HTRA1 27,37 ) and cardiovascular disease (NOS3) 39 .
Using two-sample MR, which implements the inverse-variance weighting (IVW) method, we observed evidence for significant causal associations after Bonferroni correction for multiple testing (P < 3.6 × 10 −3 ) between WMH volume and increased risk of Alzheimer-type dementia, with no statistical evidence of horizontal pleiotropy using Cochran's Q statistic (Q-PHet≥0.01, Fig. 4, Supplementary Data 16). We also observed evidence for significant causal association of WMH volume with risk of any stroke, ischemic stroke, small vessel stroke, and deep intracerebral hemorrhage. There was some evidence for horizontal pleiotropy  1 Study workflow and rationale. Ϯ number of GW hits. MRI magnetic resonance imaging, CHARGE cohorts for heart and aging research in genomic epidemiology, EUR European, AFR African-american, GWAS genome-wide association study, WMH White matter hyperintensities, SNP single nucleotide polymorphism, HTN hypertension, JMA joint meta-analysis, MR-MEGA meta-regression of multi-ethnic genetic association, GW genome-wide, LD linkage disequilibrium, GCTA-COJO genome-wide complex trait analysis-conditional and joint analysis, MAGMA multi-marker analysis of genomic annotation, DTI diffusion tensor imaging, iSHARE internet based student health research enterprise, FA fractional anisotropy, MD mean diffusivity, RD radial diffusivity, AxD axial diffusivity, PSMD peak width of skeletonised mean diffusivity, wGRS weighted genetic risk score, GEC genetic type I error calculator, LDSR LDscore regression, GWAS-PW GWAS-pairwise analysis, HESS heritability estimator from summary statistics, EPIGWAS epigenome wide association study, TWAS transcriptome-wide association study, GTEx genotype-tissue expression, ROSMAP religious orders study and the RUSH memory and aging project, CMC common mind consortium, eQTL expression quantitative trait loci, eGene expression-associated genes, COLOC colocalisation, GREP genome for repositioning drugs.  Table 2). First, we looked up associations of the 27 WMH risk loci with related vascular traits, including the lead WMH risk variants and nearby variants (±250 kb) in moderate to high LD (r 2 > 0.5). After correcting for the number of independent loci and traits tested (P < 1.3 × 10 −4 , Methods), 20 of the 27 WMH risk loci (74%) showed significant association with at least one other trait and/or vascular risk factors. For 13 of these, associations were found at a genome-wide significant level (Fig. 2, Supplementary Data 13). Blood pressure (BP) traits showed by far the largest number of  , none of which were reported previously as WMH risk loci, appear not to be associated with any of the vascular traits explored, suggesting other underlying biological pathways. Second, we explored the genome-wide and regional genetic overlap between WMH volume and related vascular traits. Mean X 2 ranged between 1.06 and 3.99 suggesting strong polygenicity for all investigated traits. The impact of possible sample overlap was estimated to be negligible using LDSR 35 (Supplementary Data 14). We observed significant (P < 3.6 × 10 −3 ) genetic correlation of larger WMH volume with higher SBP, DBP, SMKindex, BMI and increased risk of VTE. Using GWAS-PW 36 and HESS 41 (Methods), we identified 16 genomic regions harboring shared genetic risk variants with at least one other vascular trait, predominantly BP traits, but also BMI, lipid levels and SMKindex (PPA3 ≥ 0.90, Fig. 3, Supplementary Data 15).
Third, we explored the causal relations between the aforementioned vascular traits and WMH volume using two-sample MR 42 (RadialMR 40 , Methods), implementing the IVW method. We observed significant (P < 3.6 × 10 −3 ) association of genetically predicted SBP, DBP, PP, SMKindex and T2D with larger WMH volume and of genetically predicted migraine with smaller WMH volume (Fig. 4, Supplementary Data 17). After removal of potentially pleiotropic outlier variants; for SBP, DBP, PP and SMKindex the MR-Egger intercept was nonsignificant, indicating no residual pleiotropy and suggesting causal association with WMH volume (Methods). For migraine and T2D in contrast there was evidence of residual pleiotropic effects (significant MR-Egger intercept, Supplementary Data 17) after removal of potentially pleiotropic outlier variants, and the association became only nominally significant for migraine. Importantly, associations of genetically predicted SBP and DBP with WMH volume remained significant after adjustment for HTN, and in participants with and without HTN (Supplementary Data 17), highlighting that higher levels of BP are likely causally associated with larger WMH volume even below BP thresholds typically used for the definition of hypertension (SBP ≥ 140 mmHg or DBP ≥ 90 mmHg or antihypertensive drug intake) 43 .
Biological interpretation of association signals. We used EPIGWAS 44 to test for cell-type enrichment of WMH association signals using chromatin marks previously shown to be cell-type specific and associated with active gene-regulation (Methods). WMH risk loci were significantly enriched in enhancer (H3K4me1) and promoter sites (H3K4me3) in cell-types derived from the brain, neurosphere (developing brain), vascular tissue, digestive, epithelial and muscle tissues, as well as human embryonic stem cells after removing WMH risk loci associated with BP (Supplementary Data 19). Analysis of brain-specific single-cell expression data in mice using MAGMA.celltyping 45 (Methods) revealed significant enrichment of highly cell-typespecific genes in endothelial mural cells and nominally significant enrichment for vascular leptomeningeal cells, oligodendrocytes, oligodendrocyte precursors, and ependymal astrocytes; results were substantially unchanged after removing WMH risk loci associated with BP (Supplementary Data 20).
To functionally characterize and prioritize individual WMH genomic risk loci we performed transcriptome-wide association studies (TWAS) using TWAS-Fusion 46 , WMH-SNP association statistics from the main effects (EUR-only) and weights from 23  gene-expression reference panels from blood, arterial, and brain tissues (Supplementary Methods 2). We also included nonpublicly available gene-expression weights from the dorsolateral prefrontal cortex (DLPFC) of 494 older community-dwelling participants (Methods) 47,48 . TWAS-Fusion identified 201 transcriptome-wide significant associations with WMH, conditionally significant on the predicted expression of a TWASassociated gene, including 21 with splicing quantitative trait loci (sQTLs) regulating highly tissue-specific gene isoforms in DLPFC (Fig. 5, Supplementary Data 21). To rule out that observed associations reflect the random overlap between expression (eQTLs) and noncausal WMH risk variants, a colocalization analysis (COLOC) 49 was performed at each significant locus, to estimate the posterior probability of a shared causal variant (PP4 ≥ 75%) between the gene expression and trait association (Methods). Colocalization was observed for 96 TWAS significant eQTLs (48%, Fig. 5): of these, 54 mapped to 8 WMH genome-wide risk loci and 16 expression-associated genes (eGenes), while 42 mapped to 12 distinct loci that were not genome-wide significant in the WMH GWAS and 23 eGenes. These additional putative WMH risk loci require confirmation in follow-up studies. Leveraging histone regulatory mark information from blood, arterial, and brain tissues (Methods, Supplementary Data 21), we observed that the majority (89%) of TWAS signals overlapped with enhancer and/or promoter elements, including eQTLs exhibiting weaker colocalization probability (PP4 < 75%).
Larger WMH volume was associated with either upregulated or downregulated gene expression, the directionality being mostly consistent across broad tissue categories (Fig. 5). We found evidence for colocalization of WMH risk variants with eQTLs in brain tissues (28 eGenes Fig. 3 Shared genetic architecture of WMH at genome-wide and regional level Color coded for the direction of effect (Green: Positive genetic correlation; Red: Negative genetic correlation). The LD-score regression (LDSR) axis shows evidence for genome-wide correlations (after Bonferroni correction for multiple testing P < 3.6 × 10 −3 , Methods), with the size of the nodes corresponding to the level of significance of the association. The GWASpairwise (PW) axis shows evidence for regional level overlap of association signals between WMH burden and related vascular and neurological traits (PPA3 ≥ 0.90, Methods). For any given region, the nearest gene (in brackets) to the top SNP associated with WMH is shown. Bivariate heritability estimator from summary statistics (ρ-HESS) was used to infer directionality of shared association signals (Methods) and asterisks denote an unexpected directionality of association

Discussion
This largest genetic study to date on complex SVD 13,14,[16][17][18] , leveraging genetic and brain-imaging information from 50,970 older community persons, triples the number of genetic loci associated with cerebral SVD and shows that this genetic risk results in detectable brain changes among asymptomatic young adults in their twenties. We further demonstrate the importance of higher BP as a risk factor for WMH even below clinical thresholds for HTN. MR analysis provides strong evidence for causal links of genetically determined WMH volume with risk of ischemic stroke, intracerebral haemorrhage, and Alzheimer-type dementia in later life. Importantly, we also provide insight into molecular pathways underlying SVD, highlighting relevant tissue and cell types, and suggest potential for genetic stratification of high-risk individuals and for genetically informed prioritization of drug targets for prevention trials.
Our approach focusing on the most common brain-imaging feature of SVD appears to be more powerful than GWAS of the small vessel stroke subtype to identify risk loci for SVD. Indeed, no new small vessel stroke risk locus was identified in MEGA-STROKE, the largest stroke GWAS meta-analysis to date 14 . We show a strong association between genetically determined WMH burden and risk of stroke in the general population, notably both risk of ischemic stroke and of intracerebral hemorrhage. While corroborating epidemiological observations 3,50 , this has never been demonstrated using genetic instrumental variables, providing evidence for causality. This prompts greater caution with the common empirical prescription of antiplatelet therapy in persons with extensive WMH in the absence of clinical stroke 3 , given the potential detrimental effects on intracerebral hemorrhage risk, and suggests the need for randomized clinical trials to determine the risk/benefit ratio of antiplatelet therapy in this setting.
The significant association we describe between genetically determined WMH burden and Alzheimer-type dementia also has potential important implications for prevention. It strengthens recent epidemiological evidence that WMH is associated not only with an increased risk of all and vascular dementia, but also of neurodegenerative Alzheimer-type dementia 3,51 , providing for the first time evidence for causality using the WMH wGRS as an instrumental variable. Because of the proven ability to treat vascular risk factors, understanding and targeting the biological mechanisms of the vascular contribution to cognitive impairment and dementia, and specifically how cerebral SVD contributes to the molecular pathology of Alzheimer disease, are areas of intense research and clinical interest 52

Fig. 4 Mendelian randomization results of vascular risk factors with WMH burden (box A) and WMH burden with neurological traits (box B). Point estimates and confidence intervals (blue) from the inverse-variance weighted (IVW) method are shown along with the point estimates and 95%
confidence interval (black) from sensitivity analyses after filtering out potentially pleiotropic outlier variants. The intercept and p-value from the MR-Egger method is displayed on the far right (an intercept term significantly differing from zero at the conservative threshold of P < 0.05 suggests the presence of directional pleiotropy absence of other efficient therapies. Our results suggest that WMH should be considered a major target for preventative interventions, to mitigate not only the risk of stroke and vascular cognitive impairment but also of Alzheimer-type dementia, and support the rationale of innovative trials using WMH progression as a surrogate or intermediate endpoint for cognitive decline and dementia. Over half of identified WMH risk loci are associated with higher BP levels. Moreover, using MR we provide evidence for a causal association between higher BP and larger WMH volume, notably even in participants without clinically defined HTN at the time of the MRI. Indeed, associations of genetically predicted SBP and DBP with WMH volume remained significant in participants without HTN, highlighting that higher levels of BP are likely causally associated with larger WMH volume even below BP thresholds typically used for the definition of HTN 43 . Considering the recent conclusions from the SPRINT-MIND trial suggesting that more drastic lowering of BP in persons with HTN is associated with slower progression of WMH volume and a lower risk of developing the combined outcome of mild cognitive impairment or dementia 53,54 , our results suggest that trials to test a similar impact of intensive BP lowering in high-risk individuals who do not meet the current clinical thresholds for HTN could be warranted. We additionally show strong causal association between increased exposure to cigarette smoking over the lifetime (lifetime smoking index) and increased WMH burden, as has recently been described in relation with stroke risk 55 , providing some additional evidence for the relevance of smoking cessation to prevent vascular brain injury and specifically SVD.
Importantly, a quarter of the identified WMH risk loci reflect molecular mechanisms that are not mediated by BP or other known vascular risk factors, two of these (NID2, VCAN), along with the COL4A2 and EFEMP1 locus, implicate genes encoding matrisome proteins 56 , involved in cell membrane structure and representing core components of the extracellular matrix (ECM). Converging evidence from experimental models for monogenic SVD suggest that perturbations of the matrisome play a central role in disease pathophysiology 57 . Our findings suggest that these could also be relevant for sporadic forms of SVD. NID2 encodes nidogen, an ECM glycoprotein and a major component of basement membranes and is recognized as having a role in poststroke angiogenesis 58,59 . Mutations in COL4A1/2, encoding collagen another basement membrane component already are known causing monogenic SVD 26 . VCAN, which we also found to be associated with white matter microstructure in young adults, encodes versican, a proteoglycan involved in cell adhesion and ECM assembly 60 . In CARASIL, a monogenic SVD caused by HTRA1 mutations, accumulation of versican in the thickened arterial wall was observed 27 . Versican also can form complexes that inhibit oligodendrocyte maturation and remyelination 61 . EFEMP1 encodes fibulin 3, an ECM glycoprotein localised in the basement membrane, and a proteolytic target of serine protease HTRA1 62 . Other previously unreported WMH risk loci that we have identified include KCNK2 that encodes Twik-related K + channel (TREK1), a 2-pore-domain background ATP-sensitive potassium channel expressed throughout the central nervous system, more prominently in fetal than in adult brain. ECHDC3, near a distinct locus (r 2 < 0.01) previously implicated in Alzheimer disease 63 . MN1, which previously has been causally related to familial meningiomas 64 , and XKR6, which has been associated with risk of systemic lupus erythematosus 65 .
Our results provide important insight into the lifetime impact of genetic risk for SVD. Indeed, WMH risk variants observed in older adults were already associated with changes in DTI markers of white matter integrity in young adults in their early twenties. Of these, PSMD, a DTI metric recently described to be more  Only genes showing significant colocalization between the eQTL and the WMH risk variant in at least one tissue are shown. Susceptibility genes are depicted on the x-axis (blue: known; violet: novel), with tissue types of gene-expression datasets on the y-axis (orange: brain or peripheral nerve tissue; green: arterial/heart; pink: blood). Blue boxes correspond to WMH risk alleles being associated with upregulation (+) of gene expression in the corresponding tissues, while red boxes correspond to WMH risk alleles being associated with downregulation (−) of gene expression (color intensity corresponds to the magnitude of gene-expression effect size). Only significant TWAS associations at P < 1.1 × 10 −5 are shown. Asterisks denote loci harboring a common causal variant associated with WMH and gene expression with high posterior probability using colocalization analyses (Methods; PP4 ≥ 0.75). ROSMAP religious order study and rush memory and aging project, DLPFC dorsolateral prefrontal cortex, CMC common mind consortium, BA Brodmann area, YFS young Finns study, NTR Netherlands twins register. strongly correlated with cognitive performance in older persons (patients with sporadic or monogenic SVD and older community persons) than any other MRI-marker of SVD 30 , was already showing nominal association with lower cognitive performance in young adults. This finding requires confirmation in future independent samples. The association of the WMH wGRS with subtle changes in white matter microstructure in young adults, if confirmed in independent samples, has potential important implications for the timing and paradigm of prediction and prevention of SVD progression and complications. It could reflect that biological pathways contributing to WMH at an older age already have a significant impact on brain microstructure in young adults, possibly reflecting a very early stage of SVD (typically characterized by reduced FA and increased MD and PSMD 33 ). DTI changes and WMH have been suggested to be dependent physiological processes occurring within consecutive temporal windows in older patients with SVD 30,33,66 . Alternatively, observed associations might also reflect pleiotropy between SVD genes and genes influencing brain maturation, as the mean age of i-Share participants corresponds to the peak of white matter maturation 67 . On average FA tends to increase during childhood, adolescence, and early adulthood and then decline in middle-age, while the reverse is observed for MD 32,68 . Hence the association of the WMH wGRS with lower FA and higher MD could also reflect an impaired or delayed maturation or a premature aging process. The significant association of the WMH wGRS with RD but not A x D could potentially suggest that this is predominantly reflecting an impact on myelination of fiber tracts 69 , in line with involvement of oligodendroglial dysfunction in early SVD pathology 70 . Future follow-up studies in a longitudinal setting are warranted to better understand the impact of genetically predicted WMH burden on the progression of white matter microstructural changes observed already in young adults and on their link with SVD and its complications.
Functional characterization revealed enrichment of WMH risk variants in regulatory marks in brain and neurosphere and in single-cell gene-expression levels in endothelial mural cells (as for clinical stroke) 71 . Gene prioritization using TWAS revealed that several WMH risk loci colocalized with eQTL for multiple genes with distinct tissue specificities. This pattern could potentially partly explain why association of such loci with WMH volume remained unchanged after controlling for the presence of HTN, although they were associated at genome-wide significant level with both BP and WMH. Of the 39 eGenes identified by TWAS four encode known drug targets. MAPT is a drug target under investigation for neurodegenerative disorders: the eQTL colocalizing with the WMH risk variant is an sQTL for the MAPT isoform in DLPFC and TWAS suggest that larger WMH volume is associated with upregulated MAPT expression. CALCRL encodes a component of the Calcitonin Gene Related Peptide receptor. TWAS suggest that lower abundance of the CALCRL transcript in arterial and nerve tissue and higher abundance in blood are associated with larger WMH volume. Monoclonal antibodies against CALCRL have recently been developed for the treatment of migraine 72 .
We acknowledge limitations. We were underpowered for detecting additional risk variants for WMH after accounting for presence of HTN in the 2-df JMA gene-HTN interaction model. Recognizing that blood pressure is also highly variable and that a one-time blood pressure measurement may not reflect the longterm exposure of participants to high blood pressure levels, we conducted secondary analyses stratifying on quartiles of genetically predicted SBP and DBP levels, yielding similar results. In aggregate, a weighted genetic risk score of independent genomewide significant WMH risk loci showed a significant 1-df interaction with HTN status in association with WMH volume, suggesting that effect modification of genetic associations by HTN may exist, but that to discover them at the level of individual loci likely will require much larger samples. While we were able to use gene-expression data from many tissues for TWAS, such data are lacking for certain tissues that may be relevant for WMH (e.g., small brain vessels, microglia). Finally, our study population is predominantly of European ancestry (95%) limiting our ability to extrapolate our conclusions to other ancestry groups.
In conclusion we have identified 27 genetic risk loci for WMH volume, of which two thirds are not previously reported, and provided additional insight into their association with structural brain changes in very young adults, their clinical significance and the importance of high BP as a risk factor below clinical thresholds. Our results also point to molecular pathways underlying SVD that are not mediated by vascular risk factors and suggest potential for genetic stratification of high-risk individuals and genetically informed prioritization of drug targets for prevention trials.

Methods
Study population. The study population comprised 23 population-based studies from the CHARGE consortium comprising a total of 24,182 individuals of European (N = 21,666) and African-American (N = 2516) ancestry, along with 26,788 community participants of European origin from the UK Biobank. In total, 50,970 participants were available for testing main genetic effects and 48,524 participants for the gene-hypertension interaction analysis (information on HTN status was missing in 2446 participants). Individuals with a history of stroke (or MRI-defined brain infarcts involving the cortical gray matter), or other pathologies that may influence the measurement of WMH (e.g., brain tumor, head trauma, etc.), at the time of MRI were excluded from analyses. Study participants in all participating cohorts gave written informed consent for phenotype quantification and use of genetic material (Supplementary Methods 1, Supplementary Table 1).
Phenotypes. MRI scans were obtained from scanners with field strengths ranging mostly from 1.5 to 3.0 Tesla and interpreted using a standardized protocol blinded to clinical or demographic features. In addition to T1 and T2 weighted scans along the axial plane, some cohorts included fluid-attenuated inversion recovery (FLAIR) and/or proton density (PD) sequences for better differentiation of WMH from cerebrospinal fluid. The vast majority of participating cohorts (>92% of all participants) used fully automated software to quantify WMH volume, with two cohorts using validated, visually guided semi-quantitative scales in older study subsets (Supplementary Table 1). WMH volume measures were inverse normal transformed to correct for skewness and account for differences in WMH quantification methods. Blood pressure measurements that are closest to the MRI scan were used to define HTN status. Participants with a SBP ≥ 140 mmHg, a DBP ≥ 90 mmHg, or taking antihypertensive medication were classified as having HTN.
Genotyping and imputation. Genome-wide genotyping platforms are described in Supplementary Data 2. Prior to imputation, sample-specific quality control (QC) on heterozygosity, missingness, gender mismatch, cryptic relatedness, and analysis of principal components (PC) for population stratification, as well as SNP-level QC on genotyping call rate and Hardy-Weinberg equilibrium were applied (Supplementary Data 2). Samples and SNPs passing the cohort-specific QC measures were then imputed to the 1000 genomes cosmopolitan panel phase 1 version 3 (1000 G p1v3) for CHARGE cohorts, while for the UK Biobank the dataset version 3 was imputed to the combined UK10K and Haplotype Reference Consortium (HRC) reference panels.
Genome-wide association analyses. Each participating study conducted ancestry-specific analyses using linear regression and assuming additive genetic effects under three models: (1) marginal genetic association test of WMH volume (SNP-main effect); (2) SNP-main effect adjusted for HTN status; (3) and joint association test of both SNP-main and SNP-by-HTN interaction effects in relation with WMH volume: where SNP corresponds to the dosage of a given genetic (G) variant, Env is the dichotomous variable for HTN status, Cov is the vector of covariates, GEnv is the SNP-by-HTN interaction effect and β values are the corresponding regression coefficients and error covariance (ε) of β. The joint model (Model 3) provides effect estimates of G and GEnv, their robust standard errors (SE) and robust covariance matrices, and a joint P value from a 2 degree-of-freedom (df) Wald test. Robust estimates of SEs and covariance matrices were used. All analyses were adjusted for age, sex, PCs of population stratification and intracranial volume (ICV). Adjustment for ICV was not performed for studies using visual grading of WMH burden, as visual grades are inherently normalized for brain size 17 (Supplementary Data 1, Supplementary Methods 1).
Genome-wide association meta-analyses. A custom harmonization script along with the R package EasyQC 73 was used to perform the QC of cohort-specific GWAS results. SNPs with minor allele frequency (MAF) lower than 1%, poor imputation quality (R 2 < 0.80), or a product of MAF, R 2 and sample size less than 10, or 15 for the 2-df interaction analysis were excluded.
Inverse-variance weighted meta-analysis was conducted using METAL 74 , first within each ancestry group followed by a meta-analysis of the ancestry-specific results. A patch implemented in the METAL 75 software was used to perform a 2-df joint meta-analysis (JMA) with inverse-variance weighting. For cohort-specific GWAS results with genomic inflation factors (λ) exceeding 1, genomic control (GC) correction was applied to correct for any residual population stratification. After meta-analysis only SNPs represented in more than half of participating studies and/or more than half of sample size and with no evidence of betweenstudy heterogeneity (PHet > 1 × 10 −4 ) were considered. Quantile-Quantile (QQ) plots of the P-values (observed versus expected) in the GWAS for the different models tested are presented along with the genomic inflation factor (λ) ( Supplementary Fig. 2, Supplementary Data 3). Since heterogeneity in allelic effects that is specifically due to ancestry differences is not addressed by the traditional fixed-effects meta-analysis, a multiancestry meta-regression was carried out. For each variant, SNP-main allelic effects on WMH volume across GWAS were estimated in a linear regression framework weighting on the inverse of the variance of effect estimates and on the axes of genetic variation derived from pairwise allele frequency differences, as implemented in MR-MEGA 25 . It provides two heterogeneity estimates, one that is correlated with ancestry (P Het -ANC) and accounted for in the meta-regression and the residual heterogeneity that is not due to population genetic differences (P Het -RES).
In addition, association of the genome-wide significant WMH risk variants with WMH volume was tested after (i) stratification on hypertension status (all cohorts), and (ii) stratification on quartiles of SBP and DBP polygenic risk score distribution in the UK biobank, as described in the supplementary information (Supplementary Methods 2).
Across all association models, the power to reject the null hypothesis of no association at the genome-wide (GW) level was set at P < 5 × 10 −8 . Independent SNPs within genome-wide risk loci were determined by performing linkage disequilibrium (LD) based clumping implemented in PLINK using both a physical distance of ±1 megabase (Mb) and an LD threshold of r 2 > 0.10 from the index SNP of a given locus 76 . For constructing the LD matrix, ancestry-specific (European [EUR], African-American ancestry in South-West USA [ASW]) 1000 G p1v3 reference panels were used for ancestry-specific results and the merged (EUR +ASW) reference for multiancestry results. Stepwise conditional regression and joint analysis (cojo) implemented in GCTA 24 was performed to further validate the independent signals (based on the main-effects GWAS in Europeans only). GCTA-COJO additionally identifies signals with GW (P < 5 × 10 −8 ) association level due to the LD adjusted joint effect of several neighbouring SNPs, selected based on an association priori of P < 1 × 10 −7 . Genotypes of 6489 unrelated European individuals from the Three City (3 C) study 77 were used to generate the LD matrix. Finally, gene-based association tests were conducted using MAGMA 28 , with P < 2.8 × 10 −6 as a gene-wide significance threshold. Gene regions with SNPs not reaching GW significance for WMH and/or not in LD (r 2 < 0.10) with the lead WMH SNP were considered as novel.
WMH heritability estimates. LD-score regression (LDSR) was used to distinguish polygenicity from confounding due to population stratification or cryptic relatedness 35 and to estimate the GW heritability by regressing the LD-score (measure of linked SNPs) against the chi-square association statistics of WMH volume from the European-only analysis. To address the infinitesimal-model assumption used by variance-component methods such as LDSR, we applied the heritability estimator from summary statistics (HESS) 29 to estimate local SNP-level heritability. HESS does not assume any effect size distribution and by weighted summation of the variant effect sizes and eigenvectors of the LD matrix provides variance explained by all SNPs at a given locus. Since the current GWAS sample size for the European-only analysis is smaller than the required size (>50,000) by HESS, GW heritability for WMH (h 2 = 0.54 ± 0.24) from the 3C-Dijon study 9 was used to partition into each locus as suggested 29 . GWAS effects sizes were reinflated with the genomic inflation factor obtained from the GWAS summary statistics (λ = 1.09) to reduce potential downward bias of local SNP-level heritability and GW heritability estimates.
Analysis of the lifetime impact of WMH risk variants. We explored the association of the WMH wGRS ( Supplementary Data 4) with MRI-markers of white matter integrity in unrelated young adults participating in the i-Share cohort, the largest ongoing cohort study on student health (www.i-share.fr), using DTI markers. A WMH wGRS was constructed from 25 GW significant SNPs identified in European-only samples. High-quality MRI images and genome-wide genotypes were available in 1738 participants (Supplementary Methods 1, Supplementary Data 1). Briefly, white matter tracts were skeletonized with Tract-Based Spatial Statistics (TBSS) and a diffusion histogram analysis was performed, as described in the supplementary information (Supplementary Table 1), to derive DTI metrics measuring the integrity of the white matter microstructure, including fractional anisotropy (FA) and mean, radial and axial diffusivity (MD, RD, A x D), as well as peak width of skeletonized mean diffusivity (PSMD). PSMD was calculated using a fully automated method via a shell script (www.psmd-marker.com) (Supplementary Table 1). A mixed linear model (MLM) was used to test the association of individual SNPs with each DTI trait, accounting for any sample substructure (admixture) and possible relatedness in the sample by using a genetic relationship matrix (GRM) as a random effect. The GRM was computed by implementing the MLMA-LOCO scheme in GCTA, where the SNP marker tested for association with a given outcome was excluded at each instance. MLMA-LOCO has been shown to better control false-positives over the standard mixed models especially in the presence of geographic population structure and cryptic relatedness 78 . The model was additionally adjusted for age, sex, ICV and the first four PCs of population stratification. The effect allele for each risk variant was defined as the allele associated with larger WMH volume. For associations with individual SNPs the significance threshold was set at P < 2 × 10 −3 (0.05/25). The aggregate effect of 25 WMH risk variants with DTI metrics was estimated by using the GTX package in R 79 .
The association of FA, MD, RD, A x D, and PSMD with reaction time on the Stroop test, reflecting information processing speed, was examined using linear regression in i-Share participants who underwent both MRI and cognitive testing (N = 1401). Analyses were adjusted for age, sex, ICV, study-curriculum and ethnic origin. The association p value was adjusted for the number of independent comparisons made (n = 3), estimated based on the correlation matrix between the DTI traits from i-Share and by applying the Matrix Spectral Decomposition (matSpDlite) method (http://neurogenetics.qimrberghofer.edu.au/matSpDlite/). Shared genetic architecture of WMH with related traits. We systematically explored the genetic overlap of WMH SNP-main-effects (in the European-only analysis) with (i) neurological traits (any stroke, ischemic stroke, small vessel stroke, large artery stroke, cardioembolic stroke; any, deep, and lobar intracerebral hemorrhage; general cognitive function and Alzheimer-type dementia); and (ii) vascular risk factors and traits (SBP, DBP, PP, HDL-cholesterol, LDL-cholesterol, TG, BMI, T2D, HbA1c, SMKindex, VTE, and migraine). We acquired summary statistics of European-only analyses for these traits, using the latest largest GWAS, seeking collaboration with the relevant consortia when the data were not publicly available (Supplementary Table 2).
We first explored the association of lead WMH risk variants (n = 27) with related vascular and neurological traits. For each of the related traits, association statistics of SNPs falling in a window of ±250 kb around each of the lead WMH SNP were retrieved and SNPs satisfying the multiple testing threshold defined by correcting for the effective number of LD independent markers per locus, as implemented in Genetic Type 1 error calculator (GEC) 80 , were retained. Only SNPs showing an association with a related vascular or neurological trait at P < 1.3 × 10 −4 (accounting for 14 independent traits and 27 independent loci) and in moderate to high LD with the lead WMH SNP (r 2 > 0.50) are reported. The correlation matrix estimated between the traits using individual-level data from the 3 C study 77 was used to estimate the number of independent traits by applying the Matrix Spectral Decomposition (matSpDlite) method (http://neurogenetics. qimrberghofer.edu.au/matSpDlite/).
Using LDSR 81 , genetic correlation estimates between WMH and the aforementioned neurological and vascular traits were obtained. A p value < 3.6 × 10 −3 correcting for 14 independent phenotypes was considered significant. As genomewide correlation estimates may miss significant correlations at the regional level (balancing-effect) 41 , the Bayesian pairwise GWAS approach (GWAS-PW) was applied 36 . GWAS-PW identifies trait pairs with high posterior probability of association (PPA) with a shared genetic variant (model 3, PPA3 ≥ 0.90). To ensure that PPA3 is unbiased by sample overlap, fgwas v.0.3.6 was run on each pair of traits and the correlation estimated from regions with null association evidence (PPA < 0.20) was used as a correction factor 36 . Additionally, to estimate the directionality of associations with trait pairs in regions with PPA3 ≥ 0.90, HESS was used to estimate local genetic correlation 41 .
Mendelian randomization (MR). For each vascular trait genetic variant (instrument) details were retrieved from the latest largest GWAS (Supplementary  Table 2). Only independent SNPs (r 2 < 0.10, based on 1000 G EUR panel) reaching genome-wide significance were included as recommended 82 . Similarly, 25 independent WMH risk variants from the SNP-main effects were used as instruments to test the association of genetically predicted WMH volume with neurological traits. The putative causal effect (β IVW ) of an exposure on the outcome was estimated, using the inverse-variance weighting (IVW) method, by the weighted sum of the ratios of beta-coefficients from the SNP-outcome associations for each NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-19111-2 ARTICLE NATURE COMMUNICATIONS | (2020) 11:6285 | https://doi.org/10.1038/s41467-020-19111-2 | www.nature.com/naturecommunications variant (j) over corresponding beta-coefficients from the SNP-exposure associations (β j ). The ratio estimates from each genetic variant were averaged after weighting on the inverse variance (W j ) of β j across L uncorrelated SNPs, implemented as an R package RadialMR (available through CRAN repositories) 40 .
Effect alleles for each risk variant were defined as the allele associated with increase in the corresponding trait values. A p value < 3.6 × 10 −3 correcting for 14 independent traits was considered significant. Cochran's Q statistic was used to test for the presence of heterogeneity (P Het < 0.01) due to horizontal pleiotropy that occurs when instruments exert an effect on the outcome and exposure through independent pathways 40 . Influential outlier SNPs that have the largest contribution to the global Cochran's Q statistic are identified by regressing the predicted causal estimate against the inverse-variance weights. After excluding the influential outlier SNPs, the IVW test was repeated along with MR-Egger regression 83 . Relative goodness of fit of the MR-Egger over the IVW approach was quantified using the Q R statistics, which is the ratio of the statistical heterogeneity around the MR-Egger fitted slope divided by the statistical heterogeneity around the IVW slope. A Q R close to 1 indicates that MR-Egger is not a better fit to the data and therefore offers no benefit over IVW 40 . Nonsignificant MR-Egger intercept was used as an indicator to formally rule out horizontal pleiotropy.
Cell and tissue type enrichment analysis. Association statistics from the WMH SNP-main effects (European-only) were used to test cell/tissue-specific enrichment. First we used the EPIGWAS software 44 and histone marks for promoters (H3K4me3) and enhancers (H3K4me1) from publicly available information on tissue-specific histone regulatory marks (Supplementary Methods 2). EPIGWAS calculates specificity scores for the lead WMH risk variant and its proxies (r 2 ≥ 0.80, 1000 G EUR) based on the distance to the strongest ChIP-seq peak signal, and estimates enrichment significance by comparing the relative proximity and specificity of the test set with 10,000 sets of matched background (using permutation). Bonferroni correction for the number of histone marks tested was applied (P < 2.5 × 10 −2 ). Second, we used MAGMA 28 (gene-property analysis) and differentially expressed gene sets from the single-cell transcriptomic (scRNA) data in mouse brain from the Karolinska Institute. MAGMA generates gene-level association statistics by combining SNP p-values in a specified window (10 kb upstream and 1.5 kb downstream of each gene) accounting for LD (1000 G EUR) and under a linear regression framework performs a one-sided test between the association of genes with WMH volume and cell specificity. Using the MAGMA.celltyping R package 45 , scRNA expression values were obtained from five different mouse brain regions (neocortex, hippocampus, hypothalamus, striatum, and midbrain) 84 . A gene-expression specificity metric for each cell-type was calculated by dividing the expression level in a given cell type by the sum of the expression levels from all cell types (i.e., genes with high expression levels in two or more cell types will get a lower specificity measure than genes with high expression levels in a single-cell type), followed by binning the metric value to 40 equally sized bins. The MAGMA one-sided test was then used to test for enrichment between the top 10 percentile bins (bins with higher cell-specificity) in each cell type. Bonferroni correction for the number of cell types tested was applied (P < 2.1 × 10 −3 ).
Transcriptome-wide association study and colocalization. We performed transcriptome-wide association studies (TWAS) using the association statistics from the WMH SNP-main effects (European-only) and weights from 22 publicly available gene-expression reference panels (Supplementary Methods 2) from blood (Netherlands Twin Registry, NTR; Young Finns Study, YFS), arterial (Genotype-Tissue Expression, GTEx), brain (GTEx, CommonMind Consortium, CMC) and peripheral nerve tissues (GTEx). For each gene in the reference panel, precomputed SNP-expression weights in the 1-Mb window were obtained (Supplementary Methods 2), including the highly tissue-specific splicing QTL (sQTL) information on gene isoforms in the dorsolateral prefrontal cortex (DLPFC) derived from the CMC. Additionally, non-publicly available gene-expression weights from the DLPFC of 494 older individuals from two large community-based studies (the Religious Order Study [ROS] 48 and the Rush Memory Aging Project [MAP] 47 were obtained. TWAS-Fusion 46 was used to estimate the TWAS Z score (association statistic between predicted expression and WMH), derived from the SNPexpression weights, SNP-WMH effect estimates and the SNP correlation matrix. Transcriptome-wide (TW) significant genes (eGenes) and the corresponding QTLs (eQTLs) were determined using Bonferroni correction in each reference panel, based on the average number of features (4360 genes) tested across all the reference panels 46 . eGene regions with eQTLs not reaching genome-wide significance in association with WMH, and not in LD (r 2 < 0.01) with the lead SNP for genomewide significant WMH risk loci, were considered as novel. Finally, a colocalization analysis (COLOC) 49 was carried out at each locus to estimate the posterior probability of a shared causal variant (PP4 ≥ 0.75) between the gene expression and trait association, using a prior probability of 1.1 × 10 −5 for the WMH association. Furthermore, functional validation of the eGenes was performed by testing for positional overlap of the best eQTLs from TWAS with enhancer (H3K4me1) and/ or promoter (H3K4me3) elements across a broad category of relevant tissue types (blood, brain/neurological, heart/arterial) using Haploreg V4.1 85 . A value of 1 was assigned to eQTLs with regulatory epigenome overlap in at least one tissue.
Drug-target enrichment. The Genome for REPositioning drugs (GREP) tool 86 was used to quantify the enrichment of eGenes emerging from the TWAS with high probability of colocalization (PP4 ≥ 0.75) in the curated drug-target list classified based on the International Classification of Diseases 10 (ICD10). GREP provides as an output the names of the drug(s) targeting a given gene set along with the disease category. Moreover, by performing a series of Fisher's exact tests GREP formally tests whether the gene set is enriched in genes targeted by drugs in a specific clinical indication category to treat a certain disease or condition.
Reporting summary. Further information on research design is available in the Nature Research Life Sciences Reporting Summary linked to this article.

Data availability
Summary statistics for the GWAS meta-analysis of the CHARGE cohorts and the UK Biobank on WMH burden main-effects generated and analyzed in the downstream analyses are deposited in a public repository (dbGAP: https://www.ncbi.nlm.nih.gov/gap/) under the accession number: phs002227.v1.p1. All other data supporting the findings of this study are available either within the article, the supplementary information and supplementary data files, or from the authors upon reasonable request. Received: 26 January 2020; Accepted: 10 September 2020;