## Introduction

As a leading cause of stroke, cognitive decline, and dementia, cerebral small vessel disease (SVD) represents a major source of morbidity and mortality in aging populations1,2,3. Exploring the mechanisms of SVD and their contribution to dementia risk has recently been identified as a priority research area4,5, based on its more frequent recognition with magnetic resonance imaging (MRI), its high prevalence in older community persons3,6 and the demonstration that intensive management of vascular risk factors, especially hypertension, may slow down its progression7,8. The biological underpinnings of SVD are poorly understood and no mechanism-based treatments currently are available. White matter hyperintensities of presumed vascular origin (WMH), the most common MRI-marker of SVD, can be measured quantitatively using automated software. They are highly heritable9, and confer an increased risk of stroke and dementia3, thus making them well-suited to identify potential genetic determinants of SVD and its contribution to stroke and dementia risk. WMH are most often covert, i.e., not associated with a history of clinical stroke. They are highly prevalent in the general population, and much more frequently observed than clinical stroke caused by SVD (which can be both ischemic [small vessel stroke] and hemorrhagic [deep intracerebral hemorrhage]) (Supplementary Fig. 1).

Studying the genomics of SVD also provides a powerful approach to discovery of underlying molecular mechanisms and targets in order to accelerate the development of future therapies, or identify drug repositioning opportunities10,11,12. Although genomic studies of WMH have been most fruitful for deciphering SVD risk variants compared with other MRI-features of SVD (lacunes, cerebral microbleeds, dilated perivascular spaces)13 or small vessel stroke14, or deep intracerebral hemorrhage15, few risk loci have been identified to date16,17,18. This is likely due to limited sample size of populations studied and possibly also the failure to take into account the role of hypertension (HTN), the strongest known risk factor for WMH, in confounding or modifying genetic associations. There is also mounting evidence suggesting that early-life factors play a crucial role in the occurrence of late-life vascular and neurological conditions, including SVD19, likely due to both genetic and environmental factors that may intrinsically influence the vascular substrate of SVD or modulate the brain’s resilience to SVD20,21,22. Identifying these early predictors could have major implications for our understanding of disease mechanisms across the lifespan and for devising effective prevention strategies.

Here, we conduct a large multiancestry meta-analysis of WMH-volume genome-wide association studies (GWAS), accounting for HTN as a potential confounder and effect modifier. We explore association of WMH risk loci with early changes in white matter microstructure on MRI using diffusion tensor imaging (DTI) in young adults in their early twenties. Last, we explore biological pathways underlying the observed genetic associations with SVD and their clinical significance through shared genetic variation and Mendelian randomization (MR) experiments with vascular risk factors and neurological traits, linking them with multiple epigenomic, transcriptomic, and drug-target databases.

## Results

### Genetic discovery from association analyses

Figure 1 summarizes the overall workflow of our study that included data from 50,970 participants (N = 48,454 Europeans and 2516 African-Americans) from population-based studies taking part in the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE)23 consortium and from the UK Biobank (Supplementary Data 1). The mean age of participants was 66.0 ± 7.5 years, 53% were women and 52% hypertensive (Methods, Supplementary Methods 1, and Supplementary Data 1 for cohort-specific characteristics). There was no evidence for systematic inflation of SNP-WMH association statistics at the individual cohort or meta-analysis level (Supplementary Data 3 and Supplementary Fig. 2) for the three types of analyses performed (Methods).

In the European-only SNP-main-effects analysis, 22 independent loci harbored common variants associated with WMH volume at genome-wide significance (P < 5 × 10−8, Table 1, Fig. 2), lead SNPs for each independent locus were confirmed by both LD clumping and GCTA-COJO24. Additionally, the NID2 locus reached genome-wide significance by the joint effect of multiple SNPs (P = 4.87 × 10−8, Supplementary Data 4), with P = 5.45 × 10−8 for lead SNP rs72680374, using GCTA-COJO (Methods, Supplementary Fig. 3). The African–American-only analysis identified a genome-wide significant locus at ECHDC3 (Supplementary Data 4). For loci showing heterogeneity in allelic effects across ancestry groups (PHet < 0.01), using MR-MEGA25 the ECHDC3 locus and another locus near KCNK2 reached genome-wide significance (Table 1, Supplementary Data 5). In the HTN-adjusted model two additional loci were associated with WMH volume at P < 5 × 10−8 (PKN2 and XKR6), while three loci were no longer genome-wide significant (Table 1). The 2-df genome-wide gene-HTN interaction joint meta-analysis (JMA) did not identify any additional locus (Table 1, Supplementary Data 6). Five loci reached genome-wide significance in the small African–American-only JMA, but these were not maintained in the fixed-effects multiancestry JMA (Supplementary Data 7).

In total, 27 loci reached genome-wide significance in association with WMH volume in at least one of the aforementioned analyses, of which 18 have not previously been reported (Table 1, Fig. 2). Associations with WMH volume at these loci were similar in participants with and without HTN and when stratifying on quartiles of genetically predicted SBP and DBP levels (Methods, Supplementary Data 8-10). In aggregate, however, a weighted genetic risk score of independent WMH risk loci (WMH wGRS, Methods) showed significant 1-df interaction with HTN in association with WMH volume (βGRSxHTN = 0.15, PGRSxHTN = 0.009, Supplementary Fig. 4). One previously described risk locus for WMH burden did not reach genome-wide significance in the current analysis (near PMF1, P = 3.9 × 10−4). Of note one genome-wide significant locus (COL4A2) and one suggestive locus (HTRA1, P < 5 × 10−6, Supplementary Data 11), involve genes implicated in monogenic forms of SVD26,27.

Additional, gene-based tests using MAGMA28 yielded 49 gene-wide significant associations (P < 2.8 × 10−6), of which 13 were outside GWAS loci, including the APOE gene (Methods, Supplementary Data 12). Using the Heritability Estimator from Summary Statistics (HESS)29 we found that 29 ± 2% of WMH-volume variance is explained by common and low frequency variants across the genome, the amount of heritability attributable to loci containing GWAS index SNPs being 2.4 ± 0.1%.

### Implications of WMH genes across the lifespan

To examine the lifetime impact of WMH risk variants on brain structure, we explored the association of the WMH wGRS with MRI-markers of white matter microstructure in 1738 young healthy adults participating in the i-Share cohort (mean age 22.1 ± 2.3 years, 72% women). Integrity of the white matter microstructure was measured on diffusion tensor imaging (DTI) using the following metrics: fractional anisotropy (FA), mean diffusivity (MD), radial diffusivity (RD), axial diffusivity (AxD), and the recently described peak width of skeletonized mean diffusivity (PSMD)30. These MRI-markers are associated with the maturation and aging process of white matter microstructure31,32, and alterations in DTI metrics were shown to precede the occurrence of WMH lesions in older patients with SVD30,33. The WMH wGRS showed significant associations with higher MD, RD, and PSMD and lower FA values in i-Share; four WMH risk loci individually showed significant associations with at least one of the DTI parameters (SH3PXD2A, NMT1, KLHL24, and VCAN, Table 2). Increasing values of PSMD (but not other DTI markers) shows a trend towards association with slower information processing speed on the Stroop test in i-Share participants (N = 1,401, effect estimate ± SE: 0.085 ± 0.040, P = 0.031), which did not survive correction for multiple testing (for three independent DTI markers). The WMH wGRS was not associated with the Stroop test in i-Share but showed a trend towards an association with poorer episodic memory performance in older community persons (N = 24,597, effect estimate ± SE: -0.19 ± 0.11, P = 0.08)34.

We also examined whether genetically predicted larger WMH volume was associated with increased risk of stroke and Alzheimer-type dementia, the most common age-related neurological diseases, and with lower cognitive performance in older age, using previously reported GWAS data (Supplementary Table 2). Several genome-wide significant WMH risk loci showed significant association with ischemic stroke (three loci), all stroke and small vessel stroke (two loci each), cardioembolic stroke, deep intracerebral hemorrhage, and Alzheimer-type dementia (one locus each) (Supplementary Data 13). Using linkage disequilibrium score regression (LDSR)35, we observed significant genetic correlation of WMH volume with all stroke, ischemic stroke, small vessel stroke, and lower general cognitive function, after Bonferroni correction for multiple testing (P < 3.6 × 10−3, Methods, Fig. 3, Supplementary Data 14). Using the Bayesian pairwise GWAS (GWAS-PW) approach36, significant regional overlap (posterior probability of association for model 3, PPA3 ≥ 0.90, Methods) was observed between WMH volume and general cognitive function and between WMH volume and stroke, especially ischemic and small vessel stroke (Supplementary Data 15). This included regions previously implicated in complex and monogenic forms of stroke (FOXF2/FOXQ137,38, HTRA127,37) and cardiovascular disease (NOS3)39.

Using two-sample MR, which implements the inverse-variance weighting (IVW) method, we observed evidence for significant causal associations after Bonferroni correction for multiple testing (P < 3.6 × 10−3) between WMH volume and increased risk of Alzheimer-type dementia, with no statistical evidence of horizontal pleiotropy using Cochran’s Q statistic (Q-PHet≥0.01, Fig. 4, Supplementary Data 16). We also observed evidence for significant causal association of WMH volume with risk of any stroke, ischemic stroke, small vessel stroke, and deep intracerebral hemorrhage. There was some evidence for horizontal pleiotropy (Q-PHet<0.01) for associations with stroke. However, after removing influential outlier variants associations remained significant and the MR-Egger intercept (indicating average pleiotropic effects) did not significantly differ from zero with QR values close to 1, indicating goodness of fit of the IVW method (Methods, Supplementary Data 16)40.

### Shared genetic risk with vascular traits

To assess whether genetic associations with WMH reflect known vascular mechanisms we systematically explored the shared genetic variation between WMH burden and related vascular traits, comprising established risk factors for vascular disease (systolic blood pressure [SBP], diastolic blood pressure [DBP], pulse pressure [PP], type 2 diabetes [T2D], low-density lipoprotein [LDL] cholesterol, high-density lipoprotein [HDL] cholesterol, triglycerides [TG], body mass index [BMI], glycated hemoglobin levels [HbA1c], and lifetime smoking index—a composite measure capturing smoking heaviness and duration as well as smoking initiation [SMKindex]), as well as putative risk factors for other disorders including venous thromboembolism (VTE) and migraine, using summary statistics from the most recent GWAS. Some of the latter were made available through collaborations with the relevant research consortia when the data were not publicly available (Supplementary Table 2).

First, we looked up associations of the 27 WMH risk loci with related vascular traits, including the lead WMH risk variants and nearby variants (±250 kb) in moderate to high LD (r2 > 0.5). After correcting for the number of independent loci and traits tested (P < 1.3 × 10−4, Methods), 20 of the 27 WMH risk loci (74%) showed significant association with at least one other trait and/or vascular risk factors. For 13 of these, associations were found at a genome-wide significant level (Fig. 2, Supplementary Data 13). Blood pressure (BP) traits showed by far the largest number of significant associations with WMH risk variants, 16 loci (59.3%) being associated with SBP, DBP, and/or PP. Further significant associations with WMH risk variants were observed for BMI (8 loci), T2D (5 loci), SMKindex (3 loci), and lipid traits (3 loci), one locus (at XKR6) being notably shared with all these risk factors. Seven loci (C16orf95, ECHDC3, MN1, NID2, SALL1, VCAN, KCNK2), none of which were reported previously as WMH risk loci, appear not to be associated with any of the vascular traits explored, suggesting other underlying biological pathways.

Second, we explored the genome-wide and regional genetic overlap between WMH volume and related vascular traits. Mean X2 ranged between 1.06 and 3.99 suggesting strong polygenicity for all investigated traits. The impact of possible sample overlap was estimated to be negligible using LDSR35 (Supplementary Data 14). We observed significant (P < 3.6 × 10−3) genetic correlation of larger WMH volume with higher SBP, DBP, SMKindex, BMI and increased risk of VTE. Using GWAS-PW36 and HESS41 (Methods), we identified 16 genomic regions harboring shared genetic risk variants with at least one other vascular trait, predominantly BP traits, but also BMI, lipid levels and SMKindex (PPA3 ≥ 0.90, Fig. 3, Supplementary Data 15).

Third, we explored the causal relations between the aforementioned vascular traits and WMH volume using two-sample MR42 (RadialMR40, Methods), implementing the IVW method. We observed significant (P < 3.6 × 10−3) association of genetically predicted SBP, DBP, PP, SMKindex and T2D with larger WMH volume and of genetically predicted migraine with smaller WMH volume (Fig. 4, Supplementary Data 17). After removal of potentially pleiotropic outlier variants; for SBP, DBP, PP and SMKindex the MR-Egger intercept was nonsignificant, indicating no residual pleiotropy and suggesting causal association with WMH volume (Methods). For migraine and T2D in contrast there was evidence of residual pleiotropic effects (significant MR-Egger intercept, Supplementary Data 17) after removal of potentially pleiotropic outlier variants, and the association became only nominally significant for migraine. Importantly, associations of genetically predicted SBP and DBP with WMH volume remained significant after adjustment for HTN, and in participants with and without HTN (Supplementary Data 17), highlighting that higher levels of BP are likely causally associated with larger WMH volume even below BP thresholds typically used for the definition of hypertension (SBP ≥ 140 mmHg or DBP ≥ 90 mmHg or antihypertensive drug intake)43.

### Biological interpretation of association signals

We used EPIGWAS44 to test for cell-type enrichment of WMH association signals using chromatin marks previously shown to be cell-type specific and associated with active gene-regulation (Methods). WMH risk loci were significantly enriched in enhancer (H3K4me1) and promoter sites (H3K4me3) in cell-types derived from the brain, neurosphere (developing brain), vascular tissue, digestive, epithelial and muscle tissues, as well as human embryonic stem cells after removing WMH risk loci associated with BP (Supplementary Data 19). Analysis of brain-specific single-cell expression data in mice using MAGMA.celltyping45 (Methods) revealed significant enrichment of highly cell-type-specific genes in endothelial mural cells and nominally significant enrichment for vascular leptomeningeal cells, oligodendrocytes, oligodendrocyte precursors, and ependymal astrocytes; results were substantially unchanged after removing WMH risk loci associated with BP (Supplementary Data 20).

To functionally characterize and prioritize individual WMH genomic risk loci we performed transcriptome-wide association studies (TWAS) using TWAS-Fusion46, WMH-SNP association statistics from the main effects (EUR-only) and weights from 23 gene-expression reference panels from blood, arterial, and brain tissues (Supplementary Methods 2). We also included non-publicly available gene-expression weights from the dorsolateral prefrontal cortex (DLPFC) of 494 older community-dwelling participants (Methods)47,48. TWAS-Fusion identified 201 transcriptome-wide significant associations with WMH, conditionally significant on the predicted expression of a TWAS-associated gene, including 21 with splicing quantitative trait loci (sQTLs) regulating highly tissue-specific gene isoforms in DLPFC (Fig. 5, Supplementary Data 21). To rule out that observed associations reflect the random overlap between expression (eQTLs) and noncausal WMH risk variants, a colocalization analysis (COLOC)49 was performed at each significant locus, to estimate the posterior probability of a shared causal variant (PP4 ≥ 75%) between the gene expression and trait association (Methods). Colocalization was observed for 96 TWAS significant eQTLs (48%, Fig. 5): of these, 54 mapped to 8 WMH genome-wide risk loci and 16 expression-associated genes (eGenes), while 42 mapped to 12 distinct loci that were not genome-wide significant in the WMH GWAS and 23 eGenes. These additional putative WMH risk loci require confirmation in follow-up studies. Leveraging histone regulatory mark information from blood, arterial, and brain tissues (Methods, Supplementary Data 21), we observed that the majority (89%) of TWAS signals overlapped with enhancer and/or promoter elements, including eQTLs exhibiting weaker colocalization probability (PP4 < 75%). Larger WMH volume was associated with either upregulated or downregulated gene expression, the directionality being mostly consistent across broad tissue categories (Fig. 5). We found evidence for colocalization of WMH risk variants with eQTLs in brain tissues (28 eGenes), vascular tissues (15 eGenes), and blood (6 eGenes). Some eGenes (KLHL24, NMT1, DCAKD, KANSL1, AMZ2P1) showed evidence for colocalization in multiple tissues, and for some WMH risk loci (chr2q33.2, chr17q25.1, chr17q21.31) colocalized variants associated with multiple eGenes with distinct tissue specificities (Fig. 5, Supplementary Data 21). WMH risk variants at the chr2q33 locus for instance showed evidence for colocalization with eQTLs for NBEAL1 in nerve and arterial tissues, for ICA1L and KRT8P15 in brain tissues, and for CARF1 in right atrial appendage.

Among the 39 WMH eGenes with high colocalization probability (Fig. 5, Supplementary Data 21), 4 (MAPT, CRHR1, CALCRL, NOTCH4) are registered as targets of approved drugs in the DrugBank database and the Therapeutic Target Database (Supplementary Data 22).

## Discussion

This largest genetic study to date on complex SVD13,14,16,17,18, leveraging genetic and brain-imaging information from 50,970 older community persons, triples the number of genetic loci associated with cerebral SVD and shows that this genetic risk results in detectable brain changes among asymptomatic young adults in their twenties. We further demonstrate the importance of higher BP as a risk factor for WMH even below clinical thresholds for HTN. MR analysis provides strong evidence for causal links of genetically determined WMH volume with risk of ischemic stroke, intracerebral haemorrhage, and Alzheimer-type dementia in later life. Importantly, we also provide insight into molecular pathways underlying SVD, highlighting relevant tissue and cell types, and suggest potential for genetic stratification of high-risk individuals and for genetically informed prioritization of drug targets for prevention trials.

Our approach focusing on the most common brain-imaging feature of SVD appears to be more powerful than GWAS of the small vessel stroke subtype to identify risk loci for SVD. Indeed, no new small vessel stroke risk locus was identified in MEGASTROKE, the largest stroke GWAS meta-analysis to date14. We show a strong association between genetically determined WMH burden and risk of stroke in the general population, notably both risk of ischemic stroke and of intracerebral hemorrhage. While corroborating epidemiological observations3,50, this has never been demonstrated using genetic instrumental variables, providing evidence for causality. This prompts greater caution with the common empirical prescription of antiplatelet therapy in persons with extensive WMH in the absence of clinical stroke3, given the potential detrimental effects on intracerebral hemorrhage risk, and suggests the need for randomized clinical trials to determine the risk/benefit ratio of antiplatelet therapy in this setting.

The significant association we describe between genetically determined WMH burden and Alzheimer-type dementia also has potential important implications for prevention. It strengthens recent epidemiological evidence that WMH is associated not only with an increased risk of all and vascular dementia, but also of neurodegenerative Alzheimer-type dementia3,51, providing for the first time evidence for causality using the WMH wGRS as an instrumental variable. Because of the proven ability to treat vascular risk factors, understanding and targeting the biological mechanisms of the vascular contribution to cognitive impairment and dementia, and specifically how cerebral SVD contributes to the molecular pathology of Alzheimer disease, are areas of intense research and clinical interest52, especially given the current absence of other efficient therapies. Our results suggest that WMH should be considered a major target for preventative interventions, to mitigate not only the risk of stroke and vascular cognitive impairment but also of Alzheimer-type dementia, and support the rationale of innovative trials using WMH progression as a surrogate or intermediate endpoint for cognitive decline and dementia.

Over half of identified WMH risk loci are associated with higher BP levels. Moreover, using MR we provide evidence for a causal association between higher BP and larger WMH volume, notably even in participants without clinically defined HTN at the time of the MRI. Indeed, associations of genetically predicted SBP and DBP with WMH volume remained significant in participants without HTN, highlighting that higher levels of BP are likely causally associated with larger WMH volume even below BP thresholds typically used for the definition of HTN43. Considering the recent conclusions from the SPRINT-MIND trial suggesting that more drastic lowering of BP in persons with HTN is associated with slower progression of WMH volume and a lower risk of developing the combined outcome of mild cognitive impairment or dementia53,54, our results suggest that trials to test a similar impact of intensive BP lowering in high-risk individuals who do not meet the current clinical thresholds for HTN could be warranted. We additionally show strong causal association between increased exposure to cigarette smoking over the lifetime (lifetime smoking index) and increased WMH burden, as has recently been described in relation with stroke risk55, providing some additional evidence for the relevance of smoking cessation to prevent vascular brain injury and specifically SVD.

Importantly, a quarter of the identified WMH risk loci reflect molecular mechanisms that are not mediated by BP or other known vascular risk factors, two of these (NID2, VCAN), along with the COL4A2 and EFEMP1 locus, implicate genes encoding matrisome proteins56, involved in cell membrane structure and representing core components of the extracellular matrix (ECM). Converging evidence from experimental models for monogenic SVD suggest that perturbations of the matrisome play a central role in disease pathophysiology57. Our findings suggest that these could also be relevant for sporadic forms of SVD. NID2 encodes nidogen, an ECM glycoprotein and a major component of basement membranes and is recognized as having a role in post-stroke angiogenesis58,59. Mutations in COL4A1/2, encoding collagen another basement membrane component already are known causing monogenic SVD26. VCAN, which we also found to be associated with white matter microstructure in young adults, encodes versican, a proteoglycan involved in cell adhesion and ECM assembly60. In CARASIL, a monogenic SVD caused by HTRA1 mutations, accumulation of versican in the thickened arterial wall was observed27. Versican also can form complexes that inhibit oligodendrocyte maturation and remyelination61. EFEMP1 encodes fibulin 3, an ECM glycoprotein localised in the basement membrane, and a proteolytic target of serine protease HTRA162. Other previously unreported WMH risk loci that we have identified include KCNK2 that encodes Twik-related K + channel (TREK1), a 2-pore-domain background ATP-sensitive potassium channel expressed throughout the central nervous system, more prominently in fetal than in adult brain. ECHDC3, near a distinct locus (r2 < 0.01) previously implicated in Alzheimer disease63. MN1, which previously has been causally related to familial meningiomas64, and XKR6, which has been associated with risk of systemic lupus erythematosus65.

Functional characterization revealed enrichment of WMH risk variants in regulatory marks in brain and neurosphere and in single-cell gene-expression levels in endothelial mural cells (as for clinical stroke)71. Gene prioritization using TWAS revealed that several WMH risk loci colocalized with eQTL for multiple genes with distinct tissue specificities. This pattern could potentially partly explain why association of such loci with WMH volume remained unchanged after controlling for the presence of HTN, although they were associated at genome-wide significant level with both BP and WMH. Of the 39 eGenes identified by TWAS four encode known drug targets. MAPT is a drug target under investigation for neurodegenerative disorders: the eQTL colocalizing with the WMH risk variant is an sQTL for the MAPT isoform in DLPFC and TWAS suggest that larger WMH volume is associated with upregulated MAPT expression. CALCRL encodes a component of the Calcitonin Gene Related Peptide receptor. TWAS suggest that lower abundance of the CALCRL transcript in arterial and nerve tissue and higher abundance in blood are associated with larger WMH volume. Monoclonal antibodies against CALCRL have recently been developed for the treatment of migraine72.

We acknowledge limitations. We were underpowered for detecting additional risk variants for WMH after accounting for presence of HTN in the 2-df JMA gene-HTN interaction model. Recognizing that blood pressure is also highly variable and that a one-time blood pressure measurement may not reflect the long-term exposure of participants to high blood pressure levels, we conducted secondary analyses stratifying on quartiles of genetically predicted SBP and DBP levels, yielding similar results. In aggregate, a weighted genetic risk score of independent genome-wide significant WMH risk loci showed a significant 1-df interaction with HTN status in association with WMH volume, suggesting that effect modification of genetic associations by HTN may exist, but that to discover them at the level of individual loci likely will require much larger samples. While we were able to use gene-expression data from many tissues for TWAS, such data are lacking for certain tissues that may be relevant for WMH (e.g., small brain vessels, microglia). Finally, our study population is predominantly of European ancestry (95%) limiting our ability to extrapolate our conclusions to other ancestry groups.

In conclusion we have identified 27 genetic risk loci for WMH volume, of which two thirds are not previously reported, and provided additional insight into their association with structural brain changes in very young adults, their clinical significance and the importance of high BP as a risk factor below clinical thresholds. Our results also point to molecular pathways underlying SVD that are not mediated by vascular risk factors and suggest potential for genetic stratification of high-risk individuals and genetically informed prioritization of drug targets for prevention trials.

## Methods

### Study population

The study population comprised 23 population-based studies from the CHARGE consortium comprising a total of 24,182 individuals of European (N = 21,666) and African–American (N = 2516) ancestry, along with 26,788 community participants of European origin from the UK Biobank. In total, 50,970 participants were available for testing main genetic effects and 48,524 participants for the gene-hypertension interaction analysis (information on HTN status was missing in 2446 participants). Individuals with a history of stroke (or MRI-defined brain infarcts involving the cortical gray matter), or other pathologies that may influence the measurement of WMH (e.g., brain tumor, head trauma, etc.), at the time of MRI were excluded from analyses. Study participants in all participating cohorts gave written informed consent for phenotype quantification and use of genetic material (Supplementary Methods 1, Supplementary Table 1).

### Phenotypes

MRI scans were obtained from scanners with field strengths ranging mostly from 1.5 to 3.0 Tesla and interpreted using a standardized protocol blinded to clinical or demographic features. In addition to T1 and T2 weighted scans along the axial plane, some cohorts included fluid-attenuated inversion recovery (FLAIR) and/or proton density (PD) sequences for better differentiation of WMH from cerebrospinal fluid. The vast majority of participating cohorts (>92% of all participants) used fully automated software to quantify WMH volume, with two cohorts using validated, visually guided semi-quantitative scales in older study subsets (Supplementary Table 1). WMH volume measures were inverse normal transformed to correct for skewness and account for differences in WMH quantification methods. Blood pressure measurements that are closest to the MRI scan were used to define HTN status. Participants with a SBP ≥ 140 mmHg, a DBP ≥ 90 mmHg, or taking antihypertensive medication were classified as having HTN.

### Genotyping and imputation

Genome-wide genotyping platforms are described in Supplementary Data 2. Prior to imputation, sample-specific quality control (QC) on heterozygosity, missingness, gender mismatch, cryptic relatedness, and analysis of principal components (PC) for population stratification, as well as SNP-level QC on genotyping call rate and Hardy–Weinberg equilibrium were applied (Supplementary Data 2). Samples and SNPs passing the cohort-specific QC measures were then imputed to the 1000 genomes cosmopolitan panel phase 1 version 3 (1000 G p1v3) for CHARGE cohorts, while for the UK Biobank the dataset version 3 was imputed to the combined UK10K and Haplotype Reference Consortium (HRC) reference panels.

### Genome-wide association analyses

Each participating study conducted ancestry-specific analyses using linear regression and assuming additive genetic effects under three models: (1) marginal genetic association test of WMH volume (SNP-main effect); (2) SNP-main effect adjusted for HTN status; (3) and joint association test of both SNP-main and SNP-by-HTN interaction effects in relation with WMH volume:

$$Y = \beta _0 + \beta _{\mathrm{G}}{\mathrm{SNP}} + \beta _{\mathrm{C}}{\mathrm{Cov}} + \varepsilon$$
(1)
$$Y = \beta _0 + \beta _{\mathrm{G}}{\mathrm{SNP}} + \beta _{\mathrm{C}}{\mathrm{Cov}} + \beta _{{{{\mathrm{Env}}}}}{\mathrm{HTN}} + \varepsilon$$
(2)
$$Y = \beta _0 + \beta _{\mathrm{G}}{\mathrm{SNP}} + \beta _{\mathrm{C}}{\mathrm{Cov}} + \beta _{{\mathrm{GEnv}}}{\mathrm{SNP}} \times {\mathrm{HTN}} + \beta _{{\mathrm{Env}}}{\mathrm{HTN}} + \varepsilon$$
(3)

where SNP corresponds to the dosage of a given genetic (G) variant, Env is the dichotomous variable for HTN status, Cov is the vector of covariates, GEnv is the SNP-by-HTN interaction effect and β values are the corresponding regression coefficients and error covariance (ε) of β. The joint model (Model 3) provides effect estimates of G and GEnv, their robust standard errors (SE) and robust covariance matrices, and a joint P value from a 2 degree-of-freedom (df) Wald test. Robust estimates of SEs and covariance matrices were used.

All analyses were adjusted for age, sex, PCs of population stratification and intracranial volume (ICV). Adjustment for ICV was not performed for studies using visual grading of WMH burden, as visual grades are inherently normalized for brain size17 (Supplementary Data 1, Supplementary Methods 1).

### Genome-wide association meta-analyses

A custom harmonization script along with the R package EasyQC73 was used to perform the QC of cohort-specific GWAS results. SNPs with minor allele frequency (MAF) lower than 1%, poor imputation quality (R2 < 0.80), or a product of MAF, R2 and sample size less than 10, or 15 for the 2-df interaction analysis were excluded.

Inverse-variance weighted meta-analysis was conducted using METAL74, first within each ancestry group followed by a meta-analysis of the ancestry-specific results. A patch implemented in the METAL75 software was used to perform a 2-df joint meta-analysis (JMA) with inverse-variance weighting. For cohort-specific GWAS results with genomic inflation factors (λ) exceeding 1, genomic control (GC) correction was applied to correct for any residual population stratification. After meta-analysis only SNPs represented in more than half of participating studies and/or more than half of sample size and with no evidence of between-study heterogeneity (PHet > 1 × 10−4) were considered. Quantile-Quantile (QQ) plots of the P-values (observed versus expected) in the GWAS for the different models tested are presented along with the genomic inflation factor (λ) (Supplementary Fig. 2, Supplementary Data 3). Since heterogeneity in allelic effects that is specifically due to ancestry differences is not addressed by the traditional fixed-effects meta-analysis, a multiancestry meta-regression was carried out. For each variant, SNP-main allelic effects on WMH volume across GWAS were estimated in a linear regression framework weighting on the inverse of the variance of effect estimates and on the axes of genetic variation derived from pairwise allele frequency differences, as implemented in MR-MEGA25. It provides two heterogeneity estimates, one that is correlated with ancestry (PHet-ANC) and accounted for in the meta-regression and the residual heterogeneity that is not due to population genetic differences (PHet-RES).

In addition, association of the genome-wide significant WMH risk variants with WMH volume was tested after (i) stratification on hypertension status (all cohorts), and (ii) stratification on quartiles of SBP and DBP polygenic risk score distribution in the UK biobank, as described in the supplementary information (Supplementary Methods 2).

Across all association models, the power to reject the null hypothesis of no association at the genome-wide (GW) level was set at P < 5 × 10−8. Independent SNPs within genome-wide risk loci were determined by performing linkage disequilibrium (LD) based clumping implemented in PLINK using both a physical distance of ±1 megabase (Mb) and an LD threshold of r2 > 0.10 from the index SNP of a given locus76. For constructing the LD matrix, ancestry-specific (European [EUR], African–American ancestry in South-West USA [ASW]) 1000 G p1v3 reference panels were used for ancestry-specific results and the merged (EUR+ASW) reference for multiancestry results. Stepwise conditional regression and joint analysis (cojo) implemented in GCTA24 was performed to further validate the independent signals (based on the main-effects GWAS in Europeans only). GCTA-COJO additionally identifies signals with GW (P < 5 × 10−8) association level due to the LD adjusted joint effect of several neighbouring SNPs, selected based on an association priori of P < 1 × 10−7. Genotypes of 6489 unrelated European individuals from the Three City (3 C) study77 were used to generate the LD matrix. Finally, gene-based association tests were conducted using MAGMA28, with P < 2.8 × 10−6 as a gene-wide significance threshold. Gene regions with SNPs not reaching GW significance for WMH and/or not in LD (r2 < 0.10) with the lead WMH SNP were considered as novel.

### WMH heritability estimates

LD-score regression (LDSR) was used to distinguish polygenicity from confounding due to population stratification or cryptic relatedness35 and to estimate the GW heritability by regressing the LD-score (measure of linked SNPs) against the chi-square association statistics of WMH volume from the European-only analysis. To address the infinitesimal-model assumption used by variance-component methods such as LDSR, we applied the heritability estimator from summary statistics (HESS)29 to estimate local SNP-level heritability. HESS does not assume any effect size distribution and by weighted summation of the variant effect sizes and eigenvectors of the LD matrix provides variance explained by all SNPs at a given locus. Since the current GWAS sample size for the European-only analysis is smaller than the required size (>50,000) by HESS, GW heritability for WMH (h2 = 0.54 ± 0.24) from the 3C-Dijon study9 was used to partition into each locus as suggested29. GWAS effects sizes were reinflated with the genomic inflation factor obtained from the GWAS summary statistics (λ = 1.09) to reduce potential downward bias of local SNP-level heritability and GW heritability estimates.

### Analysis of the lifetime impact of WMH risk variants

We explored the association of the WMH wGRS (Supplementary Data 4) with MRI-markers of white matter integrity in unrelated young adults participating in the i-Share cohort, the largest ongoing cohort study on student health (www.i-share.fr), using DTI markers. A WMH wGRS was constructed from 25 GW significant SNPs identified in European-only samples. High-quality MRI images and genome-wide genotypes were available in 1738 participants (Supplementary Methods 1, Supplementary Data 1). Briefly, white matter tracts were skeletonized with Tract-Based Spatial Statistics (TBSS) and a diffusion histogram analysis was performed, as described in the supplementary information (Supplementary Table 1), to derive DTI metrics measuring the integrity of the white matter microstructure, including fractional anisotropy (FA) and mean, radial and axial diffusivity (MD, RD, AxD), as well as peak width of skeletonized mean diffusivity (PSMD). PSMD was calculated using a fully automated method via a shell script (www.psmd-marker.com) (Supplementary Table 1). A mixed linear model (MLM) was used to test the association of individual SNPs with each DTI trait, accounting for any sample substructure (admixture) and possible relatedness in the sample by using a genetic relationship matrix (GRM) as a random effect. The GRM was computed by implementing the MLMA-LOCO scheme in GCTA, where the SNP marker tested for association with a given outcome was excluded at each instance. MLMA-LOCO has been shown to better control false-positives over the standard mixed models especially in the presence of geographic population structure and cryptic relatedness78. The model was additionally adjusted for age, sex, ICV and the first four PCs of population stratification. The effect allele for each risk variant was defined as the allele associated with larger WMH volume. For associations with individual SNPs the significance threshold was set at P < 2 × 10−3 (0.05/25). The aggregate effect of 25 WMH risk variants with DTI metrics was estimated by using the GTX package in R79.

The association of FA, MD, RD, AxD, and PSMD with reaction time on the Stroop test, reflecting information processing speed, was examined using linear regression in i-Share participants who underwent both MRI and cognitive testing (N = 1401). Analyses were adjusted for age, sex, ICV, study-curriculum and ethnic origin. The association p value was adjusted for the number of independent comparisons made (n = 3), estimated based on the correlation matrix between the DTI traits from i-Share and by applying the Matrix Spectral Decomposition(matSpDlite) method (http://neurogenetics.qimrberghofer.edu.au/matSpDlite/).

### Shared genetic architecture of WMH with related traits

We systematically explored the genetic overlap of WMH SNP-main-effects (in the European-only analysis) with (i) neurological traits (any stroke, ischemic stroke, small vessel stroke, large artery stroke, cardioembolic stroke; any, deep, and lobar intracerebral hemorrhage; general cognitive function and Alzheimer-type dementia); and (ii) vascular risk factors and traits (SBP, DBP, PP, HDL-cholesterol, LDL-cholesterol, TG, BMI, T2D, HbA1c, SMKindex, VTE, and migraine). We acquired summary statistics of European-only analyses for these traits, using the latest largest GWAS, seeking collaboration with the relevant consortia when the data were not publicly available (Supplementary Table 2).

We first explored the association of lead WMH risk variants (n = 27) with related vascular and neurological traits. For each of the related traits, association statistics of SNPs falling in a window of ±250 kb around each of the lead WMH SNP were retrieved and SNPs satisfying the multiple testing threshold defined by correcting for the effective number of LD independent markers per locus, as implemented in Genetic Type 1 error calculator (GEC)80, were retained. Only SNPs showing an association with a related vascular or neurological trait at P < 1.3 × 10−4 (accounting for 14 independent traits and 27 independent loci) and in moderate to high LD with the lead WMH SNP (r2 > 0.50) are reported. The correlation matrix estimated between the traits using individual-level data from the 3 C study77 was used to estimate the number of independent traits by applying the Matrix Spectral Decomposition (matSpDlite) method (http://neurogenetics.qimrberghofer.edu.au/matSpDlite/).

Using LDSR81, genetic correlation estimates between WMH and the aforementioned neurological and vascular traits were obtained. A p value < 3.6 × 10−3 correcting for 14 independent phenotypes was considered significant. As genome-wide correlation estimates may miss significant correlations at the regional level (balancing-effect)41, the Bayesian pairwise GWAS approach (GWAS-PW) was applied36. GWAS-PW identifies trait pairs with high posterior probability of association (PPA) with a shared genetic variant (model 3, PPA3 ≥ 0.90). To ensure that PPA3 is unbiased by sample overlap, fgwas v.0.3.6 was run on each pair of traits and the correlation estimated from regions with null association evidence (PPA < 0.20) was used as a correction factor36. Additionally, to estimate the directionality of associations with trait pairs in regions with PPA3 ≥ 0.90, HESS was used to estimate local genetic correlation41.

### Mendelian randomization (MR)

For each vascular trait genetic variant (instrument) details were retrieved from the latest largest GWAS (Supplementary Table 2). Only independent SNPs (r2 < 0.10, based on 1000 G EUR panel) reaching genome-wide significance were included as recommended82. Similarly, 25 independent WMH risk variants from the SNP-main effects were used as instruments to test the association of genetically predicted WMH volume with neurological traits. The putative causal effect (βIVW) of an exposure on the outcome was estimated, using the inverse-variance weighting (IVW) method, by the weighted sum of the ratios of beta-coefficients from the SNP-outcome associations for each variant (j) over corresponding beta-coefficients from the SNP-exposure associations (βj). The ratio estimates from each genetic variant were averaged after weighting on the inverse variance (Wj) of βj across L uncorrelated SNPs, implemented as an R package RadialMR (available through CRAN repositories)40.

$$\beta {\mathrm{IVW}} = \frac{{\mathop {\sum }\nolimits_{j = 1}^L Wj\beta _j}}{{\mathop {\sum }\nolimits_{j = 1}^L Wj}}$$

Effect alleles for each risk variant were defined as the allele associated with increase in the corresponding trait values. A p value < 3.6 × 10−3 correcting for 14 independent traits was considered significant. Cochran’s Q statistic was used to test for the presence of heterogeneity (PHet < 0.01) due to horizontal pleiotropy that occurs when instruments exert an effect on the outcome and exposure through independent pathways40. Influential outlier SNPs that have the largest contribution to the global Cochran’s Q statistic are identified by regressing the predicted causal estimate against the inverse-variance weights. After excluding the influential outlier SNPs, the IVW test was repeated along with MR-Egger regression83. Relative goodness of fit of the MR-Egger over the IVW approach was quantified using the QR statistics, which is the ratio of the statistical heterogeneity around the MR-Egger fitted slope divided by the statistical heterogeneity around the IVW slope. A QR close to 1 indicates that MR-Egger is not a better fit to the data and therefore offers no benefit over IVW40. Nonsignificant MR-Egger intercept was used as an indicator to formally rule out horizontal pleiotropy.

### Cell and tissue type enrichment analysis

Association statistics from the WMH SNP-main effects (European-only) were used to test cell/tissue-specific enrichment. First we used the EPIGWAS software44 and histone marks for promoters (H3K4me3) and enhancers (H3K4me1) from publicly available information on tissue-specific histone regulatory marks (Supplementary Methods 2). EPIGWAS calculates specificity scores for the lead WMH risk variant and its proxies (r2 ≥ 0.80, 1000 G EUR) based on the distance to the strongest ChIP-seq peak signal, and estimates enrichment significance by comparing the relative proximity and specificity of the test set with 10,000 sets of matched background (using permutation). Bonferroni correction for the number of histone marks tested was applied (P < 2.5 × 10−2). Second, we used MAGMA28 (gene-property analysis) and differentially expressed gene sets from the single-cell transcriptomic (scRNA) data in mouse brain from the Karolinska Institute. MAGMA generates gene-level association statistics by combining SNP p-values in a specified window (10 kb upstream and 1.5 kb downstream of each gene) accounting for LD (1000 G EUR) and under a linear regression framework performs a one-sided test between the association of genes with WMH volume and cell specificity. Using the MAGMA.celltyping R package45, scRNA expression values were obtained from five different mouse brain regions (neocortex, hippocampus, hypothalamus, striatum, and midbrain)84. A gene-expression specificity metric for each cell-type was calculated by dividing the expression level in a given cell type by the sum of the expression levels from all cell types (i.e., genes with high expression levels in two or more cell types will get a lower specificity measure than genes with high expression levels in a single-cell type), followed by binning the metric value to 40 equally sized bins. The MAGMA one-sided test was then used to test for enrichment between the top 10 percentile bins (bins with higher cell-specificity) in each cell type. Bonferroni correction for the number of cell types tested was applied (P < 2.1 × 10−3).

### Transcriptome-wide association study and colocalization

We performed transcriptome-wide association studies (TWAS) using the association statistics from the WMH SNP-main effects (European-only) and weights from 22 publicly available gene-expression reference panels (Supplementary Methods 2) from blood (Netherlands Twin Registry, NTR; Young Finns Study, YFS), arterial (Genotype-Tissue Expression, GTEx), brain (GTEx, CommonMind Consortium, CMC) and peripheral nerve tissues (GTEx). For each gene in the reference panel, precomputed SNP-expression weights in the 1-Mb window were obtained (Supplementary Methods 2), including the highly tissue-specific splicing QTL (sQTL) information on gene isoforms in the dorsolateral prefrontal cortex (DLPFC) derived from the CMC. Additionally, non-publicly available gene-expression weights from the DLPFC of 494 older individuals from two large community-based studies (the Religious Order Study [ROS]48 and the Rush Memory Aging Project [MAP]47 were obtained. TWAS-Fusion46 was used to estimate the TWAS Z score (association statistic between predicted expression and WMH), derived from the SNP-expression weights, SNP-WMH effect estimates and the SNP correlation matrix. Transcriptome-wide (TW) significant genes (eGenes) and the corresponding QTLs (eQTLs) were determined using Bonferroni correction in each reference panel, based on the average number of features (4360 genes) tested across all the reference panels46. eGene regions with eQTLs not reaching genome-wide significance in association with WMH, and not in LD (r2 < 0.01) with the lead SNP for genome-wide significant WMH risk loci, were considered as novel. Finally, a colocalization analysis (COLOC)49 was carried out at each locus to estimate the posterior probability of a shared causal variant (PP4 ≥ 0.75) between the gene expression and trait association, using a prior probability of 1.1 × 10−5 for the WMH association. Furthermore, functional validation of the eGenes was performed by testing for positional overlap of the best eQTLs from TWAS with enhancer (H3K4me1) and/or promoter (H3K4me3) elements across a broad category of relevant tissue types (blood, brain/neurological, heart/arterial) using Haploreg V4.185. A value of 1 was assigned to eQTLs with regulatory epigenome overlap in at least one tissue.

### Drug-target enrichment

The Genome for REPositioning drugs (GREP) tool86 was used to quantify the enrichment of eGenes emerging from the TWAS with high probability of colocalization (PP4 ≥ 0.75) in the curated drug-target list classified based on the International Classification of Diseases 10 (ICD10). GREP provides as an output the names of the drug(s) targeting a given gene set along with the disease category. Moreover, by performing a series of Fisher’s exact tests GREP formally tests whether the gene set is enriched in genes targeted by drugs in a specific clinical indication category to treat a certain disease or condition.

### Reporting summary

Further information on research design is available in the Nature Research Life Sciences Reporting Summary linked to this article.