Stroke has multiple etiologies, but the underlying genes and pathways are largely unknown. We conducted a multiancestry genome-wide-association meta-analysis in 521,612 individuals (67,162 cases and 454,450 controls) and discovered 22 new stroke risk loci, bringing the total to 32. We further found shared genetic variation with related vascular traits, including blood pressure, cardiac traits, and venous thromboembolism, at individual loci (n = 18), and using genetic risk scores and linkage-disequilibrium-score regression. Several loci exhibited distinct association and pleiotropy patterns for etiological stroke subtypes. Eleven new susceptibility loci indicate mechanisms not previously implicated in stroke pathophysiology, with prioritization of risk variants and genes accomplished through bioinformatics analyses using extensive functional datasets. Stroke risk loci were significantly enriched in drug targets for antithrombotic therapy.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
BMC Medicine Open Access 01 September 2022
Identification of novel proteins for lacunar stroke by integrating genome-wide association data and human brain proteomes
BMC Medicine Open Access 23 June 2022
Genetically predicted higher educational attainment decreases the risk of stroke: a multivariable Mendelian randomization study
BMC Cardiovascular Disorders Open Access 16 June 2022
Subscribe to Nature+
Get immediate online access to Nature and 55 other Nature journal
Subscribe to Journal
Get full journal access for 1 year
only $6.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
GBD 2015 DALYs and HALE Collaborators. Global, regional, and national disability-adjusted life-years (DALYs) for 315 diseases and injuries and healthy life expectancy (HALE), 1990-2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet 388, 1603–1658 (2016).
GBD 2015 Mortality and Causes of Death Collaborators. Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980-2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet 388, 1459–1544 (2016).
Gudbjartsson, D. F. et al. Variants conferring risk of atrial fibrillation on chromosome 4q25. Nature 448, 353–357 (2007).
Gudbjartsson, D. F. et al. A sequence variant in ZFHX3 on 16q22 associates with atrial fibrillation and ischemic stroke. Nat. Genet. 41, 876–878 (2009).
International Stroke Genetics Consortium (ISGC) et al. Genome-wide association study identifies a variant in HDAC9 associated with large vessel ischemic stroke. Nat. Genet. 44, 328–333 (2012).
Woo, D. et al. Meta-analysis of genome-wide association studies identifies 1q22 as a susceptibility locus for intracerebral hemorrhage. Am. J. Hum. Genet. 94, 511–521 (2014).
Kilarski, L. L. et al. Meta-analysis in more than 17,900 cases of ischemic stroke reveals a novel association at 12q24.12. Neurology 83, 678–685 (2014).
Traylor, M. et al. A novel MMP12 locus is associated with large artery atherosclerotic stroke using a genome-wide age-at-onset informed approach. PLoS Genet. 10, e1004469 (2014).
NINDS, Stroke Genetics Network (SiGN) & International Stroke Genetics Consortium (ISGC). Loci associated with ischaemic stroke and its subtypes (SiGN): a genome-wide association study. Lancet Neurol. 15, 174–184 (2016).
Neurology Working Group of the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium, the Stroke Genetics Network (SiGN) & the International Stroke Genetics Consortium (ISGC). Identification of additional risk loci for stroke and small vessel disease: a meta-analysis of genome-wide association studies. Lancet Neurol. 15, 695–707 (2016).
Malik, R. et al. Low-frequency and common genetic variation in ischemic stroke: the METASTROKE collaboration. Neurology 86, 1217–1226 (2016).
Traylor, M. et al. Genetic variation at 16q24.2 is associated with small vessel stroke. Ann. Neurol. 81, 383–394 (2017).
Williams, F. M. et al. Ischemic stroke is associated with the ABO locus: the EuroCLOT study. Ann. Neurol. 73, 16–31 (2013).
1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Morris, A. P. Transethnic meta-analysis of genomewide association studies. Genet. Epidemiol. 35, 809–822 (2011).
Mishra, A. & Macgregor, S. VEGAS2: software for more flexible gene-based testing. Twin Res. Hum. Genet. 18, 86–91 (2015).
Traylor, M. et al. Genome-wide meta-analysis of cerebral white matter hyperintensities in patients with stroke. Neurology 86, 146–153 (2016).
Hara, K. et al. Association of HTRA1 mutations and familial ischemic cerebral small-vessel disease. N. Engl. J. Med. 360, 1729–1739 (2009).
Verdura, E. et al. Heterozygous HTRA1 mutations are associated with autosomal dominant cerebral small vessel disease. Brain 138, 2347–2358 (2015).
Gould, D. B. et al. Role of COL4A1 in small-vessel disease and hemorrhagic stroke. N. Engl. J. Med. 354, 1489–1496 (2006).
Jeanne, M. et al. COL4A2 mutations impair COL4A1 and COL4A2 secretion and cause hemorrhagic stroke. Am. J. Hum. Genet. 90, 91–101 (2012).
Pickrell, J. K. et al. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 48, 709–717 (2016).
Lubitz, S. A. et al. Independent susceptibility markers for atrial fibrillation on chromosome 4q25. Circulation 122, 976–984 (2010).
Kato, N. et al. Trans-ancestry genome-wide association study identifies 12 genetic loci influencing blood pressure and implicates a role for DNA methylation. Nat. Genet. 47, 1282–1293 (2015).
Surakka, I. et al. The impact of low-frequency and rare variants on lipid levels. Nat. Genet. 47, 589–597 (2015).
Morris, A. P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981–990 (2012).
Bis, J. C. et al. Meta-analysis of genome-wide association studies from the CHARGE consortium identifies common variants associated with carotid intima media thickness and plaque. Nat. Genet. 43, 940–947 (2011).
Verhaaren, B. F. et al. Multiethnic genome-wide association study of cerebral white matter hyperintensities on MRI. Circ. Cardiovasc. Genet. 8, 398–409 (2015).
Sinner, M. F. et al. Integrating genetic, transcriptional, and functional analyses to identify 5 novel genes for atrial fibrillation. Circulation 130, 1225–1235 (2014).
Germain, M. et al. Meta-analysis of 65,734 individuals identifies TSPAN15 and SLC44A2 as two susceptibility loci for venous thromboembolism. Am. J. Hum. Genet. 96, 532–542 (2015).
Nikpay, M. et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47, 1121–1130 (2015).
Ellinor, P. T. et al. Meta-analysis identifies six new susceptibility loci for atrial fibrillation. Nat. Genet. 44, 670–675 (2012).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124–130 (2013).
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015).
Geer, L. Y. et al. The NCBI BioSystems database. Nucleic Acids Res. 38, D492–D496 (2010).
Mishra, A. & MacGregor, S. A novel approach for pathway analysis of GWAS data highlights role of BMP signaling and muscle cell differentiation in colorectal cancer susceptibility. Twin Res. Hum. Genet. 20, 1–9 (2017).
Wakefield, J. A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am. J. Hum. Genet. 81, 208–227 (2007).
Yang, H. & Wang, K. Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nat. Protoc. 10, 1556–1566 (2015).
Wang, W. et al. LNK/SH2B3 loss of function promotes atherosclerosis and thrombosis. Circ. Res. 119, e91–e103 (2016).
Ramensky, V., Bork, P. & Sunyaev, S. Human non-synonymous SNPs: server and survey. Nucleic Acids Res. 30, 3894–3900 (2002).
Kumar, P., Henikoff, S. & Ng, P. C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009).
GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Eicher, J. D. et al. GRASP v2.0: an update on the Genome-Wide Repository of Associations between SNPs and phenotypes. Nucleic Acids Res. 43, D799–D804 (2015).
Leslie, R., O’Donnell, C. J. & Johnson, A. D. GRASP: analysis of genotype-phenotype results from 1390 genome-wide association studies and corresponding open access database. Bioinformatics 30, i185–i194 (2014).
Higasa, K. et al. Human genetic variation database, a reference database of genetic variations in the Japanese population. J. Hum. Genet. 61, 547–553 (2016).
Bonder, M. J. et al. Disease variants alter transcription factor levels and methylation of their binding sites. Nat. Genet. 49, 131–138 (2017).
Adams, D. et al. BLUEPRINT to decode the epigenetic signature written in blood. Nat. Biotechnol. 30, 224–226 (2012).
Franzén, O. et al. Cardiometabolic risk loci share downstream cis- and trans-gene regulation across tissues and diseases. Science 353, 827–830 (2016).
Erbilgin, A. et al. Identification of CAD candidate genes in GWAS loci and their expression in vascular cells. J. Lipid Res. 54, 1894–1905 (2013).
The ARIC investigators. The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. Am. J. Epidemiol. 129, 687–702 (1989).
Suhre, K. et al. Connecting genetic risk to disease end points through the human blood plasma proteome. Nat. Commun. 8, 14357 (2017).
Brænne, I. et al. Prediction of causal candidate genes in coronary artery disease loci. Arterioscler. Thromb. Vasc. Biol. 35, 2207–2217 (2015).
Flister, M. J. et al. Identifying multiple causative genes at a single GWAS locus. Genome Res. 23, 1996–2002 (2013).
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).
Kemp, J. P. et al. Identification of 153 new loci associated with heel bone mineral density and functional involvement of GPC6 in osteoporosis. Nat. Genet. 49, 1468–1475 (2017).
Li, Y. & Kellis, M. Joint Bayesian inference of risk variants and tissue-specific epigenomic enrichments across multiple complex human diseases. Nucleic Acids Res. 44, e144 (2016).
Lee, B. K. et al. Cell-type specific and combinatorial usage of diverse transcription factors revealed by genome-wide binding studies in multiple human cells. Genome Res. 22, 9–24 (2012).
Sanseau, P. et al. Use of genome-wide association studies for drug repositioning. Nat. Biotechnol. 30, 317–320 (2012).
Nelson, M. R. et al. The support of human genetic evidence for approved drug indications. Nat. Genet. 47, 856–860 (2015).
den Hoed, M. et al. Identification of heart rate-associated loci and their effects on cardiac conduction and rhythm disorders. Nat. Genet. 45, 621–631 (2013).
Pfeufer, A. et al. Genome-wide association study of PR interval. Nat. Genet. 42, 153–159 (2010).
Christophersen, I. E. et al. Large-scale analyses of common and rare variants identify 12 new loci associated with atrial fibrillation. Nat. Genet. 49, 946–952 (2017).
Verweij, N. et al. Genetic determinants of P wave duration and PR segment. Circ. Cardiovasc. Genet. 7, 475–481 (2014).
Le Scouarnec, S. et al. Dysfunction in ankyrin-B-dependent ion channel and transporter targeting causes human sinus node disease. Proc. Natl. Acad. Sci. USA 105, 15617–15622 (2008).
Schott, J. J. et al. Congenital heart disease caused by mutations in the transcription factor NKX2-5. Science 281, 108–111 (1998).
Ellesøe, S. G. et al. Familial atrial septal defect and sudden cardiac death: identification of a novel NKX2-5 mutation and a review of the literature. Congenit. Heart Dis. 11, 283–290 (2016).
Mohler, P. J. et al. Ankyrin-B mutation causes type 4 long-QT cardiac arrhythmia and sudden cardiac death. Nature 421, 634–639 (2003).
Shi, H., Kichaev, G. & Pasaniuc, B. Contrasting the genetic architecture of 30 complex traits from summary association data. Am. J. Hum. Genet. 99, 139–153 (2016).
Kato, N. et al. Meta-analysis of genome-wide association studies identifies common variants associated with blood pressure variation in east Asians. Nat. Genet. 43, 531–538 (2011).
Surendran, P. et al. Trans-ancestry meta-analyses identify rare and common variants associated with blood pressure and hypertension. Nat. Genet. 48, 1151–1161 (2016).
Winkler, T. W. et al. Quality control and conduct of genome-wide association meta-analyses. Nat. Protoc. 9, 1192–1212 (2014).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
So, H. C., Gui, A. H., Cherny, S. S. & Sham, P. C. Evaluating the heritability explained by known susceptibility variants: a survey of ten complex diseases. Genet. Epidemiol. 35, 310–317 (2011).
Feigin, V. L., Lawes, C. M., Bennett, D. A. & Anderson, C. S. Stroke epidemiology: a review of population-based studies of incidence, prevalence, and case-fatality in the late 20th century. Lancet Neurol. 2, 43–53 (2003).
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012). S1–S3.
Liu, J. Z. et al. A versatile gene-based test for genome-wide association studies. Am. J. Hum. Genet. 87, 139–145 (2010).
Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525 (2015).
International Consortium for Blood Pressure Genome-Wide Association Studies et al. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478, 103–109 (2011).
Wain, L. V. et al. Genome-wide association study identifies six new loci influencing pulse pressure and mean arterial pressure. Nat. Genet. 43, 1005–1011 (2011).
Wellcome Trust Case Control Consortium et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012).
Johnson, A. D. et al. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics 24, 2938–2939 (2008).
Arnold, M., Raffler, J., Pfeufer, A., Suhre, K. & Kastenmüller, G. SNiPA: an interactive, genetic variant-centered annotation browser. Bioinformatics 31, 1334–1336 (2015).
Ward, L. D. & Kellis, M. HaploReg v4: systematic mining of putative causal variants, cell types, regulators and target genes for human complex traits and disease. Nucleic Acids Res. 44, D1, D877–D881 (2016).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Wishart, D. S. et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 34, D668–D672 (2006).
Yang, H. et al. Therapeutic target database update 2016: enriched resource for bench to clinical drug target and targeted pathway information. Nucleic Acids Res. 44, D1069–D1074 (2016).
Hachiya, T. et al. Genetic predisposition to ischemic stroke: a polygenic risk score. Stroke 48, 253–258 (2017).
Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. https://doi.org/10.1038/s41588-018-0047-6 (2018).
A full list of Acknowledgements appears in the Supplementary Note.
S. Gretarsdottir, G.T., U.T., and K.S. are all employees of deCODE Genetics/Amgen, Inc. M.A.N. is an employee of Data Tecnica International. P.T.E. is the PI on a grant from Bayer HealthCare to the Broad Institute, focused on the genetics and therapeutics of atrial fibrillation. S.A.L. receives sponsored research support from Bayer HealthCare, Biotronik, and Boehringer Ingelheim, and has consulted for St. Jude Medical and Quest Diagnostics. E.I. is a scientific advisor for Precision Wellness, Cellink and Olink Proteomics for work unrelated to the present project. B.M.P. serves on the DSMB of a clinical trial funded by Zoll LifeCor and on the Steering Committee of the Yale Open Data Access Project funded by Johnson & Johnson. The remaining authors have no disclosures.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Figures 1–13, Supplementary Tables 1, 3–5, 8–10, 12, 14, 16 and 26, and Supplementary Note
Given for each sample are the age distribution, gender distribution and risk factors distribution, if available. Further information on genotyping platform, technique, imputation parameters and QC parameters are given, if available.
Supplementary Table 6: Variance explained by the 32 lead SNPs. Shown are the lead SNPs of the 32 risk loci for stroke and the phenotypic variance explained as estimated by the method of So et al.
Variances are given for the Europeans-only and the East Asianonly meta-analysis. If a SNP was not available in the analysis, variance explained was set to zero.
Data were analyzed for each ethnicity and a meta-analysis was calculated using Stouffer’s Z. Genome-wide results are displayed in bold (P < 2.02 x 10-6 for Bonferroni correction for the number of genes).
Shown are the 2-SNP or 3-SNP solutions for each lead SNP after conditioning on the lead SNP in Europeans. P-values of SNP2 and SNP3 were considered significant at P < 5 x10-8. SVS is omitted because there were no genome-wide significant signals to investigate.
Supplementary Table 13: Results from look-ups of the 32 genome-wide significant loci for stroke in published GWAS data from related phenotypes
Column D specifies the index SNPs of the non-stroke phenotype or SNPs in high LD with the index SNP (r 2 > 0.9) with the lowest Pvalue in the respective non-stroke phenotype. Index SNPs or proxy SNPs reaching P < 1.30 x 10-4 (0.05/32 loci/12 related vascular traits) in the respective related phenotype are shown. Index SNPs and proxy SNPs reaching genome-wide significance are marked by an asterisk in column G. Column F specifies the r 2 between the index SNP and the lead SNP in stroke.
Supplementary Table 15: MR-Egger regression and comparison with Inverse-Variance Weighted (IVW) estimates, for vascular wGRS showing a significant association with stroke risk
IVW estimates are derived from a fixed effects analysis using the GTX software (Online Methods); for the intercept of the MR-Egger analysis (Egger_intercept, Online Methods) we used a significance threshold of P < 0.05. Effect estimates are given per unit increase in the wGRS. CI: confidence interval; OR: odds ratio *The MR-Egger intercept estimate was nominally significant (P = 0.015) only for the association between the SBP wGRS and AS, and this was no longer the case after removing 6 of 37 SNPs that appeared as outliers on the leave-one-out plot (Online Methods), leading to causal estimates in broad agreement across regression techniques, with larger standard errors using the MR-Egger method as is typically the case (www.biorxiv.org/content/biorxiv/early/2017/07/05/159442.full.pdf and PMID: 26050253, 28527048). The causal estimates obtained by the weighted median approach (PMID: 27061298) are also in broad agreement with those from the IVW and the MR-Egger (beta ± s.e.: 0.032 ± 0.005, OR (95%CI): 1.03 (1.02-1.04), P = 9.48x10- 10).
Shown is the enrichment P-value of GWAS results in specific tissues. We used epigwas to calculate enrichment P-values for H3K4me1 (enhancers), H3K4me3 (promoters) and H3K9ac (active promoters).
For each stroke subtype, SNPs with BF > 5 from the trans-ethnic meta-analysis were analyzed. Gene sets with a FDR < 0.05 were considered significant. Columns E-N show the Z-scores of the genes in the gene set.
Shown are enrichment P-values for the corresponding Ingenuity canonical pathway and the proteins involved in the respective pathway. P-values are derived from Fisher’s exact test. FDR < 0.05 was considered significant and are displayed in bold. For The IPA Diseases and Bio Functions and for the IPA Tox Functions, Pvalues are given for the enrichment of specific function annotations.
Shown are pathways for each stroke subtype, the ethnicity-specific P-values and the meta-analysis P-value. Pathways with FDR < 0.05 were considered significant and are displayed in bold (CES only).
Results were obtained separately in European, East Asian, and African American ancestry samples. Shown is the number of SNPs in the 95% credible set (numerator) and the total number of SNPs in the analysis (denominator, r 2 > 0.1).
Supplementary Table 22: Detailed functional and biological information on SNPs at the 32 stroke risk loci
Shown are the lead SNPs and all proxy SNPs with r 2 > 0.8. We show information on nearby genes, the genomic consequence (intergenic, intronic, missense, regulatory), chromatin marks, eQTLs (GRASP_v2, GTEX_v6, BIOS, BLUEPRINT, STARNET, UCLA and HGVD), meQTLs (BLUEPRINT and ARIC) and pQTLs (KORA). We also give information whether this specific SNP is included in the 95% credible set analysis and the P-value of the Riviera-beta-analysis.
Supplementary Table 23: Relation of the lead and proxy SNPs (r 2 > 0.8) from 32 stroke risk loci with the best cis eQTL, meQTL and pQTL from various human bio-resources, grouped per tissue or cell type
Shown is the stroke subtype showing the most significant association; for meQTLs, CpG probe numbers are indicated in brackets after the gene name.
Supplementary Table 24: Biological candidate gene prioritization of 149 genes located in the 32 stroke associated risk loci
For each gene, we first list the biological score derived from 14 biological criteria and the overall score by including other biological information. All colored boxes have a value of 1; values of 0 signify no information or not satisfied criteria. For the genomic context, filled red boxes indicate that the criteria are satisfied. Filled blue boxes indicate significant QTL association (eQTL, geneexpression; meQTL, methylation; pQTL, protein). Filled yellow boxes indicate overlap with H3K4me3, H3K9ac and H3K4me1 peaks in cells types that showed significant enrichment in epigwas analysis. Filled green boxes indicate significantly enriched pathways. Filled purple boxes indicate overlap with drug target genes (ATC-C: Cardiovascular; ATC-B01: Antithrombotic).
Shown is the number of genes falling into the respective Anatomical Therapeutic Chemical (ATC) drug class together with the respective statistics for genome-wide loci (BF > 6) and suggestive loci (BF > 5) both with and without the SH2B3 locus.
Given are the related vascular traits from which the respective wGRS were derived, the marker name (rs_id), the risk/other allele and the beta used as weight for the wGRS approach.
About this article
Cite this article
Malik, R., Chauhan, G., Traylor, M. et al. Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes. Nat Genet 50, 524–537 (2018). https://doi.org/10.1038/s41588-018-0058-3
This article is cited by
Nature Genetics (2023)
Current Atherosclerosis Reports (2023)
Metabolic Brain Disease (2023)
Molecular Neurobiology (2023)
Quantensprung in der Schlaganfallforschung: Assoziationsstudie liefert neue Hinweise für Risikofaktoren und Interventionsoptionen