Introduction

Aortic valve stenosis (AS) is characterized by thickened and calcified valvular cusps causing left ventricular outflow obstruction. This progressive disease is usually graded as mild, moderate, or severe, based on the valve area and pressure gradient across the valve. Severe AS is a notable cause of morbidity and mortality, affecting approximately 5% of those over 70 years of age1,2,3, and the estimated 5-year survival in symptomatic severe AS ranges from 15 to 50% unless outflow obstruction is relieved by aortic valve replacement3.

The pathogenesis of the disease remains poorly understood. However, several of the associated clinical risk factors of calcified aortic valve are shared by atherosclerotic disease, and immunohistochemical studies show that calcified aortic valve lesions have many characteristic features of atherosclerosis, including initial endothelial damage, oxidized lipid deposition, chronic inflammation, and calcification4. In addition, bicuspid aortic valve (BAV), the most common congenital cardiac malformation, when the aortic valve has two leaflets instead of three, accelerates the development of AS by decades4. While the prevalence of BAV is 0.5–2% in the population, BAV is found in up to half of those with severe AS5.

Little is known about the genetics of AS, although a recent genome-wide association (GWA) study reported the association of rs10455872 in the LPA gene, encoding apolipoprotein(a) of lipoprotein (a) (Lp(a)), with calcification of the aortic valve, and with AS6. Elevated serum levels of Lp(a) have also been associated with increased risk of AS7. These findings are in keeping with a common pathogenic feature of AS and atherosclerosis8,9.

Another genetic study recently showed that a rare p.Arg721Trp MYH6 missense variant, which was previously shown to associate with sick sinus syndrome and atrial fibrillation10,11, also associates with coarctation of the aorta, BAV, and with AS12.

Here, we describe a large GWA study of AS including 2,457 cases and 349,342 controls, with follow-up in up to 4,850 AS cases and 451,731 controls. We examined the association of AS variants with BAV and several other cardiovascular conditions and assessed the shared genetic risk of AS and coronary artery disease (CAD).

Results

Novel variants associate with aortic stenosis

We tested 32.5 million sequence variants for association with AS in 2,457 Icelandic cases and 349,342 controls (see Manhattan plot in Supplementary Fig. 1). We identified the variants by whole-genome sequencing 15,220 Icelanders, and imputed them into 151,678 chip-typed, long-range phased individuals and their close relatives13.

We observed one genome-wide significant association, between AS and the intergenic variant rs7543130 (effect allele frequency (EAF) [A] = 51.2%) on chromosome 1p21 near the PALMD gene (odds ratio (OR) = 1.23; 95% confidence interval (CI): 1.15–1.31, P = 6.8 × 10−10 (significance threshold for intergenic variants set at P = 7.9 × 10−10, see Methods and ref. 14)) (Table 1). We noted that rs7543130 was recently reported to associate with aortic root size15 and we replicate this association in our Icelandic aortic root dimension sample (P = 1.3 × 10−8) (Table 2).

Table 1 Meta-analysis results for aortic valve stenosis variants
Table 2 Association of aortic valve stenosis variants with other cardiovascular traits

We tested the top seven common and low-frequency variants in the discovery GWA scan, including rs7543130, in up to 4,850 AS cases and 451,731 controls from Sweden, Norway, United Kingdom, and the United States (Table 1, Supplementary Data 1). The joint analysis showed a robust association between AS and rs7543130 (OR = 1.20; 95% CI: 1.16–1.25; P = 1.2 × 10−22) as well as rs1830321 (EAF[T] = 37.5%) intronic to TEX41, a non-protein coding gene on chromosome 2q22 (OR = 1.15; 95% CI: 1.11–1.20, P = 1.8 × 10−13) (Table 1).

We replicated the reported association of the intronic LPA variant6 rs10455872 with AS in Iceland and the follow-up sample sets (combined OR = 1.46; 95% CI: 1.37–1.56, P = 1.9 × 10−31) (Table 1). In contrast, we did not find association with variants implicating osteogenic and calcium signaling pathway genes, previously reported to suggestively associate with AS16 (P > 0.05 in Iceland and UK Biobank).

We tested the association of the two novel AS variants and the LPA variant with a subset of Icelandic AS cases who had undergone aortic valve replacement, representing those with severe AS. Although less significant, likely due to smaller sample size, the effect sizes were not significantly different from those for all AS (Supplementary Data 2).

Aortic stenosis variants and cardiovascular phenotypes

We tested the rs7543130 near PALMD, rs1830321 in TEX41, and the LPA rs10455872, for association with BAV, a major risk factor for AS4,5, in 1,555 cases and 33,883 controls from Iceland, Sweden, and the United States. Both of the novel AS variants associate with BAV and the rs7543130 association was genome-wide significant (OR = 1.28; 95% CI: 1.19–1.39; P = 6.6 × 10−10; OR = 1.12, 95% CI: 1.04–1.22, P = 5.3 × 10−3 for rs1830321). The LPA rs10455872 does not associate with BAV (Table 2).

Table 2 also shows the association of the rare p.Arg721Trp MYH6 missense variant rs387906656 (EAF = 0.34%) with the risk of BAV (OR = 8.04; 95% CI: 3.36–19.22; P = 2.8 × 10−6). This variant was previously shown to associate with sick sinus syndrome and atrial fibrillation10,11, and was recently reported also to associate with coarctation of the aorta, BAV, and AS (OR = 2.65; 95% CI: 1.78–3.96; P = 1.8 × 10−6)12. The effect size on BAV is substantially greater than that for AS (P = 0.023), suggesting that the AS risk conferred by this variant is mediated through BAV.

Next, we examined the association of the AS variants with several other cardiovascular diseases in Icelandic data. In line with the BAV association of p.Arg721Trp in MYH6, rs7543130, and rs1830321, all three variants associate with ventricular defects and/or atrial septal defects (P < 0.006) (Table 2 and Supplementary Data 2).

Like the LPA variant, rs1830321 in TEX41 associates with CAD in Iceland (OR = 1.05, 95% CI:1.03–1.08; P = 9.3 × 10−5), but the MYH6 missense variant and rs7543130 near PALMD do not (Table 2 and Supplementary Data 2). The TEX41 rs1830321 is in linkage disequilibrium (LD) with a known GWA CAD variant rs2252641 at the same locus (R2 = 0.80)17.

Given that several atherosclerosis risk factors have been associated with AS6,18,19, we tested the novel AS variants for association with the traditional cardiovascular risk factors and observed a nominally significant association (P < 0.02) between rs1830321 and systolic and diastolic blood pressure in Iceland (Supplementary Data 3) and in data from the UK Biobank (https://biobankengine.stanford.edu/search#).

Shared genetic risk factors with CAD

The frequent comorbidity of CAD and AS20, together with the similarities in histopathology4, suggest shared genetic predisposition. Therefore, we tested 71 CAD variants21,22 for association with AS, both individually (Supplementary Data 4 and Table 3) and as a weighted genetic risk score (CAD-GRS-all) (Table 4). We excluded from this analysis the LPA variant rs10455872 and rs1830321 in TEX41 that associate genome-wide significantly with both CAD and AS.

Table 3 Coronary artery disease variants and aortic root size variants that associate with aortic valve stenosis
Table 4 The association of coronary artery disease genetic risk score with aortic valve stenosis

In the Icelandic and UK Biobank datasets combined, four CAD variants associate with AS at a significance threshold set at P = 7.0 × 10−4 = 0.05/71. These are the LPA variant rs3798220 (p.Ile1891Met), rs116843064 in ANGPTL4 (p.Glu40Lys), rs646776 at the CELSR2/PSRC1 locus, and rs3184504 in SH2B3 (p. Trp60Arg) (Table 3 and Supplementary Data 4 and 5).

Consistent with a shared genetic risk, the CAD-GRS-all associates with AS both in the Icelandic and the UK Biobank datasets (combined P = 7.5 × 10−9) (Table 4). However, the effect on AS is only 37% of the effect on CAD and the AS association is  not significant after adjustment for CAD diagnosis. Given the reported association between genetic predisposition to both elevated Lp(a) and low-density lipoprotein (LDL) cholesterol and AS6,19, we tested a subset of CAD-GRS (labeled as CAD-GRS-lip), constructed based on 14 Lp(a) and LDL cholesterol/non-high-density lipoprotein (HDL) cholesterol variants (Supplementary Data 4) for association with AS. The CAD-GRS-lip associated strongly with AS (P = 5.1 × 10−19) with an effect that was similar or larger than that for CAD (β = 0.91 and 1.02, for CAD and AS, respectively). This association with AS remained after adjusting for CAD diagnosis (P = 5.1 × 10−9) (Table 4), suggesting that the genetic predisposition to elevated Lp(a) and LDL cholesterol explains in large part the shared genetic risk between CAD and AS. Specifically, examining the impact of CAD-GRS-lip on the risk of AS among CAD cases shows that CAD cases with genetic predisposition to high Lp(a) or LDL/non-HDL cholesterol are at a greater risk of having AS than CAD cases without such predisposition (P = 8.5 × 10−7) (Supplementary Data 6). In contrast, the complementary subset of CAD-GRS-all (CAD-GRS-non-lip), in which the Lp(a) and LDL/non-HDL cholesterol variants are excluded, associates with less risk of AS after adjusting for CAD status (P = 0.0036) (Table 4).

Aortic root size variants and aortic stenosis

Given the known association of rs7543130[A] on chromosome 1p21 with increased aortic root dimension15 and the recognized relationship between BAV and aortopathy5,23, we excluded known or suspected BAV cases from our echocardiogram database and re-examined the association of this variant with aortic root size. The association with aortic root size remained in this data (β = 0.062, P = 4.4 × 10−8) (Supplementary Data 2).

We then tested 11 other reported aortic root size variants15 for association with AS in Icelandic and UK Biobank datasets (Supplementary Data 7). One of these variants, rs17696696[G] intronic to CFDP1, associates with AS in these samples and was thus tested in additional 2,115 AS cases and 28,121 controls; the joint analysis yielded OR = 1.07, 95% CI: 1.03–1.11, P = 0.00013 (Table 3). A correlated variant rs4888378 (R2 = 0.98) has been reported to associate with carotid intima–media thickness and with the risk of CAD24. We also observed a previously unreported genome-wide significant association with CAD in Iceland and the UK Biobank data (combined OR for rs17696696[G] = 1.05, 95% CI: 1.03–1.07, P = 1.4 × 10–10). The AS and CAD risk allele of rs17696696[G] in CFDP1 associates with smaller aortic root diameter. None of the other AS variants associated with aortic root size (Table 2).

Candidate causal variants and genes

Attempting to identify candidate causal variants and genes at the PALMD and TEX41 loci, we first looked for association of the AS variants with expression quantitative trait loci (eQTL) using the Genotype-Tissue Expression dataset25. Assessment of 44 diverse human tissues from adults indicated association of rs1830321 with TEX41 expression, albeit limited to thyroid tissue, but no eQTLs were observed for rs7543130.

To further investigate potential functional relevance of the two AS variants, we mapped variants in LD (R2 > 0.5) with rs7543130 near PALMD and rs1830321 in TEX41 to regulatory regions in heart and aorta tissue samples using public data from the NIH Roadmap Epigenomics Consortium26,27. Subsequently, we used chromatin interaction maps28 for aorta and left and right ventricular heart tissue samples to look for interactions between the regulatory regions, to which AS risk variants mapped, and gene promoters.

At the PALMD locus, four variants (rs11166276, rs6702619, rs1890753, and rs2392040) mapped to three distinct regulatory regions annotated as enhancers and poised promoter (Fig. 1a, upper panel). Multiple chromatin interactions were observed for the regulatory regions harboring these four variants in left ventricular samples. Notably, only the region harboring rs1890753 (R2 = 0.97 with rs7543130) interacted with promoters of genes (Fig. 1a, lower panel). Thus, rs1890753 represents the candidate causal variant at the chromosome 1p21 locus, by directly interacting with the promoters of PALMD, SNX7, PLPPR5, and PLPPR4, and the non-coding RNAs, LOC100129620 and LOC101928270. In fetal heart tissue, a poised promoter state is found at the rs1890753 locus (Fig. 1a, upper panel).

Fig. 1
figure 1

Chromatin interactions between regulatory regions harboring candidate causal variants at the PALMD and TEX41 loci. Chromatin states indicative of regulatory regions for the aortic valve stenosis locus on chromosomes 1p21 (a) and 2q22 (b) are shown for heart and aorta tissue samples. Different types of regulatory states are indicated with distinct colors shown at the top of the figure. EnhA (Enhancer Active), EnhW (Enhancer Weak), PromUp/Dn (Chromatin marks characteristic of a promoter region found upstream or downstream of TSS), DNase (DNase, nucleosome-free/open chromatin region), PromP (Promoter poised region, marked simultaneously as active and repressed, poised for activation during development), TssA (Transcription Start Site, Activated), and Repr (Repressive marks, heterochromatin). Vertical gray lines indicate the variants found in LD (R2 > 0.50) with a rs7543130 (*) (N = 19) or b rs1830321 (+) (N = 50). Variants found to overlap with regulatory regions in any of the five tissues are marked up and indicated as red vertical lines. Long-range chromatin interactions in left ventricle tissue samples are shown for a the region harboring rs1890753 on chromosome 1p21 with red curved lines, including interactions to promoters for PALMD, PLPPR4, PLPPR5, DPH5 and SNX7, LOC100129620 and LOC101928270, and for b regions harboring rs13028626, rs6749506, rs2252654, rs4662414, and rs13408842 that directly interact with the promoter regions of ZEB2, GTDC1, ZEB2-AS1, LINC01412 and TEX41

At the TEX41 locus, five variants in LD with rs1830321 overlapped with four distinct regulatory regions (Fig. 1b, upper panel). Chromatin interaction mapping in left ventricular tissue identified the regulatory regions harboring all five variants (rs13028626, rs6749506, rs2252654, rs4662414, and rs13408842) in direct contact to the promoter region of ZEB2, and the non-coding RNAs ZEB2-AS1 and LINC01412 (Fig. 1b, lower panel). In addition, the rs13408842 region directly interacted with the promoter of GTDC1 and the non-coding RNA genes TEX41 and LOC101928386. Pairwise correlations (R2) between the five variants and the lead variant rs1830321 ranged from 1.0 for rs13028626, to 0.61 for rs2252654.

Chromatin interactions between the regulatory regions harboring candidate causal variants at the PALMD and TEX41 loci were much less frequent in right ventricular tissue and aorta, compared with the left ventricle, and none overlapped with gene promoters (Supplementary Fig. 2).

Discussion

Through a large GWA study, we have discovered two common AS variants on chromosomes 1p21 near PALMD and 2q22 in TEX41, and replicated the previously reported AS variant in LPA6. Like the rare AS variant in MYH612, both of the novel AS variants also associate with BAV and congenital cardiac septal defects. The chromosome 2q22 variant also associates with CAD risk.

Given that BAV is a major risk factor for AS, and that the MYH6 and chromosome 1p21 variants have substantially greater effects on BAV than on AS, it may be postulated that the AS risk conferred by these variants is mediated through BAV. However, as we have limited information on whether AS occured on the background of bicuspid or tricuspid valve, we were not able to determine whether these variants associate with AS in the absence of BAV.

Interestingly, the AS and BAV risk allele of rs7543130 near PALMD also associates with increased aortic root size. The relationship between BAV and aortopathy is well recognized and several studies suggest that the dilation of the proximal ascending aorta results from changes in flow secondary to the presence of BAV5,23,29. This raises the question whether the effect of chromosome 1p21 variant on aortic root size can be explained by its association with BAV. However, our results indicate that the variant’s impact on aortic root size is not merely a consequence of BAV, since excluding BAV cases from the analysis had minimal effect on the association. We also note that the MYH6 missense variant has a large effect on BAV but no effect on aortic root size. We did not find a consistent relationship between genetic associations with risk of AS and aortic root size, but found that one additional aortic root size variant, rs17696696 intronic to CFDP1, associates with AS.

We demonstrate that this new AS variant, rs17696696 in CFDP1, associates genome-wide significantly with CAD, like rs1830321 near TEX41 and the LPA rs10455872 (ref. 6). Further, we found that four other CAD variants associate with AS, supporting the notion that there may be a cause shared by CAD and AS. However, contesting a generalized common pathophysiology, causal analysis suggests that only some genetic pathways are shared by CAD and AS, and that the risk of both diseases conferred by Lp(a) and LDL/non-HDL cholesterol levels contributes substantially to the frequent co-occurence of these two diseases.

These results support the assumption that lowering Lp(a) and non-HDL cholesterol levels might slow or prevent progression of AS. However, in a randomized trial of 1,873 patients with mild-to-moderate AS who received statin plus ezetimibe therapy or placebo, active therapy did not reduce the composite outcome of combined aortic valve events and ischemic events during a median follow-up of 52.2 months30. It remains conceivable that non-HDL cholesterol lowering therapy implemented earlier in the disease process could affect development of AS. Other potential therapeutic interventions such as Lp(a) lowering among those with high Lp(a) levels need to be explored.

Although we cannot establish how the AS variants at PALMD and TEX41 affect the pathogenesis of disease, chromatin conformational experiments provide clues about potential mechanisms. These experiments show folding of chromatin such that distinct regulatory regions harboring variants in high LD with the lead AS variants, physically interact with several gene promoters, suggesting several candidate causal genes at both loci. Interestingly, in line with an impact during fetal development, a poised promoter state was found in fetal heart tissue for a candidate causal variant rs1890753 at chromosome 1p21. Poised promoters are considered to be involved in the expression of developmental genes allowing for a rapid response to differentiation signals.

At the TEX41 locus, we note that ZEB2, one of the genes suggested through chromatin interaction studies, is a strong biological candidate. ZEB2 is a DNA-binding transcriptional repressor that interacts with activated SMADs, the transducers of tumor growth factor-β (TGFβ) signaling. TGFβs are known to play a role in cardiac development and in several aspects of cardiovascular physiology ranging from the effect on cardiomyocyte and vascular smooth muscle, and renal control of blood pressure31.

In summary, we discovered two AS variants on chromosomes 1p21 and 2q22. Associations of these and two previously reported AS variants with BAV, other congenital heart defects, aortic root size, and CAD, involve both cardiac developmental abnormalities and atherogenesis-like processes in the pathogenesis of AS. Further, we demonstrate that four CAD variants and one aortic root size variant associate with AS. While our genetic causal analysis does not support a generalized sharing of genetic risk between CAD and AS, it indicates that the shared risk factors of Lp(a) and non-HDL cholesterol contribute substantially to the frequent co-occurence of these diseases.

Methods

deCODE discovery study subjects

The Icelandic AS sample set included all patients diagnosed in the years 1983–2016 with AS at Landspitali – The National University Hospital (LUH) in Reykjavik, the only tertiary referral center in Iceland. Case status was assigned based on ICD-10 codes I35.0 or I35.2 for discharge diagnoses, or the relevant NOMESCO classification of surgical procedure codes FMA, FMSA, FMD or FMSD, and subcodes). A total of 2,609 cases were identified and of those 2,457 had available genotypes and were included in the analysis. The controls included 349,342 population controls from the Icelandic genealogical database and individuals recruited through different genetic studies at deCODE genetics. The study was approved by the Icelandic Data Protection Authority and the National Bioethics Committee of Iceland (approval no. VSNb2015030022/03.01 with amendments). All participating subjects donating biological samples signed informed consents. Personal identities of the participants and biological samples were encrypted by a third-party system approved and monitored by the Icelandic Data Protection Authority.

The deCODE genetics phenotype database contains extensive medical information on various diseases and traits. Cardiovascular phenotypes used for the purpose of the study included CAD (N = 37,782)32, heart failure (N = 10,480), ischemic stroke (N = 8,948)33, atrial fibrillation (N = 13,471)34, sick sinus syndrome (N = 3,310)10, high-degree atrioventricular block (N = 1,303), BAV (N = 208), atrial septal defect (N = 708), ventricular septal defect (N = 902), coarctation of aorta (N = 119), thoracic aortic aneurysm (TAA) and dissection (N = 500), hypertension (N = 54,974), type 2 diabetes (N = 11,448)35, and aortic root diameter (N = 19,506). The heart failure, high-degree atrioventricular block, coarctation of aorta, TAA and dissection, atrial septal defect, and ventricular septal defect sample sets were based on discharge diagnoses from LUH. Hypertension diagnoses were obtained from the Primary Health Care Clinics of the Reykjavik area, or from LUH. The BAV sample set included individuals with a documentation of BAV in an echocardiographic report from LUH between 1994 and 2015. Measurements of aortic root diameter were obtained from a database of 53,122 echocardiograms from 27,460 individuals performed at LUH between 1994 and 2015. Non-HDL cholesterol measurements (N = 136,326) were obtained from three of the largest clinical laboratories in Iceland: (i) LUH (hospitalized and ambulatory patients); (ii) the Laboratory in Mjódd, Reykjavík (ambulatory patients); and (iii) Akureyri Hospital, Regional Hospital in North Iceland, Akureyri (hospitalized and ambulatory patients)32. Blood pressure measurements (N = 125,647) were obtained from the Primary Health Care Clinics of the Reykjavik area. Measurements were adjusted for sex, year of birth, and age at measurement, and were subsequently standardized to have a normal distribution.

Whole-genome sequencing and imputation

This study is based on whole-genome sequence data from 15,220 Icelanders participating in various disease projects at deCODE genetics. In addition, 151,677 Icelanders have been genotyped using Illumina SNP chips and genotype probabilities for untyped relatives are calculated based on Icelandic genealogy. The sequencing was done using Illumina standard TruSeq methodology to a mean depth of 35× (SD 8)13. Autosomal single-nucleotide polymorphisms (SNPs) and INDEL’s were identified using the Genome Analysis Toolkit version 3.4.036. Information about haplotype sharing was used to improve variant genotyping, taking advantage of the fact that all sequenced individuals had also been chip-typed and long-range-phased37.

Genotype imputation information

The informativeness of genotype imputation (imputation information) was estimated by the ratio of the variance of imputed expected allele counts and the variance of the actual allele counts:

$$\frac{{{\rm Var}(E(\theta |{\rm chip}\,{\rm data}))}}{{{\rm Var}(\theta )}},$$

where θ is the allele count. Here, Var(E(θ|chip data)) is estimated by the observed variance in the imputed expected counts and Var(θ) was estimated by p(1−p), where p is the allele frequency.

Gene and variant annotation

Variants were annotated using Ensembl release 80 and Variant Effect Predictor version 2.838. A total of 32.5 million variants passed the quality threshold and were imputed into 151,677 Icelanders who had been genotyped using Illumina chips.

Adjusting for relatedness

To account for inflation in test statistics due to cryptic relatedness and stratification, we applied the method of LD score regression39. With a set of 1.1 M variants, we regressed the χ2 statistics from our GWA scan against LD score and used the intercept as a correction factor. The LD scores were downloaded from an LD score database (ftp://atguftp.mgh.harvard.edu/brendan/1k_eur_r2_hm3snps_se_weights.RDS; accessed 23 June 2015).

Thresholds for genome-wide significance

The threshold for genome-wide significance was corrected for multiple testing with a weighted Bonferroni adjustment using as weights the enrichment of variant classes with predicted functional impact among association signals14. With 32,463,443 sequence variants being tested, the weights given in Sveinbjornsson et al.14 were rescaled to control the family-wise error rate. This yielded significance thresholds of 2.6 × 10−7 for high-impact variants (N = 8,464), 5.1 × 10−8 for moderate-impact variants (N = 149,983), 4.6 × 10−9 for low-impact variants (N = 2,283,889), 2.3 × 10−9 for other variants in Dnase I hypersensitivity sites (N = 3,913,058) and 7.9 × 10−10 for other variants (N = 26,108,039).

Association analysis

Logistic or linear regression were used to test for the association of SNPs with binary or quantitative traits, respectively, treating disease status or quantitative trait as the response and allele counts from direct genotyping or expected genotype counts from imputation as covariates. To account for inflation in test statistics due to cryptic relatedness and stratification, we applied the method of LD score regression39. The estimated correction factor for AS based on LD score regression was 1.19 for the additive model.

Genetic risk score

Weighted GRS for CAD (CAD-GRS-all) was generated based on previously reported CAD effect estimates (logOR) (see Supplementary Data 4). Variants used for the CAD-GRS-lip included a subset of variants in CAD-GRS that associated with non-HDL cholesterol (P < 1 × 10−8) in the Icelandic dataset, or are located at the LPA locus. The two LPA locus variants included in CAD-GRS-lip associate with lipoprotein(a) at P < 8 × 10−90 in Iceland.

Genetic association replication studies

The Malmo Diet and Cancer Study (MDCS) is a community-based prospective cohort of middle-aged individuals from Southern Sweden. In total, 30,447 subjects attended a baseline exam in 1991–1996 when they filled out a questionnaire, underwent anthropometric measurements, and donated peripheral venous blood samples40. Prevalent or incident cases of AS were ascertained from nationwide hospital registers with high validity as described previously19. Genome-wide genotyping of single-nucleotide variants was performed using the Illumina Human Omni Express Exome BeadChip kit. Genotyping was performed in a nested case-cohort design including 15,362 subjects with complete data, of which 470 cases with incident AS. The SNP rs10455872 was genotyped in the entire cohort, with genotypes available in 28,722 subjects, including 613 cases with incident AS. Association with incident AS was tested in a case–control analysis utilizing logistic regression under an additive inheritance model adjusted for age and sex. Case–control matching was performed in SAS v9.4 with the greedy algorithm, matching 1 AS case to 1 population-based controls for sex, baseline age (<3 years age difference), year of baseline visit (within 3 years from visit), and requiring at least equal follow-up in controls. All participants were of European ancestry, confirmed by multidimensional scaling of genome-wide data. Informed consent was obtained from all participants and the study was approved by the Ethics Committee of Lund University, Sweden.

Patients from the greater Stockholm area with AS, BAV, or TAA were recruited as a part of the ASAP (the Advanced Study of Aortic Pathology) and Artist studies. The ASAP cohort consists of 429 patients undergoing aortic valve surgery at the Karolinska University Hospital (Stockholm, Sweden)29,41. The samples were genotyped on Illumina 610wQuad beadchips and approximately 588,400 SNPs were provided after quality control (QC). The Artist cohort consists of 406 samples genotyped with Omni-2.5 Quad beadchips on 2,443,180 SNPs. Imputation was performed using Impute2 from 1000G phase1 v3. Samples from the POLCA/Olivia cohorts were used as controls (a total of 1,295 individuals). POLCA consists of healthy 50-year-old men, free from coronary heart disease, recruited at random using the population registry. The POLCA samples were genotyped on Illumina 610kwQuad. The Olivia comprises both men and women with an age distribution 33–80 years. In Olivia, 670 control samples were genotyped on illumina 1M genotyping arrays. The vast majority of included control samples are of Scandinavian ancestry. Association analysis was performed using SNPTEST, with age, sex, and first 10 principal components as covariates. The study was approved by the Human Research Ethics Committee at Karolinska Institutet (ASAP study, ethical approval number 2006/784-31/1; Artist cohort, 2008/1771-31; POLCA/Olivia control samples, 03-491), Stockholm, Sweden.

The Norwegian Nord-Trøndelag Health Study (HUNT) is a population-based health survey conducted in the county of Nord-Trøndelag, Norway. Individuals were included at three different time points during approximately 20 years (HUNT1 (1984–1986), HUNT2 (1995–1997), and HUNT3 (2006–2008))42. At each time point, the entire adult population (≥20 years) was invited to participate by completing questionnaires, attending clinical examinations, and interviews. Taken together, the health studies include information from over 120,000 different individuals from Nord-Trøndelag. Biological samples including DNA have been collected for approximately 70,000 participants. AS was defined based on ICD-10 codes collected from local hospitals and out-patient clinics between 1999 and 2016. Cases were defined as individual with one or more ICD-10 codes specific for AS (“I35.0” or “I35.2”), whereas controls were all individuals without a code specific for AS. In total, 1,546 cases with AS and 24,235 controls genotyped with Illumina HumanCoreExome arrays were analyzed. Association analysis were conducted using EPACTS-3.3. The SNP-phenotype associations were modeled using the Firth Bias-Corrected Logistic Likelihood Ratio Test43, assuming an additive genetic model for genotyped markers and imputed genotypes. Models were adjusted for sex, birth year, genotyping batch, and four principal components (PCs). PCs were computed using PLINK. Individuals of non-European ancestry, based on principal components analysis (PCA) were excluded from the study. Additional filters applied to the analysis included minor allele count ≥10 and imputation r2 ≥ 0.3. Participation in the HUNT Study is based on informed consent, and the study has been approved by the Data Inspectorate and the Regional Ethics Committee for Medical Research in Norway.

In the years 2006–2010, the UK Biobank study recruited 502,647 individuals aged 37–76 years from across the country. All participants provided information regarding their health and lifestyle via touch screen questionnaires, consented to physical measurements, and agreed to have their health followed. They also provided blood, urine, and saliva samples for future analysis. UK Biobank has ethical approval from the Northwest Multi-Center Research Ethics Committee, and informed consent was obtained from all participants. Genotype imputation data was available for 487,409 individuals (May 2017 release), of which 408,658 were used in the analysis. The 408,658 individuals were selected as self-reported white British with similar genetic ancestry based on principal component analysis and with consistent reported and genetically determined gender. AS was defined according to ICD-10 codes I35.0 and I35.2, based on diagnoses codes a participant has had recorded across all their episodes in hospital, and CAD was defined as the codes I20.0, I21, I22, I25.0, I25.1, I25.2, and I25.9. The case–controls analysis was done using SNPTEST v2.5.244 and the association with the GRS was tested using R v3.4.145. In both cases, the analysis was adjusted for age, gender, and 20 principle components. To adjust for relatedness and remaining population stratification the P values were adjusted using genomic control adjustment, with adjustment factors λg estimated based on association analysis of 155,000 unrelated common variants. The estimated λg's were 1.023, 1.088, 1.029, and 1.013 for the analysis of AS, CAD, AS restricted to CAD and AS excluding CAD, respectively.

In the University of Michigan and Cardiovascular Health Improvement Study, we collected DNA from consented individuals with BAV from the Frankel Cardiovascular Center at the University of Michigan as part of the University of Michigan BAV registry or the Cardiovascular Health Improvement Project (CHIP). Patients were typically seen in clinic for aortic valve replacement or aortic aneurysm. DNA was isolated from peripheral blood lymphocytes. Four hundred and seventy-three BAV cases, 251 AS cases, and 809 TAA cases were successfully collected and genotyped. We identified potential controls from a surgical-based biobank, the Michigan Genomics Initiative (MGI), that were genotyped with the same GWAS array as cases. After excluding those with aortic disease, we performed age matching by requiring controls to have a birth year within −5 and +10 years of the case. From the available controls in the appropriate age and sex category, we selected the best ethnic match for each case and repeated the greedy algorithm until a control was selected for each case. We repeated the entire process so that 10 controls were selected for each AS and BAV case and 5 controls for each TAA case. All MGI research subjects provided informed consent. We performed genotyping using a GWAS+exome chip array (Illumina HumanCoreExome). To avoid any potential batch effects, cases and controls were genotyped using the same array in the same genotyping center (Sequencing and Genotyping core at the University of Michigan). Genotype calling was performed using GenTrain version 2.0 in GenomeStudio V2011.1 (Illumina) using identical cluster files for cases and controls. Samples with <98% genotype calls, evidence of gender discrepancy, and duplicates as well as individuals with non-European ancestry identified by plotting the first 10 genotype-driven principal components were excluded from further analysis. We performed variant-level QC by excluding variants that met any of the following criteria; variants with a cluster separation score <0.3, <98% genotype call rate, or deviation from Hardy–Weinberg equilibrium (P < 1 × 10−5). We phased the autosomal genotype data using SHAPEIT246 and imputed variants from the Haplotype Reference Panel v147 using minimac348,49. We excluded poorly imputed variants with imputation R2 < 0.3. We performed single-variant association testing for BAV, TAA, and AS status using the Wald test based on logistic regression with age, sex, and the first four principal components as covariates using the EPACTS software (URL: http://csg.sph.umich.edu//kang/epacts/) for imputed dosages. All repository projects utilized for this study are approved by the University of Michigan, Medical School, Institutional Review Board, and informed consent was obtained from study participants.

In the BAVCon consortium, 452 sporadic self-reported Caucasian BAV cases were genotyped using the Illumina Omni-2.5 platform. One thousand eight hundred and thirty-four self-reported Caucasian population controls from the Framingham Heart Study genotyped using the Omni5 platform. Caucasian ancestry was further identified using PCA to detect clusters, filter outliers, and filter-related individuals both before and after merging cases and controls for association analysis. Principal components were included as covariates in association analysis. Further QC of the genotype data from both cohorts was performed using GenomeStudio and PLINK. After QC, we imputed additional genotypes against the 1,000 genomes reference (Phase 3) using IMPUTE2 to yield 7,913,553 genetic markers for an additive logistic regression model, adjusted for gender and age. This study has been approved by Partner’s in HealthCare Human Research Committee, and informed consent was obtained from study participants.

In the University of Texas Health Science Center, 765 patients of European descent with TAA or aortic dissections, were enrolled and genotyped50. A subset of these patients had BAV, N = 147. The study included 864 controls from US National Institute of Neurological Disorders and Stroke (NINDS)51. We imputed additional genotypes against 1,000 genomes reference (Phase 3), and for association analysis, we used additive logistic regression model accounting for gender and principal components51. Subjects of non-European ancestry according to multidimensional scaling were removed from the analysis. This study has been approved by Committee for the Protection of Human Subjects at UT Health Science Center at Houston, and informed consent was obtained from study participants.

Identification of candidate causal variants and genes

Epigenome data from the NIH Roadmap Epigenome Mapping Consortium (http://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/imputed12marks/jointModel/final/catMat/hg19_chromHMM_imputed25.gz) for 11 histone marks analyzed by chromatin immunoprecipitation with sequencing, together with open chromatin regions analyzed by DNase-seq, integrated into 25 discrete chromatin states through ChromHMM were downloaded26. Lead variants and those found in LD (R2 > 0.5) within the AS loci were then annotated for chromatin states involving regulatory functions, that is, EnhA (Active enhancer elements), EnhW (Weak enhancer elements), EnhTx (Enhancer marks coupled with transcription-associated histone marks), DNAse (Open chromatin configuration), TssA (Active transcription start site), PromUp/D (Regions upstream or downstream of promoter), PromP (Poised promoter), PromBiv (Bivalent promoters), Het (Heterochromatin), ReprPC (Polycomb-group repressed) while omitting states indicative of transcription status only (Tx3′, Tx5′, Tx, TxWk) or states characteristic of zinc-finger protein genes (ZNF/Rpts).

High-throughput 3C analysis (Hi-C) data for right ventricular and left ventricular tissue and aorta were obtained from an online repository: Functional Mapping and Annotation of Genome-Wide Association Studies (FUMA GWAS: http://fuma.ctglab.nl/)52. It uses public datasets compiled by Schmitt et al.28 in order to identify structural interactions of enhancers with genes (Hi-C compendium: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc = GSE87112)

Gorpipe53 was employed to query the data and then imported into R for further analysis and plotting making use of base functions (R 3.4.1). The Hi-C data are binned into intervals of 40 kb which are indicated as gray horizontal lines in lower panels of Fig. 1a, b, and their midpoints were then used to draw arcs from one midpoint to another for interactions at false-discovery rate <1 × 10−10. Intersection between chromatin interaction intervals and promoters was based on Refseq annotated transcription start sites.

Data availability

The Icelandic population WGS data has been deposited at the European Variant Archive under accession code PRJEB8636. The authors declare that the data supporting the findings of this study are available within the article, its Supplementary Data files and upon request.