Introduction

Cardiovascular disease (CVD) is the leading cause of premature death and disability worldwide, with both genetic and environmental determinants1,2. The most common cardiovascular disease is coronary heart disease (CHD), including coronary atherosclerosis and myocardial infarction, among others. While genome-wide association studies (GWAS) have identified multiple genetic loci associated with cardiovascular diseases, exact genes driving these associations remain mostly uncovered3.

Owing to Finland’s population history, many deleterious and high-impact variants are enriched in the Finnish population giving a possibility to find genetic associations that would not be detected elsewhere4. Many studies have reported high-impact loss-of-function (LoF) variants associated with risk factors for CVD, such as blood lipid levels, thus impacting on the CVD risk remarkably. For example, high-impact LoF variants in genes LPA4, PCSK95, APOC36, and ANGPTL47 have been shown to be associated with Lipoprotein(a), LDL-cholesterol (LDL-C), or triglyceride levels, and lowering the CVD risk.

Besides blood lipids, other risk factors for CVD include hypertension, smoking and the metabolic syndrome cluster components. The mechanism that links these risk factors to atherogenesis, however, remains incompletely elucidated. Many, if not all, of these risk factors, however, also participate in the activation of inflammatory pathways, and inflammation in turn can alter the function of artery wall cells in a manner that drives atherosclerosis8.

Using data from a sizeable Finnish biobank study FinnGen (n = 260,405), we identified an association with an inframe insertion rs534125149 in MFGE8 and protection against coronary atherosclerosis and other representations of major coronary heart disease (CHD), such as myocardial infarction (MI). This variant is highly enriched in Finland, 70-fold compared to Non-Finnish Europeans (NFE) in the gnomAD genome reference database9 with AF of 3% in Finland. This association was also replicated in BioBank Japan (BBJ) and Estonian Biobank (EstBB). We also identified a splice acceptor variant rs201988637 in the same gene, which is also associated with protection against coronary atherosclerosis and other representations of major CHD, indicating that rs534125149 has very similar effect on CHD as a splice acceptor variant in MFGE8. Associations of both of these two variants in MFGE8 were specific to CHD, and they did not significantly (p < 1.75 × 10−5) increase risk for any other disease, highlighting MFGE8 as a potential drug target candidate.

Results

GWAS results for coronary atherosclerosis

We identified a total of 2 302 variants associated (GWS, p < 5 × 10−8) with coronary atherosclerosis (detailed description of the definition of the endpoint is in Supplementary Note 1). These variants were located in 38 distinct genetic loci (a minimum of 0.5 Mb distance from each other; Fig. 1 and Supplementary Table 1). Out of the 38 GWS loci, four (within or near genes MFGE8, TMEM200A, PRG3, and FHL1) have not been previously reported to associate with any CVD-related endpoints or risk factor for CVD in GWAS Catalog10 [https://www.ebi.ac.uk/gwas/]. Lead variants in these loci and their characteristics are listed in Table 1 and locus zoom plots for each of the loci are in Supplementary Fig. 1.

Fig. 1: GWAS results for coronary atherosclerosis in FinnGen.
figure 1

Total number of independent genome-wide significant associations (GWS; p < 5 × 10-8) is 38, the lead variant in each marked with diamonds. Four previously unreported associations for CVD-related phenotypes are highlighted with ±750 Mb around the lead variant in the region as red and the lead variant marked with red diamond.

Table 1 Lead variants in previously unreported loci for coronary atherosclerosis.

Among these four previously unreported loci for coronary atherosclerosis, the locus near MFGE8 had the strongest association (p-value = 2.63 × 10−16 for top variant rs534125149). The lead variant is an inframe insertion located in the sixth exon in the MFGE8 gene (Supplementary Fig. 2) and it is highly enriched in the Finnish population compared to NFSEEs (Non-Finnish, Estonian or Swedish Europeans). Interestingly, MFGE8 is mainly expressed in coronary and tibial arteries according to data from GTEx v8 (Supplementary Fig. 3), and furthermore the expression of MFGE8 is highest in aorta. In addition, previously identified common variants in MFGE8 locus that have been associated with decreased expression of MFGE8 in tibial artery and aorta have also been associated with decreased risk of CHD11.

In addition to MFGE8, we identified three additional previously unreported loci to be associated with coronary atherosclerosis, TMEM200A, PRG3 and FHL1 being the nearest genes of the lead variants. TMEM200A and PRG3 loci had one non-coding low-frequency variant reaching the genome-wide significance threshold, and FHL1 had 11. All variants in the credible sets of all these associations were either intergenic or intronic variants and had no reported significant GWAS associations with any trait in the GWAS Catalog or significant eQTL associations in GTEx. The one variant (rs118042209) in the credible set of TMEM200A locus was associated with multiple disease endpoints representing major coronary heart disease (CHD) in FinnGen, including coronary atherosclerosis, ischemic heart disease and angina pectoris, whereas the lead variant in the PRG3 locus was associated with cardiomyopathy. All variants in the credible set of FHL1 were associated with multiple disease endpoints representing major CHD in FinnGen, including angina pectoris and ischemic heart disease. TMEM200A have been reported to be associated with ten traits (including height and trauma exposure) and PRG3 with two traits (eosinophil count and eosinophil percentage of white cells) in the GWAS Catalog. FHL1 gene had no reported associations in GWAS Catalog.

Replication

Association between rs534125149 in MFGE8 locus with CHD was replicated in Biobank Japan12,13 (BBJ) and the Estonian Biobank (EstBB)14 (35,644 cases and 328 461 controls total: OR = 0.752 [0.67–0.84], p = 4.37 × 10−7). Association results for rs534125149 with CHD and MI across different cohorts are in Fig. 2. Post hoc power calculations for each cohort were performed (probability that the test will reject the null hypothesis H0 at GWS threshold) and the results as the function of effect size are in Supplementary Fig. 4. From these calculations we can see that in FinnGen the power to detect the variant as GWS is remarkably greater than in EstBB or BBJ, even with similar effect sizes and sample sizes. Therefore, this boost in power in FinnGen seems to be mainly due to a different allele frequencies, since this variant is highly enriched to Finland.

Fig. 2: Results for rs534125149 against coronary heart disease and myocardial infarction across cohorts where available and meta-analysis results.
figure 2

Logistic regression has been applied, adjusted for age and sex. Meta-analysis was performed using inverse-variance weighted fixed-effects meta-analysis method. Black dots represents odds ratios, and lines 95% confidence interval from the the single cohorts and red diamonds represent the results from meta-analysis ends of the diamonds representing the ends of the 95% confidence interval. Source data for the figure is in Supplementary Data 1.

In addition to MFGE8, meta-analysis across FinnGen, UKBB, EstBB, and BBJ was performed for the lead variants in the three other previously unreported loci for CHD (TMEM200A, PRG3, and FHL1), where available. Lead variant in PRG3 locus is highly enriched to Finland and absent in all other cohorts, and thus replication efforts for that variant were not possible. The two other loci that were meta-analyzed (TMEM200A and FHL1) did not replicate (p-value in the combined meta-analysis of the replication cohorts (meta-analysis without FinnGen) is smaller than 0.05/4 = 0.0125 and all effect size estimates are in same direction). Association results for rs534125149 with CHD and MI across different cohorts for TMEM200A and FHL1 variants are in Fig. 3. Post hoc power calculations for each cohort were performed and the results as the function of effect size are in Supplementary Fig. 5. From those results we can see that the lack of replication in UKBB, EstBB and BBJ does not appear to be due to lack of power. Therefore, we identified and replicated one novel locus for CHD (MFGE8).

Fig. 3: Results for rs118042209 in TMEM200A and rs5974585 in FHL1 against coronary heart disease and myocardial infarction across different cohorts across cohorts where available.
figure 3

Logistic regression has been applied, adjusted for age and sex. Meta-analysis was performed using inverse-variance weighted fixed-effects meta-analysis method. Black dots represent odds ratios, and lines 95% confidence interval from the single cohorts and red diamonds represent the results from meta-analysis ends of the diamonds representing the ends of the 95% confidence interval. Source data for the figure is in Supplementary Data 1.

Phenome-wide association results for rs534125149

We observed a highly protective association for the Finnish enriched inframe insertion rs534125149 in the MFGE8 gene and multiple disease endpoints, all representing major CHD, including coronary atherosclerosis (OR = 0.75 [0.71–0.81], p = 2.63 × 10−16) and myocardial infarction (MI) (OR = 0.74 [0.68–0.81], p = 1.95 × 10−11). In total, this variant was associated (PWS) with 14 disease endpoints, all representing major CHD (Fig. 4). Majority of them are highly overlapping, and thus similar associations to all of them is expected. Thus, we pruned the 14 PWS disease endpoints down to six disease endpoints (coronary atherosclerosis, coronary revascularization, ischemic heart diseases, major coronary heart disease event, myocardial infarction, and statin medication) that have fundamental characteristics for further analyses. For the inframe insertion rs534125149 in MFGE8, we did not detect other phenome-wide significant associations among the 2 861 endpoints in our data.

Fig. 4: Phenome-wide association study (PheWAS) results for rs534125149.
figure 4

Total number of tested endpoints is 2861 (A complete list of endpoints analyzed and their definitions is available at https://www.finngen.fi/en/researchers/clinical-endpoints). The dashed line represents the phenome-wide significance threshold, multiple testing corrected by the number of endpoints = 0.05/2861 = 1.75 × 10−5. All endpoints reaching that threshold are labeled in the figure.

Splice acceptor variant rs201988637 in MFGE8

In addition to inframe insertion rs53412514, we identified a splice acceptor variant (rs201988637) in MFGE8 to be associated with coronary atherosclerosis (OR = 0.72 [0.63–0.83], p = 7.94 × 10−06) and multiple disease endpoints representing major CHD. The splice acceptor variant had very similar PheWAS profile as the inframe insertion (Supplementary Fig. 6) and furthermore the two variants had very similar protective effect sizes for the endpoints (Fig. 5 and Supplementary Table 2). Similar to rs534125149, this variant is also highly enriched in Finland (37-fold compared to NFE), allele frequency in Finland being 0.6%. Moreover, both the splice acceptor and the inframe insertion variants were enriched to Eastern Finland (Supplementary Fig. 7).

Fig. 5: Effect size comparison.
figure 5

Comparison of the effects (OR) of rs534125149 and rs201988637 for 14 endpoints with p-value < 1.75 × 10-5 (PWS) for rs534125149 in FinnGen R6. 95% confidence intervals represented as gray lines.

These two variants (rs534125149 and rs201988637) are in low linkage disequilibrium (LD, r2 = 0.00015) and did not have any effect on the other variant’s associations with coronary atherosclerosis or MI (Table 2 and Supplementary Fig. 8). This indicates that they both are independently associated with these endpoints.

Table 2 Results of the conditional analysis on MI and coronary atherosclerosis.

Survival analysis

In addition to protection against coronary atherosclerosis and myocardial infarction, both the infame insertion rs534125149 and splice acceptor variant rs201988637 showed also significant association in survival analysis when analyzing survival time from birth to first diagnose of coronary atherosclerosis (HR = 0.78 [0.74–0.93]), p = 1.67 × 10−17 and HR = 0.77 [0.69–0.88], p = 5.08 × 10−05, respectively) and myocardial infarction (HR = 0.86 [0.80–0.93], p = 2.63 × 10−10 and HR = 0.72 [0.61–0.85], p = 8.16 × 10−05). In addition, when combining the heterozygous and homozygous carriers of both rs534125149 and rs201988637 together, carriers get the first diagnose significantly later than non-carriers (HR = 0.81 [0.77–0.85], p = 6.4 × 10−16 for coronary atherosclerosis and HR = 0.78 [0.72–0.85], p = 1.16 × 10−11 for MI) (Fig. 6).

Fig. 6: Cumulative incidence plots for first event of myocardial infarction in FinnGenR6.
figure 6

Red line represents carriers (homo- or heterozygous) for either rs534125149 or rs201988637 (n = 17,838), and blue line represent non-carriers (n = 242,567). Hazard ratio and p-value are from cox-proportional hazards model. Dashed lines represent 95% confidence intervals.

In addition, as a sensitivity analysis we performed the similar Cox model for first event of MI by adding different risk factors for CHD as covariates in the model to see if any of these risk factors (BMI, Type 2 Diabetes, smoking, statin use or sex) have impact on the observed association. Risk factors were added to the model both individually and together. As a result, we saw only a small change in the effect size when adjusting for these risk factors (Supplementary Table 3). The change was more noticeable on p-values where the missing data in the added covariates lead to decreased statistical power.

Associations with risk factors for CVD

We then tested for possible associations between the MFGE8 variants and risk factors for CVD. The splice acceptor variant rs201988637 was associated with pulse pressure in analyses across four cohorts with pulse pressure measurement and variant rs201988637 available, with the risk lowering allele associated with lower pulse pressure (p = 1.7 × 10−04, β = −0.13 [−0.2 to −0.06]) (Fig. 7). Association with pulse pressure was also tested for inframe insertion rs534125149 and previously reported common variant in the locus, rs8042271 across all where the variants were available. We saw consistent effect sizes across the cohorts, and significant (p < 0.05) meta-analysis p-values for both variants (Supplementary Fig. 9).

Fig. 7: Results for pulse pressure association across all cohorts with splice acceptor variant rs201988637 available (FINRISK, GeneRISK, YFS, EstBB, and UKBB).
figure 7

Size of the boxes represent the sample size of the cohorts, and the lines the 95% confidence interval. Associations were tested using linear regression, adjusting for age and sex Pulse pressure phenotypes were inverse-rank normalized prior analysis. Source data for the figure is in Supplementary Data 1.

In addition, in recent studies for blood pressure measurements (systolic and diastolic blood pressure and pulse pressure), genome-wide significant association have been reported in the region15,16. To assess whether these reflects the same signal, we performed colocalization analysis in the region ±200 kB around rs53412514 using Coloc package in R17 with coronary atherosclerosis results from FinnGen and pulse pressure GWAS results from Evangelou et al.16 The probability for shared signal (PP4) was 97.1%, further validating MFGE8 locus is associated with pulse pressure (Supplementary Fig. 10).

In addition to pulse pressure associations in the region, rs534125149 was significantly associated with height, but further analysis pointed this signal to be reflecting the association of a known association of ACAN with height, located near MFGE8 (Supplementary Fig. 11). No associations with other risk factors were observed.

In the Corogene cohort (n = 4896), rs534125149 was significantly (p < 0.05) associated with lower risk for acute coronary syndrome and stable coronary heart disease (RR = 0.87 and 0.83, respectively) compared to healthy controls, but not with myocardial infarction without coronary artery occlusion (Supplementary Fig. 12). These results are in line with our findings regarding the specificity of the association of variants in MFGE8 on atherosclerotic cardiovascular disorders. The p-value for the difference of the AFs of rs534125149 among patients with acute coronary syndrome or stable coronary heart disease and among MINOCA was, however, not significant (p = 0.78), which may due to lack of power. In addition, the cohort is very heterogeneous.

Previously reported common variants near MFGE8

Previously, common intergenic variant (rs8042271) near MFGE8 has been reported to associate with coronary heart disease (CHD) risk3,18. We replicate this association (OR = 0.90, p = 3.69 × 10−10 for coronary atherosclerosis) in FinnGen. LD between the common variant rs8042271 and the inframe insertion rs534125149 is 0.154. The LD characteristics for all three variants in MFGE8 (rs534125149, rs201988637 and rs8042271) in FinnGen are in Supplementary Table 4. Common variant rs8042271 was in the 95% credible set for MI with the causal probability of 0.003 but was not included in the 95% credible sets for coronary atherosclerosis (Supplementary Tables 5 and  6). The conditional analyses of all three MFGE8 variants showed that the association of the previously reported common variant rs8042271 can be explained by the inframe insertion variant rs534125149, but not vice versa, and that the association of the splice acceptor variant rs201988637 is independent of both rs534125149 and rs8042271. (Supplementary Table 7). This was the case also with previously reported common variant rs734780, showing very similar LD with rs534125149 (0.112) as rs8042271 (0.154).

Fine-mapping of the MFGE8 locus

In our fine-mapping analyses, MI had most probably one credible set (set of causal variants) of 32 variants with the highest posterior probability (posterior probability = 0.62), and coronary atherosclerosis had two credible sets of 6 and 45 variants, respectively, with the highest posterior probability (posterior probability = 0.74). For both MI and coronary atherosclerosis, rs534125149 had the highest probability of being causal (probability of being causal = 0.250 and 0.318, respectively) and was included in the first credible set (Supplementary Tables 5 and 6; and Supplementary Fig. 13). Splice acceptor variant rs201988637 was not included in the credible sets for either MI or coronary atherosclerosis, whereas previously reported common variant rs8042271 was included in the credible set for MI with the probability of being causal = 0.003 (Supplementary Table 6).

Protein modeling

We predicted the impact of the insertion variant rs534125149 on the protein structure of MFGE8 using AlphaFold19. The predicted conformational changes were localized to a loop region within the C2 domain, ~20 Å away from the key amino acids involved in membrane binding (Supplementary Fig. 14)20,21. This loop contains Asn238, which is known to be glycosylated22. It is possible that the insertion of an additional asparagine may lead to impaired glycosylation, which is important for protein folding, among other cellular processes23. The role of this region in the function of MFGE8 hasn’t been previously described and it is therefore unclear how this variant would otherwise lead to an impact on MFGE8 function. Thus, further experimental work is necessary to understand the mechanism by which this variant leads to protection against coronary atherosclerosis.

Discussion

Here, we show that a Finnish enriched inframe insertion in MFGE8 is associated with substantially lower risk of diseases representing major CHD, including myocardial infarction and coronary atherosclerosis. This variant was associated with CHD specifically, and no significant association was observed to other diseases in a phenome-wide search, even if this can be due to lower statistical power in rare disease endpoints. Splice acceptor variant rs201988637 in MFGE8 was also associated with lower pulse pressure, but not with blood lipids, blood pressure or other known coronary heart disease risk factors.

Our findings allow us to draw several conclusions. First, MFGE8 is a potential intervention target with specific effects on coronary heart disease. Specific protective association with the variants in MFGE8 and CHD shows potential for efficacy of a treatment targeting MFGE8 protein or downstream products. Second, the lack of risk elevation in other diseases provide evidence on the potential safety of the intervention. Previously, the protective effect of loss-of-function variants have been reported for example for PCSK95 and APOC36, and in phase I, II and III trials, inhibition of PCSK9 have led to significantly decreased LDL-C levels, and in short-term trials, PCSK9 inhibitors have been well-tolerated and have had a low incidence of adverse effects24 Based on the phenome-wide association profile for the splice acceptor variant rs201988637, we hypothesize that inhibiting MFGE8 could lower the CHD risk, if the variant can be proved to be loss-of-function in MFGE8.

An association of a splice acceptor variant rs201988637 in MFGE8 with lower pulse pressure, a potential biomarker for arterial stiffness25, are very much in line with previous studies on MFGE8 and the inflammatory aging process of the arteries, highlighting the possible role of MFGE8 in arterial aging and stiffness. The MFGE8 gene encodes Milk-fat globule-EGF 8 (MFGE8), or lactadherin, which is an integrin-binding glycoprotein implicated in vascular smooth muscle cell (VSMC) proliferation and invasion, and the secretion of pro-inflammatory molecules26,27. Lactadherin is known to play important roles in several other biological processes, including apoptotic cell clearance and adaptive immunity28, which are known to contribute to the pathogenesis of ischemic stroke. Initially lactadherin was identified as a bridging molecule between apoptotic cells and phagocytic macrophages29,30,31, but growing evidence has indicated that it is a secreted inflammatory mediator that orchestrates diverse cellular interactions involved in the pathogenesis of various diseases, including vascular metabolic disorders and some tumors32,33,34,35,36 and cancers, such as breast34,37, bladder38, esophageal39 and colorectal cancer40. Recently, not only has MFG-E8 expression emerged as a molecular hallmark of adverse cardiovascular remodeling with age41,42,43,44, but MFG-E8 signaling has also been found to mediate the vascular outcomes of cellular and matrix responses to the hostile stresses associated with hypertension, diabetes, and atherosclerosis45,46,47,48,49.

Arterial inflammation and remodeling are linked to the pathogenesis of age-associated arterial diseases, such as atherosclerosis. Recently, lactadherin has been identified as a novel local biomarker for aging arterial walls by high-throughput proteomic screening, and it has been shown to also be an element of the arterial inflammatory signaling network50. The transcription, translation, and signaling levels of MFG-E8 are increased in aged, atherosclerotic, hypertensive, and diabetic arterial walls in vivo, as well as activated VSMCs and a subset of macrophages in vitro. During aging, both MFG-E8 transcription and translation increase within the arterial walls and hearts of various species, including rats, humans, and monkeys44,51,52,53, and MFG-E8 is markedly up-regulated in rat aortic walls with aging44. High levels of MFG-E8 have also been detected within endothelial cells, SMC, and macrophages of atherosclerotic aortae in both mice and humans49,54. Furthermore, in the advanced atherosclerotic plaques found in murine models, decreased macrophage MFG-E8 levels are associated with an inhibition of apoptotic cell engulfment, leading to the accumulation of cellular debris during the pathogenesis of atherosclerosis. Lactadherin has, however, in contrast shown tissue protection in various models of organ injury, including suppression of inflammation and apoptosis in intestinal ischemia in mice55, as well as inducing recovery from ischemia by facilitating angiogenesis56.

In addition, expression of MFGE8 is highly enriched to tissues relevant to the reported association, such as aorta. Genes nearby MFGE8, including ABHD2 and HAPLN3, are, however similarly to MFGE8 enriched to arteries18. Therefore, they could play a role in atherosclerosis via coordinated gene network. In addition, recent studies have pointed toward the fact that lncRNA, called CARMAL, may regulate the expression of MFGE857.

Our study does, however, have a few limitations. First, our primary association results come from Finnish population with considerable elevation in allele frequency in MFGE8 variants among Finns. Therefore, the replication of the association in other populations has reduced statistical power. However, there were enough carriers combined in Japanese, Estonian and UK samples to replicate robustly both the protective association with coronary heart disease and for pulse pressure. Secondly, although our data shows association with pulse pressure, which has previously been linked to arterial stiffness, the direct effect of the genetic variants on arterial stiffness and arterial aging needs further evidence. Lastly, with our dataset, we have not been able to demonstrate that the two variants (rs534125149 and rs201988637) in MFGE8 are loss-of-function variants, and thus further experimental work is required to validate our findings.

In conclusion, our results suggests that inhibiting production of lactadherin could reduce the risk for coronary atherosclerosis substantially and thus present MFGE8 as a potential therapeutical target for atherosclerotic cardiovascular disease. Our study also highlights the potential of FinnGen, as a large-scale biobank study in isolated population to identify high-impact variants either very rare or absent in other populations.

Methods

Study cohort and data

We studied total of 2 861 disease endpoints in Finnish biobank study FinnGen (n = 260 405) (Table 3). FinnGen (https://www.finngen.fi/en) is a large biobank study that aims to genotype 500,000 Finns and combine this data with longitudinal registry data, including national hospital discharge, death, and medication reimbursement registries, using unique national personal identification numbers. FinnGen includes prospective epidemiological and disease-based cohorts as well as hospital biobank samples.

Table 3 Basic characteristics of the study cohort.

Definition of disease endpoints

All the 2861 disease-endpoint analyzed in FinnGen have been defined based on registry linkage to national hospital discharge, death, and medication reimbursement registries. Diagnoses are based on International Classification of Diseases (ICD) codes and have been harmonized over ICD codes 8, 9, and 10. More detailed lists of the ICD codes used for the disease-endpoints myocardial infarction and coronary atherosclerosis, which are discussed more in this study, are in Supplementary Note 1. A complete list of endpoints analyzed, and their definitions is available at https://www.finngen.fi/en/researchers/clinical-endpoints.

Genotyping and imputation

FinnGen samples were genotyped with multiple Illumina and Affymetrix arrays (Thermo Fisher Scientific, Santa Clara, CA, USA). Genotype calls were made with GenCall and zCall algorithms for Illumina and AxiomGT1 algorithm for Affymetrix chip genotyping data batchwise. Genotyping data produced with previous chip platforms were lifted over to build version 38 (GRCh38/hg38) following the protocol described here: dx.doi.org/10.17504/protocols.io.nqtddwn. Samples with sex discrepancies, high-genotype missingness (>5%), excess heterozygosity (±4SD) and non-Finnish ancestry were removed. Variants with high missingness (>2%), deviation from Hardy–Weinberg equilibrium (p < 1 × 10−6) and low minor allele count (MAC < 3) were removed.

Pre-phasing of genotyped data was performed with Eagle 2.3.5 (https://data.broadinstitute.org/alkesgroup/Eagle/) with the default parameters, except the number of conditioning haplotypes was set to 20,000. Imputation of the genotypes was carried out by using the population-specific Sequencing Initiative Suomi (SISu) v3 imputation reference panel with Beagle 4.1 (version 08Jun17.d8b, https://faculty.washington.edu/browning/beagle/b4_1.html) as described in the following protocol: dx.doi.org/10.17504/protocols.io.nmndc5e. SISu v3 imputation reference panel was developed using the high-coverage (25–30x) whole-genome sequencing data generated at the Broad Institute of MIT and Harvard and at the McDonnell Genome Institute at Washington University, USA; and jointly processed at the Broad Institute. Variant callset was produced with Genomic Analysis Toolkit (GATK) HaplotypeCaller algorithm by following GATK best practices for variant calling. Genotype-, sample- and variant-wise quality control was applied in an iterative manner by using the Hail framework v0.2. The resulting high-quality WGS data for 3775 individuals were phased with Eagle 2.3.5 as described above. As a post-imputation quality control, variants with INFO score <0.7 were excluded.

Association testing and replication

A total of 260,405 samples from FinnGen Data Freeze 6 with 2861 disease endpoints were analyzed using Scalable and Accurate Implementation of Generalized mixed model (SAIGE), which uses saddlepoint approximation (SPA) to calibrate unbalanced case-control ratios58. Models were adjusted for age, sex, genotyping batch and first ten principal components. All variants reaching genome-wide significance p-value threshold of 5 × 10−8 are considered as genome-wide significant (GWS), and all disease-endpoints reaching multiple testing corrected (for the number of endpoints tested = 2861) p-value threshold of 0.05/2861 = 1.75 × 10−5 were considered as phenome-wide significant (PWS).

Independent GWS loci for atherosclerosis were determined as adding ±0.5 Mb around each variant that reached the genome-wide significance threshold, overlapping regions were merged. The publicly available summary statistics from CARDIoGRAMplusC4D, a large meta-analysis of CHD involving 60,801 cases and 123,504 controls3 was used for assessing whether the locus has been previously reported to associate with CHD. In addition, NHGRI-EBI GWAS Catalog10 was used for assessing whether the locus has been previously reported to associate with any CVD-related endpoint or traditional risk factor for CVD, such as blood lipids, BMI and blood pressure. All loci that had not been reported to associate with CVD were fine-mapped using FINEMAP59 to determine the credible sets in each signal, and meta-analyzed across the cohorts (UKBB, EstBB and BBJ) where available to test their novelty.

In Corogene60 (n = 5300), a sub-cohort of FinnGen where participants have been collected as patients with coronary heart disease (CHD) and other related heart diseases, we tested the association of rs534125149 with sub-types of coronary heart disease: acute coronary syndrome, stable coronary heart disease (CHD) and MINOCA61 (myocardial infarction no coronary artery occlusion), by which we refer to patients that have had symptoms, ECG-changes and cardiac enzyme or troponine release suggesting acute coronary syndrome, but did not have coronary stenosis. The acute coronary syndrome was further divided into unstable Angina pectoris, non-ST segment elevation myocardial infarction (NSTEMI) and ST segment elevation myocardial infarction (STEMI). Associations were tested by calculating risk ratios (RR) for carriers vs. non-carriers of rs534125149 using non-CHD group always as controls and excluding the other tested groups from the analysis. p-values were calculated using χ2-test, and p-values < 0.05 were considered significant.

Survival analysis

Survival analysis for coronary atherosclerosis and myocardial infarction was performed using GATE62, which accounts for both population structure and sample relatedness and controls type I error rates even for phenotypes with extremely heavy censoring. GATE transforms the likelihood of a multivariate Gaussian frailty model to a modified Poisson generalized linear mixed model (GLMM63,64), and to obtain well-calibrated p-values for heavily censored phenotypes, GATE uses the SPA to estimate the null distribution of the score statistic. For coronary atherosclerosis and myocardial infarction, survival time from birth to first diagnose was analyzed for both rs534125149 and rs201988637. Models were adjusted for age, sex, genotyping batch and first ten principal components, similarly to original GWAS analyses. In addition, cox-proportional hazards model was used for survival analysis for coronary atherosclerosis and myocardial infarction using a binary variable (carrier or non-carrier) for either inframe insertion rs534125149 or splice acceptor variant rs201988637.

Biomarker analyses

We tested the association of the two MFGE8 variants (rs534125149 and rs201988637) with quantitative measurements of cardiometabolic relevance or known risk factors for CVD in two sub-cohorts of FinnGen, the population-based national FINRISK study65 (n = 26,717) and GeneRISK66 (n = 7239). The associations were tested across 66 quantitative measurements of cardiometabolic relevance in FINRISK, and for 158 sub-lipid species in GeneRISK. In Young Finns Study (YFS)67 cohort (n = 1934), we tested the association of the two variants with three measurements of arterial relevance (carotid artery distensibility, pulse wave velocity, and pulse pressure).

In addition to Finnish cohorts described above, we tested the association of the two variants in Estonian Biobank data (EstBB)14,68, BioBank Japan (BBJ)12,13, and UK Biobank (UKBB)69. In EstBB (n = 51,388–137,722) we tested the associations of both variants with body mass index (BMI), systolic and diastolic blood pressure (SBP and DBP) and pulse pressure (PP), in BBJ in we tested the association of rs534125149 with 17 known quantitative risk factors for CVD and lastly, in the UKBB we tested the association of rs201988637 with 79 measurements of cardiometabolic relevance. In all of these biomarker analyses, a linear regression model adjusted for age and sex was used and for all quantitative risk factors rank-based inverse normal transformation was applied prior to analysis. Bonferroni corrected p-value threshold for the number of phenotypes tested was used to assess the significance of resulting associations in each cohort.

For biomarkers that showed significant association in any of the cohorts, we performed a meta-analysis across all cohorts the measurement was available. Meta-analysis was performed using inverse-variance weighted fixed-effects meta-analysis method70,71. Bonferroni corrected p-value for number of traits tested (n = 2) was used to assess the significance of resulting associations in meta-analysis.

Height association

To assess whether the association of rs534125149 with height was due to the MFGE8 gene, we first performed conditional analysis of height conditioning the association for rs534125149, the lead variant in FinnGen height GWAS (rs11630187) and for previously known height-associated variant in the locus, rs1694234172, separately. Conditioning the height association on rs534125149 did not have much effect on the association of the lead variant for height (rs11630187) in the region (p-value before conditioning = 5.07 × 10−34 and after conditioning = 1.19 × 10−26), whereas when conditioning on the lead variant for height (rs11630187) in the region, the smallest p-value in the region was 1.39 × 10−15 (for variant rs28564751). In addition, conditioning on either known height-associated variant rs16942341 or lead variant for height in FinnGen (rs11630187) did not affect on rs534125149’s association with height (p-value before conditioning = 8.04 × 10−13 and after conditioning = 3.14 × 10−12 and 2.75 × 10−05, respectively)

In addition, to assess whether the association of rs534125149 with atherosclerotic cardiovascular disease and height reflect the same signal, we performed colocalization analysis in the region ±200kB around rs53412514 using Coloc package in R. The probability for shared signal (PP4) was 9.22 × 10−13, whereas probability for two independent (PP3) signals was 1, indicating two independent signals for height and coronary atherosclerosis in the locus.

Identifying causal variants

We used FINEMAP59 on the GWAS summary statistics to identify causal variants underlying the associations for MI (strict definition, i.e., only primary diagnoses accepted) and coronary atherosclerosis. FINEMAP analyses were restricted to a ±1.5 Mb region around the rs534125149. We assessed variants in the top 95% credible sets, i.e., the sets of variants encompassing at least 95% of the probability of being causal (causal probability) within each causal signal in the genomic region. Credible sets were filtered if minimum linkage disequilibrium (LD, r2) between the variants in the credible set was <0.1, i.e., not clearly representing one signal.

Protein modeling

The predicted structure of lactadherin was obtained from AlphaFold19 (https://alphafold.ebi.ac.uk/entry/Q08431). Model confidence for the domain containing the variant of interest was scored mostly as very high and was structurally similar to the crystal structure of bovine lactadherin21 (PDB ID:2PQS). The structure of the insertion variant rs534125149 was predicted using the AlphaFold Colab notebook (https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb). Protein structures were visualized using PyMOL73.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.