Introduction

Von Willebrand factor (VWF) is a multifunctional glycoprotein, which is secreted by endothelial cells and released upon endothelial cell activation. VWF initiates the adherence of platelets to the injured vessel wall, and the subsequent platelet aggregation facilitates adequate hemostasis.1, 2

Plasma levels of VWF antigen (VWF:Ag) are characterized by a large interindividual variation and range from 0.60 to 1.40 IU/ml in healthy individuals.3 Various environmental and lifestyle factors affect VWF:Ag levels, but ~60% of the variability in VWF:Ag levels can be explained by genetic factors.4

The necessity of maintaining normal VWF levels in the circulation is illustrated by two clinical manifestations that may occur when VWF exceeds its normal range. High VWF:Ag levels are associated with an increased risk of venous thrombosis and arterial thrombosis.5, 6, 7, 8 Conversely, low VWF:Ag levels are associated with an increased bleeding tendency and are a characteristic of von Willebrand disease (VWD). VWD is the most common inherited bleeding disorder in humans and is caused by a quantitative deficiency of VWF (type 1 and 3 VWD) and/or a qualitative defect of VWF molecules (type 2 VWD).9

Most severe forms of type 1 VWD are caused by dominant-negative family-based variations in the VWF gene (VWF).10, 11 However, in individuals with moderately decreased VWF:Ag levels, VWF variations are often not found and linkage with the VWF locus is rarely seen.10, 11 Hence, it is difficult to differentiate between subjects with physiologically low VWF:Ag levels and subjects with low VWF:Ag levels because of VWD.12, 13 However, as VWF:Ag levels are strongly genetically determined, it is expected that more common genetic variations in genes other than VWF are likely to be involved in the occurrence of low VWF:Ag levels and therefore in the etiology of type 1 VWD. We have previously shown that several loci outside the VWF gene are indeed associated with VWF:Ag levels and that the VWF decreasing alleles are more frequently observed in individuals diagnosed with VWD.13 To identify common genetic loci that are associated with low VWF:Ag levels, related to an increased bleeding tendency, we performed a meta-analysis of genome-wide association studies in 11 large population-based cohort studies.

Methods

Study populations

This meta-analysis was conducted in the CHARGE Consortium,14 which includes data from several population-based cohort studies. VWF:Ag measurements were available in four of these: the Rotterdam Study (RS) I and II, the Framingham Heart Study (FHS) and the Atherosclerotic Risk in Communities (ARIC) study. In addition, we included data from seven other studies that had VWF:Ag measurements and genome-wide data available: the British 1958 Birth cohort (B58C) study, the PROspective Study of Pravastatin in the Elderly at Risk (PROSPER), the Prevention of Renal and Vascular Endstage Disease (PREVEND) study, Lothian Birth Cohort 1921 and 1936 (LBC1921, LBC1936), Vis Croatia Study (CROATIA-Vis) and ORKNEY complex Disease Study (ORCADES) (see Supplementary Tables 1 and 2). The designs of the studies have been described previously.15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25

Genome-wide scans and VWF:Ag measurements were available for analysis in 31 149 individuals. Eligible participants were not using a coumarin-based anticoagulant at the time of VWF:Ag measurement and were of European ancestry by self-report. All studies were approved by their respective institutional review committee. In addition, written informed consent was obtained from all participants, as well as permission to use their DNA for research purposes.

Baseline measurements and VWF measures

Baseline measures of clinical and demographic characteristics were obtained at the time of cohort entry for ARIC, CROATIA-Vis, ORCADES, PROSPER, PREVEND and RS, and at the time of phenotype measurements for B58C, LBC1921, LBC1936 and FHS. Measures were obtained using standardized methods as specified by each study and included measures of height and weight, as well as self-reported treatment of diabetes and hypertension, current alcohol consumption and prevalent cardiovascular disease (history of myocardial infarction, angina, coronary revascularization, stroke or transient ischemic attack). Blood group antigen phenotypes (O and non-O) were reconstructed using genotype data of rs687289:C>T, which is a marker for the O allele.26

VWF:Ag was measured in all cohorts using enzyme-linked immunosorbent assays (ELISA) (Supplementary Table 3).

Genotyping

For the genotyping, DNA was collected from phlebotomy from all studies except B58C, which used cell lines. Genome-wide assays of SNPs were conducted independently in each cohort using various Affymetrix and Illumina panels (Supplementary Table 3). Each study conducted genotype quality control and data cleaning, including assessment of Hardy–Weinberg equilibrium and variant call rates. Details on genotyping assays have been described in detail previously and are provided in Supplementary Table 3.14

For this analysis, we investigated genetic variation in the 22 autosomal chromosomes.27 Genotypes were coded as 0, 1 and 2 to represent the number of copies of the coded alleles for all chromosomes.27 Each study independently imputed its genotype data to the ≈2.6 million SNPs identified in the HapMap Caucasian (CEU) sample from the Centre d’Etude du Polymorphisme Humain.28, 29, 30 Imputation software, including MACH, BIMBAM or IMPUTE, were used to impute unmeasured genotypes with SNPs that passed quality control criteria based on phased haplotypes observed in HapMap. Imputation results were summarized as an ‘allele dosage’, which was defined as the expected number of copies of the minor allele of that SNP (a continuous value between 0 and 2) for each genotype. Each cohort calculated the ratio of observed to expected variance of the dosage statistics for each SNP. This value, which generally ranges from 0 to 1 (ie, poor to excellent), reflects imputation quality.

Public Repository: Our data are available on the European Genome-phenome Archive (https://ega.crg.eu, accession number: EGAS00001001341).

Statistical analysis

Genotype–phenotype data were analyzed independently by each study. VWF:Ag measurements were used as dichotomous variable (low versus normal) with low VWF defined as the lowest 5% within blood groups, that is, blood group O and non-O. All studies used logistic regression with an additive genetic model adjusted for age and sex to conduct analyses of all directly genotyped and imputed SNPs and their association with dichotomous VWF:Ag measures. FHS used generalized estimation equations to account for familial correlation. ARIC and PROSPER adjusted for field site, additionally. B58C adjusted for sex, date and time of sample collection, postal delay and the nurse who performed the inclusion, which also adjusts for the region of residence. Age adjustment was not necessary in B58C, as all cohort members were born in 1 week.

An inverse-variance weighted meta-analysis was performed using METAL software (http://www.sph.umich.edu/csg/abecasis/Metal/index.html) with genomic control correction being applied at the cohort level.31

The a priori threshold of genome-wide significance was set at a P-value of 5.0x10−8. When more than one SNP clustered at a locus, the SNP with the smallest P-value was selected to represent the locus.

Results

For this meta-analysis 31 149 participants of European ancestry were included. The sample size and participant characteristics from each cohort are displayed in Supplementary Table 1. The mean age ranged from 45 years in B58C to 87 years in LBC1921 and on average 48% of the participants was female.

A quantile–quantile plot of the observed P-value from meta-analysis against expected P-value distribution is shown in Figure 1. Figure 2 illustrates the primary findings from the meta-analysis and presents P-values for each of the interrogated SNPs across the 22 autosomal chromosomes. A total of 97 SNPs exceeded the genome-wide significance threshold of 5 × 10−8 and clustered around five genetic loci on four different chromosomes (Table 1 and Figure 3). The SNP with the strongest signal was rs8176704:A>G, which is located at 9q34 (intron) in the ABO blood group gene (P=2.4 × 10−64). The odds ratio (OR) for having VWF levels in the lowest 5% was 2.83 (95% CI 2.52; 3.18). In addition, we performed a conditional analysis. Based on this analysis, we found three independent signals at 9q34. The analysis shows that rs579459 and rs8176747 are independently significant after taking into account the LD structure and their correlation with rs817704. The second most significant locus was marked by rs216303:T>C, which is located at 12p13 (intron) in the VWF gene (OR 0.57; 95% CI: 0.51; 0.64, P=5.3 × 10−22). The third genome-wide significant signal at chromosomal position 6q24 (intron) was within STXBP5 (Syntaxin Binding Protein 5). Rs1221638:A>G was associated with the smallest P-value (5.8 × 10−10) in this region (OR 1.28; 95% CI: 1.19; 1.39). The fourth statistical significant signal was marked by rs4981022A>G, which is located at 12q23 (intron) in STAB2 (stabilin-2) (OR 0.79; 95% CI: 0.73; 0.85, P=1.2 × 10−8). The final genome-wide significant locus was marked by rs17057285:A>C (OR 0.41; 95% CI: 0.30; 0.56, P=2.6 × 10−8), which is 200 kb upstream from UFM1 (ubiquitin-fold modifier 1). There are two SNPs close to rs17057285. The first one is rs17057209, which is 52 kb far from rs17057285 and is in complete LD with rs17057285 (R2=1). Both these SNPs are missing in 5 studies (VIS, ORKNEY, PREVEND, LBC1921, LBC1936) out of 11 studies that contributed to the study. The third SNP is rs7323793, which is 67 kb far and is partly in LD with rs17057285 (R2=0.496). Rs7323793 is missing only in the PREVEND study.

Figure 1
figure 1

Quantile–quantile plot of the observed and expected distribution of P-values for all ~2.6 million SNPs and their association with low VWF levels based on meta-analyzed data.

Figure 2
figure 2

−Log10 P-values for each of the ~2.6 million tests performed as part of the GWA analysis of low VWF levels. The gray dashed horizontal line marks the 5 × 10−8 P-value threshold of genome-wide significance.

Table 1 Genome-wide significant association of five loci with low VWF levels
Figure 3
figure 3

Regional plots of top marker loci associated with low VWF levels. (A–E) The association P-values (−log10 transformed, indicated by the left y axis) for SNPs in a 60-kb region of each of the five loci (ABO, VWF, STXBP5, STX2, UFM1) are plotted against their chromosome positions (NCBI build 3) on x axis. The top SNPs are presented as a large diamond in red font and neighboring variants are presented in different colors based on linkage disequilibrium based on HapMap Caucasian data: red: 1≤r2>0.8; orange: 0.8≥r2>0.6; yellow: 0.6≥r2>0.3; green: 0.3≥r2>0.1; blue: 0.1≥r2>0.05; light blue: 0.05≥r2>0.0. The left y axis is the P-value on the −log10 P-value scale and the gray line marks the threshold of genome-wide significance (P=5 × 10−8). Shown in light blue are the estimated recombination rates in HapMap with values indicated by the right y axis. Regional genes and their direction of transcription are depicted with green arrows. The full colour version of this figure is available at EJHG Journal online.

In addition to our five genome-wide significant loci, five other loci demonstrated multiple-SNP hits with P-values below 1.0 × 10−6: rs10848820:A>G (P=1.2 × 10−7) within TSPAN9 (tetraspanin 9), rs4276643:T>C (P=3.4 × 10−7) within SCARA5 (scavenger receptor class A, member 5), rs17398299:A>C (P=4.1 × 10−7) close to 1 gene, LPHN2 (latrophilin 2), rs5995441:T>C (P=8.3 × 10−7) within CARD10 (caspase recruitment domain family, member 10) and rs3750450:T>G (P=9.6 × 10−7) within EPB41L4B (erythrocyte membrane protein band 4.1 like 4B).

Discussion

In this meta-analysis of GWA data from 11 population-based cohorts comprising 31 149 individuals of European ancestry, we identified five genetic loci that are associated with low VWF levels: ABO, VWF, STXBP5, STAB2 and UFM1.

The most significant signal in our study came from a well-known determinant of VWF:Ag levels, the ABO locus. The presence of blood group A and B antigens on VWF molecules leads to a decreased clearance of VWF molecules. Consequently, individuals with blood group O have 25% lower VWF plasma concentrations compared with individuals with blood group non-O.32 Although we used a different cutoff point for low VWF levels for blood group O and non-O separately to minimize the effect of blood group, the ABO locus still reached a very high level of statistical significance. This implies that blood group O versus non-O explains not the total ABO locus effect, and that A or B antigens also determine VWF levels. Indeed, carriers of the B antigen have higher VWF levels compared with carriers of the A antigen and carriers of both antigens have the highest VWF levels.33, 34

The second locus is within the VWF gene. It has been well established that common genetic polymorphisms in the VWF gene contribute to the variability in VWF:Ag levels.35, 36, 37 The most significant SNP that marked the VWF locus was rs216303:T>C, which is located within an intronic region. Until recently, intronic polymorphisms were often considered less relevant for disease development and regulating protein levels in plasma. However, there is now an increasing recognition that intronic variants can contribute by, for example, influencing the form and efficacy of gene splicing and mRNA stability.37 Another possibility is that SNPs in the intronic regions are in high LD with functional SNPs in adjacent regions.

The third locus is within the STXBP5 gene, which encodes the syntaxin binding protein 5. STXBP5 can bind to Soluble N-ethylmaleimide-sensitive factor (NSF) Attachment protein Receptor (SNARE) proteins, among which syntaxin-2 and syntaxin-4. Syntaxin-4 has been shown to be involved in Weibel Palade Body exocytosis,38 the well-known mechanism for the secretion of VWF molecules from endothelial cells. We have previously shown in a well-defined cohort of young patients with a first event of arterial thrombosis that genetic variation in STXBP5 is associated with VWF:Ag levels.13, 39 The LD between rs1221638:A>G and the SNP that had the highest significance in the previous meta-analysis is D′=0.90 and R2=0.67.

The fourth locus was marked by rs4981022:A>G, which is located in STAB2. STAB2 is a transmembrane receptor protein and is primarily expressed in liver and spleen sinusoidal endothelial cells. STAB2 can bind various ligands, such heparin, LDL, bacteria and advanced glycosylation products, and subjects them to endocytosis.40 STAB2 variation might be important in the regulation of VWF levels via the clearance of VWF molecules.

The final genome-wide significant locus was marked by rs17057285:A>C, which is upstream from UFM1. UFM1 encodes the ubiquitin-fold modifier 1, which has been recently identified as a novel protein-conjugating system.41 Although the precise function has not been elucidated yet, the UFM1 cascade seems to be involved in cellular homeostasis, influencing cell division, growth and endoplasmatic reticulum function.42 UFM1 is highly expressed in the pancreatic islets of Langerhans and has a role in the development of type 2 diabetes. Another study showed possible involvement in the development of ischemic heart disease. In this study, chronic inflammation in mice led to a strong upregulation of UFM1 in cardiomyocytes.43 As VWF:Ag levels also have been associated with an increased risk of ischemic heart disease, this is an interesting finding. However, UFM1 has not yet been linked to VWF directly yet and is a novel association needing replication.

Four of the identified loci for low VWF:Ag levels (ie, ABO, VWF, STXBP5 and STAB2) have previously shown to be involved in the regulation of VWF:Ag levels in general.44 The other identified new genetic loci for continuous VWF:Ag levels (ie, SCARA5, STX2, TC2N and CLEC4M) were not associated with low VWF levels.

UFM1 is a novel genetic locus associated with low VWF levels that was not associated with the continuous VWF:Ag levels. Rs17057285:A>C, the SNP with the highest P-value that marks this locus, has a very small minor allele frequency of ~0.5%. Therefore, this finding should be interpreted with care.

In today’s clinical practice, it is hard to distinguish between physiologically low VWF levels and VWF:Ag levels due to VWD, because both VWF levels and bleeding symptoms are highly variable and occur frequently in the general population.45 Until recently, it was believed that low VWF:Ag levels and VWD are caused by variations in the VWF gene only. However, now it has been shown that 35% of type 1 (partial quantitative deficiency of VWF) VWD patients have no apparent VWF variations.10, 11 This suggests that genetic variations in genes other than VWF may lead to low VWF:Ag levels, also in patients diagnosed as having VWD.13 Indeed, our current findings confirm this hypothesis that next to ABO blood group and VWF, other genetic loci are involved in the occurrence of low VWF levels.

In the current study, we have not included a replication cohort. Generally, it has been recommended to include all cohorts in the discovery panel to maximize statistical power, rather than use some of the cohorts for replication. In addition, the identified genetic loci comprise extremely small P-values and were previously discovered in the meta-analysis using VWF:Ag as a continuous measure. For these reasons, it is very unlikely that our findings are false positive or came out by chance.

In conclusion, we identified five genetic loci that are associated with low VWF levels: ABO, VWF, STXBP5, STAB2 and UFM1. Our findings confirm the hypothesis that genes other than VWF lead to low VWF:Ag levels. Further research is warranted in order to elucidate whether these genetic loci also contribute to the incidence of bleeding symptoms and VWD.