Genome-wide association analysis and replication in 810,625 individuals with varicose veins

Ahmed, Waheed-Ul-Rahman; Kleeman, Sam; Ng, Michael; Wang, Wei; Auton, Adam; Lee, Regent; Handa, Ashok; Zondervan, Krina T.; Wiberg, Akira; Furniss, Dominic

doi:10.1038/s41467-022-30765-y

Download PDF

Article
Open access
Published: 02 June 2022

Genome-wide association analysis and replication in 810,625 individuals with varicose veins

Nature Communications volume 13, Article number: 3065 (2022) Cite this article

4745 Accesses
6 Citations
41 Altmetric
Metrics details

Subjects

Abstract

Varicose veins affect one-third of Western society, with a significant subset of patients developing venous ulceration, costing $14.9 billion annually in the USA. Current management consists of either compression stockings, or surgical ablation for more advanced disease. Most varicose veins patients report a positive family history, and heritability is ~17%. We describe the largest two-stage genome-wide association study of varicose veins in 401,656 individuals from UK Biobank, and replication in 408,969 individuals from 23andMe (total 135,514 cases and 675,111 controls). Forty-nine signals at 46 susceptibility loci were discovered. We map 237 genes to these loci, several of which are biologically plausible and tractable to therapeutic targeting. Pathway analysis identified enrichment in extracellular matrix biology, inflammation, (lymph)angiogenesis, vascular smooth muscle cell migration, and apoptosis. Using a polygenic risk score (PRS) derived in an independent cohort, we demonstrate its predictive utility and correlation with varicose veins surgery.

Genetics of varicose veins reveals polygenic architecture and genetic overlap with arterial and venous disease

Article 09 January 2023

Genome-wide association study of varicose veins identifies a protective missense variant in GJD3 enriched in the Finnish population

Article Open access 18 January 2023

Genome-wide meta-analysis identifies 93 risk loci and enables risk prediction equivalent to monogenic forms of venous thromboembolism

Article 19 January 2023

Introduction

Varicose veins (VVs) are a very common manifestation of chronic venous disease, affecting over 30% of the population in Western countries¹. In the USA, chronic venous disease affects over 11 million males and 22 million females aged 40–80 years², meaning it is twice as prevalent as coronary heart disease³. Chronic venous insufficiency leads to serious complications in 10% of cases, including lipodermatosclerosis, venous ulceration and rarely amputation. Despite best care, 25–50% of venous leg ulcers remain unhealed after six months of treatment⁴. Ongoing management of venous leg ulcers costs around $14.9 billion annually and 4.5 million work days per year are lost to venous-related illness in the USA^5,6. Despite this, at present no medical treatments exist for VVs. For symptomatic patients, endovenous ablation is the first-line treatment approach⁷. However, recurrence following surgery is 20%, with no difference in recurrence compared to conventional open surgery⁸.

VVs are thought to develop from a combination of valvular insufficiency, venous wall alterations, and haemodynamic changes that precipitate venous reflux, stasis and hypertension of the venous network, causing varicosities^9,10. Risk factors for VVs include older age, female sex, pregnancy, a positive family history, obesity, tall height and previous deep-vein thrombosis (DVT)^3,11,12. Many patients with VVs report a positive family history¹³, and among offspring with one affected parent the familial standardised incidence ratio is 2.39¹⁴, with a heritability of 17%¹⁵, suggesting a genetic component to aetiology. Two recent genome-wide association studies (GWAS) of VVs have been described. Ellinghaus et al.¹⁶ tested for associations in 323 cases and 4619 controls, with suggestive associations examined in an independent cohort totalling 1946 cases and 3146 controls. They reported two associations, mapped to EFEMP1 and KCNH8. Fukaya et al.¹⁷ used UK Biobank to identify a further 30 putative associations associated with VVs. However, their cases were defined only by the International Classification of Diseases (ICD) diagnostic codes, meaning that thousands of cases defined by operation codes were misclassified as controls¹⁸. Moreover, the genetic associations discovered were not replicated in an independent cohort.

To advance substantially our understanding of the aetiology and genetic architecture of VVs, we performed the largest genome-wide association study (GWAS) of surgically confirmed VVs. We use data from participants from the UK Biobank (n = 401,656) and confirm top associations in a large independent cohort of research participants with self-reported VVs from 23andMe (n = 408,969). We examine patterns of expression of genes in clinically relevant biologic pathways and prioritise targets for therapeutic development. Furthermore, we derive a polygenic risk score (PRS) for VVs in an independent cohort (FinnGen), and demonstrate its predictive utility in UK Biobank.

Results

The overall analytic workflow is shown in Fig. 1.

**Fig. 1: Study design and analysis workflow.**

Association analysis

Genome-wide testing of the UK Biobank discovery cohort (22,473 cases and 379,183 controls) yielded genome-wide significant associations (P < 5 × 10⁻⁸) at 108 risk loci (Supplementary Data 1). Conditional regression yielded a further seven independent signals at six of the 108 loci associated significantly with VVs. As expected, the large discovery sample led to the λ_GC showing inflation (1.25) (Supplementary Fig. 1). The linkage equilibrium score regression (LDSC) intercept (1.06), and attenuation ratio of 0.13 is consistent with polygenicity¹⁹. We estimated the total SNP heritability (h²_g) for VVs in UK Biobank to be 8.04% (S.E. = 0.17%) using a variance components method, and 5.03% (S.E. = 0.30%) using LDSC¹⁹. The estimated heritability in the 23andMe dataset (113,041 cases and 295,928 controls) was 5.40% (S.E. = 0.30%).

We tested the top 116 signals at the 108 risk loci in the independent 23andMe dataset. Of the 116 associated variants, 106 passed QC and were available in the 23andMe summary statistics. Forty-nine of 116 variants demonstrated significant association at a conservative Bonferroni-corrected threshold of P < 4.72 × 10⁻⁴. Thus, we identified 49 significant associations at 46 independent risk loci (Fig. 2; regional association plots for all 49 signals can be found in Supplementary Fig. 2). Allelic effects were concordant across both cohorts at all 49 replicated variants, with minimal evidence of heterogeneity between the two GWAS at all loci (Q-statistic > 0.05). Eighteen loci were previously reported (Supplementary Data 2). Sixty-seven variants at 67 loci did not replicate (Supplementary Data 3); however, there was high overall concordance in the effect sizes of the 106 SNPs between the two datasets (Pearson’s r² = 0.84, P < 1 × 10⁻¹⁶; Supplementary Fig. 3A).

**Fig. 2: Results of genome-wide association study in varicose veins.**

In silico annotation

To interrogate and annotate SNPs at our susceptibility loci, we used FUMA (Functional Mapping and Annotation of GWAS)²⁰. FUMA identified 5315 genome-wide significant candidate SNPs from our discovery cohort associated with VVs at 45 of the 46 replicated loci (Supplementary Fig. 4; Supplementary Data 4). Around 2% of candidate SNPs (n = 103) were exonic, of which 56 were non-synonymous (52 missense, two stop-gain, one splice site variant, and one frameshift variant). Of the non-synonymous variants, four missense variants were predicted to affect protein structure or function, and were in moderate linkage disequilibrium (LD) (r² ≥ 0.22 and Dʹ ≥ 0.75) with the index SNP at three loci—12q13.12 (index SNP: rs7308356), 16q24.3 (index SNP: rs2002833) and 17q24.3 (index SNP: rs9895127) (Supplementary Data 5). This includes rs7184427 (A/G) (P = 9.10 × 10⁻⁴⁰, OR = 1.19, r²_index = 0.22, Dʹ_index = 0.75), which causes a predicted deleterious p.Val250Ala substitution within PIEZO1 (SIFT²¹ score: 0). PIEZO1 encodes a mechanically active ion channel involved in the detection of vascular shear stress, and was previously associated with VVs and lymphoedema^17,22.

Of the 4294 intronic and intergenic variants identified by FUMA (Supplementary Fig. 4), 3735 (87.0%) resided in open chromatin regions (Supplementary Data 4), and 163 demonstrated evidence of functionality with CADD (Combined Annotation-Dependent Depletion²³) score ≥12.37, the threshold suggested for deleterious variants (Supplementary Data 6). Using RegulomeDB (RDB²⁴) to investigate their regulatory functions, 17 had a RDB score of at least 2b (likely to affect binding) and eight had an RDB score of at least 1f (likely to affect binding and linked to expression of a gene target).

To narrow down the list of probable functional variants, we performed functionally informed fine-mapping²⁵ for each replicated locus. Eighteen SNPs were identified with posterior probability >95% (Supplementary Table 1), including two likely functional variants in PIEZO1 (rs112070238 and rs8053350), affecting the 5ʹUTR and promoter regions, respectively.

Gene mapping

Positional mapping in FUMA SNP2GENE highlighted 204 genes based on genomic position at 38 loci (Supplementary Data 7)²⁰. eQTL mapping, based on GTEx v8 tibial artery tissue²⁶, mapped a total of 80 genes. Genome-wide, gene-based association analysis implemented in MAGMA v1.07²⁷ identified 248 protein-coding genes significantly associated with VVs (P < 2.67 × 10⁻⁶); 117 were within our replicated loci (Supplementary Data 8; Supplementary Fig. 5).

In the summary-based Mendelian randomisation (SMR) analysis²⁸, testing was performed for 4946 probes with a cis-eQTL at P < 5 × 10⁻⁸; a threshold of significance was set at P_SMR < 1.01 × 10⁻⁵ (0.05/4,946). Forty-four genes passed the set significance threshold (P_SMR < 1.01 × 10⁻⁵). To exclude SMR associations due to linkage, we performed HEIDI analysis across 44 significant genes −25 passed the HEIDI test (P_HEIDI ≥ 1.12 × 10⁻³; Supplementary Data 9), 12 of which were within our VVs susceptibility loci, highlighting an association with VVs through pleiotropy rather than LD and co-localisation.

In summary, 237 unique genes were mapped by at least one mapping approach to 39 of the 46 susceptibility loci. Significant overlap between the mapping strategies was seen, with the majority of genes (54.9%, n = 130) being prioritised by two or more mapping approaches. Thirty-six genes were prioritised by three mapping approaches, and six genes (ATF1, AP1M1, DNAH10OS, FBLN7, LBH, WDR92) were prioritised by all four approaches.

Gene set, tissue-specific and pathway enrichment

To delineate gene sets and enriched pathways where the 237 prioritised genes converged, gene-set analysis was conducted in MAGMA v1.07²⁷. Following MAGMA gene-set enrichment analysis, four Gene Ontology (GO) gene sets were significantly over-represented in our data: cardiovascular development (P = 1.56 × 10⁻⁸, n = 666); tube morphogenesis (P = 9.35 × 10⁻⁸, n = 778); blood vessel morphogenesis (P = 9.39 × 10⁻⁷, n = 555); and tube development (P = 1.68 × 10⁻⁶, n = 956) (Supplementary Table 2). Further, tissue-specific gene property analysis demonstrated significant gene expression in all three vascular tissue types present in GTEx 54 tissue types: coronary artery (P = 6.23 × 10⁻⁷, 2nd most enriched), tibial artery (P = 1.05 × 10⁻⁶, 3rd most enriched) and aorta (P = 3.92 × 10⁻⁵, 8th most enriched; Supplementary Fig. 6). MAGMA analysis of GTEx 30 general tissue types also demonstrated blood vessels to be a highly enriched tissue (P = 3.8 × 10⁻⁴, 3rd most enriched). Using eXploring Genomic Relations (XGR) analysis²⁹, six canonical pathways were significantly enriched. This included enrichment for genes in pathways related to extracellular matrix biology, the VEGF and VEGFR signalling network, and intracellular Ca²⁺ signalling in the T-cell receptor (TCR) Pathway (Table 1).

Table 1 Gene-based enrichment analysis.

Full size table

Polygenic risk score analyses

In order to derive a polygenic risk score (PRS) without overfitting to the UKB cohort, we utilised VV summary statistics from the FinnGen cohort (17,027 cases, 190,028 controls), which were highly concordant with the UK Biobank GWAS (Pearson’s r² = 0.95, P < 1 × 10⁻¹⁶; Supplementary Fig. 3B). Using multivariable logistic regression, we found that the FinnGen-derived PRS was an independent predictor for varicose vein case/control status (Fig. 3 and Supplementary Table 3). Individuals in the top PRS decile had markedly increased odds of VV diagnosis (OR 4.57, 95% CI 4.27–4.92, P < 1 × 10⁻³⁰⁰) compared to the bottom PRS decile. We hypothesised that VV cases who underwent surgery are phenotypically more severe and so would have a greater genetic liability than non-surgical VV cases. In a subgroup analysis of UK Biobank VV cases defined by self-reported diagnosis, ICD-10 or OPCS-4 code, history of surgery was associated with significantly increased PRS (0.407 vs. 0.330; P = 5.5 × 10⁻⁴), suggesting that the VV-PRS captures disease severity.

**Fig. 3: Logistic regression for varicose vein case/control status in UK Biobank against VV-polygenic risk score (VV-PRS) decile.**

Phenome-wide VV associations

To uncover associations between VV and other phenotypes, we first performed genetic correlation analysis using LD score regression¹⁹. Of the 176 traits from nine categories tested for genetic correlation, twelve traits from two categories (anthropometric and autoimmune) met our Bonferroni-corrected significance threshold (P < 5.56 × 10⁻³; Supplementary Table 4). All twelve significant traits were positively correlated with VVs (r_g range: 9–21%). Eleven traits belonged to the anthropometric category and pertained to height and weight phenotypes, which are established risk factors for VVs^3,11,12.

In the autoimmune trait category, we discovered systemic lupus erythematosus (SLE) to be correlated with VVs, sharing ~19% genetic overlap (P = 4.2 × 10⁻³, r_g = 0.195). Of note, the C allele of variant rs17321999 at 2p23.1 (LBH), which is associated with an increased risk of SLE (P = 2.22 × 10⁻¹⁶, OR = 1.20)³⁰, was also significantly associated with VVs in our discovery cohort (P_disc = 3.20 × 10⁻¹⁴, OR = 1.09), and in high LD with lead SNP at 2p23.1 in the meta-analysis (rs9967884; P_meta = 1.40 × 10⁻¹⁴, OR = 1.11; r² = 0.87). We used the Open Targets Genetics platform³¹ to confirm the associations between VVs and SLE at the variant level, using a phenome-wide association study (PheWAS) approach (Supplementary Data 10); three of our 49 replicated lead variants (rs4849044, rs7773004, rs61863928) demonstrated an association with SLE.

As a complementary approach, we leveraged the FinnGen-derived VV-PRS to perform a phenome-wide polygenic risk score analysis using a set of curated time-to-event phenotypes (n = 694) in UK Biobank. Eighteen phenotypes met the phenome-wide significance threshold (P = 1 × 10⁻⁵) in the Cox regression (Supplementary Table 5). The most significantly associated phenotypes were “Phlebitis and thrombophlebitis”, “Unspecified monoarthritis”, “Obesity”, “Postphlebitic syndrome”, “Pulmonary embolism and infarction” and “Umbilical hernia”. To investigate whether these phenotypes are causally associated with VV, we performed bidirectional Mendelian randomisation using genetic instruments derived from our UK Biobank GWAS. Using the inverse-variance weighted (IVW) method as the primary analysis, we found evidence for raised body mass index being causal in VV, with no evidence for significant directional pleiotropy on MR-Egger sensitivity analysis (Supplementary Table 6). Two further suggestive causal associations were found: VVs (exposure phenotype) on thrombophlebitis risk and hernia (exposure phenotype) on VVs risk.

Drug-target enrichment analysis

Finally, we tested the amenability of our associated genes to drug targeting using the Open Targets Platform³². Of the 237 mapped genes, 200 genetic targets were identified³². Forty-two drug pathways reached nominal significance (P < 0.05), with the Butyrophilin family interactions pathway being most enriched (P = 2.6 × 10⁻⁷, six targets), followed by the Calcineurin-NFAT pathway (P = 9.4 × 10⁻⁴, two targets), and the Transcriptional regulation by RUNX1 pathway, which possessed the highest number of targets (P = 1.6 × 10⁻³, 14 targets; Supplementary Data 11). Tractability information for 105 gene targets was available, with 65 of 237 genes predicted to be tractable to antibody targeting to a high confidence, and 26 genes predicted to be tractable to small molecule targeting (Supplementary Data 12). Eight of the 237 genes overlapped with pharmacologically active targets with known pharmaceutical interactions (CDK10, COL27A1, GABBR1, KCNJ2, MAPK10, OPRL1, TNC and VEGFA) (Supplementary Data 13). Of note, VEGFA is a target for several antibody, protein and oligosaccharide agents, which are currently in phase 2, 3 and 4 clinical trials, including for several ocular vascular disorders.

Discussion

VVs cause significant morbidity and large healthcare costs. There is a compelling need for a deeper understanding of the biology of VVs in order to develop new therapeutic strategies. Our analysis represents the largest and most comprehensive GWAS to date of VVs, including 135,514 patients and 675,111 controls. We discovered 28 previously unreported risk loci (29 signals), and independently replicated 18 of 29 previously reported but non-replicated loci (20 of 32 known signals). Our in silico analyses demonstrated strong evidence of functional variants in VV-associated genomic regions. Furthermore, our pathway analyses establish a strong enrichment for genes expressed in the extracellular matrix, immune cell signalling and circulatory system development. We have also identified a previously unreported genetic correlation between VVs and systemic lupus erythematosus (SLE). Several prioritised genes demonstrate the potential for pharmacological targeting, and are currently under active investigation in other diseases. Lastly, we demonstrate the potential utility of a polygenic risk score (PRS) in VVs, demonstrating that genetic risk correlates with a more severe phenotype—a first step in facilitating personalised medicine approaches to management.

Biological drug targets

This study has identified several biological pathways as mediators of VV pathophysiology that may be of relevance to pharmacological targeting. These cluster broadly into five functional categories: extracellular matrix regulation, the immune response, angiogenesis and lymphangiogenesis, smooth muscle cell biology and apoptosis (Supplementary Data 14), and we will briefly examine the three functional categories that produced the most compelling evidence in this study.

Extracellular matrix regulation

VVs demonstrate an increased luminal diameter and intimal hypertrophy, features closely related to disruption of the extracellular matrix (ECM)³³. Deposition of ECM in the perivascular space in VVs is also recognised—possibly a compensatory mechanism to reinforce an already weakened wall³⁴. Venous dilatation and valve ring enlargement seen in VVs is thought to affect the ability of venous valves to co-apt, which in turn contributes to venous reflux and hypertension¹⁰. Indeed, there is a noted imbalance of ECM proteins in VVs, specifically collagen and elastin—with a preponderance of collagen compared to normal vein³⁵. It is therefore possible that disruption in intrinsic connective tissue components of the vein wall or valves may contribute to VVs.

Our prioritised genes significantly overlapped with canonical pathways relating to ECM components, including Collagen Type XXVII Alpha 1 Chain (COL27A1) and EGF-containing Fibulin-like Extracellular Matrix Protein 1 (EFEMP1).

rs753085 (P = 2.17 × 10⁻¹¹, OR = 1.07) is in an intron of COL27A1. COL27A1 is a fibrillar collagen in the extracellular matrices of several tissues, including blood vessels³⁶. COL27A1 expression has been demonstrated to be reduced in VV samples³⁷. Moreover, our drug-enrichment analysis demonstrated COL27A1 to be a pharmacologically active target, with pharmaceutical agents currently being investigated in several clinical trials (Supplementary Data 13), demonstrating its potential candidacy as a therapeutic target for VV prevention or treatment.

In our GWAS, we replicated the previously described association between rs3791679 and VVs (P = 1.59 × 10⁻¹³, OR = 1.08)¹⁶. rs3791679 resides in an enhancer region of EFEMP1, and is believed to be the causal variant at this locus³⁸. EFEMP1 encodes the ECM glycoprotein fibulin-3³⁹, which plays a key role in maintaining the integrity of elastic tissues¹⁶, and is highly expressed in vein endothelial cells. Interestingly, the same A allele of rs3791679 is also associated with abdominal hernia susceptibility⁴⁰; this pleiotropy may in part explain our Mendelian randomisation finding that placed hernias on the causal pathway for VVs.

Fibulin-3 has been found to antagonise vascular development by reducing the expression of the matrix metalloproteases, MMP2 and MMP3, and increasing expression of tissue inhibitors of metalloproteases (TIMPs) in endothelial cells⁴¹. The saphenofemoral junction in VVs demonstrates reduced expression of MMP2, and heightened expression of TIMP1 and MMP1 protein levels⁴². Alterations in expression of these enzymes may therefore result in weakness in the venous wall and predispose patients to VVs. Our drug-enrichment analysis found fibulin-3 to be tractable to antibody targeting to a high confidence; moreover, metformin has been demonstrated to downregulate fibulin-3 through downregulation of MMP2⁴³. Fibulin-3 therefore represents another potential pharmacological target for VVs.

The immune response

Chronic Inflammation in the venous wall has been proposed to play a role in VV aetiopathology³³. When compared to normal veins, enhanced expression of inflammatory mediators has been observed¹⁰. VVs contain increased mast cells, monocytes and macrophages compared to normal veins¹⁰.

We defined five inflammation-associated risk loci in our GWAS. Of particular note is rs78216177 (P_meta = 5.80 × 10⁻¹⁴, OR = 1.10), in an intron of DOCK8 that has roles in both innate and adaptive immune systems. Deletion of DOCK8 is strongly associated with Hyper-IgE syndrome, a type of primary immunodeficiency that affects multiple systems, including the vasculature⁴⁴. Vascular abnormalities in hyper-IgE syndrome include aneurysmal changes and abnormalities in great vessels.

Notably, we also discovered a significant genetic overlap between VVs and SLE, an autoimmune disease, in the genetic correlation analyses, albeit not in the phenome-wide analysis of VV-polygenic risk score. The associations between VVs and SLE were confirmed at the variant level using a PheWAS approach, with three of 49 replicated lead variants (rs4849044, rs7773004, rs61863928) demonstrating an association with SLE. Furthermore, lead variant rs1471251 (P = 8.33 × 10⁻¹¹, OR = 1.06) is a known eQTL of AFF1, which is associated with SLE⁴⁵. Reinforcing this shared genetic basis, a VVs-associated variant at LBH, rs17321999 (P_disc = 3.20 × 10⁻¹⁴, OR = 1.09), was also previously associated with SLE (P = 2.22 × 10⁻¹⁶, OR = 1.20)³⁰.

XGR analysis demonstrated enrichment of “intracellular calcium signalling in the CD4 + T-cell receptor (TCR) pathway” (P = 1.9 × 10⁻³, Z = 3.5), specifically highlighting genes NFATC2 and PPP3R1, which are intimately involved in this pathway. rs3787184 (P_meta = 2.51 × 10⁻³⁶, OR = 1.16) was one of 18 SNPs identified as likely causal variants in our functionally informed fine-mapping (Supplementary Table 1), and resides in a promoter region of NFATC2. rs2861819 (P_meta = 2.65 × 10⁻⁷⁷, OR = 1.20) is in an intergenic region ~19 kb upstream of PPP3R1. PPP3R1 encodes calcium binding B (CnB), a subunit of calcineurin, a Ca²⁺influx-activated serine/threonine-specific phosphatase that interacts with NFAT transcription factors in the regulation of naive T-cell activation⁴⁶. VVs are defined by clustering and infiltration of T lymphocytes³⁴, which are predominantly distributed close to the venous valve agger⁴⁷, a fibroelastic structure located at the base of venous valves where tunica media meets adventitia. Therefore, we hypothesise that aberrant PPP3R1 and NFATC2 expression could alter calcium signalling in T-cells, which may contribute to the valvular pathology seen in VVs. The Calcineurin-NFAT pathway was also the second-most enriched in our drug-target enrichment analysis, and may represent a novel therapeutic approach to VVs.

Angiogenesis and lymphangiogenesis

Disruption in normal angiogenic processes can lead to VVs⁴⁸, potentially because of a failure to develop properly formed venous walls and valves, or to repair defects following vascular stress injury. MAGMA gene-set analysis revealed enrichment of several gene sets relating to tube formation and morphology, with the VEGF/VEGFR signalling network also being enriched in the XGR analysis.

rs11967262 at 6p21.1 (P_meta = 1.45 × 10⁻¹⁹, OR = 1.09) lies in an intergenic region ~7 kb upstream of vascular endothelial growth factor A (VEGFA). VEGFA is a critical regulator of angiogenesis, and is fundamental to maintaining the integrity and functionality of the vessel wall⁴⁹. VEGFA is a selective endothelial mitogen, binding to its receptor VEGFR2 to induce endothelial cell proliferation, migration and differentiation. VEGFA and VEGFR2 expression are significantly enhanced in the wall of VVs compared to normal veins, particularly in VVs complicated by thrombophlebitis⁵⁰; interestingly, our Mendelian randomisation analysis found evidence for a causal role played by VVs in thrombophlebitis risk. Plasma levels of VEGFA have also been demonstrated to be significantly increased in patients with VVs⁵¹. VEGFA also causes vasodilatation, which reduces vessel tone, leading to stasis and the release of oxygen free radicals, which contributes to vein wall weakness⁵². Intriguingly, VEGFA also promotes inflammation via expression of intercellular and vascular cell adhesion molecules, linking angiogenesis to immune dysregulation⁵³. Of note, anti-VEGFA agents are currently being investigated in several clinical trials for the treatment of retinal vein occlusion (Supplementary Data 13). The VEGF axis therefore represents a promising candidate for therapeutic targeting in the treatment of VVs.

Positional and MAGMA mapping highlighted genes at four loci relating to lymphangiogenesis. Indeed, the lymphatic system develops from veins, and its function is intimately related with the venous circulation, draining extracellular fluid back into circulation. It is therefore feasible that similar genetic defects may result in either lymphoedema, varicose veins or a combination of the two conditions. PROX1 is a master inducer gene necessary for the development of lymphatic vasculature⁵⁴, and PROX1knock-out mice are deficient of lymphatic vasculature⁵⁵. During developmental lymphangiogenesis, PROX1 has been shown to be necessary for the formation of lymphovenous valves^56,57, suggesting it may predispose to VV development by causing defects in venous valves. rs340875, (P_meta = 4.22 × 10⁻²⁰, OR = 1.09) is ~2 kb upstream of PROX1, and we also fine-mapped two probable functional variants at this locus (Supplementary Table 1).

Furthermore, PROX1 is co-expressed and functions alongside the transcription factor FOXC2 in lymphatic valve-forming cells at the earliest stage of lymphatic development⁵⁸. Mutations in FOXC2 cause hereditary lymphoedema-distichiasis—a disease characterised by VVs and peripheral lymphoedema—highlighting genetic overlap between the two disorders⁵⁹. A previous analysis also suggested FOXC2 to be implicated in the development of varicose veins in the general population⁶⁰, but we did not find evidence of association between varicose veins and FOXC2 in our study. In fact, the only gene identified in this study known to be associated with a Mendelian disease that results in VVs is PIEZO1, which also predisposes to both VVs and lymphoedema (Supplementary Table 7).

Polygenic risk score

Over two million people in the USA have advanced chronic venous disease², and around 500,000 per year undergo invasive surgical procedures. Our polygenic risk score (PRS) analysis of operated vs unoperated VVs cases represents an important proof-of-principle, demonstrating the feasibility of enhanced prognostication in enabling the subset of VVs patients that will require surgical intervention to be identified. Preventive strategies in high-risk individuals could include prophylactic compression stocking use, or early ablation procedures to mitigate the risk of venous ulceration. The benefits of early endovenous ablation in improving healing of venous leg ulcers has been demonstrated⁶¹. Future research may also enable the identification of those at high risk of recurrence following surgery, a significant problem in current management of VVs patients.

Strengths and limitations

Certain limitations of this study must be acknowledged. While the discovery GWAS in UK Biobank used a combination of hospital diagnostic codes, operation codes and self-report codes, VV cases in 23andMe were identified based on self-report alone, meaning that the phenotyping for the replication GWAS was necessarily less stringent. Moreover, rather than undertake a formal meta-analysis between the discovery and replication GWAS, we independently tested the association for the 116 independent lead SNPs that were genome-wide significant in the UK Biobank discovery GWAS. Thus, sub-threshold signals in the discovery GWAS that may have reached genome-wide significance in the replication GWAS, or under meta-analysis, were not identified. Furthermore, not having access to the full summary statistics for the replication GWAS also meant that our in silico analyses were performed on the summary statistics from the discovery GWAS alone.

Finally, we acknowledge that out of the many VVs-associated genes identified by a multitude of gene-prioritisation and mapping methods, only a small minority have robust evidence linking them to VVs through a truly causal variant. Moreover, our eQTL-based gene mapping and summary-based Mendelian randomisation analyses were limited by our lack of access to eQTL data for venous tissue. We therefore had to employ GTEx tibial artery tissue data as an imperfect surrogate, reasoning that it is the anatomically, histologically, and embryologically closest tissue to lower limb veins within the GTEx dataset. Functional validation will be required in diseased venous tissue to confidently associate these candidate susceptibility variants and genes to VVs.

Several strengths of the paper mitigate these limitations. We have performed the largest GWAS of VVs to date by a considerable margin, with 135,514 cases and 675,111 controls. We have stringently controlled our false-positive rate by reporting only the loci that were genome-wide significant in the discovery GWAS and that subsequently replicated, so we can be confident in the veracity of these 49 signals. This is reflected in the plethora of biologically plausible genes, gene clusters and biological pathways that were associated with these loci. Furthermore, by including surgical codes for phenotyping in the UK Biobank discovery GWAS, we were able to identify a considerably greater number of cases than a previous GWAS that also used the UK Biobank resource but relied on ICD diagnostic codes alone (22,473 vs. 9577 cases)^17,18. As a general principle of case ascertainment, we believe there is much to be gained by seeking out individuals who have undergone surgery for a disease: given the inevitable risk of complications, surgery is generally reserved for those at the more phenotypically severe end of the spectrum who have failed non-surgical treatment. Finally, the predictive capabilities of a VVs polygenic risk score derived in an independent cohort not only underscores the validity of the VVs-associated polygenic signals uncovered in our discovery GWAS, but also opens the door to the future use of genetic risk-stratification in improving prognostication and guiding decision-making in the management of VVs patients.

In summary, we have described the largest GWAS to date of VVs, a highly prevalent disease with a substantial health and socioeconomic cost. We discovered 49 variants at 46 loci that predispose to VVs. We have identified pathways and genes involved in extracellular matrix regulation, inflammation, vascular and lymphatic development, smooth muscle cell activity and apoptosis, all of which are biologically plausible contributors to the pathobiology of VVs, and provide excellent candidates for further investigation of venous biology. Several genes appear tractable to pharmacological targeting (notably VEGFA, COL27A1, EFEMP1, PPP3R1 and NFATC2), and may represent viable therapeutic targets in the future management of VV patients. Importantly, our polygenic risk score represents a first step towards better prognostication in patients with VVs.

Methods

Ethical approval and consent

UK Biobank obtained ethical approval from the North West Multi-Centre Research Ethics Committee (MREC) (11/NW/0382) to collect and disseminate data and samples from participants (for more details: ukbiobank.ac.uk/ethics). This study was conducted under UK Biobank study ID 22572. All participants provided informed consent for their genotype data to be used for this research. The consent procedures for UK Biobank are provided elsewhere (for more details: www.ukbiobank.ac.uk/). All 23andMe participants were obtained from the customer base of 23andMe, Inc. Genotyping of participants was performed by the 23andMe Personal Genome Service. All 23andMe research participants provided informed consent for their genotype data to be utilised for research purposes under a protocol approved by the external AAHRPP-accredited IRB, Ethical & Independent Review Services. Further details regarding the consent processes of 23andMe can be found elsewhere (23andme.com/en-gb/about/consent).

Population and phenotype definition

The UK Biobank is a population-level resource comprising a prospective cohort of approximately 500,000 participants, recruited at age 40–69 years, at 22 centres across England, Scotland and Wales, between 2006 and 2010⁶². Participants underwent whole-genome genotyping and data linkage with their medical records was performed to permit deep phenotyping⁶³. The characteristics of the full UK Biobank cohort are described in detail elsewhere⁶⁴. In the discovery analysis, VV cases were identified from the UK Biobank data showcase (ukbiobank.ac.uk) using the following diagnostic, operative and self-report codes (Supplementary Table 8):

1.
Primary and/or secondary ICD-10 codes for varicose veins (I83)
2.
Primary and/or secondary OPCS code for varicose vein surgery: ((L84-L88)
3.
Self-reported operation code for varicose vein surgery (1479)
4.
Self-reported non-cancer illness code for varicose veins (1494)

In summary, 27,165 individuals from the UK Biobank cohort possessed at least one of the above codes and were classified as VVs cases. Following quality control (QC), we identified 22,473 VV cases and the remaining 379,183 individuals were defined as controls.

In the 23andMe replication cohort, participants answered the question “Do you have varicose veins on your legs? (Yes/Not Sure/No)”. VV cases were identified if they answered “Yes”, while controls were those who answered “No”. In all, 113,041 VV cases and 295,928 controls were included in the final replication analysis.

Genotyping

UK Biobank participants were genotyped on UK BiLEVE (49,950 participants) and UK Biobank Axiom arrays (438,427 participants), with ~95% shared content. In all, 805,426 directly genotyped variants from 488,377 participants were available prior to QC. The 23andMe replication cohort was genotyped on one of four custom arrays (v1/v2, v3, v4, v5). Illumina HumanHap550+ BeadChip was used for v1/v2 (1680 cases, 4882 controls) and the Illumina OmniExpress+ BeadChip was used for v3 (21,342 cases, 56,448 controls). For v4 a fully customised array (58,883 cases, 148,637 controls) was used, and for v5, the Illumina Infinium Global Screening Array was used (31,136 cases, 85,961 controls). Successive arrays contained significant overlap with all previous arrays.

Quality control

Quality control (QC) for the UK Biobank discovery cohort was conducted using PLINK v1.9⁶⁵ (https://zzz.bwh.harvard.edu/plink/) and R v3.3.1, as previously described (Supplementary Fig. 7)⁶⁶. Briefly, all SNPs with a call rate <90% were removed. Following which, sample-level QC was conducted—individuals were excluded if: (i) they demonstrated heterozygosity >3 S.D. from the mean (calculated using UK Biobank’s PCA-adjusted heterozygosity values, Data Field 20004); (ii) there was disparity between genetically inferred sex (Data Field 22001) and self-reported sex (Data Field 31) or individuals with aneuploidy of sex chromosomes (Data Field 22019); and (iii) had a call rate <98%. Further, we removed all individuals who were not of white British ancestry (based on PCA and self-reported ethnicity (Data Field 22006).

To confirm individuals were of white British-ancestry, we combined our UK Biobank data with individuals identified as GBR from the 1000 Genomes Project (www.1000genomes.org), and using FlashPCA v2.0 (github.com/gabraham/flashpca/) we identified agreement between the two datasets. Following this analysis, 86,693 individuals were excluded from our discovery GWAS. Using a linear mixed model implemented in BOLT-LMM⁶⁷ allowed us to include and account for related individuals. Next, conducting SNP-level QC, we excluded 230,562 SNPs based on: (i) a call rate <98%, (ii) Hardy–Weinberg Equilibrium (HWE) P < 1 × 10⁻⁴, (iii) MAF < 1%. We further excluded six individuals who were visual outliers when autosomal heterozygosity was plotted against call rate. Our post-QC discovery GWAS, therefore, consisted of 401,667 individuals and 547,011 genotyped SNPs (Supplementary Fig. 7). Following QC, our final discovery GWAS consisted of 401,667 individuals of white British ancestry and 547,011 genotyped single-nucleotide polymorphisms (SNPs).

For the 23andMe replication analysis, samples were restricted to individuals from European ancestry determined through an analysis of local ancestry⁶⁸. A maximal set of unrelated individuals was chosen for each analysis using a segmental identity-by-descent (IBD) estimation algorithm, defining related individuals if they shared more than 700 cM IBD, including regions where the two individuals share either one or both genomic segments IBD. Cases were preferentially retained in the analysis. Variant QC was applied independently to genotyped and imputed GWAS results. The SNPs failing QC were flagged based on multiple criteria, such as HWE P-value, call rate, imputation R-square and test statistics of batch effects.

Imputation

The phasing and imputation of UK Biobank has been previously described⁶³. Briefly, using SHAPEIT3⁶⁹ (jmarchini.org/shapeit3/), autosome phasing was conducted, with a reference panel based on Phase 3 1000 Genomes Project data⁷⁰. Imputation was performed using a combined HRC reference panel (www.haplotype-reference-consortium.org/) and a merged UK10000/1000 Genomes Phase 3 Panel⁷¹. This resulted in an imputation file consisting of 92,693,895 autosomal SNPs, short indels and large structural variants⁶³. In 23andMe, out-of-sample modified versions of the Beagle graph-based haplotype phasing algorithm⁷² and Eagle2 v2.3⁷³ algorithm were used to phase samples. We used a 23andMe internally developed tool, which implements the Beagle graph-based haplotype phasing algorithm, modified to separate the haplotype graph construction and phasing steps. We imputed samples against a single unified imputation reference panel combining the 1000 Genomes Phase 3 haplotypes⁷⁰ with the UK10K imputation reference panel⁷¹ using Minimac3⁷⁴.