Introduction

Lipedema has been recognized since the 1940s as a condition occurring mainly in women and characterized by the bilateral enlargement of the lower limbs which in many instances exaggerate the female form [1]. It can be debilitating with respect to resulting pain and impaired mobility [2].

There are very few prevalence estimates for lipedema. A prevalence of 11% was observed in a sample of women from a lymphedema clinic in Germany [3], and a prevalence of 9.7% was observed in a small sample of professional German women [4]. Family history has been reported, suggesting that lipedema is heritable [1, 5, 6]. Further evidence supporting a genetic contribution is that lipedema subcutaneous adipose tissue (SAT) is highly resistant to lifestyle changes such as diet and exercise, occurs after menarche, and occurs nearly exclusively in women. These lines of evidence suggest that environmental (non-genetic) factors may not play a predominant role.

Although lipedema does appear to be heritable, individual genetic loci associated with it have yet to be found. Identifying such loci will provide much-needed insight into the pathophysiology of lipedema, and potentially enable the development of therapeutics, and the identification of individuals at high risk. However, since lipedema is rarely diagnosed, and is not widely recognized, there are currently no large collections of diagnosed lipedema cases with whom genetic risk factors could be identified.

Here, we used body fat percentage and anthropometric measurements from the UK Biobank to classify women into those that appear to have a lipedema phenotype, and those without, in an effort to identify genetic risk factors for lipedema. Given the lack of evidence thus far for a single highly penetrant gene for lipedema [5], we hypothesized that multiple individual genetic variants across the genome are associated with the lipedema phenotype among adult women.

Methods

UK Biobank

The UK Biobank is a prospective cohort study of ~500,000 adults living in the UK, aged 39 to 70 [7, 8]. Participants were measured for a variety of traits and diseases from 2006 to 2010 at 22 centers across the UK. Only women were included in this study, and to minimize the potential confounding effects of ancestry, only women self-identifying European ancestry were included. All participants gave written informed consent, and ethical approval was obtained from the North West Multicentre Research Ethics Committee, the National Information Governance Board for Health & Social Care, and the Community Health Index Advisory Group.

Phenotypic measurements

We used bioelectrical impedance analysis (BIA) data obtained using a Tanita BC418MA (Tanita Inc., Illinois, USA) segmental body composition analyzer. Women who were pregnant, an amputee, wheelchair bound, unable to stand, or using a pacemaker were not measured (i.e., excluded). We excluded women whose left leg fat % was 30% greater or lesser than their right leg fat %, to exclude those with obvious uni-lateral enlargement of the legs. We averaged the left and right leg fat percentages and used this average in all subsequent analyses. The UK Biobank specify that hip and waist circumferences measurements were obtained in centimeters (cm) with a Seca 200 cm tape. The tape was first passed around the smallest part of the trunk (natural indent). If no natural indent could be found, the waist measurement was taken at the level of the umbilicus. The hip circumference was measured at the widest part of the hips.

Lipedema definition

We defined lipedema first as having a relatively high leg fat% along with a relatively small waist circumference. We first obtained the residuals from a linear regression in which the average of the left and right leg fat percentages was the outcome, and the independent variables were height, hip circumference, age, and recruitment center. These covariates were included as most leg fat is likely to be located on the upper end of the leg (by the hip), and it is therefore important to take into account the anthropometric characteristics of hip size and overall body size in order to minimize confounding from these characteristics. Furthermore, since lipedema usually involves high levels of fat deposition throughout the leg, and not only at the hip, these adjustments helps us to identify women with a high fat percentage throughout the leg, and not just at the hip level. We then obtained the residuals from a linear regression in which waist circumference was the outcome, and the independent variables were the same as those listed above for leg fat percentage. Women were deemed to be lipedema cases if they were in the >65th percentile of the leg fat percentage residual and in the <51.3rd percentile of the waist circumference residual (these cutoffs were chosen to achieve a prevalence of 10%). Controls were individuals with ≤45th leg fat residual percentile residual or those in the ≥71.3rd waist circumference residual percentile. The resulting “buffer zone” of individuals was to exclude potential false positives or false negatives. We also considered as a sensitivity analysis a lipedema phenotype at an assumed 5% prevalence. In this case, women were deemed to be lipedema cases if they were in the >75th percentile of the leg fat percentage residual and in the <46.7th percentile of the waist circumference residual, and controls were individuals with ≤65th leg fat residual percentile residual or those in the ≥56.7th waist circumference residual percentile.

Leg pain

Since leg pain is an associated feature of lipedema, we used responses to the following question asked at the baseline exam at each UK Biobank recruitment center: “Do you get a pain in either leg on walking?” This question was administered to a subset (~40%) of UK Biobank participants, as it was introduced by the UK Biobank towards the end of recruitment. A lack of an affirmative response to this question was required to define a patient as a control.

Genetic data

The vast majority of UK Biobank participants were genotyped with the Affymetrix UK Biobank Axiom Array (Santa Clara, USA). Approximately 10% of participants were genotyped with the Affymetrix UK BiLEVE Axiom Array. Tens of millions of additional SNP genotypes were obtained through imputation using the Haplotype Reference Consortium [9] and UK10K [10] haplotype data as references. Principal component analysis was performed by the UK Biobank team, using fastPCA software on a set of 147,604 high-quality directly genotyped markers. Individuals with an unusually high heterozygosity rate, a > 5% missing rate, or a mismatch between self-reported and genetically-inferred sex were excluded. These and other details regarding the genotyping, imputation, and QC procedures are available elsewhere [8]. SNPs not in Hardy-Weinberg equilibrium (p < 1 × 10−6), with a high missingness (>1.5%), a low minor allele frequency (<0.1%), or low imputation quality (info < 0.4) were excluded. A total of 16.8 million SNPs were available for analysis, including those on the X chromosome.

Statistical analyses

We tested the association between each SNP with case-control status using linear regression implemented in the BOLT-LMM software [11, 12], which implements a linear mixed model regression, including SNPs other than the one tested as random effects, and thereby correcting for population stratification and relatedness. Since BOLT-LMM implements a linear regression model, the effect size estimates for case-control outcomes are unreliable. As previously done in other studies [13, 14], we therefore estimated the effect sizes of the genome-wide significant (p < 5 × 10−9) SNPs with logistic regression in R [15]. We included age, hip circumference, recruitment center, genotyping platform, and the first 10 principal components (PCs) as covariates in both the BOLT-LMM and the logistic regression SNP association models. Hip circumference was included to account for overall body size and for proportion of fat in thighs. We performed multiple sensitivity analyses. First, we considered a model without any adjustment for hip circumference. Since there is considerable uncertainty regarding the actual prevalence of lipedema, we also considered a prevalence of 5% instead of 10%. We also performed an analysis only in women with BMI < 30 to avoid obesity complicating the analyses. Finally, we considered analyses in which we added into our case definition an affirmative response regarding self-reported leg pain. SNPs were considered genome-wide significant if p < 5 × 10−9. This threshold was chosen instead of the more typical threshold because we performed multiple GWAS analyses, and to optimize our ability to replicate findings in our much smaller replication dataset. To estimate SNP-based heritability, and to estimate genetic correlations of our lipedema phenotype with a wide range of other human traits and diseases, including other body composition and cardiometabolic traits and diseases, we used the online implementation (http://ldsc.broadinstitute.org/) of the LD-score regression method [16,17,18]. Details of each phenotype and GWAS used in these genetic correlations can be found on the aforementioned website. We examined the LD-score regression intercept, as well as the Q-Q plot, to assess genomic inflation. We tested the association of each of the significant SNPs identified with potentially relevant anthropometric measures among all UK Biobank women of European ancestry, using linear regression, including age, chip, 10 PCs, and center as covariates.

Gene expression enrichment patterns across different tissues were examined through the web-based platform, Functional Mapping and Annotation of Genome-Wide Association Studies (FUMA GWAS) [19], that uses data from GTEx v7 [20], and the MAGMA gene-based analysis for identification of associated genes [21]. Briefly, the MAGMA gene-based analysis uses all the SNPs in a gene as the unit of analysis to test the association of each gene across the genome with the phenotype. A genome-wide significance threshold of p < 2.6 × 10−6 was used. We also used the FUMA GWAS platform to identify eQTL from GTEx v7 and from a large blood eQTL study [22], by interrogating all top SNPs, and all SNPs in LD (r2 > 0.6) with top SNPs.

Pathway over-representation analysis was performed for the unique genes identified by the aforementioned eQTL analysis, using the SOAP/WSDL interface of the ConsensusPathwayDB-human. ConsensusPathwayDB is a project of the Max Planck Institute for Molecular Genetics that integrates multiple interaction networks [23, 24].

Replication study

We sought to replicate our top findings in a case/control study of clinically diagnosed lipedema cases, as previously described [25]. Briefly, 130 lipedema cases recruited from two specialist UK clinics at St George’s University Hospital NHS Trust and the University Hospitals of Derby and Burton NHS Trust were genotyped with Illumina Infinium microarrays. Unaffected females (N = 5848) enrolled in the Understanding Society UK study [26] genotyped with Illumina HumanCoreExome-12 (v1.0) were selected as the control group to the replication cohort controls (European Genome-phenome Archive ID: EGAD00010000890). Assuming a SNP with 50% allele frequency and OR of 1.07 (in line with our findings in UK Biobank), for a case-control study to have 80% power to detect an association with a nominally significant p value (p < 0.05), would require a sample size of >12,000 cases and controls. Therefore, based on the study design, sample size, phenotype definitions, and effect sizes found in the discovery sample, we are highly underpowered to detect a statistically significant effect in the UK Lipoedema study, given the much smaller sample size. However, given the limitations of the phenotype definition, as well as other factors in the UK Biobank study, those effect sizes are likely attenuated compared to the effect sizes we may observe in a study with a more accurate case-control definition. Details of the ‘UK Lipoedema’ cohort genotyping can be found elsewhere [25]. Imputation was performed by aligning variants to the 1000 Genomes [27] reference and the normalized variants were imputed using the Michigan Imputation server [28]. Post imputation quality controls were used to remove low-quality (r2 ≤ 0.8) imputed variants before further analyses. The association analysis was performed using a univariate linear mixed model, implemented in GEMMA software (version 0.98.1) [29]. The p value distribution was assessed using a Quantile-Quantile (Q-Q) plot, and there was no inflation effect observed on the association analysis. Given the 14 statistically significant top hits that we were able to test in the replication study, a significant replication was determined based on a p value < 0.0036, according to a Bonferroni correction.

Results

GWAS

In this study, we defined lipedema cases and controls based on leg fat mass and waist circumference. The characteristics of the different case (n = 24,450) and control (n = 165,227) groups are shown in Table 1. The inferred lipedema cases tend to have a higher leg fat % and a lower waist-to-hip ratio (WHR), as expected.

Table 1 Inferred lipedema case and control characteristics in the UK Biobank.

Cases are shown in green in Fig. 1 with respect to unadjusted waist circumference and unadjusted leg fat %, and in Supplementary Fig. 1 with respect to the residualized values of these variables that were used to determine cases and controls. The LD-score regression intercept (<1.02) and Q-Q plot (Supplementary Fig. 2) from the genome-wide association study suggest little evidence of systematic inflation of effect sizes, beyond the polygenic signal associated with the lipedema phenotype. SNP-based heritability for our main model was 5.13% (see Supplementary Table 1).

Fig. 1: Scatter plot of raw, unadjusted waist circumference and leg fat% measurements of females in the UK Biobank, indicating in green the lipedema phenotype cases.
figure 1

The horizontal and vertical lines indicate the mean of leg fat% and waist circumference of entire female sample, respectively.

We find 18 loci significantly associated with the primary lipedema phenotype as defined in this study (see Fig. 2 and Table 2). Associations of these 18 significant SNPs with relevant anthropometric traits among all UK Biobank European-ancestry women are shown in Supplementary Table 2. In analyses not adjusting for hip circumference, the results are relatively similar, with generally smaller magnitudes of association (see Supplementary Table 3 and Supplementary Fig. 3). The same is observed for the other sensitivity analyses (see Supplementary Figs. 4, 5). The addition of leg pain as a criterion for cases (n = 1724 cases; 165,227 controls) resulted in no genome-wide significant loci (Supplementary Fig. 6). Only the LYPLAL1 locus exhibited a ‘suggestive level’ of significance (p = 4.5 × 10−6, see Supplementary Table 4). The gene-based analysis, in which the collection of SNPs in each gene is taken as the unit of analysis instead of each SNP individually, identified 72 genes associated with lipedema (Supplementary Fig. 7 and Supplementary Table 5).

Fig. 2: Manhattan plot of GWAS of the inferred lipedema phenotype from the UK Biobank, at an assumed 10% prevalence.
figure 2

Loci that were successfully replicated in the ‘UK Lipoedema’ cohort are shown in red. The red horizontal line represents the genome-wide significance p value threshold of 5 × 10−9.

Table 2 List of lead variants showing significant association with the lipedema phenotype (hip circumference (HC) adjusted) from the UK Biobank GWAS, including associations of the corresponding SNPs with lipedema in the UK Lipoedema replication study.

Genetic correlations with other traits and diseases

We observed significant positive genetic correlations of the lipedema phenotype with body fat and leptin levels (Supplementary Fig. 8). We also find nominally significant positive genetic correlations with hip circumference, primary biliary cirrhosis and BMI. We find nominally significant negative genetic correlations with WHR, age at first birth, forced vital capacity, birth weight, and age at menopause (Supplementary Fig. 8)

Associations of loci with tissue-specific gene expression levels

Genes implicated by eQTL analysis, and which may provide insight into the genetic regulation of the lipedema phenotype, vary by locus and by tissue. For example, the RSPO3 top SNP, rs72959041, is associated with the expression level of the RSPO3 gene in subcutaneous adipose tissue, as are ZNF664 with the rs11057418 SNP, and TIPARP with the rs4680338 SNP (Supplementary Table 6). The top SNP at the GRB14-COBLL1 locus is associated with expression of several genes in various tissues. A list of these gene expression patterns for the significant (FDR < 0.05) eQTLs of the top SNPs can be found in Supplementary Table 6. Upon expanding this analysis to all SNPs in LD (r2 > 0.6) with the top SNPs, a more extensive list of genes and tissues was identified (Supplementary Table 7). A list of genes sorted by relevant and highly represented tissues is shown in Supplementary Table 8. Tissue enrichment analyses based on the gene-based (MAGMA) analysis identify arterial blood vessels as the main tissues where these genes may exert their actions (Supplementary Fig. 9).

Pathway analysis

Our pathway analysis results show significant enrichment for multiple pathways and Gene Ontology Terms (Supplementary Table 9). Noteworthy are: a) the EGFR1 signaling pathway, a pathway that induces growth, differentiation, migration, adhesion and cell survival through multiple hormone interactions [30]; and b) GO:0003785 actin monomer binding, a regulator of actin cytoskeleton dynamics in cells [31].

Replication

In a replication cohort of phenotyped lipedema patients from the ‘UK Lipoedema’ study, we were able to obtain results for 14 out of 18 locus lead variants (see Table 2). We identified two statistically significant associations also showing the same direction of effect as in UK Biobank: near VEGFA (p = 5.0 × 10−4) and near GRB14-COBLL1 (p = 2.3 × 10−3). Two nominally significant loci (p < 0.05; ADAMTS9 and LYPLAL1) were also directionally consistent.

Discussion

Using an inferred lipedema phenotype based on high leg fat % and small waist in the UK Biobank, we performed the first GWAS of a lipedema phenotype and identified 18 loci across the genome. Two of these loci (VEGFA and GRB14-COBLL1) were significantly associated with lipedema in an independent case-control study including clinically diagnosed lipedema cases.

Loci in/near RSPO3, GRB14-COBLL1, ZNF664-FAM101A (near CCDC92), VEGFA, ADAMTS9, LYPLAL1, ANKRD55-MAP3K1 have previously been found to be associated with WHR, and importantly, to exhibit stronger effects in women than in men [32,33,34]. Importantly, we show that these loci are associated with leg fat % independently of hip circumference, indicating that these associations are not predominately driven by hip circumference or fat around the hip, but rather fat throughout the lower limbs.

Pain in the lower limbs is a common complaint among lipedema patients [2] and when including pain as a criterion in the GWAS, only one of the loci, rs749853052 near LYPLAL1, exhibited a ‘suggestive level’ of significance. We strongly suspect that the reduced sample size when incorporating leg pain information resulted in reduced statistical power to detect significantly associated loci. This variant is over 250 kb downstream of the LYPLAL1 (lysophospholipase-like 1) gene. The function of this locus is still unknown, although it may act as a triglyceride lipase or lysophospholipase [35, 36]. This locus has been associated with WHR, with a stronger effect in women than in men [32, 37, 38]. It has also been associated with a “favorable adiposity” or gynoid phenotype [39], and with non-alcoholic fatty liver disease [40]. It has also been found to be more abundantly expressed in the subcutaneous adipose tissue of obese compared to lean individuals [41], potentially making it an interesting association to explore in the context of lipedema.

One of the novel loci identified on chromosome 5 is located in the LINC01184 non-coding gene, and just upstream of the SLC12A2 gene. Our eQTL analysis revealed that the top SNP at this locus is associated with increased expression of the fibrillin 2 (FBN2) gene in the thyroid. The FBN2 gene is located downstream of the SLC12A2 gene. This locus was also identified in a GWAS of varicose veins of the lower extremities [42]. The allele associated with decreased risk of varicose veins is in strong LD (r2 > 0.93) with the allele associated with increased odds for the lipedema phenotype, suggesting a potential connection between lipedema and varicose veins.

We identified an intronic variant in the ADAMTSL3 gene which codes for a glycoprotein. ADAMTSL3 localize to the extra-cellular matrix (ECM) [43] where it may modulate the ADAMTS proteinases [44]. ADAMTS proteins are involved in ECM or cell–matrix interactions [44]. As the EMC is important for the regulation of adipocyte expansion and proliferation [45], ADAMTS and ADAMTS-like protein could be important in that process. Interestingly, ADAMTSL3 has been associated with overall body fat [46] and lean body mass [47].

Finally, another locus worth highlighting is at the DNAH10-CCDC92-ZNF664 locus. Knockdown of both DNAH10 and CCDC92 has previously been shown to result in lowered mRNA levels of the respective gene, and in reduced lipid accumulation in mouse adipocytes, consistent with an impairment of lipid accumulation in peripheral adipose tissues in humans [48]. This locus was also implicated in abnormally high HDL-C levels [49], and in large HDL particles [50], further suggesting the involvement of this locus in adipose tissue growth and its consequences.

Of course, for any of the above, the question remains if the inferred lipedema phenotype has any resemblance to a clinically defined lipedema cohort. When validating 14 SNPs in the ‘UK Lipoedema’ cohort [25], the VEGFA and GRB14-COBLL1 loci were significantly associated with lipedema. In addition to the well-established association of these loci with WHR [34], they have also been associated with other cardiometabolic traits and diseases [51, 52], as well as with a favorable pattern of adiposity [39, 53]. Interestingly, other loci identified for favorable adiposity overlap with some of those identified here, such as ANKRD55/MAP3KI, DNAH10/CCDC92, and FAM101A [39, 53]. It has been suggested previously that despite higher BMI, lipedema patients have relatively lower risk of type 2 diabetes [54] and the gynoid SAT may protect against cardiovascular risk [55].

Among the strengths of our study are the large sample size that enabled our genetic investigation and identification of associated loci. Although there are other studies of lipedema, they have much smaller samples, and thereby do not have sufficient power to detect the tiny effect sizes that characterize the genetic architecture of most traits that do not have a single-gene cause. Another strength is the availability of relatively detailed body composition measures. Given the small effect sizes typically observed for the genetics of complex traits and diseases, a large sample size increased our probability of detecting loci associated with the lipedema phenotype, at the cost of the quality of the phenotype (i.e., absence of a lipedema diagnosis, see next paragraph). A major limitation of this study is that we could not rely on an actual diagnosis or on a validated classifier of lipedema. Since the recognition and diagnosis of lipedema is in its infancy, and is still very limited, it is currently difficult to obtain large collections of genotyped women with diagnosed lipedema. The binary classifier that we used likely mis-classified a number of lipedema cases and controls. However, the larger sample size and the availability of a replication cohort consisting of diagnosed lipedema cases likely counter-balanced some of the resulting loss in power from the discovery cohort. It is possible that the 16 loci identified in the UK Biobank discovery cohort that were not replicated in the replication cohort are loci that are not associated with lipedema, but rather with other associated aspects of body shape that are similar to, but not related, to lipedema. However, it is possible that some of the other discovered loci are implicated in lipedema, but we were underpowered to detect their association in the replication study, and that only through the study of further cohorts could we confirm the potential role of these loci in lipedema. Another limitation of our study is that it is limited to people of European (mainly British) ancestry, and to individuals between the ages of 40 and 70, many years after lipedema typically initiates, and when the lipedema phenotype is more likely to be confounded by frank obesity.

In conclusion, we have identified 18 loci associated with an inferential lipedema phenotype in adult women of European descent. Some have previously been identified as female-specific loci for WHR, while others have not previously been linked to body composition phenotypes. In a replication study with clinically diagnosed lipedema cases, we successfully replicate the VEGFA and GRB14-COBLL1 loci, which have previously been associated with fat distribution patterns. We hope that these loci and the genes and tissues that they implicate will provide starting points towards a better understanding of the pathophysiology of lipedema, and eventually to treatment and prevention approaches.