Introduction

Pelvic organ prolapse (POP) refers to the condition where one or more of the pelvic organs herniate to or beyond the vaginal opening and can involve the bladder, uterus, rectum, or post-hysterectomy vaginal cuff1. Symptoms affect quality of life and include bothersome sense of vaginal bulb, urinary or bowel symptoms, and sexual dysfunction2,3,4. The five stages of POP as defined by Pelvic Organ Prolapse Quantification system (POPQ)5 range from stage 0; no prolapse, to stage IV which is complete eversion of the total length of the lower genital tract. Prevalence estimates for all stages of POP are 30–50% among postmenopausal women diagnosed on physical examination6,7,8, but 7–26% if diagnosis is restricted to pelvic organs herniating beyond or at the hymenal remnant (stages II–IV)9,10. Symptom-based prevalence estimates are lower, or 3–6%11,12 with incidence peaking between ages of 70 and 797. Between 11 and 13% of women have had surgery for POP or related conditions by age 80 years13,14,15. POP is the leading indication for hysterectomy in postmenopausal women and accounts for 1 in 6 hysterectomies in all age-groups16.

Risk factors for POP vary depending on POP classification. For POP stages II-IV (a descent at or beyond the hymenal remnant), risk factors are number of children delivered, vaginal delivery, advancing age and BMI17, suggesting the role of tissue trauma, estrogen exposure and intra-abdominal pressure in the pathogenesis of POP. For all stages of POP, Hispanic ethnicity6, lack of pelvic floor muscle strength18 and family history19 are also reported as risk factors, and previous hysterectomy is associated with severe POP20. For symptomatic POP, poor health status, constipation, or irritable bowel syndrome are also among reported risk factors11.

Although the etiology of POP is not fully understood, and is likely multifactorial, abnormality in the connective tissue supporting the vagina and pelvic organs or in the muscles in the pelvic floor have been proposed to contribute to the pathophysiology of the condition21,22. The pelvic organs’ support is dependent on interactions between the levator ani muscle and pelvic connective tissue, i.e. the ligaments holding the organs in alignment23. A higher prevalence and more recurrent POP is seen in women with joint hypermobility than in others24,25. Moreover, higher serum concentration of procollagen III25 is found in women with joint hypermobility and in those with recurrent POP. It is, however, unclear whether such changes in collagen metabolism cause POP or are the result of a trauma26,27.

Genetic factors have been estimated to explain 43% of the variation in risk of POP in a twin study28. However, previous candidate gene-, linkage- and genome-wide-association studies (GWASs) on POP, generally of small sample sizes, have not yielded sequence variants that associate with POP26,29,30,31.

To provide insights into the etiology of POP, we performed a combined GWAS for POP in Iceland and the UK using data on 15,010 cases with hospital-based diagnosis code and 340,734 controls. We discovered eight variants at seven loci that associate with POP and point to a role of connective tissue metabolism and estrogen in the etiology of POP.

Results

Association analysis

We performed a meta-analysis of two GWA-studies on POP, one from Iceland and the other from the UK (UK Biobank: UKB) with a combined sample of 15,010 cases and 340,734 controls of European ancestry (Supplementary Data 1). The Icelandic GWAS included 3409 cases and 131,444 controls and the corresponding numbers from UKB were 11,601 and 209,288. Cases were identified from hospital-based diagnosis records (International classification of disease (ICD) edition 10 code N81: Female genital prolapse) and controls were all females (see Methods for a detailed description of the Icelandic and UK datasets). To account for multiple testing, we used a weighted Bonferroni procedure based on sequence variant annotation32 (Supplementary Data 2). A total of 245 variants at seven loci associate with POP at genome-wide significance (Fig. 1 and Supplementary Fig. 1, Supplementary Data 3). Conditional analysis at each locus identified a secondary signal at one (2p16.1) of the seven loci (Supplementary Data 4) resulting in eight distinct associations with POP (Table 1, Supplementary Fig. 2). All eight variants were nominally significant in both populations and accounting for multiple testing (P-value > 0.05/8 = 6.25 × 10−3), there is no significant heterogeneity in the effect estimates from the two datasets (Table 1). Seven out of the eight variants are common, with minor allele frequency (MAF) range of 17–48%, and one is of low frequency (MAF = 4.87%). The number of variants that correlate with the lead variants (r2 > 0.8) at these loci ranges from 3 to 78 (Table 1, Supplementary Fig. 2). No previously suggested POP variants from candidate gene, linkage or GWAS studies26,29,30,31 associate with POP (Supplementary Data 5, 6).

Fig. 1: Manhattan plot of association results between sequence variants from Iceland and UK Biobank and pelvic organ prolapse in the meta-analysis.
figure 1

The significance of association for each variant (P-values on -log10 scale) are plotted against the respective position on each chromosome. Red line indicates genome-wide significance level (5 × 10−8). Genes closest to the 7 loci of interest are annotated in the Figure. The plot was created using qqman: an R package for visualizing GWAS results using Q-Q and Manhattan plots102. Variants with imputation information >0.9 are displayed.

Table 1 Association results for lead variants at loci reaching genome-wide significance in a meta-analysis of pelvic organ prolapse.

We assessed the robustness of the POP association results for the eight lead variants in a group of more stringently diagnosed POP cases that are defined by procedure codes specific to the treatment of POP in addition to the ICD10 code N81 (Supplementary Data 7). The statistical power for this analysis comes mainly from the UKB data because information on POP related surgeries was available for 61% of the ICD-coded POP cases in UK, but only 13% of the Icelandic POP cases. The effect sizes tended to be greater for the association using cases that had undergone surgery compared to those diagnosed only based on ICD code (Supplementary Data 8). This most likely reflects increased severity of POP among those that had undergone surgery or a greater number of falsely diagnosed POP cases among the ICD based cases. We conclude that our results based on ICD codes alone are robust as the effects are in the same direction and the effect sizes are not substantially different from those using surgically treated POP cases.

We used LD score regression to estimate the SNP heritability of POP33. Using LD scores for about 1.1 million variants found in European populations we estimated SNP heritability in the meta-analysis to be 12.4% (95% CI 9.9–14.8%).

Functional annotation and biological inference of risk loci

None of the lead POP variants are coding or in high LD (r2 > 0.8) with coding variants (Supplementary Data 3) and none was highly correlated with the top cis-eQTL for neighboring genes in any of the available tissues using mRNA expression data from the GTEx database and our own RNA-sequence data on Icelandic samples from blood (13,162 individuals) and adipose tissue (749 individuals).

To gain a further understanding of the variants associating with POP we explored how they correlate with other traits in combined data from Iceland and UKB – a database with approximately 800 traits (P-value threshold = 0.05/800 = 6.3 × 10−5) – in addition to looking up all variants and their correlates (r2 > 0.5) in the GWAS-catalog34 (Supplementary Data 9, 10). As we observed overlaps of genetic association to different traits we conducted adjusted association analyses (co-localization analyses) at the POP loci to identify association results that are consistent with a single signal representation (see Methods).

Six of the eight POP variants also associate with a total of 13 other phenotypes. Three of those variants associate with six phenotypes with pathogenesis that may be the same as that of POP. Two POP variants associate with conditions related to estrogen exposure, rs3820282 at the WNT4 locus (leiomyoma of uterus, gestational duration and endometriosis) and rs12325192 at the SALL1 locus (leiomyoma). The third, rs3791675 at the EFEMP1 locus, associates with conditions related to aberrant connective tissue function (hernias and carpal tunnel syndrome) (Table 2).

Table 2 A summary of previously reported (*) and novel associations of POP variants with other traits.

Secondary, non-symptomatic diagnosis of POP through conditions such as leiomyoma and endometriosis may introduce a confounding of or biases in the association results from our GWAS meta-analysis. We therefore tested the association of our eight POP variants with POP after excluding endometriosis and leiomyoma cases from the analysis. This did not affect effect or significance substantially (Supplementary Data 11) indicating that the POP associations are independent of endometriosis and leiomyoma.

POP associated loci

The most significant POP association is with rs3820282–T (P = 3.3 × 10−21, OR = 0.85), located in intron 1 of WNT4, which encodes a protein involved in development of the female reproductive tract35. Loss of WNT4 function in humans leads to underdevelopment and sometimes absence of the uterus and vagina36. Variants correlated (r2 > 0.8) with the POP-protecting allele rs3820282–T have been associated with increased risk of endometriosis37, leiomyoma38, increased gestational duration39 and with decreased bone mineral density40 (Supplementary Data 10), all of which we replicate (Supplementary Data 9). Rs3820282–T also associates with lower number of children in our combined data (Supplementary Data 9). Our co-localization analyses revealed that apart from the bone mineral density and number of children signals, the association results for leiomyoma, gestational duration, and endometriosis are likely to be the same signal as the POP association at the locus (Supplementary Data 12 and Table 2).

Rs12325192 is located near SALL1 and the POP-protecting allele rs12325192–T is correlated (r2 = 0.94) with rs66998222, a variant we have previously reported to associate with leiomyoma38; a trait that is positively correlated with estrogen exposure (Table 2). Mutations in SALL1 have been associated with Townes-Brocks syndrome, a condition associated with renal malformation, suggesting a role of SALL1 in genitourinary development41.

Two distinct POP variants, rs3791675 and rs1430191, are located in and near EFEMP1 (also known as FBLN3), a gene encoding fibulin-3. Fibulins are components of microfibrils, the building blocks of elastic fibers that are generated in fibroblasts42 and provide tissue elasticity. EFEMP1 is expressed in mouse connective tissue and network analyses suggest a role for EFEMP1 in connective tissue maintenance/homoeostasis43. Rare mutations in EFEMP1 have been found to cause autosomal dominant Doyne honeycomb degeneration of retina, characterized by yellow–white deposits known as drusen that accumulate beneath the retinal pigment epithelium44. The strongest POP association at the locus is with rs3791675–T (P = 2.7 × 10−17, OR = 0.87). Its correlates (r2 > 0.8) have been associated with various traits in Europeans, including less height45,46, smaller BMI-adjusted waist circumference47,48 and protection against inguinal hernia43 (Supplementary Data 10). We replicate these associations with our data (Table 2). In addition, rs3791675–T associates with protection against various hernias; inguinal hernia, femoral hernia, umbilical, and ventral hernia, consistent with the hypothesis of a similar proposed collagen pathophysiology for POP and hernias49,50,51,52. Rs3791675–T also associates with protection against diverticular disease, a disease in which abnormal collagen and decreased tensile strength of the colonic wall have been proposed to contribute to pathogenesis53. We furthermore found rs3791675–T to associate with increased risk of carpal tunnel syndrome – a condition also linked to connective tissue metabolism54 – higher pulse pressure, and the lung function ratio forced expiratory volume (FEV1)/forced vital capacity (FVC) (Supplementary Data 9). According to our co-localization analyses, association results for inguinal hernia, height, FEV1/FVC, carpal tunnel syndrome, ventral hernia, and waist circumference (Supplementary Data 12 and Table 2), are consistent with a single signal origin.

Rs9306894, located in the 3′UTR of GDF7 (also known as BMP12), associates with POP. GDF7 encodes a secreted ligand of the TGF–beta superfamily of proteins and is thought to be involved in tendon and ligament formation and repair55,56. The protein has been found to play a crucial role in tenogenesis of mesenchymal stem cells and is used in tissue-engineering to treat tendon injury57. A correlated variant near GDF7 (rs2289081–C, r2 = 0.85) has been associated with decreased pulse pressure58 and rs7255–T (r2 = 0.62 to rs9306894) has been associated with increased risk of Barrett’s oesophagus and esophageal adenocarcinoma combined (BE and EA) (Supplementary Data 10), which we replicate (Supplementary Data 9) but co-localization analysis only supports a single signal origin for BE and AE combined (Supplementary Data 12). We replicate the previous association of a highly correlated variant rs9306895 (r2 = 0.996) with prostate cancer59 (P = 6.15 × 10−6, OR = 1.07) (Table 2).

For the POP variant rs1247943, close to TBX5 (distance: 172 Kb), we replicate the previous associations of two highly correlated variants with the POP risk-increasing allele; rs1270884 (r2 = 0.81) and rs2555019 (r2 = 0.99) with increased risk of prostate cancer60 and lowered risk of benign prostatic hyperplasia61 (Table 2).

Rs7682992–T, the POP variant close to the FAT4 gene (distance: 0.5 Mb), associates with increased risk of stress incontinence (Table 2). FAT4 is part of the planar cell polarity pathway that controls tissue organization; loss of FAT4 in mice leads to cystic kidney disease62.

Rs72624976 is located in the 3′UTR of IMPDH1, one of two target genes for the immunosuppressant drug mycophenolic acid63. Mutations in IMPDH1 have been associated with the eye disorders retinitis pigmentosa64 and Leber´s congenital amaurosis65.

Comorbidity

To test further for comorbidity between POP and other traits, we collected sequence variants reported to associate with traits that associate with one or more of our POP variants: leiomyoma (N = 32)38,66, gestational duration (N = 5)39, endometriosis (N = 27)67, BE and AE combined (N = 14)68, prostate cancer (N = 145)59, inguinal hernia (N = 4)43, height (N = 3,290)69, FEV1/FVC (N = 52)70, BMI-adjusted waist circumference (N = 70)48 and benign prostatic hyperplasia (N = 23)61. Of the 3,652 unique variants collected, 17 were correlated within height (r2 > 0.8). We tested these 3,635 (3652–17) independent sequence variants for association with POP. As 44 variants were correlated across traits (r2 > 0.8), we used the significance threshold of P < 1.4 × 10−5 (0.05/3591) (3652-17-44 = 3591) (Supplementary Data 1322). We found variants at nine loci to associate with POP; five of which associate at genome-wide significance with POP, three that were previously associated with height (P < 1.4 × 10−5) at TXNDC5 (6p24.3), SLC12A2 (5q23.3) and LOXL1 (15q24.1) (Supplementary Data 19) and one at WT1 that associates with inguinal hernia (11p13) (Supplementary Data 18).

We further used the variants collected for each of the ten traits as instruments to explore a possible causal relationship between each trait and POP71. By regressing the effect estimates of each group of variants (trait-specific) on POP on the effect estimates of those variants on the corresponding traits, using AF × (1-AF) as a weight72, we did not observe a correlation between the effect estimates (Supplementary Fig. 3).

We also plotted the POP effects of the eight POP variants against their effects on the 13 traits listed in Table 2 (Iceland and UKB combined, apart from height and gestational duration which are data from Iceland) (Supplementary Fig. 4). It is apparent that although a handful of variants associate with POP and other phenotypes, there is little overlap between the variants associating with POP and the other phenotypes and little correlation between effect estimates. We are not aware of previous literature findings showing associations of these phenotypes within POP cases and their family members.

BMI is not associated with POP at the genetic level

We screened for evidence of a genetic relationship of POP and two of the most consistently reported risk factors for POP in addition to age; BMI and number of children. We verified that variants reported to associate with BMI73,74 or number of children75 do not associate with POP (Supplementary Data 23, 24) and that adding BMI or number of children separately as covariates in our models did not affect the effect sizes or significance of the eight lead variants when tested for association with POP (Supplementary Data 25). Furthermore, no correlation was found between effect estimates of BMI- or number of children sequence variants and their effects on POP and vice versa (Supplementary Figs. 5, 6). Using polygenic risk scores for POP, we saw little evidence of association with number of children (P-value = 9.5 × 10−5, beta = 0.015) in Iceland only and none with BMI (Supplementary Data 26). For BMI, we found no evidence of causal relationship with POP at the genetic level. With that in mind, and since only one of the POP variants (rs3820282) associates with number of children we do not find strong support for a causal pathway or a common third factor affecting POP and those two traits.

Discussion

We conducted a meta-analysis of two GWAS for POP and discovered eight variants at seven loci that associate with POP. Two sequence variants, rs3820282 at WNT4 and rs12325192 near SALL1 also associate with other traits that are strongly affected by estrogen exposure, i.e. leiomyoma (rs3820282 and rs12325192), gestational duration (rs3820282) and endometriosis (rs3820282). Of the 23 correlated variants at the 1p36.12 locus, rs3820282 has the best functional candidacy. Rs3820282–T resides in a region marked by mono-methylation of histone H3 at lysine residue K4 (H3K4me1), a characteristic of enhancers, and is predicted to alter a conserved binding site for the transcription factors (TFs) ESR1 and ESR276 supported by ChIP-seq data in breast and bone cell lines77. This suggests that the variant may alter the estrogen-based regulation of WNT4 or adjacent genes39. A previous report showed chromatin looping between the region containing rs12038474 (LD to rs3820282; r2 = 0.81) and CDC42 in endometrial adenocarcinoma78. Our analysis of available Hi-C data79 along with enhancer-gene predictions80 suggests WNT4, HSPG2, and CDC42 as possible enhancer-gene targets, using relevant cell-types and tissues. Based on the role of WNT4 in female sex organ development35, WNT4 can be considered to be a strong candidate gene at the locus. One of the eight POP variants, rs3791675 at EFEMP1, associates with traits with proposed collagen pathophysiology, i.e. inguinal hernia, ventral hernia and carpal tunnel syndrome. Furthermore, two genes at POP associated loci, GDF7 and EFEMP1, have functions related to connective tissue metabolism.

We find that the overlap between POP variants and other traits are limited to distinct signals and thus do not allow for strong inference regarding common pathways or causal relationships. However, the pleiotropy found at three of the POP loci, where the associating traits have similar proposed pathophysiology as POP, together with the known functions of the proposed target genes, may fuel further explorations of the possible role of estrogen exposure and connective tissue metabolism in the etiology of POP. Through analysis of variants that are reported to associate with conditions that associate with one or more of our POP variants, we identified four additional POP variants, three that associate with height and one with inguinal hernia.

In mice, a POP phenotype has been associated with knockdown of fibulin 3 (EFEMP1), lysyl oxidase like 1 (LOXL1), fibulin 5 (FBLN5), and homeobox A11 (HOXA11) as summarized in a recent review on key genes and pathways for POP81. Of those four genes, variants at the EFEMP1 locus associate with POP in our data and, through analysis of height variants, also a variant at the LOXL1 locus; rs12440667–T (P = 3.6 × 10−6, OR = 0.94) (Supplementary Data 19, Supplementary Fig. 7). EFEMP1 encodes fibulin–3 but members of the fibulin family of extracellular matrix (ECM) proteins have been found to be important in elastic fiber assembly82,83. The two EFEMP1 mouse knockout studies reported to date84,85 demonstrate an abnormality in the integrity of elastic fibers in fascia connective tissue and connective tissue of the vaginal wall with abnormal pelvic organ support in addition to increased activity of matrix metalloprotease (MMPs) that degrade matrix collagen in connective tissue. These mice also displayed an early onset of aging-associated traits including decreased body mass, lordokyphosis, reduced hair growth, and generalized fat, muscle and organ atrophy, as well as reduced lifespan. The POP variants in and near to EFEMP1 did not associate with traits likely to cause lordokyphosis (any fracture and osteoporosis in the ICE-UKB data, and BMD (hip and spine measures)), BMI (both sexes and females only) or lifespan (females) in the Icelandic data (Supplementary Data 27).

In this paper, we report the first set of genome-wide significant POP variants identified through GWAS. The genetic overlap observed between POP and several traits with similar pathophysiology point towards a role of estrogen exposure and connective tissue metabolism in the etiology of POP. The results provide new insights for future research to further the understanding of POP.

Methods

Datasets

The meta-analysis combined the results of two GWA studies of pelvic organ prolapse (POP). The Icelandic data originates from Landspitali – The National University Hospital Inpatient Registry from January 1983 to August 2018. A total of 3,699 women had a POP diagnosis (ICD10 code N81 or ICD9 code 618). Controls were recruited through different genetic research projects at deCODE genetics. We had genotype information for 92% of the POP cases, so that the Icelandic dataset consisted of 3409 cases and 131,444 female controls. The UK Biobank data (UKB) consists of 11,601 cases and 209,228 controls of European ancestry (self-reported white British with similar genetic ancestry based on principal component analysis and with consistent reported and genetically determined gender)86, recruited between 2006 and 2010 aged 40–69, and with follow-up until 201687. The UKB cases had ICD10 code N81 in hospital inpatient records, with data dating back to between 1981 and 1997, depending on the external source (Hospital Episode Statistics from England (89% of participants), Patient Episode Database for Wales (7%) and Scottish Morbidity Record (7%)) and include primary and secondary diagnosis of POP (http://biobank.ctsu.ox.ac.uk/showcase/docs/inpatient_mapping.pdf).

Hospital-based POP diagnosis are likely to be stage II or greater, according to the POPQ standardized grading system5. Furthermore, the ratio of cases to controls in the datasets (0.055 in UKB and 0.026 in ICE) are consistent with reported prevalence estimates that use POPQ stage II–IV as a criterion for diagnosis or with symptom-based prevalence estimates8,9,10,11. The 2-fold difference in the ratio of cases to controls of POP in UKB data compared to the data from Iceland is explained by the difference in age-distributions (Supplementary Data 1). To further compare the POP phenotype between Iceland and UKB, we compared the percentage of cases within each of the nine subgroups of ICD10 code N81. In both datasets the majority of cases are diagnosed with cystocele or rectocele and fewer with uterovaginal prolapse (Supplementary Data 28).

The leiomyoma38, benign prostatic hyperplasia61, prostate cancer61, endometriosis37, diverticular disease88, and lung function70 datasets have been described previously. We obtained blood pressure measurements from Landspitali – The National University Hospital of Iceland, the Primary Health Care Clinics of the Reykjavik area and at recruitment for deCODE studies. Data on gestational duration (GD) (Iceland only) for the first available pregnancy was obtained from the National Birth Registry, excluding multiple pregnancies, stillbirths and pre and post term births (GD ≤259 days and GD ≥301 days) and adjusting for age of mother, sex, and year of birth of the child. Information on number of children was extracted from deCODE geneology database, adjusting for year of birth and county. Information on BMI was corrected for year of birth, age and county, conditional on age >18. BMI values originate from measured and self-reported data and are mean values for multiple measures within individuals. Waist circumference measurement was adjusted for age, gender, and BMI, conditional on age > 18. The BMD values at head (DEXA, Hologic QDR4500A) were age, height and body mass index corrected (Iceland only). Quantitative traits were rank-based inverse normal transformed to a standard normal distribution separately for each gender. Other datasets from Iceland originate from deCODE genetics phenotype database which contains extensive medical information on various diseases and other traits. Cases with hernias were identified by hospital-based ICD10 diagnosis codes: K40 (inguinal hernia), K41 (femoral hernia), K43 (ventral hernia) in both datasets; Iceland and UKB and the same applies for diverticular disease (K57), stress incontinence (N393), carpal tunnel syndrome (G560), Barrett’s oesophagus and esophageal adenocarcinoma combined (K227 and C15, data in UKB only), and actinic keratosis (L57). For sex specific phenotypes in Iceland, only controls for the relevant sex were included in the analysis.

The study was approved by the Icelandic National Bioethics Committee (bioethics consent number VSN 18-067) in agreement with conditions issued by the Data Protection Authority of Iceland. Written informed consent was obtained from all genotyped subjects. Personal identities relating to participants’ data and biological samples (i.e. blood samples, buccal samples, or adipose tissue samples) were encrypted by a third-party system (Identity Protection System), approved and monitored by the Data Protection Authority89. DNA was extracted from blood and buccal samples and RNA from blood and adipose tissue samples.

Genotyping and imputation

Details of genotyping and imputation methods in the Icelandic part of the study have been described90. In brief, the whole genomes of 15,220 Icelanders, participating in various disease projects at deCODE genetics, were sequenced to a mean depth of at least 10× (median 32×) using Illumina technology. Genotypes of SNPs and indels were called using joint calling with the Genome Analysis Toolkit HaplotypeCaller (GATK version 3.4.07)91. As all of the sequenced individuals had also been chip-typed and long-range phased, information about haplotype sharing was used to improve genotype calls. In total, 151,677 Icelanders have been genotyped with various Illumina SNP chips, long-range phased and imputed based on the sequenced data set92. Using genealogic information, genotype probabilities for 282,894 untyped relatives of the genotyped individuals were calculated to further increase the sample size for association analysis and to increase the power to detect associations.

The informativeness of genotype imputation was estimated by the ratio of the variance of imputed expected allele counts and the variance of the actual allele counts:

$$\frac{{{\mathrm{Var}}\left( {E\left( {\theta |{\mathrm{chip}}\,{\mathrm{data}}} \right)} \right)}}{{{\mathrm{Var}}\left( \theta \right)}}$$

where θ is the allele count. Here, \({\mathrm{Var}}\left( {E\left( {\theta |{\mathrm{chip}}\,{\mathrm{data}}} \right)} \right)\) is estimated by the observed variance in the imputed expected counts and Var(θ) was estimated by p (1 − p), where p is the allele frequency. Variants were annotated using Ensembl release 80 and Variant Effect Predictor (VEP) version 2.893.

Genotyping of UKB samples was performed using a custom-made Affymetrix chip, UK BiLEVE Axiom94 in the first 50,000 participants and with the Affymetrix UK Biobank Axiom array95 in the remaining participants with 95% of the signals on both chips. Imputation was performed by Wellcome Trust Centre for Human Genetics using a combination of the Haplotype Reference Consortium (HRC), 1000 Genomes phase 3 and the UK10K haplotype resources86. This yields a total of 92.6 million imputed variants, however, we used only markers that were imputed on the basis of the HRC panel owing to problems in the UK10K + 1000 Genomes panel imputation, resulting in 37.1 million imputed variants used in the current study.

GWAS and meta-analysis

We used logistic regression assuming an additive model in the case-control analysis to test for association between variants and disease, treating disease status as the response and expected genotype counts from imputation as covariates, and using likelihood ratio test to compute P-values. For the Icelandic data, the model also includes as nuisance variables available individual characteristics that correlate with phenotype status. In Iceland, these covariates are: county of birth, current age or age at death (first and second order terms included), availability of blood sample for the individual and an indicator function for the overlap of the lifetime of the individual with the time span of phenotype collection. The association analysis for both the Icelandic and UKB datasets was done using software developed at deCODE genetics90. For the association testing in UKB, 40 principal components were used to adjust for population substructure and age was included as a covariate in the logistic regression model.

To account for inflation in test statistics due to cryptic relatedness and stratification, we applied the method of linkage disequilibrium (LD) score regression33. LD scores were downloaded from an LD score database (ftp://atguftp.mgh.harvard.edu/brendan/1k_eur_r2_hm3snps_se_weights.RDS; accessed 23 June 2015). With a set of 1.1 million variants, we regressed the χ2 statistics from our GWAS scan on the LD scores and used the intercept as a correction factor. P values were then adjusted by dividing the corresponding χ2 values by the correction factors. The estimated correction factor for POP based on LD score regression was 1.12 for the Icelandic and 1.05 for the UK datasets, respectively.

Variants in the UKB imputation dataset were mapped to National Center for Biotechnology Information (NCBI) Genome Reference Consortium Build 38 positions (GRCh38) and matched to the variants in the Icelandic dataset based on allele variation. Only sequence variants from the Haplotype Reference Consortium panel (HRC) were included in the meta-analysis. The results from the two datasets were combined using a fixed effect model (a Mantel-Haenszel model)96 in which the datasets were allowed to have different population frequencies for alleles and genotypes but were assumed to have a common OR and weighted with the inverse of the variance. We selected a threshold of 0.8 imputation info and MAF >0.01% for variants available in the Icelandic data set and/or the UKB data set. A total of 37,144,863 variants were used in the analysis. We tested for heterogeneity by comparing the null hypothesis of the effect being the same in both populations to the alternative hypothesis of each population having a different effect using a likelihood ratio test (Cochran´s Q) reported as Phet (Table 1 for POP and Supplementary Data 29 for other traits).

We used the weighted Bonferroni method to account for all 37,144,863 variants being tested (P-value < (0.05 × weight)/37,144,863). Using the weights given in Sveinbjornsson et al.32, this procedure controls the family-wise error rate at 0.05; P ≤ 2.2 × 10−7 for high-impact variants (including stop-gained, frameshift, splice acceptor or donor and initiator codon variants, N = 9,658), P ≤ 4.4 × 10−8 for missense, splice-region variants and in-frame-indels (N = 180,803), P ≤ 4.0 × 10−9 for low-impact variants (including synonymous, 3′ and 5′ UTR, and upstream and downstream variants, N = 2,653,622), P ≤ 6.7 × 10−10 for intron and intergenic variants (N = 34,300,780) (Supplementary Data 2). All P-values are two-sided.

Conditional analysis

We applied approximate conditional analyses, implemented in the GCTA software97 to the meta-analysis summary statistics to look for additional association signals at each of the genome-wide significant loci. LD between variants was estimated using a set of 8700 whole-genome sequenced Icelandic individuals. The analysis was restricted to variants present in both the Icelandic and UKB datasets and within 1 Mb from the index variants. We tested 7 loci and about 100,000 variants in the conditional analysis and report one variant (rs1430191) with conditional P value < 5 × 10−7. The results from GCTA were verified by conditional analysis using genotype data in the Icelandic and UK datasets separately and results presented in Table 1 are obtained by meta-analyzing those results.

Co-localization analysis

The variants explored as potentially representing the same signal as the POP variants were found by two means. First, we looked up correlates of the POP variants in the GWAS-catalog (r2 > 0.5) (Supplementary Data 10) and extracted those with 0.5 < r2 < 0.9 for the co-localization analysis because of the limited potential to distinguish between variants in LD of r2 > 0.9. Second, for each of the traits associating with the POP variants in our data we extracted the strongest associating variant (r2 < 0.9) for the adjusted (conditional) analysis. For the tests performed in the co-localization analysis, we use previously reported variants as index variants for the secondary traits when available for a conditional analysis, given that the reported analyses are based on a similar sized or a larger sample than our combined data from Iceland and UKB. Otherwise we use the variant most strongly associating with the trait in our data. The results from the conditional analyses are consistent with a single signal representation if two conditions are met: First, the P-value of the index variant for the secondary trait is ≥0.05 after adjusting for the POP variant at the locus and second, the POP variant at the locus associates with the secondary trait at a P-value < 6.3 × 10−5. The latter condition holds for traits identified through the phenomescan (PheWAS) but a P-value < 0.05 is required if a correlate of the POP variant is previously reported to associate with the trait in a considerably larger sample than in our data (Supplementary Data 12).

RNA sequencing analysis

Generation of poly(A)+ complementary DNA sequencing libraries, RNA sequencing, and data processing were carried out as described previously98,99. Two tissue types were available for this analysis: whole blood (N = 13,162) and adipose tissue (N = 749). We used generalized linear regression to test for association between sequence variants and rank-transformed gene expression estimates.

Polygenic scores, heritability, and functional annotation

We calculated two sets of polygenic risk scores (PRSsPOP) both using Icelandic and UK data essentially as previously described100. Briefly, the PRSs were calculated using genotypes for about 630,000 autosomal markers included on the Illumina SNP chips to avoid uncertainty due to imputation quality. We estimated linkage disequilibrium (LD) between markers using 14,938 phased Icelandic samples and used this LD information to calculate adjusted effect estimates using LDpred100,101. To avoid overfitting do to population substructure, the effect estimates calculated using the Icelandic data were used as weights when generating the weighted PRS (PRSPOPICE) for testing in the UK, and the effect estimates generated from the UK data were used to derive the weighted PRS (PRSPOPUKB) for testing in Iceland. We created several PRSs assuming different fractions of causal variants (the P parameter in Ldpred), and selected the best PRS based on prediction of POP in the Icelandic and UK datasets (1% causal variants). The most predictive PRSPOPICE was then used to calculate the correlation with selected phenotypes in the UKB data, and the most predictive PRSPOPUKB was tested for correlation with the selected phenotypes in Iceland. The correlation between the PRS and phenotypes was calculated using logistic regression in R (v3.5) (http://www.R-project.org) adjusting for year of birth and principle components by including them as covariates in the analysis. We summarize the correlation of the two sets of PRS scores calculated with the Icelandic and UKB phenotypes as weighted average of the effect estimates from both analyses.

Using precomputed LD scores for about 1.1 million variants found in European populations (downloaded from: https://data.broadinstitute.org/alkesgroup/LDSCORE/eur_w_ld_chr.tar.bz2), we estimated SNP heritability with LD score regression33.

Chromatin interaction map data were derived from Hi-C sequencing for selected cell- and tissue types79. The data were downloaded from Omnibus, accession number GSE87112, in pre-processed format (Fit-Hi-C algorithm) representing false-discovery rates (FDR) for contact regions at 40 kb resolution. To define statistically significant contacts we used a threshold value of FDR <10−6 in relevant cell-types and tissue (IMR90 fibroblasts, mesenchymal stem cells, and muscle tissue i.e. left ventricle, right ventricle, and psoas muscle)79. DNA contact regions containing the lead variant, or those containing variants in LD (r2 > 0.8) to the lead variant, were then identified to find interacting target genes. Using JEME (joint effect of multiple enhancers)80, we similarly looked for enhancer elements containing the lead variant (and those in LD) to find target genes using similar cell- and tissue types (IMR90 fibroblasts, mesenchymal stem cells, and muscle tissue i.e. colon, stomach, duodenum, and male skeletal muscle). The intersection of genes identified by Hi-C and JEME are then regarded as strong candidate gene targets.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.