A genome-wide meta-analysis yields 46 new loci associating with biomarkers of iron homeostasis

Iron is essential for many biological functions and iron deficiency and overload have major health implications. We performed a meta-analysis of three genome-wide association studies from Iceland, the UK and Denmark of blood levels of ferritin (N = 246,139), total iron binding capacity (N = 135,430), iron (N = 163,511) and transferrin saturation (N = 131,471). We found 62 independent sequence variants associating with iron homeostasis parameters at 56 loci, including 46 novel loci. Variants at DUOX2, F5, SLC11A2 and TMPRSS6 associate with iron deficiency anemia, while variants at TF, HFE, TFR2 and TMPRSS6 associate with iron overload. A HBS1L-MYB intergenic region variant associates both with increased risk of iron overload and reduced risk of iron deficiency anemia. The DUOX2 missense variant is present in 14% of the population, associates with all iron homeostasis biomarkers, and increases the risk of iron deficiency anemia by 29%. The associations implicate proteins contributing to the main physiological processes involved in iron homeostasis: iron sensing and storage, inflammation, absorption of iron from the gut, iron recycling, erythropoiesis and bleeding/menstruation. Bell et al. report 46 new loci associated with biomarkers of iron homeostasis, including ferritin levels, iron binding capacity, and iron saturation, in the Icelandic, Danish and UK populations. The associated loci point to new iron-regulating proteins and important genetic differences between men and women.

I ron is an essential element for a wide variety of metabolic processes such as oxygen transport, cellular respiration, and redox reactions in numerous metabolic pathways. For this reason, iron homeostasis is tightly regulated on cellular and systemic levels to ensure a balance between uptake, transport, storage, and utilization. Iron deficiency is one of the five leading causes of disability worldwide, especially among children and women of childbearing age 1,2 . Similarly, iron overload is associated with an increased risk of several major chronic conditions, including diabetes and liver disease 1,3 .
Four iron biomarkers are used for clinical assessment of iron status: serum ferritin, serum iron, and total iron-binding capacity (TIBC) are measured directly, while transferrin saturation (TSAT) is derived as serum iron divided by TIBC. While serum ferritin correlates well with body iron stores in non-inflamed individuals 4 , TSAT measures the proportion of iron-binding sites of transferrin that are occupied by iron. TSAT indicates the availability of iron for erythropoiesis and is low in iron deficiency and high during iron overload. In some forms of anemia (e.g., anemia of inflammation) the iron is not transported efficiently to the bone marrow for erythropoiesis, despite adequate iron stores. Since in this situation there is adequate ferritin but low TSAT, it is useful to evaluate TSAT in addition to ferritin 4,5 .
Genome-wide association studies (GWAS) have previously investigated the association between sequence variants and iron homeostasis biomarkers [6][7][8] . The largest study to date yielded 11 loci: ABO, ARNTL, FADS2, HFE, NAT2, SLC40A1, TEX14, TF, TFR2, TFRC, and TMPRSS6 associating with one or more iron homeostasis biomarkers (ferritin, iron, TIBC or TSAT) 6 . To search for additional sequence variants associated with iron homeostasis, we performed a GWAS meta-analysis of ferritin, serum iron, TIBC, and TSAT in Iceland and blood donor studies from the UK (INTERVAL study) and Denmark (Danish Blood Donor Study). This was followed by cross-referencing of ironassociated loci with clinically relevant phenotypes (including iron deficiency anemia (IDA), iron overload, and red blood cell indices). We report associations with iron homeostasis biomarkers for 62 independent sequence variants at 56 loci, including 46 novel loci. Based on a literature review, we categorize 25 of these loci as involved in iron sensing or storage, inflammation, gut absorption, iron recycling, erythropoiesis, and bleeding/menstruation.

Results
Overview. We performed a meta-analysis of four iron-related biomarkers: ferritin (N = 246,139), serum iron (N = 163,511), TIBC (N = 135,430), and TSAT (N = 131,471), combining GWAS results from Iceland, the UK, and Denmark ( Fig. 1, Supplementary Data 1). We found associations with iron homeostasis biomarkers represented by 62 sequence variants at 56 loci, of which 46 have not been reported in the previous GWAS on iron homeostasis and are therefore considered novel (Table 1, Table 2, Fig. 2, and Supplementary Data 2). For each locus, we report the lead variant (lowest P value) and additional uncorrelated variants (r 2 < 0.1) within the locus with genome-wide significance. Our criteria for statistical significance have been previously described 9 (see "Methods"). A variant-to-gene mapping algorithm that takes into account gene location, variant effect (for coding variants), and effect on gene expression (eQTL) for each variant (lead variant and LD class) was used to choose a single candidate gene for each locus (see "Methods"). Twenty-five of the 62 iron homeostasisassociated sequence variants have a high-confidence predicted causal gene, 23 variants have multiple top-scoring genes, 36 variants have at least one coding variant or eQTL in the LD class, and 13 variants have more than one gene with coding variants and/or eQTL in the LD class (Supplementary Data 3). The LD class of a variant is defined as all variants having r 2 > 0.8 with the variant. Linkage disequilibrium (LD) (r 2 ) is estimated based on the Icelandic population. In cases where variants had more than one topscoring gene, the gene closest to the lead variant was selected, except for two loci where likely candidate genes were present among the top-scoring genes (FTL (ferritin light chain) and HAMP (hepcidin)) (Supplementary Data 3). Fourteen of the variants associated with more than one biomarker, bringing the total number of observed associations to 87 (Supplementary Data 2). All our associations have P < 3.0 × 10 −8 . We replicated the association of all 11 previously reported variants 6 , 10 at genome-wide significance (Supplementary Data 2). In addition, we found six rare variants (minor allele frequency (MAF) < 1%), six low-frequency variants (1% ≤ MAF < 5%), and 37 common variants that have previously not been reported to associate with iron homeostasis biomarkers (Supplementary Data 2). Forty-six variants associate with a single iron biomarker (ferritin, 34; iron, 8; TIBC, 4), while 12 variants associate with more than one biomarker (Fig. 3 10 , has not previously been associated with iron homeostasis. We calculated the correlation between the iron biomarkers and selected other biomarkers related to iron metabolism (red blood cell indices, platelet count, erythrocyte sedimentation rate, and Creactive protein) ( Supplementary Fig. 1) and the genetic correlation between the four iron biomarkers (Supplementary Data 4). Among the iron homeostasis biomarkers, the strongest correlation was between iron and TSAT (0.86) and the strongest genetic correlation was also between these biomarkers (Iceland TSAT vs. UK iron: 0.53 (SE = 0.19), P = 0.0059; Iceland iron vs. UK TSAT: 0.54 (SE = 0.17), P = 0.0020) (Supplementary Data 4). Furthermore, we estimated the heritability of the iron homeostasis biomarkers to be between 0.16 and 0.32 using parent-offspring and sibling correlations, suggesting that heritability explains 16-32% of the variance of the four iron homeostasis markers studied (Supplementary Data 5).
We tested for heterogeneity between the results from the Icelandic, UK, and Danish cohorts (Supplementary Data 2). Of the 87 associations, 79 are with markers present in two or more populations and of these, 19 show nominally significant heterogeneity (P < 0.05). For all associations, the effects are in the same direction in all three populations, and for 79 of the 87 associations, effects are nominally significant (P < 0.05) in all three populations ( Supplementary Fig. 2, Supplementary Fig. 3). Eight associations are reported with five rare variants at three loci found only in Iceland (MAF = 0.12-0.47%): three coding variants (two missense, one stop-gained) in STAB1, a stop-gained variant in TF, and a stop-gained variant in TMPRSS6. Common variants associating with iron biomarkers are reported in all three populations for each of these loci, providing additional evidence for these associations (Supplementary Data 2).
Because of the well-known difference in iron homeostasis between the sexes 11 , we tested for sexual dimorphism in iron biomarker associations (Supplementary Data 6). We found differences in the ferritin effect (using a test for heterogeneity with P value threshold P < 0.05/62 = 8.1 × 10 −4 ) of five of the 62 variants. In addition, we identified one additional variant that only associates with ferritin in women: a missense variant in VWF (p.Tyr1584Cys/rs1800386[C]), a likely pathogenic type 2 von Willebrand disease (VWD) mutation 12 (β = −0.17 standard deviation (SD) [−0.23, −0.12], P = 3.0 × 10 −10 ). Of the six variants, four have greater effects in women (F5: six times greater effect, SLC25A37: three times greater effect, DUOX2: 36% greater effect, and VWF: 13 times greater effect) and two in men (HK1: four times greater effect, HFE p.Cys282Tyr: 51% greater effect) (Supplementary Data 6). The four variants with larger effects in women also have stronger effects in premenopausal than postmenopausal women (Supplementary Data 7). In addition, we find sex differences in variants in the well-known iron regulatory genes HFE (ferritin, iron, and TSAT) and TMPRSS6 Iron homeostasis variants and protein quantitative loci (pQTL). To gain further insight into the biological pathways involved in iron homeostasis, we tested for association of the  Min/maj minor/major allele, MAF minor allele frequency, Gene predicted causal gene based on a variant-to-gene algorithm (see "Methods"), SD standard deviation, CI confidence interval,  (Fig. 4).
Hepcidin regulation and iron storage: Synthesis of the iron homeostasis hormone hepcidin (HAMP) is under tight regulation by the liver iron sensing and signaling cascade involving several proteins, including those encoded by HFE, TMPRSS6, TF, TFR2, and TFRC 13 . Hepcidin as the major iron homeostasis hormone regulates iron transport from cells through inhibition (and degradation) of ferroportin in cells, such as intestinal epithelial Min/maj minor/major allele, MAF minor allele frequency, Gene predicted causal gene based on a variant-to-gene algorithm (see "Methods"), SD standard deviation, CI confidence interval, Phet P value from the test for heterogeneity (see "Methods"). The effect is shown for the minor allele. a High-confidence predicted causal gene (based on a variant-to-gene algorithm, see "Methods"). b The minor allele is a 3.5 kb deletion in the 3′ UTR of SLC11A2.
Gut absorption: Iron absorption is mediated by the two iron transporters DMT1 (encoded by SLC11A2) at the luminal side and ferroportin (encoded by SLC40A1) basolaterally, both regulated by hepcidin signals and both harboring variants associated with iron homeostasis biomarkers 13 . Recently, hepcidin blocking of intestinal ferroportin was shown to inhibit HIF-2α expression, through increased intracellular iron and subsequent activation of irondependent prolyl hydroxylases, leading to reduced expression of iron absorptive proteins 27 . Mammalial HIF-α prolyl hydroxylases are encoded by the three genes EGLN1-3 28 . The rs996347[C] intron variant (MAF = 35%) at EGLN3 associates with increased ferritin (β = 0.049 SD [0.042, 0.056], P = 3.0 × 10 −41 ). EGLN3 is a likely candidate to mediate the inhibition of intestinal HIF2α expression, as it specifically inhibits HIF-2α rather than HIF-1α 29,30 . The . DUOX2 is expressed in the upper intestinal mucosa and may play a role in innate mucosal immunity 10,31 . Furthermore, in mouse models, DUOX1 and DUOX2 knockouts have a greater susceptibility to Helicobacter felis infection and inflammation 32 and epidemiological studies have indicated that H. pylori infections in humans are associated with reduced iron stores 33 .
Iron recycling: recycling of heme-iron takes place in the reticuloendothelial system in the spleen and liver, where old red cells are taken up and iron recycled back to the bone marrow, providing over 90% of the iron needed for the generation of heme in red cell precursors 1 . DMT1 and ferroportin also transport iron from endocytic vesicles and export iron out of the macrophage, respectively 34 Fig. 4) all associate with increased ferritin, with effects ranging from 0.17 to 0.35 SD (P = 2.2 × 10 −8 to 2.6 × 10 −19 ). STAB1 is primarily expressed in M2-macrophages and sinusoidal endothelial cells 35 and has been shown to affect phosphatidylserine-mediated uptake of aged red blood cells 36,37 . We also report associations of the intergenic variants rs2954029[T] (MAF = 48%) and rs6029148 . Their closest protein-coding genes, TRIB1 (for rs2954029) and MAFB (for rs6029148), have both been shown to control the differentiation of macrophages 38,39 .
Erythropoiesis: The bone marrow relays signals inhibiting liver hepcidin synthesis under a state of stress erythropoiesis to make iron available to erythroid precursors 40 . Variants located close to two known iron regulators within the erythropoiesis compartment, the intergenic variant rs13253974[A] (MAF = 32%) near SLC25A37 (mitoferrin-1) 41  . Variants in the HBS1L-MYB intergenic region are known to associate strongly with fetal hemoglobin levels 43,44 . Fetal hemoglobin levels are induced during stress erythropoiesis 45,46 , a condition also involving ERFE signaling 40 . Furthermore, HK1 mutations are associated with reduced red cell survival 47 .
Bleeding/menstruation: Loss of iron occurs primarily through epithelial desquamation and blood loss 48  Iron homeostasis variants and red blood cell traits. To better understand the effect of the sequence variants on iron homeostasis and iron usage, we tested for association with the red blood cell indices hemoglobin (N = 286,622), mean corpuscular hemoglobin (N = 286,245), mean corpuscular volume (N = 286,248), and reticulocyte count (N = 19,031) and compared the effects of variants on them and the four iron biomarkers ( Supplementary Fig. 5, Supplementary Data 11). Normally, as body iron stores fall, the hemoglobin concentration, mean corpuscular volume, and mean corpuscular hemoglobin concentration also fall. The p.Cys282Tyr variant at HFE (rs1800562) strongly affects all iron and red blood cell biomarkers except reticulocyte count. Variants at DUOX2, F5, and TRIB1 have a similar pattern of effects on iron and red blood cell biomarkers, with a negative effect on TIBC and mainly positive effects on the red cell indices (Supplementary Fig. 5 Data 2). This variant also shows an unusual pattern with decreased hemoglobin along with increased ferritin, indicating that body stores of iron are sufficient but the recycling of iron from stores is abnormally reduced (Supplementary Fig. 5).
IDA and iron overload. The two extremes of iron homeostasis, iron deficiency, and iron overload, are clinically important and associated with high disease burden 4,50 . In iron deficiency, depletion of iron stores is followed by reduced iron availability for erythropoiesis, leading to IDA, presenting as hypochromic, microcytic anemia with low ferritin and/or low TSAT 48 . Increased TSAT, most commonly defined as a saturation above 50%, is used as a screening marker for hemochromatosis and iron overload 51 . To understand how the 62 iron homeostasis variants affect either IDA or iron overload, we tested for association with IDA (defined as ever simultaneously having hemoglobin < 120 g/L for women, <130 g/L for men, MCV < 80 fl, MCH < 27 pg and either ferritin < 10 mcg/L or TSAT < 16%; N cases = 6476, N controls = 362,706) 5  . This variant has not been associated with iron homeostasis but has been associated with hematological traits 52 and variants in the same region have been associated with fetal hemoglobin expression 43,53 . Additionally, variants in the iron homeostasis regulatory genes HFE, TMPRSS6, TF, and TFR2 associate with iron overload (Fig. 5, Supplementary Data 12).
We tested the 62 iron homeostasis variants for association with the following eleven clinical manifestations of iron overload and/ or iron deficiency 54 based on various meta-analyses performed in Iceland using data from Iceland, UK, Denmark, and the USA: hemochromatosis, liver fibrosis/cirrhosis, liver cancer, type 2 diabetes, impotence, cardiomyopathy, osteoporosis, osteoarthritis, hyperpigmentation, amenorrhea, and restless leg syndrome (Supplementary Data 14). Taking all 62 × 11 = 682 tests into account using Bonferroni correction, the TMPRSS6 p. Novel SLC11A2 deletion variant. Rare loss-of-function mutations in SLC11A2 (solute carrier family 11 member 2 encoding DMT1, divalent metal transporter 1) have been associated with a microcytic anemia with iron overload under the recessive mode of inheritance 58-60 demonstrating a role of DMT1 in both iron absorption and recycling. We identified 14 homozygous carriers of the abovementioned deletion in SLC11A2 in the Icelandic cohort, seven of whom had been diagnosed with IDA (microcytic anemia with low ferritin and/or low TSAT) and one with transfusion-dependent anemia; two had required transfusions and one intravenous iron (Supplementary Data 17).
Transcription of SLC11A2 leads to four major mRNAs with differing tissue-specific expression patterns 61 . These messages differ both in their usage of 5′ exons 1a or 1b and usage of alternative 3′ translated and untranslated regions (UTRs) (Fig. 6A). These alternative UTRs differ in that one contains an iron-response element (IRE), denoted IRE+, while the other UTR lacks such a motif, denoted IRE−. The IRE+ UTR is primarily expressed in duodenal and kidney epithelium, mediates iron absorption, and is regulated directly by cellular iron status through interaction with IRE-binding proteins 62,63 . Of the four highest expressed transcripts in blood, two contain the IRE− UTRs, and two contain the IRE+ UTRs. The SLC11A2 deletion extends from within the IRE+ containing 3′ UTR and into the downstream intron (Fig. 6A). Heterozygotes (N = 251) and homozygotes (N = 2) express 40% (95% CI: 58-62%, P = 2.2 × 10 −16 ) and 81% (95% CI: 73-87%, P = 0.015) less IRE+ transcripts than wildtype (N = 12,828), respectively (Fig. 6B). When comparing allele-specific transcription in heterozygotes there was a 3.7-fold (P = 2.2 × 10 −17 ) preference for wildtype allele in IRE+ containing alleles. The deletion removes the native 3′ UTR polyadenylation signal, likely resulting in an unstable mRNA. The IRE− transcripts are expressed at 29% greater levels by heterozygotes than noncarriers (95% CI: 26-33%, P = 2.2 × 10 −16 ) and at 133% greater levels by homozygotes (95% CI: 48-205%; P = 0.016). These data suggest that the SLC11A2 deletion causes isoform-specific effects, suppressing the expression of IRE+ containing transcripts, that are primarily expressed in absorptive duodenal and kidney epithelium 62 leading to reduced absorption. This leads to a recessive hereditary IDA. Hepcidin levels based on proteomics samples from 35,559 Icelanders are reduced in SLC11A2 deletion carriers (β = −0.172 SD [−0.257, −0.088], P = 5.9 × 10 −5 ), consistent with systemic iron deficiency 64 . A single homozygous carrier of the deletion has a hepcidin value 2.17 SDs below average. The only previously described genetic IDA, iron refractory IDA, due to homozygous loss-of-function variants in TMPRSS6, is associated with hepcidin dysregulation and inappropriately high hepcidin values 64 .

Discussion
Through a GWAS meta-analysis of the iron homeostasis biomarkers ferritin, serum iron, iron-binding capacity, and TSAT in Iceland, Denmark, and the UK, we have identified 56 loci harboring variants associating with one or more of these biomarkers, 46 of which are novel (including six rare variants, six lowfrequency variants, and 37 common variants). Among the novel loci, variants in DUOX2 and SLC11A2 associate with increased risk of IDA, while the F5 rs6025[T] variant protects against IDA. Furthermore, the rs9399136[C] variant at the HBS1L/MYB locus is protective against IDA while increasing the risk of iron overload.
While most of these iron homeostasis variants show similar effects in Iceland, UK, and Denmark, the observed heterogeneity for a subset of the variants may reflect demographic, clinical, and environmental differences. In clinical populations, iron homeostasis markers are more frequently measured in individuals with suspected iron deficiency or overload. Blood donors, on the other hand, are screened for anemia and several other diseases at every donation. Therefore, people with a previous history of iron deficiencies are underrepresented in blood donor studies, although a substantial proportion of blood donors will develop low or reduced iron stores 65 . These differences in cohort characteristics could partially explain heterogeneity in effect sizes between populations for a subset of the variants. The sex-specific heterogeneity reported highlights the differences between the sexes in iron homeostasis. The two sequence variants showing the strongest heterogeneity are both variants in coagulation factors, the well-known factor V-Leiden mutation 49 and a mutation in VWF known to cause type 2 VWD 12 . These variants are likely to mediate their effects through increased (VWF) or decreased (F5) blood loss, in women being mainly mediated through menstruation, supported by the finding that factor V Leiden variant protects against menorrhagia. Thirty-four of the 62 reported novel variants only associate with ferritin. Possible explanations of this high number could be that ferritin as a biomarker is affected by a more broad array of processes, such as inflammation and tissue damage (e.g., liver injury) 66 , and also that we have more ferritin measurements (~246 K vs.~131-163 K).
Iron deficiency is a major global health problem, especially for children and women 2 . A worldwide survey in 2010 showed that one-third of the world population is anemic with iron deficiency being responsible for approximately half of that cases 50 . In addition to the nonspecific symptoms of IDA, it also may contribute globally to reduced cognitive performance in children 67 , adverse outcomes of pregnancies 68 , and decline in cognition in the elderly 48,69 . Despite the importance of iron deficiency and IDA, no systematic genetic studies looking at iron deficiency or IDA have been performed. Sequence variants that are common (at DUOX2 and the HBS1L-MYB intergenic region), lowfrequency (at F5), and rare (at TMPRSS6 and SLC11A2) associate with IDA (Fig. 5, Table 1). The association of the missense DUOX2 variant with all iron homeostasis markers, as well as with IDA is striking. That this association was seen in all three populations studied but not observed in previous GWAS of iron homeostasis is intriguing, however, it should be noted that Benyamin et al. 6 reported a subgenome-wide significant association with ferritin near this locus (rs16976620). Our study is significantly larger and also benefits from more comprehensive imputation panels made available since then, which likely enabled us to not only detect an association at genome-wide significance but also map this to the likely causal gene with high confidence.
The phenotype of recessive IDA with low iron stores that we report with the rare 3.5 kb deletion within SLC11A2 is different from the previously reported recessive hypochromic anemia with iron overload associated with this gene [55][56][57] . Further studies to define the pathways mediating the effects of the variants associating with IDA could help shed light on the pathophysiology of iron deficiency. Notably, neither any individual iron homeostasis variants nor the PRS for ferritin or TSAT associate with the risk of restless leg syndrome, a neurological disorder suggested being exacerbated by iron deficiency 70 . Although this argues against a simple causal relationship between the two, a more complex relationship, e.g., through brain iron concentations 71 cannot be ruled out. Even though hereditary hemochromatosis is most often associated with HFE p.Cys282Tyr homozygosity, the penetrance is only 28% in males and much lower in females 72  In summary, we have identified 46 novel loci affecting iron homeostasis. Many of the novel candidate genes have roles in homeostasis through mechanisms, such as absorption, iron recycling, erythropoiesis, and hepcidin regulation. Furthermore, we show an association of five of these loci with IDA, a major clinical entity that hitherto has not been studied thoroughly from a genetic point of view. This study reveals a substantial catalog of possible iron regulatory genes, awaiting further inquiry to fully elucidate their functional role.

Methods
Study subjects from Iceland. The Icelandic data (where around one-half of all individuals had repeated measurements) include the vast majority of all clinical laboratory results in Iceland from 1990 to 2017. Serum iron and TIBC were measured with colorimetric methods and serum ferritin was measured with an electrochemiluminescence immunoassay using reagents and calibrators and Cobas 6000 and 8000 modular instruments from Roche Diagnostics, Mannheim, Germany. Hemoglobin concentration measurements, as well as other basic hematology parameters used, were measured on EDTA anticoagulated blood using the Sysmex XN-1000 hematology analyzer.
All participants who donated samples gave informed consent and the National Bioethics Committee of Iceland approved the study (VSN-  which was conducted in agreement with conditions issued by the Data Protection Authority of Iceland. Personal identities of the participant's data and biological samples were Fig. 6 Expression levels of SLC11A2 for novel deletion mutant compared to wild type. A Coverage plot of RNA-sequenced reads from whole blood showing the median normalized expression at the 3′ UTR end of SLC11A2 in wild-type (AA, green; n = 12,828) heterozygous (ADel, blue; n = 251) and homozygous (DelDel, red; n = 2) individuals for the different SLC11A2 deletion genotypes. B Comparison of expression levels of the four major SLC11A2 transcripts (two transcripts without iron response elements (IRE) in their 3′ UTR (IRE-) and two with IRE (IRE+) in their 3′ UTR in whole blood RNA using a mixed violin-and boxplot. The violin plot shows the density where the width represents the frequency of the log2 normalized expression levels. The white boxes show the distribution statistics (interquartile range and median) and the whiskers represent ±1.5× the interquartile range. The filled circles correspond to individual expression values or outliers that lie beyond the extremes of the whiskers. encrypted by a third-party system (Identity Protection System), approved, and monitored by the Data Protection Authority.
Study subjects from the UK. The INTERVAL study is a prospective cohort study of approximately 45,000 blood donors, representative of the wider donor population, nested in a randomized control trial. Participants, aged 18 years or older, were recruited between 2012 and 2014 from 25 National Health Service Blood and Transplant static donor centers in England. All participants provided written, informed consent, and the study was approved by the Cambridge (East) Research Ethics Committee (ref: 11/EE/0538).
Ferritin measurement was based on the immunological agglutination principle with the enhancement of the reaction by latex. Latex particles coated with antiferritin antibodies agglutinated with ferritin and the precipitate was determined turbidimetrically at 570/800 nm. Serum iron was measured using a colorimetric method (FerroZine) without deproteinization. Under acidic conditions, iron was liberated from transferrin. Acrobat reduced Fe3+ to Fe2+ which then reacted with FerroZine to form a colored complex. The color intensity is directly proportional to the iron concentration and was measured photometrically. TIBC was calculated by summing up serum iron and unsaturated iron-binding capacity, which was also measured photometrically. TSAT was calculated by dividing serum iron by TIBC concentration. All data points lying more than 4.5 interquartile range from the median were considered outliers and removed (591 for ferritin, 7 for transferrin, 65 for TSAT, and 37 for serum iron).
The genotyping protocol and quality control procedures for INTERVAL study samples have been described in detail previously 73 . Briefly, DNA extracted from buffy coat was used to assay approximately 820,000 variants and short insertions/ deletions on the Affymetrix Axiom «Biobank» genotyping array (Affymetrix, Santa Clara, California, US). Genotyping was performed in multiple batches of approximately 4800 samples each. Sample QC was performed including exclusions for sex mismatches, low call rates, duplicate samples, extreme heterozygosity, and non-European descent. We carried out high-resolution multiple imputations using a joint UK10K and 1,000 Genomes Phase 3 (May 2013 release) reference panel and retained variants with a MAF ≥ 0.1% and/or INFO score ≥0.4 for analysis.
Study subjects from Denmark. The Danish Blood Donor Study (DBDS), initiated in 2010 as collaborative blood donor-oriented and generic research platform and is an on-going nation-wide prospective cohort with inclusion sites at all Danish blood collection facilities. Currently, more than 110,000 blood donors are participating, and more than 95% of invited blood donors are willing to participate 74 . Due to the step-wise roll-out of DBDS, an enrichment of individuals from the greater Copenhagen region (the capital) and the central region of Jutland (the second largest city) are present in this study. DBDS has secured necessary permissions and approval from the Danish Data Protection Agency (2007-58-0015) and the Scientific Ethical Committee system (M-20090237). Briefly, regarding the DBDS genomic cohort DNA is purified from whole blood and subsequently stored at −20 degrees Celsius. DBDs participants in this study has been genotyped in 1 batch at Decode genetics using the Global Screening Array by Illumina optimized for comparison with the Illumina Omni Express chip 75 . Ferritin was measured on fresh EDTA-anticoagulated plasma samples using two commercially available assays: for 30,903 individuals using Ortho Vitros 5600 (Ortho Clinical Diagnostics, Rochester, NY, USA), and for 2851 individuals using Abbott Architect i2000SR (Abbott Laboratories, Abbott Park, IL, USA), including 27 individuals that had measurements taken using both methods.
Whole-genome sequencing. The process used to whole-genome sequence the 28,075 Icelanders, as well as the subsequent imputation, has been described in recent publications 76,77 . In summary, we sequenced the whole genomes of 28,075 Icelanders using Illumina technology to a mean depth of at least 10× (median 32×). Single-nucleotide polymorphism (SNPs) and indels were identified and their genotypes called using joint calling with Graphtyper 78 . In total, 155,250 Icelanders were genotyped using Illumina SNP chips and their genotypes were phased using long-range phasing 79 . All sequenced individuals were also chip-typed and longrange phased, providing information about haplotype sharing that was subsequently used to improve genotype calls. Genotypes of the 32 million high-quality sequence variants were imputed into all chip-typed Icelanders. Variants in the Icelandic and Danish cohorts were imputed based on the IMPUTE HMM model 80 as previously described 81 . Variants in INTERVAL were imputed using the Sanger Imputation Server (https://imputation.sanger.ac.uk) which implements the Burrows-Wheeler transform imputation algorithm PBWT on whole chromosomes. A combined UK10K and the 1000 Genomes Phase 3 reference panel was used 82 . Using genealogic information, the sequence variants were also imputed into relatives of the chip-typed further increasing the sample size for association analysis and the power to detect associations. All of the variants tested had imputation information over 0.8. The GWAS from Denmark was performed using 19 million markers identified through whole-genome sequencing of 2816 Danes that were subsequently imputed into 84,386 chip-typed individuals. The GWAS from the UK was performed with 19 million markers from the UK10K and 1000 Genomes Phase 3 reference panel, imputed into 43,059 chip-typed individuals participating in the INTERVAL study. In total, 40 million markers were tested in the meta-analysis.
Association testing and meta-analysis. The four iron homeostasis biomarkers (ferritin, serum iron, TIBC, and TSAT) were each rank-based inverse normal transformed to a standard normal distribution (separately for each sex) and adjusted for age using a generalized additive model. In addition, for the UK cohort, the biomarkers were adjusted for menopausal status, ABO blood group, body mass index, smoking levels, alcohol levels, and iron supplementation status. For each sequence variant, the mixed model implemented in the software BOLT-LMM v2.3 83 , using the genotype as an additive covariate and the transformed quantitative trait as a response, was used to test for association with quantitative traits. Logistic regression was used to test for association between variants and case-control phenotypes, using software developed at deCODE genetics 76 .
We used LD score regression to account for distribution inflation in the dataset due to cryptic relatedness and population stratification 84 . LD score regression intercepts were as follows: ferritin: 1.032 (SE = 0.011), iron: 1.016 (SE = 0.025), TIBC: 1.030 (SE = 0.039), TSAT: 1.025 (SE = 0.020). We used logistic regression to test for association between sequence variants and binary traits, regressing trait status against expected genotype count. In the Icelandic data, we adjusted for sex, age, and county of birth by including these variables in the logistic regression model. In the UK and Danish data we adjusted for sex and age, as well as principal components in order to adjust for population stratification.
Results from the Icelandic, UK, and Danish datasets were combined using a fixed-effect inverse-variance weighted meta-analysis, where different datasets were allowed to have different population frequencies for alleles and genotypes but assumed to have a common effect. Heterogeneity in effect estimates was assessed using a likelihood ratio test. Effects are always given in units of SD. The pooled SD using data from Iceland, UK, and Denmark are 1.08 μg/L for log(ferritin), 7.76 μmol/L for iron, 14.14 μmol/L for TIBC, and 13.25% for TSAT.
We accounted for multiple testing by means of a weighted Bonferroni correction, taking into account the higher prior probability of association of certain variant annotations while controlling the family wise error rate (FWER) at 0.05 9 . The method has been described previously 9 and results in stricter multiple testing correction than the commonly used threshold of 5 × 10 −8 (which would not control FWER at 0.05 given that 40 million markers were tested) while being more powerful than simply correcting for 40 million tests using a fixed threshold of 0.05/ 40,000,000 = 1.25 × 10 −9 . The resulting significance thresholds were 2.0 × 10 −7 for high-impact variants (including stop-gained, frameshift, splice-acceptor, or splicedonor variants, N = 11,723), 4.0 × 10 −8 for "moderate-impact" variants (including missense, splice-region variants and in-frame indels, N = 202,336), 3.7 × 10 −9 for "low-impact" variants (including upstream and downstream variants, N = 2,896,354) and 6.1 × 10 −10 or for the "lowest-impact" variants (including intron and intergenic variants, N = 37,239,641).
Loci were defined based on physical proximity, where variants in a 500 kb window (lead variant ±250 kb) were considered to be at a single locus. We defined novel loci as loci not reported in previous GWAS of biomarkers of iron homeostasis.
Variant-to-gene mapping. To predict the most likely causal gene for each variant we used an algorithm taking into account the gene location with regard to LD class (defined as all variants with r 2 > 0.8 with the lead variant), the variant effect for coding variants, and the effect on gene expression (eQTL, restricting to the top cis-eQTL). The algorithm, called variant-to-gene mapping, considers all genes within the LD class ±250 kb and outputs a score for each gene.
Often, the GWAS variant is not causal itself but in LD with the causal variant. To identify the likely causal gene, we defined all variants in linkage disequilibrium (r 2 > 0.8) with the GWAS variant as the LD class. We assumed local effects, where genes overlapping the LD class interval receive a distance score of 5, while genes within 250 kb on each side of the LD class interval receive a distance score of 1. The variants in the LD class were then scored based on their capability to affect gene coding (i.e., transcription/translation): a variant with high impact (stop-gain and stop-loss, frameshift indel, donor and acceptor splice-site, and initiator codon variants) was given a coding score of 150, while a variant with moderate impact (missense, in-frame indel, splice region) was given a coding score of 30. For each gene, we summed up the coding scores of all coding variants affecting it, i.e., coding variants within the gene itself. Variants shown to be correlated with gene expression (eQTL) in any tissue received an eQTL score of 50. We restricted ourselves to the top cis-eQTL (lowest P value < 10 −7 , distance from gene < 1 Mb) for each gene and tissue. We assumed that eQTL in different tissues/different variants were due to the same signal. Therefore, we did not sum up the eQTL scores per gene but used the maximum eQTL score per gene. The total score per gene was computed as the maximum of its distance, coding, and eQTL scores. The normalized gene score was computed by scaling such that the sum of normalized scores for all candidate genes was 1, so to enable direct comparisons across genes. Note that this automatically takes the gene density into account. In cases where more than one gene share the maximum score (for example, if the LD class has four genes and they all have probability = 0.25), we chose the gene with the most significant eQTL if such information existed, otherwise the gene closest to the lead variant was selected. Relative values for the scoring for high-and moderate-impact values were based on enrichment analysis, as previously described 9 , while the score of 50 for eQTL was determined in order to make coding and eQTL equally informative overall. Values for proximity were set to have some degree of preference for closeby genes, given otherwise equal evidence, while at the same time giving stronger weight to coding and eQTL than to proximity alone. Data sources for eQTL data are listed in Supplementary Data 18.
Genetic overlap with other traits. We calculated genetic correlations between pairs of traits using the cross-trait LD score regression methods 84 in our metaanalysis using summary statistics from traits in the Icelandic and UK datasets. We used results for about 1.2 million variants, well imputed in both datasets and for LD information we used precomputed LD scores for European populations (available from the Broad Institute).
Heritability estimation. Heritability was estimated in the following two ways: (1) 2 × parent-offspring correlation, (2) 2 × full sibling correlation, using the Icelandic data (where all family relationships are known).
Polygenic risk scores. We generated PRS for ferritin and TSAT and regressed the scores, along with sex, year of birth, and 20 principal components as covariates in logistic regression models against 11 clinical manifestations of iron deficiency or iron overload (restless legs, hemochromatosis, liver fibrosis/cirrhosis, liver cancer, type 2 diabetes, osteoarthritis, impotence, cardiomyopathy, osteoporosis, hyperpigmentation, and amenorrhea). Scores are based on a framework set of 620,000 high-quality SNPs covering the whole genome, adjusted for LD using LDpred 85 . The methods used to generate the PRS have been previously described in detail 86 . For restless legs, the phenotype data is from Iceland while for the other ten phenotypes, data is from UK Biobank (restless legs were not available in UK Biobank). To minimize bias and/or overfitting, the geographical population with the phenotype data is not included when generating the scores. Thus, for ferritin, the PRS for restless legs is based on a Denmark+UK GWAS metaanalysis, while the PRS for the other phenotypes is based on Iceland + Denmark GWAS meta-analyses. For TSAT, the PRS for restless legs is based on the UK GWAS, while the PRS for the other phenotypes is based on the Icelandic GWAS.
Protein measurements (pQTL). During 2000-2019, we collected plasma samples from 40,004 Icelanders. Fifty-two percent of the samples were collected as part of the Icelandic Cancer Project, while the remaining samples (48%) were collected as part of various genetic programs at deCODE genetics, Reykjavík, Iceland. All samples were measured using the SOMAscan platform, containing 5284 aptamers providing a measurement of relative binding of the plasma sample to each of the aptamers in relative fluorescence units, corresponding to 4,792 proteins. After quality control, unique measurements for N = 35,559 individuals were used for genome-wide association analysis.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The Icelandic population WGS data have been deposited at the European Variant Archive under accession code PRJEB15197. The authors declare that the data supporting the findings of this study are available within the article, its Supplementary Data files, and upon request. Overall meta-analysis summary statistics have been shared at https://www. decode.com/summarydata/. The UK Biobank data can be obtained upon application (ukbiobank.ac.uk). For this study, UK-Biobank data was under project number 56270.