Introduction

Hereditary hemochromatosis (HH) is caused by mutations in genes involved in the regulation of systemic iron homeostasis.1 Dysregulation of iron homeostasis in HH leads to the absorption and storage of more iron than is required, leading to tissue damage and disease. Five genes have been associated with various subtypes of the disorder.2 The most common gene implicated in HH is HFE (OMIM 235200); homozygosity for a single missense substitution (p.Cys282Tyr) is responsible for between 60 and 100% of cases among European populations.3,4 The p.Cys282Tyr mutation has an allele frequency of around 5% in the European population, with a decreasing gradient from north to south.5 It is thought to have originated in a Celtic population several thousand years ago, although a Viking origin for the mutation also has been proposed.5 The p.Cys282Tyr founder mutation has since spread throughout Europe, perhaps as a result of a selective advantage to those carrying the mutation, who may be protected from the development of iron-deficiency anemia.5 The p.His63Asp variant in HFE is more common than p.Cys282Tyr, having an allele frequency of around 14% among Europeans. It also has a much wider geographical distribution, being present at varying allele frequencies in most of the world’s populations.6 p.His63Asp is generally not considered to be a pathogenic mutation because the majority of people who are homozygous show no evidence of iron overload. Individuals who are compound heterozygous for p.Cys282Tyr and p.His63Asp can develop iron overload, and this variant is enriched among individuals with a diagnosis of HH.7 However, the penetrance of this genotype is low, and other cofactors that contribute to iron overload are usually present.8

Much of the knowledge we have gained regarding the molecular mechanisms regulating iron homeostasis has stemmed from the identification and characterization of genes implicated in non-HFE forms of iron overload. Central to the regulation of systemic iron homeostasis is the hepcidin-ferroportin axis. Hepcidin is a negative regulator of iron absorption and recycling, and functions to limit iron release from cells by binding to the iron exporter ferroportin, causing its internalization and degradation.9

All forms of HH reported so far result from defective function or regulation of hepcidin or ferroportin. Iron overload in all autosomal-recessive forms of HH results from either inappropriately low or no production of hepcidin. For example, HFE is an upstream regulator of hepcidin, and patients with HFE HH have low concentrations of hepcidin relative to iron stores.10 One form of juvenile HH (type 2A; OMIM 602390) is caused by mutations in an upstream regulator of hepcidin, hemojuvelin (encoded by the HFE2 gene).11 A second form of juvenile HH (type 2B; OMIM 613313) is caused by mutations in the hepcidin gene (HAMP) itself.12 Patients with type 3 HH (OMIM 604250) have mutations in the gene encoding another regulator of hepcidin, transferrin receptor 2 (TFR2).13 An autosomal-dominant form of HH (type 4; OMIM 606069) is caused by heterozygous mutations in the ferroportin gene (SLC40A1) and can result in one of two phenotypes caused by either loss of iron transport ability or hepcidin insensitivity.14

The population prevalence of HFE HH and the HFE variants p.Cys282Tyr and p.His63Asp have been studied extensively. As far as we are aware, the population prevalence of non-HFE forms of HH have not been determined. This is likely because of the low frequency of these conditions relative to HFE HH among European populations, the low frequency of HH among non-European populations, and the paucity of non-HFE mutations identified in the screening of control groups. With the advent of next-generation sequencing (NGS) methods that can sequence whole genomes or exomes relatively quickly and cost-effectively, it is now possible to search relatively large population groups for variants in virtually any gene in the human genome. We hypothesized that pathogenic alleles known to cause non-HFE forms of iron overload would be detected in large genomic sequence databases and that data derived from these databases could be used to estimate the population prevalence of these disorders. By analyzing data from the 1000 Genomes Project (1000G),15 Exome Sequencing Project (ESP; Exome Variant Server, NHLBI GO ESP, Seattle, WA; http://evs.gs.washington.edu/EVS/), and Exome Aggregation Consortium (ExAC; Cambridge, MA; http://exac.broadinstitute.org), we have attempted to estimate the population prevalence of all forms of HH. Our analysis has produced a first estimate for the population prevalence of the various forms of non-HFE HH; this study forms a valuable resource for informing the medical and research community on the incidence of these atypical iron overload disorders and variants that are potential genetic modifiers of disease.

Materials and Methods

Identification and classification of HH-causing variants

An extensive literature review, which included a comprehensive search of the National Center for Biotechnology Information PubMed database (http://www.ncbi.nlm.nih.gov/pubmed) for articles published up until July 2015, was performed to identify all variants in the HFE (NM_000410.3), HFE2 (NM_213653.3), HAMP (NM_021175.2), TFR2 (NM_003227.3), and SLC40A1 (NM_014585.5) genes, which are reported to be associated with HH. Variants were classified into those that seemed to be pathogenic and those that were possibly pathogenic. Variants were classified as pathogenic if they were reportedly associated with iron overload, defined by elevated serum ferritin and/or transferrin saturation, in patients or families with HH and in the homozygous or compound heterozygous state for HFE, HFE2, HAMP, or TFR2 or in the heterozygous state for SLC40A1. Variants were classified as possibly pathogenic if they were reported only in the heterozygous state for HFE, HFE2, HAMP, or TFR2 genes, if there were conflicting reports of whether they were associated with iron overload, or if they were reported only as modifiers of the HFE HH iron overload phenotype. For example, the HFE p.Cys282Tyr mutation was classified as pathogenic, whereas other common variants, such as HFE p.His63Asp, HFE p.Ser65Cys, and SLC40A1 Gln248His, for which penetrance is known to be low, were classified as possibly pathogenic. Many of the variants have been reported only in single cases or families, and pathogenicity in these cases was assigned based on the phenotypic data available and the opinions of the study authors.

Identification of major functional variants of HH genes

The 1000G, ESP, and ExAC databases were searched for the presence of variants in the HFE, HFE2, HAMP, TFR2, and SLC40A1 genes that would be expected to have major functional or structural effects on the encoded protein but had not previously been reported in patients with HH. These included frameshift, premature stop codon, initiator codon, splice donor, and splice acceptor variants.

Determination of HH gene variant allele frequencies

The coding sequence nucleotide changes for each variant were identified and converted to human genome coordinates (hg19) using the Mutalyzer program (https://mutalyzer.nl/). The genome coordinates and nucleotide changes were converted into ANNOVAR format and run through the wANNOVAR program16 to generate annotations for each variant, including allele frequencies in the 1000G, ESP, and ExAC data sets. Allele frequencies for previously reported pathogenic variants and major functional variants were used to estimate the population prevalence of homozygous and heterozygous HH genotypes using the Hardy-Weinberg equation. When more than five alleles were detected in the ExAC data set, a binomial proportion 95% confidence interval (CI) was calculated using the Wilson score method.

Results

Identification of HH-causing variants

An extensive literature review was conducted to identify pathogenic and possibly pathogenic variants in the five genes associated with HH. This process yielded a total of 215 variants across the 5 HH genes: 38 in HFE, 55 in HFE2, 13 in HAMP, 49 in TFR2, and 60 in SLC40A1; these were reported in 153 publications (Supplementary Table S1 online). Of the 215 variants identified, 161 were classified as pathogenic and 54 were classified as possibly pathogenic.

Frequency of previously reported HH gene pathogenic variants

The frequency of these variants in the 1000G, ESP, and ExAC data sets was determined using wANNOVAR.16 Of the 161 pathogenic variants tested, 43 (27%) were present in either the 1000G, ESP, or ExAC data sets ( Table 1 ). All 5 HH genes were represented among these 43 variants: 12 in HFE, 13 in HFE2, 1 in HAMP, 9 in TFR2, and 8 in SLC40A1. As expected, the most common pathogenic HH variant present in all data sets was the HFE p.Cys282Tyr variant, which was present with allele frequencies of 1.3, 4.8, and 3.24% in the 1000G, ESP, and ExAC data sets, respectively ( Table 1 ). When broken down into subpopulations, the frequency of HFE p.Cys282Tyr was highest in the European-American cohort of ESP (6.4%), the non-Finnish European cohort of ExAC (5.14%), and the European cohort of 1000G (4.3%) (Supplementary Table S2 online). The allele frequencies for HFE p.Cys282Tyr are in good agreement with those observed in other population studies of European individuals.5 The HFE p.Cys282Tyr mutation also was present with lower and varied allele frequencies among other subpopulations (Supplementary Table S2 online); it was lowest among East Asians (0% in 1000G; 0.01% in ExAC). Of the 43 pathogenic variants identified, nearly all (n = 41) were present in the larger ExAC data set, 10 in ESP, and 7 in 1000G; only 3 were present in all 3 data sets (Supplementary Table S2 online). Of the non-HFE p.Cys282Tyr variants, all had low allele frequencies relative to p.Cys282Tyr. For each of the five HH genes, we summed the allele frequencies for all previously reported pathogenic variants to determine the combined allele frequencies of all pathogenic variants for each gene and in each data set ( Table 1 ) and subpopulation (Supplementary Table S2 online).

Table 1 Frequencies of previously reported hereditary hemochromatosis gene pathogenic variants in the 1000 Genomes Project, Exome Sequencing Project, and Exome Aggregation Consortium data sets

Frequency of unreported HH gene major functional variants

In addition to variants that have been previously associated with HH, we identified variants that would be predicted to have major functional or structural effects on the encoded proteins and are therefore likely pathogenic but not previously reported among patients with HH. These included frameshift, premature stop codon, initiator codon, splice donor, and splice acceptor variants. We identified 40 such variants ( Table 2 and Supplementary Table S3 online): 37 were present in the ExAC data set and 5 in the ESP data set; 2 of these were present in both the ExAC and ESP data sets. None were found in the 1000G data set.

Table 2 Frequencies of unreported hereditary hemochromatosis gene major functional variants in the Exome Sequencing Project and Exome Aggregation Consortium data sets

Predicted frequency of pathogenic HH genotypes

Using the pathogenic allele frequencies from the previously reported HH variants ( Table 1 and Supplementary Table S2 online), we estimated the heterozygous and homozygous pathogenic genotype frequencies for each gene using the Hardy-Weinberg equation ( Table 3 ). The affected genotype carrier rates were calculated from the homozygous genotype frequencies for the HFE, HFE2, HAMP, and TFR2 genes and from the heterozygous genotype frequencies for SLC40A1; these were expressed as 1 per “n” of the population that would be predicted to carry HH pathogenic genotypes ( Table 3 ). This analysis was performed for the total 1000G, ESP, and ExAC data sets ( Table 3 ), and also for the different subpopulations of the 1000G (Supplementary Table S4 online), ESP (Supplementary Table S5 online), and ExAC (Supplementary Table S6 online) data sets. The same analysis was performed by combining the allele frequencies derived from the previously reported pathogenic HH variants ( Table 1 and Supplementary Table S2 online) with those derived from the unreported major functional variants ( Table 2 and Supplementary Table S3 online). The major functional variants are highly likely to alter protein structure and function and hence to be pathogenic; they were included in the analysis in an attempt to obtain a more accurate picture of the population prevalence of HH genotypes. The heterozygous and homozygous genotype frequencies and the affected genotype carrier rates for the combined analysis are presented in Table 4 , and also are broken down into ESP and ExAC subpopulations (Supplementary Tables S7 and S8 online).

Table 3 Previously reported hereditary hemochromatosis (HH) gene pathogenic variant heterozygote frequency, homozygote frequency, and predicted HH genotype carrier rates in the 1000 Genomes Project, Exome Sequencing Project, and Exome Aggregation Consortium data sets
Table 4 Hereditary hemochromatosis (HH) gene combined previously reported pathogenic and major functional variant heterozygote frequency, homozygote frequency, and predicted HH genotype carrier rates in the Exome Sequencing Project and Exome Aggregation Consortium data sets

Because the ExAC data set is significantly larger than the ESP and 1000G data sets (>60,000 unrelated individuals compared with >6,500 and >2,500, respectively) and also includes data from ESP and 1000G, we present more detailed results only for the ExAC data set in the following sections.

Predicted frequency of HFE pathogenic genotypes

Our analysis revealed that, based on the ExAC cohort, 1 of 927 (95% CI: 860–970) individuals would be predicted to carry a pathogenic genotype that could cause HFE HH ( Table 4 ). We also broke down the HFE pathogenic genotypes into the common p.Cys282Tyr homozygous genotype and pathogenic genotypes that do not carry the p.Cys282Tyr variant. As expected, the p.Cys282Tyr homozygous genotype was vastly more frequent than non-p.Cys282Tyr pathogenic genotypes, with carrier rates of 1 in 953 (95% CI: 896–1,008) and 1 in 4,991,817 (95% CI: 2,759,351–7,846,276), respectively ( Table 4 ). However, this does not take into account compound heterozygous genotypes containing p.Cys282Tyr and another pathogenic HFE variant. Based on the genotype frequencies of the combined HFE pathogenic genotypes and the p.Cys282Tyr homozygous genotype, we can predict that approximately 1 of every 37 patients with HFE HH—or around 1 in 34,000 of the population—will be compound heterozygous for p.Cys282Tyr and another pathogenic HFE variant. The same analysis was performed on the different ethnic subpopulations of ExAC (Supplementary Table S8 online). This analysis revealed that the highest prevalence of HFE pathogenic genotypes is in the non-Finnish European population, with a predicted pathogenic genotype carrier rate of 1 in 373. This is largely made up of p.Cys282Tyr homozygotes (1 in 379). The lowest prevalence of HFE pathogenic genotypes was observed in the East Asian population (pathogenic genotype carrier rate of 1 in 286,530) and, unlike all other populations, HFE HH genotypes were largely made up of non-p.Cys282Tyr variants (1 in 319,857), with only 1 in 100,000,000 predicted to be p.Cys282Tyr homozygous and only 1 in 10 HFE pathogenic genotypes predicted to contain a p.Cys282Tyr allele.

Predicted frequency of non-HFE pathogenic genotypes

Analysis of variants of the genes causing recessive forms of HH revealed that pathogenic variants causing these forms of iron overload are rare ( Table 4 ). Based on the ExAC data set, homozygous HFE2 pathogenic genotypes were predicted to be present in 1 in 4,791,406 (95% CI: 3,121,527–9,238,643) of the population, with the highest prevalence predicted in the South Asian population (1 in 1,611,664; Supplementary Table S8 online). Homozygous TFR2 pathogenic genotypes have a predicted carrier rate of approximately 1 in 6,164,757 (95% CI: 3,429,355–12,846,700) within the overall ExAC data set ( Table 4 ) and are most frequent among the non-Finnish European population (1 in 753,395; Supplementary Table S8 online). Homozygous HAMP pathogenic genotypes were predicted to be very rare based on the ExAC data set (1 in 181,700,486; 95% CI: 43,282,548–493,827,160). This very low prevalence is likely due to only one previously reported pathogenic allele (p.Arg75*) being observed on two occasions among the entire ExAC cohort of >60,000. One of these was in the East Asian population and one in the non-Finnish European population. Four additional predicted functional variants also were observed ( Table 2 , Supplementary Table S3 online), with the highest frequency among the American population (1 in 4,680,732; Supplementary Table S8 online).

Analysis of variants in the SLC40A1 gene, causing autosomal-dominant ferroportin disease, revealed allele frequencies for eight previously reported pathogenic variants and three additional major functional variants. The combined pathogenic allele frequencies were 0.0364% (95% CI: 0.000313–0.000544) for the overall ExAC data set, giving a predicted pathogenic genotype carrier rate of 1 in 1,373 (95% CI: 920–1,598). This figure approaches the frequency of HFE HH and is largely due to relatively high allele frequencies for two SLC40A1 variants (p.Asp270Val and p.Arg371Trp) in the African population. The predicted SLC40A1 pathogenic genotype carrier rate of these two variants is 1 in 197 among the African population.

Discussion

The high prevalence of HH among European populations is almost entirely due to the high frequency of the HFE p.Cys282Tyr mutation. Many studies have determined the population prevalence of this mutation, including large population screening studies such as the North American Hemochromatosis and Iron Overload Screening (HEIRS) study17 and the Melbourne HealthIron study.18 Southern European populations have lower carrier rates for p.Cys282Tyr,19,20 and the proportion of HH that is related to non-HFE genes is higher in southern Europe and other parts of the world where the p.Cys282Tyr mutation is less common.20 In Asian countries the majority of HH cases are related to non-HFE genes.21 Despite the wealth of publications on the various causes of non-HFE HH, to our knowledge no systematic study has attempted to estimate the prevalence of these atypical forms of iron overload. The frequency of individual non-HFE mutations among the general population have only been estimated to be rare; no other quantitative measure of their frequency is available.

The 1000G,15 ESP, and ExAC data sets provide a huge resource of genomic information that can be mined for a wide variety of applications. We have taken advantage of these data sets to obtain a first estimate of the prevalence of HFE and non-HFE disease-causing variants. By far the most common pathogenic variant was HFE p.Cys282Tyr, which is present at the highest frequency among non-Finnish Europeans (allele frequency 5.14%), similar to that reported in various European populations (allele frequencies ranging from 0 to 14%).5,17,22 An additional 42 HH-causing variants were identified among the three data sets, and 40 unreported variants that are predicted to have major functional or structural effects on the encoded proteins were also used in our analysis. Using the allele frequencies of these variants, we could determine that all the recessively inherited forms of non-HFE HH are predicted to be rare. Pathogenic HFE2 and TFR2 variants were predicted to cause iron overload in approximately 1 in 5 to 6 million people. Pathogenic HAMP variants were even rarer. Given that heterozygosity for mutations in SLC40A1 lead to ferroportin disease, we were surprised to find relatively high frequencies of SLC40A1 variants classified as pathogenic. These variants were found at the highest frequencies among African populations (0.25%) but were also present in the American (0.039%), East Asian (0.033%), and non-Finnish European (0.03%) populations. Two SLC40A1 variants—p.Asp270Val and p.Arg371Trp—solely contributed to the high allele frequency among the African populations. The SLC40A1 p.Asp270Val variant was originally reported in a 27-year-old black South African female with iron overload.23 The p.Asp270Val variant was also reported in an African-American male reported to have mild iron overload and coexistent hepatitis C virus infection.24 He also had stainable iron in hepatocytes and Kupffer cells, a pattern consistent with ferroportin disease. In this study the p.Asp270Val variant also was detected in one of 258 African-American controls; however, no clinical data from this individual were available.24 One study determined that the p.Asp270Val variant had no effect on the hepcidin sensitivity of ferroportin; however, its effect on iron transport was not measured.25 The p.Arg371Trp variant was described as part of a series of new SLC40A1 variants in a family with iron overload.26 In the same publication, another variant affecting the same amino acid (p.Arg371Gln), similarly detected at a low frequency in the ExAC non-Finnish European population, was also reportedly associated with iron overload.26 The observations linking the SLC40A1 p.Asp270Val and p.Arg371Trp variants to iron overload and their relatively high allele frequencies point to a potential role for these variants as a cause of iron overload in African populations. However, further functional studies or larger population studies are required to unequivocally link these variants to impaired ferroportin function and iron overload in individuals carrying these variants. The other six SLC40A1 variants identified in the ExAC data set (c.-59_-45del, p.Ile180Thr, p.Gly204Ser, p.Gly267Asp, p.Arg371Gln, and p.Gly468Ser) have all been associated with iron overload in patients and/or families,26,27,28,29,30,31,32,33 although ferroportin containing the p.Ile180Thr variant seemed to behave similarly to the wild type when analyzed in functional assays.30 The SLC40A1 p.Val162del variant has been reported as a cause of ferroportin disease more frequently (in nine publications) than any other SLC40A1 mutation (Supplementary Table S1 online); however, we did not detect it in any of the genomic sequence databases. It is widely believed that this 3-bp deletion has occurred multiple times in various populations, and there is now evidence for this, with the identification of a de novo p.Val162del variant occurring in an isolated case of ferroportin disease.34 These data suggest that p.Val162del is more frequently identified than other SLC40A1 mutations because it has occurred multiple times in isolated populations rather than occurring once and spreading to different populations, and these factors may explain why we did not detect it in this study.

We used freely available NGS data in an attempt to estimate the population prevalence of non-HFE HH. While we arrived at estimates for the prevalence of non-HFE HH using three data sets, there are some limitations to this study. (i) The size of the populations studied may be enough to determine the prevalence of more common inherited disorders but may be less accurate for rarer inherited disorders. We believe the larger ExAC data set is likely to provide more accurate estimates. (ii) The geographic and ethnic origins of the populations studied are diverse but are still not fully representative of the world’s population because they are more biased toward populations with a European background. It is possible that geographic and ethnic differences in the prevalence of non-HFE HH have not been captured by our analysis. For example, non-HFE HH seems to be more prevalent in southern Europe compared with northern Europe.20 (iii) Our analysis relied on variant data collated from published reports, together with the addition of variants that would be expected to have major functional effects on the encoded proteins; it is possible that other HH-causing variants have been either not identified or not reported in the literature and hence would not have been captured in our study. (iv) The accuracy of the publically available NGS data is another limitation. While on the whole the variant calls for the data sets are likely to be highly accurate, the depth of coverage across samples could be variable, and it has been shown that not all variant calls across NGS platforms can be validated using more traditional approaches such as Sanger sequencing.35 (v) Our study does not take into account potential differences in the penetrance of variants or the possibility that iron overload can be caused by digenic inheritance of variants in HFE and other non-HFE genes.36 For example, the p.His63Asp and p.Ser65Cys variants in HFE are known to contribute to some cases of iron overload but have a much lower penetrance than p.Cys282Tyr.7,8 We decided to exclude these variants from our analysis so as not to over inflate population prevalence estimates.

The method we used to estimate the prevalence of non-HFE HH is very straightforward and could also be applied to estimate the prevalence of a wide variety of other inherited disorders. Indeed, similar methods have been recently used to determine the prevalence of McArdle disease.37 However, the composition of the population data sets should be taken into account when applying this technique to other diseases. While the 1000G data set represents control individuals from geographically and ethnically diverse populations, the larger ESP and ExAC data sets are made up predominantly of European patient and control cohorts that have been used to study a variety of diseases. Hence it would not be advisable to use these data to study the prevalence of variants that may increase the risk of diseases included in these data sets, such diabetes and heart and lung disorders. For example, the ESP data set includes a cohort of patients that has been used to study modifier genes in cystic fibrosis.38 Hence, CFTR pathogenic variants are over-represented in the ESP data set. We believe that it is unlikely that the makeup of the ESP and ExAC data sets has greatly affected our estimation of the prevalence of HH. While iron accumulation in the pancreas can cause diabetes and in the heart can lead to cardiomyopathy, there is little evidence that HH variants increase the risk of these diseases among the general population.39,40 In support of this, the prevalence of the HFE p.Cys282Tyr variant within the ESP data set is very similar to that observed in large population studies from North America,17,22 indicating that there is no enrichment of this mutation.

The current explosion in NGS has the potential to greatly increase the available data from genomic sequencing projects. In the future, variant data are likely to be available from a larger number of human genomes and from more ethnically diverse populations, allowing more reliable and accurate measures of variant frequency and predicted population prevalence of non-HFE HH and a wide variety of other inherited diseases.

Disclosure

The authors declare no conflict of interest.