Introduction

Hyperphenylalaninaemia (HPA) is a group of inherited disorders in which plasma phenylalanine concentrations are abnormally elevated, giving a clinical presentation involving abnormal development of the central nervous system with cognitive impairment in untreated children.1 Phenylketonuria (PKU; MIM No. 261600), the most severe form of HPA, is caused by deficient activity of the hepatic enzyme phenylalanine hydroxylase (PAH) which results from mutation at the PAH locus on chromosome 12q23.1.2,3 To date, more than 400 mutations in the PAH gene have been identified in human populations and catalogued in PAHdb [http://data.mch.mcgill.ca/pahdb_new/], a public-domain database.4,5 The PAH locus also contains a number of neutral polymorphisms useful in defining haplotypes, including seven Single Nucleotide Polymorphisms (SNPs), a Variable Number Tandem Repeat (VNTR) motif and a Short Tandem Repeat (STR) polymorphism.2,6,7,8,9 PAH haplotypes, and their associations with specific PKU mutations, have been useful in studying the population genetics of the disorder and in probing the origins of mutations.10,11,12,13,14,15

The genetic basis of PKU and HPA in the Irish population has been investigated in studies which looked separately at the Republic of Ireland and Northern Ireland.16,1718 These reported that three mutations (R408W, F39L and I65T) accounted for over 50% of mutant alleles, although at different individual relative frequencies. The predominant haplotype associated with R408W in both populations was haplotype 1.8,16,17 that is SNP haplotype 1 with the 8-copy VNTR allele according to the revised PAH haplotype nomenclature (R408W-1.8).14 R408W-1.8 exhibits an east to west cline of increasing relative frequency across north-western Europe peaking in Ireland.19 We undertook the present study with the objectives of reaching full ascertainment of HPA mutations in the population of the Republic of Ireland, and of investigating regional differences in the HPA mutation spectrum within the island of Ireland, looking in greater depth at the influence of neighbouring populations on the Irish gene-pool.

Patients, materials and methods

Patients (n=386) were recruited to the study through the National Centre for Inherited Metabolic Disorders and the National Newborn Screening Programme at The Children's Hospital (Temple Street, Dublin 1). Initial diagnosis of hyperphenylalaninaemia was made on neonatal screening using the bacterial inhibition assay20 with confirmation by quantitative analysis of serum amino-acids. Among the 386 patients were 107 sibling pairs and, since only one member of each sib-pair was used for genotypic analysis, 279 unrelated cases in total were available for study. Patient samples were obtained as dried blood spots on Guthrie cards. Small nuclear families (n=24) consisting of affected individuals, siblings and parents were recruited to the study for PAH haplotype determination. Blood samples were obtained with informed consent from parents by venepuncture and as dried blood spots from children. Anonymised control samples (n=50) were obtained, again as dried blood spots, from the National Newborn Screening Programme. Ethical approval for the study was obtained from the Ethics Committee of The Children's Hospital.

Exon sequences of the PAH locus were scanned for the presence of mutation by denaturing gradient gel electrophoresis (DGGE) using primer sequences and conditions described by Guldberg et al.21 Where altered electrophoretic mobility of amplicons was observed, the relevant exons were screened for known mutations by PCR–RFLP assays and/or DNA sequencing. The PCR–RFLP assays used either naturally occurring restriction sites or sites created during PCR amplification (PAHdb). DNA cycle sequencing22 was performed using the Big Dye Terminator cycle sequencing kit (Perkin Elmer) and sequencing products were resolved on an ABI 310 automated fluorescent DNA sequencer (Perkin Elmer Applied Biosystems). Mutations and polymorphisms detected in the PAH gene were identified by comparison with the reference sequence and genotypic data archived in PAHdb.5

The distribution of mutant alleles within Ireland was investigated using data on the county of residence of the birth mother of each of the cases studied. Mutant allele counts were summed for the four provinces of Ireland (Connacht, Leinster, Munster and Ulster). Relative allele frequencies for the province of Ulster could only be assessed in this study for three of its nine counties (Cavan, Donegal and Monaghan) since the remaining six lie within the United Kingdom (Northern Ireland). In order to produce an estimate of relative allele frequencies within Ulster as a whole, allele counts for the Ulster counties in the Republic (this study) and in Northern Ireland17 were summed. Mutant allele frequencies were presented either as relative frequencies or as absolute frequencies to account for variation in disease incidence between populations. Absolute frequency values were calculated as the product of the square root of the disease incidence and the relative frequency of the mutation on the assumption of Hardy-Weinberg equilibrium.23 Statistical tests on the frequency distributions of mutant genotypes in the four Irish provinces were performed using the SPSS software (SPSS Inc., USA). Spatial autocorrelation analysis24 of the geographic distribution of R408W-1.8 absolute mutant allele frequencies in north-western Europe was performed by means of the SAAP software package, which measures the average level of genetic similarity between populations in particular geographic distance classes, expressed in a correlogram of Moran's autocorrelation coefficient plotted as a function of distance. Different migration processes are reflected in correlograms of different shapes.25 The overall significance of correlograms was assessed by the Bonferroni criterion.26 Admixture calculations were carried out using a modification of the Bernstein model27,28 which allows estimation of the relative contributions of two parental populations to a hybrid group assuming that the absolute allele frequencies are known. This is not the case, however, when we treat Ulster as a hybrid population formed between Ireland (excluding Ulster) and Scotland. The overall allele frequency of PKU mutants in Ulster, necessary to estimate the absolute frequencies from their relative frequencies, is unknown, so we introduced a modification of the Bernstein model that allows the simultaneous estimation of the overall allele frequency of mutant alleles in the hybrid population and the admixture coefficients (see Appendix).

To confirm that mutation-haplotype associations for the predominant mutations were as reported previously, PAH haplotypes were analysed in a panel of small nuclear families and in a group of patients homozygous for the most common mutations. Five single nucleotide polymorphisms (SNPs) were analysed by means of PCR–RFLP assays, namely BglII29, PvuII(a)30, PvuII(b) (Eisensmith, personal communication), MspI31 and XmnI.32 The variable number tandem repeat (VNTR) polymorphism (formerly scored as a triallelic HindIII polymorphism) was detected by PCR amplification with flanking primers followed by visualisation and size determination of alleles via polyacrylamide gel electrophoresis.8 Haplotypes were assigned according to the nomenclature of Eisensmith and Woo.14 Minihaplotypes17 associated with specific mutant PAH alleles were determined by genotypic analysis of the VNTR marker, together with the short tandem repeat (STR) marker described by Goltsov et al.,9 in homozygous patients. STR genotypes were determined by PCR amplification and allele sizing by electrophoresis on the ABI 310 automated sequencer.9,33

Results

DNA samples from 279 unrelated hyperphenylalaninaemia cases, equivalent to 558 PAH alleles, were screened by DGGE for the presence of mutation in the 13 exons of the PAH gene. Electrophoretic mobility shifts, indicative of base sequence alteration, were detected for all except 24 (4.3%) alleles. No mobility shifts were detected in exons 1 or 4. Exons exhibiting shifts were then screened for known mutations by means of PCR–RFLP assays. Following restriction analysis, exons with mobility shifts that had not been characterised were analysed by DNA sequencing. For 21 alleles (3.8%), mobility shifts localised to exons 6 (n=1), 7 (n=11), 9 (n=2) and 11 (n=7) could not be sequenced because of poor DNA template quality. Mutations were thus characterised for 513 of the 558 alleles (92.0%), identifying 29 distinct mutations (Table 1). Four polymorphisms/silent substitutions were detected at relative frequencies of up to 2.8%, namely IVS2nt19, Q232Q, V245V and L385L (data not shown). The mutations were distributed across the PAH coding sequence with 34.5% (10 of 29) occurring in exon 7. The majority were missense mutations (24 of 29, 82.8%) with four splice-site mutations and one nonsense mutation (R243X). Nine mutations occurred at CpG dinucleotide motifs (Table 1). In all, 239 of the 279 cases (85.7%) were fully genotyped, 66 (27.6%) having homoallelic genotypes and the remaining 173 (72.4%) having heteroallelic genotypes. A substantial proportion of mutant alleles (63.6%) were accounted for by the mutations R408W (41.0%), F39L (12.2%) and I65T (10.4%). All other mutant alleles were present at relative frequencies of 6% or less. No novel mutations were detected. PAH mutation-haplotype associations were confirmed by haplotype and minihaplotype analysis. Combined SNP and VNTR haplotypes were determined for the predominant mutations in a group of 22 affected individuals in small nuclear families and minihaplotype associations in a group of 38 homozygous patients (Table 2). Both R408W and F39L were found in association with haplotype 1.8 while I65T was primarily associated with haplotype 9.8 (seven of eight alleles; 87.5%). The predominant minihaplotype association for R408W was with the 8:242 minihaplotype (49 of 60 alleles, 81.7%). F39L was found in association with three minihaplotypes (8:238, 8:242 and 8:246) while I65T was predominantly associated with the 8:246 minihaplotype (five of eight alleles, 62.5%).

Table 1 Mutation spectrum of hyperphenylalaninaemia in the Southern Irish population
Table 2 Associations with PAH haplotypes and minihaplotypes observed for the three most common mutations in the Southern Irish population

The geographic distribution of mutant alleles within the island of Ireland was investigated using a combined data-set produced by merging the data generated in this study with extant data for Northern Ireland (Table 3).17 Significant differences in frequency distributions of the eight most common mutant alleles were observed between the four provinces (P<0.0001). Based upon this analysis, R408W was found to occur with the highest relative frequency (55.8%) in Connacht, the most westerly province. Given that R408W is associated with haplotype 1.8 in Ireland, these data can be added to those previously reported by Eisensmith et al.19 to show that the cline of R408W-1.8 relative frequency across north-western Europe peaks in Connacht (Table 3, Figure 1A). Spatial autocorrelation analysis was used to test the significance of this cline, analysing R408W-1.8 absolute allele frequencies. The correlogram obtained (Figure 1B) is statistically significant (P<0.001) and is consistent with a regional cline of R408W-1.8.

Table 3 Geographic distribution of mutant PAH alleles within the island of Ireland. Allele counts for eight mutations are given together with their relative frequencies expressed as percentage values (in brackets)
Figure 1
figure 1

A cline of frequency of the R408W-1.8 Phenylketonuria mutation across north-western Europe peaking in Connacht, the most westerly province of Ireland. (A) Depicts relative frequencies of R408W-1.8 in each of fifteen European populations18,19, represented by the shaded areas of the relevant pie-charts superimposed on the outline map; values for the provinces of Ireland derived from this study are similarly represented on the inset map. (B) Presents a correlogram obtained by spatial autocorrelation analysis (using the SAAP software) of R408W-1.8 absolute allele frequencies for 13 populations (the four Irish provinces, Scotland, England, France, Germany, Spain, Denmark, Sweden, NW Norway and SE Norway). Individual data points that attained significance at the P<0.05 level are indicated by asterisks (*). The overall significance of the correlogram was P<0.001 (calculated by the Bonferroni criterion).

In contrast to R408W, the F39L and I65T mutations exhibited maximal relative frequencies in Leinster (13.6%) and Ulster (20.0%) respectively (Table 3). IVS12nt1, the most common English PKU mutation (27%),34 was found to occur at relative frequencies of 3–6% in Munster, Connacht and Ulster but was relatively uncommon (0.6%) in Leinster. In addition, three mutations (F299C, R408Q and Y414C) which are prevalent on the Atlantic coast of Norway35 were found to be under-represented in Connacht as compared to the other provinces (Table 3).

The sub-division of data on the basis of the historic provinces of Ireland facilitated an examination of the influence on Ulster of its neighbouring population, Scotland. Admixture calculations were thus performed using the frequencies of 13 mutant PAH alleles in Scotland34, Ulster (as the hybrid population) and the rest of Ireland (Table 4). By the method described in the Appendix, the proportion of Scottish admixture in Ulster was estimated as 0.4561 (46%). The overall mutant PAH allele frequency in Ulster was also estimated in these calculations as 0.0131, corresponding to a disease incidence of approximately one in 6000 (assuming Hardy–Weinberg equilibrium).

Table 4 Ulster as a hybrid population formed by admixture between Scotland and Ireland. The proportion of Scottish admixture in the population of Ulster was estimated as the mean value of m=(q2(rh-r2))/(q1r1-q2r2-rh(q1-q2)) using relative allele frequency data for 13 mutant PAH alleles in Ireland (excluding Ulster; r1), Scotland (r2) and Ulster (rh)

Discussion

The mutation spectrum of hyperphenylalaninaemia in the Irish Republic has been elucidated, mutations being detected in 95.7% of alleles (534 of 558), a detection efficiency comparable with other recent studies.17,34,36 Mutations were fully characterised in 513 of the 558 alleles (92%), identifying 29 distinct mutations in 11 of the 13 PAH exons. Mobility shifts were not detected in the case of 24 alleles (4.3%) despite scanning of the entire PAH coding sequence by DGGE. These mutations may have been either nucleotide substitutions localised deep within intronic sequences or in the flanking 5′ and 3′ untranslated regions of the PAH gene, or exonic deletions. In a further 21 alleles, mobility shifts localised to specific exons could not be characterised due to insufficient DNA template quantity or quality. In some cases, the age of the Guthrie card resulted in poor PCR product yields that precluded sequence analysis. Mutations were distributed across the coding sequence but occurred largely in the highly conserved region spanning exons 4 to 11 of the PAH gene.37 The three most common mutations were R408W (41.0%), F39L (12.2%) and I65T (10.4%), together accounting for 63.6% of mutant alleles (Table 1). All other mutations were present at relative frequencies of less than 6.0%. These findings confirm the results of a previous study by O'Neill et al.16 A similar situation was observed in the Northern Irish population where the same three mutations were found to account for 57.5% of mutant alleles, although at different individual relative frequencies.17 While no novel mutations were detected in the study, this is the first report in north-western Europe of A447D, a mutation previously detected in the U.S. population.38 SNP haplotypes14, minihaplotypes17 and their associations with the predominant HPA mutations were investigated using a cohort of small nuclear families and unrelated homozygous individuals. The most common mutation, R408W, was associated with haplotype 1.8 and principally with the 242 bp allele of the short tandem repeat (STR) marker (Table 2). F39L was found to be associated with haplotype 1.8 and three STR alleles (238, 242 and 246 bp). I65T showed the greatest variation in haplotype associations, being associated mainly with haplotype 9.8 (but also with 1.8) and exhibiting a range of STR allele associations. These haplotype and minihaplotype associations are similar to those previously reported for Northern Ireland, western Scotland and south-western England.17,19,34,39

Relative allele frequencies were calculated for the eight most common mutations in the four provinces of Ireland (Table 3). The provinces have been recognised as distinct regions for longer than the modern geopolitical subdivisions, having been established in the early historic period (400–1000 AD) with their boundaries fixed in the early modern period.40 This, coupled with relatively low mobility within Ireland until recent times,41 makes the provinces a reasonable basis upon which to look for evidence of stratification within the Irish population. Our data show that the relative frequencies of the most common mutations vary considerably between the provinces. For example, F39L showed maximal frequency in Leinster (the east coast), whereas I65T peaked in Ulster (the north/north-west). Perhaps most interestingly, R408W occurred with the highest relative frequency (55.8%) in Connacht, the most westerly province (Table 3).

R408W is known to have arisen by recurrent mutation on chromosomes of two distinct haplotype backgrounds: R408W-2.3 exhibits a west-to-east cline of relative frequency reaching its maximum in the Balto-Slavic region while R408W-1.8 exhibits an east-to-west cline peaking in Ireland.19,39,42,43 Our data demonstrate that the east to west cline of R408W-1.8 continues across Ireland to peak in Connacht (Figure 1A). Spatial autocorrelation analysis24 (Figure 1B) has determined that the data are consistent with a regional cline of R408W-1.8, indicating that genetic drift has not erased a pattern of genetic variation most likely established by human migration. Although selection can not be excluded in establishing this pattern, it is unlikely that it would have acted differently on R408W-1.8 and -2.3, given that they produce identical phenotypes.

It has been suggested that the R408W-1.8 mutation arose independently in Ireland and that Ireland acted as its centre of diffusion, based upon the observation of ‘Celtic’ PAH alleles in Scotland, Scandinavia, and the New World.17,18,19,44,45,46 However, these observations are linked to population movements in the historic period, such as Irish settlement in Scotland and England in the 3rd to 5th centuries AD,40 population movements associated with the Vikings in the 8th to 12th centuries AD47,48,49,50 and Irish emigration in the last three centuries.44 R408W-1.8 is likely to be an ancient mutation, given that it is associated with at least four STR allelic variants (PAHdb), one less than the I65T mutation which is believed to be Palaeolithic.18 Therefore, the distribution of the R408W-1.8 allele in Europe is more likely to reflect earlier human dispersals. A more plausible hypothesis is that the cline of R408W-1.8 in north-western Europe reflects an older pattern of genetic variation substantially unaltered by the Neolithic dispersal of individuals of near-eastern origin carrying other alleles into Europe. In support of this, it should be noted that the cline of R408W-1.8 parallels a number of frequency clines in north-western Europe for alleles at unlinked loci, including the Q188R mutant allele in transferase-deficient galactosaemia,46 blood group O51 and Y chromosome haplogroup 1,52 all of which reach maximal frequency in Ireland (the latter two in Connacht). Where parallel allele frequency clines are observed for a number of unlinked genes, they are more likely to have been generated simultaneously by a single population migration process rather than by several independent processes of dispersal.23 In this case the peak of R408W-1.8 frequency in Ireland, and of other markers in north-westerly areas least affected by the expansion of Neolithic populations, should be regarded as the genetic traces of the Palaeolithic Europeans.52,53,54 Contrary therefore to previous views18, R408W-1.8 may indeed be a PAH mutation which links Ireland with Europe.

Some argue that Ulster should be considered separately since its history has been marked by distinct demographic forces including Ulster–Irish settlement in western Scotland (3rd to 5th Centuries AD), Viking settlement of and exchange between western Scotland and Ireland (8th to 12th Centuries AD), and immigration from Scotland in the Plantation of Ulster (17th Century AD).40,47,55,56 To investigate the extent of Scottish admixture in Ulster, calculations were performed using a modification of the Bernstein model27 and data for thirteen mutant PAH alleles in Scotland, Ulster (as the hybrid population) and the rest of Ireland (Table 4). The estimated proportion of Scottish genes in Ulster (46%) is consistent with Ulster having been a zone of significant Irish-Scottish admixture. In addition, a number of mutations previously observed at low frequencies (2.5%) in the Ulster population (T380M, R243Q, S273F, L194P and IVS8nt1) have been reported at comparable frequency in western Scotland.34 None of these were detected by DGGE or PCR–RFLP assay in this study of the Republic of Ireland.

Scandinavian PAH mutations (F299C, R408Q, Y414C and G46S) accounted for 6.1% of the mutant alleles detected in this study while G272X, the most common Norwegian mutation, was not observed (Table 1). Mutations associated with the north-west coast of Norway (F299C, R408Q and Y414C) were more common than those associated with south-eastern Norway.35,57 This supports the view that the Scandinavian contribution to the Irish gene-pool was significant and mediated largely by Viking incursions from north-western Norway.18 Although the major direction of gene-flow was from Norway to Ireland, the possibility of flow in the opposite direction cannot be ruled out. Analysis of the geographic distribution of Scandinavian mutations within Ireland (Table 4) has shown that they account for the lowest proportion of mutations in Connacht (1.9%) compared to the other provinces (Munster 7.5%, Leinster 5.9%, Ulster 12.4%).

This study was undertaken to produce a detailed picture of the mutation spectrum underlying HPA and PKU in the Southern Irish population. By merging this with the data for Northern Ireland, the composite data-set yields insights into factors which have shaped the population of the island, underlining the utility of recessive disease-associated alleles in the study of human populations.