Introduction

Friedreich ataxia (FA) is an autosomal recessive neurodegenerative disorder of unknown etiology. FA is the most prevalent early onset ataxia, characterised by degeneration of the dorsal root ganglia and atrophy of the posterior columns and spinocerebellar tracts. Diagnosis is based on clinical criteria, according to Harding [1], and on electrophysiological evidence of axonal sensory neuropathy. The Friedreich ataxia locus (FRDA) was mapped to chromosome 9q in 1988, by Chamberlain et al. [2]. Despite phenotypic variation, genetic homogeneity was demonstrated by linkage to the anonymous markers D9S15 and D9S5 [35], including the clinical variants late-onset Friedreich ataxia (LOFA) [6] and Friedreich ataxia with retained deep-tendon reflexes (FARR) [7]. Expansion of the physical map allowed to isolate a large collection of linked markers [4, 813], spanning around 900 kb on chromosome 9ql3 (fig. 1). FRDA was progressively narrowed down to a 150 kb region [10, 1218] where the FRDA gene has been finally isolated [19].

Fig. 1
figure 1

Physical map of the analysed FRDA-linked region in chromosome 9q. Studied markers and position of the FRDA (X25) gene and the new marker FAD1 are indicated.

Many genetic and physical strategies have been used to aid the positional cloning of the gene. Haplotype analysis can be an effective tool to predict the location of disease causing mutations among closely linked markers and to estimate the spectrum of mutations accounting for a disease in homogeneous populations, although the results can be sometimes confusing [20, 21]. The approach is based on the inference of ancient crossovers through haplotype divergences observed in the present population. Analysis of extended haplotypes in a series of normal and FRDA Spanish chromosomes with previously described polymorphisms gave us the first clue that the Spanish population is fairly homogeneous regarding the number of different mutations causing the disease. The present isolation of a new marker (FAD1) in strong linkage disequilibrium with FA has allowed us to postulate that a main ancient FRDA mutation could exist, at least in the studied French and Spanish populations. The recent identification of a main FRDA-causing mutation in the X25 or FRDA gene [19] is in agreement with our results, but the nature of the mutation — an intronic GAA trinucleotide expansion — also opens the possibility that linkage disequilibrium with the new marker FAD1 might reflect a premutational state of chromosomes at risk for FA.

Materials and Methods

Cases

The 46 analyzed FA Spanish families were selected on the basis of a previous diagnosis of FA following Harding’s [1] essential diagnostic criteria, including electrophysiological studies and the presence of hypertrophic cardiomyopathy.

For the FAD1 frequency analysis, we chose 57 normal Spanish, 27 normal French, and 30 normal individuals with Basque ancestry as control groups; on the other hand, 75 Spanish and 33 French FA patients were analyzed.

DNA Analysis

DNA was extracted from peripheral leukocytes with standard phenol extraction and ethanol precipitation. Restriction fragment length polymorphism (RFLP) analysis of 26P/BstXl (D9S5) was carried out as described in Monrós et al. [18]. STRPs analysed were: GS2 (D9S111) [9], FD1 (D9S41 IE) [10], MLS1 and FR1 (D9S202) [11, 12], FR2 (D9S886), FR8 (D9S888), FR7 (D9S887) and FR5 (D9S889) [12]. PCR were performed according to authors’ instructions, electrophoresed on 10–12% nondenaturing Polyacrylamide gels and visualised after gel silver staining. Order and physical distances of loci are those reported in several different physical maps [9, 1013, 16,17].

Description of the New Marker FAD1

During a search for transcribed sequences in the critical FRDA region, a new exon of the gene X104/CSFA1 (Z02)[16, 17] was identified by 5′RACE PCR and by prediction with GRAIL-FEXH algorithms. This exon was located 25–50 kb proximal to FR2 (fig. 1). A new single nucleotide polymorphism (SNP) called FAD1 was detected by single strand conformation polymorphism (SSCP) analysis. PCR exon amplification using primers indicated in figure 2 was performed in a final volume of 50 µl containing 250 ng of DNA, 200 µM of each dNTP, 25 pmol of each primer and 1 U of Taq polymerase (Boehringer Mannheim, Germany). For SSCP analyses, 1.5 µl of the PCR product and 1.5 µl of loading buffer were denatured 5 min at 95 °C, placed 3 min in ice and electrophoresed on a nondenaturing 12% acrylamide gel with 5% glycerol at room temperature. FAD1 alleles were visualized by gel silver stain. Exon sequencing showed that FAD1 is a diallelic system with either cytosine (allele A) or adenine (allele B) transversion, that cannot be detected by restriction analysis. Exon nucleotide sequence and the precise site of the nucleotide substitution FAD1 are shown in figure 2.

Fig. 2
figure 2

Nucleotide sequence of the new ZO-2 5′exon. Intronic sequences are shown in lower case. The precise site of the nucleotide substitution that produces the FAD1 polymorphism is indicated by an arrow. S1 and S2 represent forward and reverse primers, respectively.

Haplotype Analysis

Genotypes were obtained for patients, their parents and all other available relatives. Normal and FRDA haplotypes were reconstructed based on patients’ alleles, and only the non-ambiguous ones were used in the analysis. Major or consensus haplotypes (I, I′, II, II′ and III, see fig. 3) were identified due to their high frequency in FRDA chromosomes. The probability of a path linking a given haplotype to a major haplotype was calculated through the product of the probabilities of the recombination or/and microsatellite mutation events that differentiate them. Parameters used were: mutation rate at SNPs = 10−9, with unlikely recurrent mutation; mutation rate at STRPs = 1O4. The probability (p) of recombination between two given markers with formation of each specific divergent haplotype was calculated according to the equation

Fig. 3
figure 3

Schematic representation of main groups of FRDA haplotypes defined in the Spanish population. Within each group, the consensus haplotype is boxed with a thick line, and haplotypes derived by recombination are boxed with a thin line. Vertical lines indicate the most probable recombination site, a Haplotypes derived from major haplotype I; b haplotypes derived from major haplotype II; c haplotypes conforming group III; d rare haplotypes.

$${\rm{p = 1}}{{\rm{0}}^{ - 5}}*{\rm{D*}}\mathop \Pi \limits_{{\rm{t = 1}}}^{\rm{n}} {{\rm{f}}_{\rm{l}}}$$

where 10− 5 is the recombination rate per 1 kb, according to the relation 1 cM = 1 Mb; D the physical distance between recombinant markers, in kb; f1 the allele frequency of the recombinant alleles in the normal population. For the pairs MLS1-FR1, FR8-FR7 and 26P-FD1, haplotype frequencies in the normal population were considered as there is strong linkage disequilibrium within each group.

Related haplotypes that could share a common origin with the defined consensus haplotypes were assigned to each group taking into account the different probabilities of the evolutive events. Odd ratios were calculated when more than one path of divergences was possible, to decide which was the most likely.

Results

Analysis of Haplotype Divergences

We have analyzed a series of 46 FA families of Spanish ancestry with the 10 markers shown in figure 1 and unambiguously constructed 88 FRDA haplotypes and 89 normal haplotypes with all the above mentioned markers. Only one chromosome was taken into account when parents were known to be consanguineous and the affected sibs showed homozygosity by descent for the entire haplotype.

Haplotype analysis of the chromosomes of the normal population showed that no single haplotype was present more than twice in the sample (maximum frequency = 0.02), most of them being unique haplotypes. By contrast, consensus haplotypes were easily found in FRDA chromosomes due to their relatively high frequency in the disease population (fig. 3). The more frequent were 3-1-2-3-2-8-B-8-3-8 (Haplotype I) and 3-1-2-3-2-8-B-3-7-6 (haplotype I′) (fig. 3a), accounting for 9% and 8%, respectively, of FRDA chromosomes but not found among normal chromosomes (Fisher exact test, p = 0.001). Both haplotypes might be related through a single crossover between FAD1 and FR8, according to the position of the X25 gene. The hypothesis of a common origin of haplotypes I and P is reinforced by the fact that we did not find any distal GS2-26P-FD1-MLS1-FR1 3-1-2-3-2 haplotype in a series of 121 normal chromosomes. Further analysis showed that 50% (44/88) of the FRDA haplotypes could have evolved from the consensus haplotype I, through single recombinations and/or microsatellite mutations, suggesting their relationship to a common ancestral FRDA mutation. Figure 3a shows derived I and I′ FRDA haplotypes through single recombination events. Haplotypes derived by microsatellite mutation are not shown, but their total number is indicated at the bottom of the figure. Relative odds of the possible paths relating the various derived haplotypes through mutation and/or recombination were calculated as described in Materials and Methods (data not shown).

Example: Divergent haplotype 3-1-2-3-2-8-B-12-3-8 could have evolved from consensus haplotype I′ by: (a) Microsatellite mutation at FR8 (changing allele 8 to allele 12); probability of microsatellite mutation p = 10−4. (b) Recombination between the FRDA locus and FR8. Its probability (p) was calculated according to the formula shown in Materials and Methods: the probability of recombination within a distance of 150 kb is 10−5 * 150 = 1.5 * 10−3; the probability of carrying the proximal haplotype 12-3-8 corresponds to the frequency of this haplotype in the normal population. This was calculated by taking the observed frequency of 12-3 minor haplotype in our series (FR8 and FR7 are very close markers in high linkage disequilibrium), which was supposed to be 0.001 since we did not observe any, multiplied by the observed frequency of allele 8 of FR5, which was 0.35. In summary, according to the formula: p = 1.5 * 10−3 * 0.001 * 0.35 = 5.2 * 10−7. Odds ratio between mutation and recombination is 10−4/5.2 * 10−7 = 192, indicating that mutation is 192 times more likely than recombination to explain divergence.

Haplotypes II and II′ (fig. 3b) were first related on the basis of the allele 11 of FR2, which was found in 10 out of 133 FRDA chromosomes, but not in 146 typed normal chromosomes (p = 0.001). As for Haplotypes I and I′, the ancient recombination inferred from the divergence of haplotype II′ from haplotype II is in agreement with the position the FRDA gene. Haplotype II derivatives were recognised as described before, the whole group accounting for 25% (22/88) of FRDA chromosomes.

The closely related group of 4 different haplotypes formed group III, with a 7% frequency among FRDA chromosomes (fig. 3c). The estimated probabilities that the remaining 17 haplotypes (19%) shown in figure 3d belong to one of the major consensus haplotype groups I, II or III were very low.

A results argue for an important genetic homogeneity of the disease in the Spanish population despite pheno-typic heterogeneity. No correlation between haplotypes and classical [1] or variant FA phenotypes [6, 7] was observed.

Analysis of FAD1 Polymorphism

FAD1 allelic frequencies in normal and FRDA chromosomes (table 1) indicate a strong association of the common allele A to the normal chromosomes and of the rare allele B to the FRDA chromosomes (p < 0.0001), both in the Spanish and French populations. No differences between general Spanish, Basque and French allelic frequencies were found, neither between allelic frequencies in FA patients of Spanish and French ancestry. Table 2 shows the FAD1 genotypic distributions among normal (general Spanish and Basque) and FRDA Spanish chromosomes. The χ2 test shows the strong disequilibrium of the BB genotype with the disease (p < 0.0001). There is a certain lack of BB homozygotes among controls (χ2 = 4.09; 1 d.f.; p = 0.043) and excess of AA among patients, but the number of chromosomes analysed does not allow to draw clear conclusions concerning on the existence of a Hardy-Weinberg equilibrium.

Table 1 Frequencies of FAD1 alleles in normal and FRDA chromosomes
Table 2 FAD1 genotypic distribution among normal and FRDA Spanish populations

Phylogenetic Relationships among FRDA Haplotypes

The haplotype analysis (fig. 3) showed that the less frequent allele B of FAD1 was associated to FRDA haplotypes of groups I and II with only four exceptions (2/44 in group I and 2/22 in group II), that could be explained by recombination between FAD1 and the FRDA gene. Among the 17 rare haplotypes, all but four were associated to the B allele. Haplotypes of group III were associated to allele A.

Considering the mutation stability of single nucleotide polymorphisms (SNPs) and the allelic and genotypic distribution of FAD alleles in the normal and FA populations, two explanations are possible: either most of the B alleles represent the same ancestral FRDA mutation, or the B allele is an at risk allele for FA.

The method used to recognise groups of haplotypes takes into account the most likely path to link the various haplotypes. It is a maximum parsimony procedure and may have a phylogenetic interpretation that is summarised in figure 4, where the spectrum of different FRDA mutations and their frequencies are shown. Haplotypes of groups I and II share allele B: they could reflect the same ancestral FRDA mutation or represent two independent FRDA mutations, both arising by chance on the same B allele of FAD1. Haplotype III is associated to allele A, suggesting a different origin of this less prevalent mutation. Rare B haplotypes are shown in figure 4 as having originated independently from major FRDA mutations, but their origin is difficult to disentangle: their derivation by recombination/mutation from a unique source in haplotypes of groups I or II cannot be ruled out nor can we expect that many different FRDA mutations arose in association with a relatively rare allele, unless it be a mutation-prone allele.

Fig. 4
figure 4

Schematic tree relating haplotype groups I, II and III and rare haplotypes, according to FAD1 alleles. Arrow thickness is proportional to the percentage of haplotypes found in the Spanish FRDA chromosomes. The discontinuous line represents the unresolved origin of Haplotypes I and II. Estimated percentages of each supposed related FRDA mutation are shown.

The strong linkage disequilibrium of FAD1 with the disease and the analysis of FRDA-linked haplotypes are reflecting a common FRDA mutation that could account for up to 90% and not less than 50% of FA-causing mutations in the Spanish population. Seven percent of FRDA chromosomes share a different mutation, associated with haplotype III, and a low percentage (not less than 4% and never more than 19%) could be due to other rare mutations. Maximum and minimum probabilities depend on whether we consider a single founder effect mutation associated with FAD1-B or the alternative hypothesis that the different FRDA mutations have originated in an atrisk allele.

Discussion

We have reported results on haplotype analysis of chromosomes carrying the FA-causing mutation, that indirectly suggest that FA might be caused by a small number of different mutations in the FRDA gene, one of them being more prevalent in the Spanish population. Despite the analysis of haplotype divergences is usually done in an intuitive manner, we have tried to formalise the relationships among similar haplotypes to obtain the relative likelihood that each observed haplotype had arisen from a given consensus haplotype, in a most parsimonious way.

The efficacy of haplotype divergence analysis as a tool for tracing back disease mutations depends on the homogeneity and age of the disease causing mutation(s), the population history, the particular recombination rate of the region under study and the mutation rate of each polymorphism. In the present work, FA families came from the whole Spanish country where a recent founder effect was not expected to exist, and no correlation between haplotypes and geographic areas was observed [22]. We have analyzed 8 STRP markers and 2 SNPs that span a region of 900 kb near the 9q paracentromeric heterochromatin. It is worth noting that microsatellites can be a double-edged sword for linkage disequilibrium studies: they are highly informative and provide an increased chance of detecting disequilibrium as compared to SNP, but their high and variable mutation rate (10−3–10−4) is also a mechanism of disequilibrium decrease — along with recombination — though not related to physical or genetic distances; moreover, STRPs undergo recurrent mutations. All these reasons can hinder the interpretation of the identity by descent of haplotypes based on STRPs when the number of meioses to a common ancestor is high. In fact, among 320 meioses we have detected one mutation at FR1 marker with loss of one repeat in a parent to sib transmission, and inferred a second one at FR7 in a second-degree consanguineous family, with gain of two repeats. In spite of that, we have been able to define three consensus haplotypes that account for more than 80% of Spanish FRDA chromosomes.

We have also described the identification of a new SNP called FAD1, which maps approximately 100 kb distal to the FRDA locus. SNPs have a very low mutation rate (10−9) and extremely unlikely recurrent mutation, arguing for a single origin of each base substitution. As a consequence, they allow the definition of monophyletic chromosome groups. However, it is difficult to explain the observed differences in FADl allelic and genotypic distributions among normal and FA populations. We have seen that the rare allele B of FADl is present in haplotypes of groups I and II, and in 14 out of 17 rare haplotypes. Genetic drift cannot explain the presence of the B allele in a wide background of different FRDA haplotypes. Two alternative explanations are possible: (a) the B allele reflects one or two ancient FRDA mutation(s) from which almost all B-associated FRDA haplotypes have derived, or (b) allele B reflects an FRDA-muta-tion-prone allele. Both hypotheses would agree with the finding of the same FADl allelic distributions in the Spanish and French normal and diseased populations, which share a remote ancestry. Regarding the last hypothesis, we know now that an intronic GAA trinucleotide expansion (size varying from 200 to more than 900 repeats) has been identified as the FA-causing mutation in 98% of FRDA chromosomes [19]. In the normal chromosomes, the GAA repeat length is also polymorphic. Its length varies from 7 to 22 repeats, but small alleles are more frequent. We could hypothesise that FAD1-B alleles in the normal population could be associated with the longer GAA repeats within the normal distribution, forming a group of at-risk alleles, i.e., a premutation group that could be prone to expansion leading to the mutated FRDA alleles, as has been described for the fragile X syndrome (FRAXA) [23] and the Huntington disease (HD) [24]. Further studies are need to test the validity of this hypothesis.

In conclusion, haplotype analysis in the FRDA region has corroborated the FRDA locus position between closely linked markers, offering a powerful tool for refined mapping of disease genes. Its resolution can be greater than analysis of loss of homozygosity in consanguineous families, which are not easy to collect, and linkage disequilibrium, which is extremely dependent on the marker allelic distributions in the normal and disease chromosome populations. On the other hand, we have demonstrated that a small number of different mutational events lead to FA in the Spanish population, where the finding of one major haplotype indicates that at least 50% of the FRDA chromosomes belong to the same ancestral mutation. This figure can be enlarged to 90%, depending on whether the FAD1-B allele reflects one ancient and widespread FRDA mutation or whether it is an allele at risk for FA.