Introduction

Trinucleotide repeat expansion was uncovered as the genetic mutation for a number of hereditary neurological disorders in recent years.1, 2, 3 These disorders include various neurodevelopmental, neurodegenerative, and neuromuscular diseases, such as Fragile X syndrome, Friedreich's ataxia, Huntington's disease, several spinocerebellar ataxias, and myotonic dystrophy. What is unique about these repeat expansions is their dynamic nature, that is changes of repeat number in both germline and somatic tissues. Repeat instability in germline is thought to be the basis for anticipation, a phenomenon characterized by younger age of onset and worse progression of disease in successive generations; whereas repeat instability in somatic tissues is believed to be important for variable expression of disease phenotypes in tissue- and age-dependent manners.4, 5

Multiple factors are implicated in trinucleotide repeat instability, which are largely divided into two categories. One is characterized as the trans-acting factors, which are often involved in DNA replication and repair, such as FEN1, Msh2, Msh3 and Msh6.6, 7, 8, 9 Deficiency of these trans-acting factors results in genomic instability and cancer formation.10 The other category reflects the local properties of the loci, such as the proximity of CpG islands,11 the orientation of adjacent replication origin,12 and the configuration and copy number of repeats.13 It is generally accepted that the higher the number of trinucleotide repeat is, the more likely the repeat expansion is unstable.14 As for the configuration of repeat expansion, it has been suggested that pure trinucleotide repeats are more prone to expand and/or contract than repeats interrupted by distinct trinucleotides. However, to date, this notion is solely based on observations of repeat size changes during a limited number of parent-to-child transmissions, and there are few direct demonstrations of allele instability at the tissue level.15, 16, 17

Spinocerebellar ataxia type 17 (SCA17) is caused by expansion of CAG trinucleotide repeats in exon 3 of the TATA binding protein (TBP) gene at chromosome 6q27.18 Patients with SCA17 exhibit cerebellar ataxia, pyramidal and extrapyramidal signs, cognitive impairments, psychosis, and seizures.18, 19, 20 It is of note that anticipation is not well documented for SCA17. The CAG expansion locus in SCA17 is polymorphic with a configuration of (CAG)3 (CAA)3 (CAG)n1 CAACAGCAA (CAG)n2 CAACAG, where n1 ranges from 7 to 11 and n2 from 9 to 21. This element has been dissected into five domains, that is, domain I=(CAG)3(CAA)3; domain II=(CAG)n1; domain III=CAACAGCAA; domain IV=(CAG)n2, and domain V=CAACAG. Since both CAG and CAA in this repeat tract code for glutamine, the complex repeat array still encodes pure polyglutamine tract. Normal alleles code for 25–44 tandemly repeated glutamines, whereas mutant alleles code for expanded glutamine tracts in the range of 47–63, and reduced penetrance is seen in the middle range.19, 21, 22, 23 Like other polyglutamine diseases,24 SCA17 is thought to be caused by a gain-of-function mechanism due to the expanded polyglutamine tract within TBP, a general transcription factor that binds the basic promoter element, TATA box.

CAG repeats in SCA17 locus have two distinct configurations, which are differentiated by the absence or presence of domain III. Type I configurations are more complex with a domain III, and more prevalent. They have been reported in Japanese, German, and French patients. CAG repeat expansion usually occurs in domain IV ranging from 26 to 31, whereas domain II remains normal.18, 19, 20 Other more complicated patterns have also been observed, such as (CAG)3 (CAA)3 (CAG)9 CAACAGCAA (CAG)16 CAACAGCAA (CAG)13 CAACAG and (CAG)3(CAA)3 (CAG)9 CAACAGCAA (CAG)9 (CAA)3 (CAG)9 CAACAGCAA (CAG)19 CAACAG.18, 25 This is presumably caused by unequal crossover during recombination. Type II configurations are not as frequent as type I, and have no domain III. As a result, domains II and VI are combined as longer pure CAG repeats. They have been reported in German and Italian families.19, 26 It is of note that intergenerational instability and anticipation were documented in these families.

Small pool PCR (SP-PCR) is a highly sensitive assay in detecting low levels of somatic mosaicism. Through dilution of genomic DNA down to single genome equivalent, SP-PCR amplifies the alleles from single- or oligo-copies of DNA templates. Thus, a large number of alleles can be analyzed individually, and the variant repeats can be detected by ABI 377 and GeneScan analysis. Here, we used SP-PCR to compare the instability of expanded CAG repeats of seven SCA17 families with either type I or type II configurations. Our results showed that instability of expanded CAG repeat is highly dependent on repeat configuration, and CAA interruption is a limiting factor for further CAG repeat expansion.

Materials and methods

Genotyping

Blood samples were obtained from SCA17 patients and age- and gender-matched controls. All these individuals have signed the consent form approved by the Local Institutional Review Boards. DNA was extracted from peripheral blood leukocytes using a standard chloroform/phenol extraction protocol. To determine the progenitor allele size of the SCA17 CAG/CAA repeat, the DNA from each individual was genotyped by PCR as described previously.18

DNA sample preparation for small pool PCR

The amount of input DNA in SP-PCR is critical for quantitative analysis of repeat size mutants. Thus, we have developed a systematic protocol to determine the number of amplifiable alleles in the input DNA. First, DNA concentration was estimated by fluorometer, SP-PCR (see below) was then performed using the samples that contain 6000, 600, 300, and 60 pg of genomic DNA. By Poisson analysis of amplified alleles, the genome equivalent (g.e.) of the amplifiable DNA amount was calculated. Using 1–10 g.e. of input DNA, multiple identical SP-PCRs were performed in 96-well plates to screen at least 180 alleles (Table 2). At least 8 Zero DNA controls (reaction mixture containing all reagents except for the template DNA) were included in each 96-well plate to monitor DNA contamination.

Small pool PCR

SP-PCR utilizes diluted DNA samples as the template. We developed a hemi-nested PCR protocol to detect fluorescent signals from amplified product. The first PCR was performed with the following primer set: forward primer, HTFIID-A: 5′-ATGCCTTATGGCACTGGACTGACC-3′ and reverse primer, HTFIID-B: 5′-CTGCTGGGACGTTGACTGCTGAAC-3′. Genomic DNA was first amplified in a 10 μl reaction mixture containing 250 μ M dNTPs, 10 pmol of each primer, 1 μl of 10 × PCR buffer, and 1 U of Taq DNA polymerase. Initial DNA denaturation was carried out at 95°C for 6 min 45 s, which was followed by 30 PCR cycles of 1 min 15 s at 94°C, 1 min, 15 s at 58°C, 1 min, 35 s at 70°C, and final extension at 70°C for 25 min. Expected size of the PCR product was 120+3n bp, where n was the number of CAG or CAA repeat units.

For the primers in secondary PCR, we used HTFIID-B and a nested forward primer labeled with the florescent moiety, 6FAM. The sequence of the nested primer is FAM-5′-GTCTATTTTGGAAGAGCAACAAAGG-3′. This fluorescent tag allows detection by the ABI 377 Sequencer. The primary PCR product was first diluted 10-folds, then adding 2 μl of diluted initial PCR product to the secondary PCR mixture, for a final dilution of initial PCR of 50-fold. The same PCR conditions as described above were used for the secondary PCR. Expected PCR product is (67+3n) bp, where n was the number of CAG or CAA repeat units.

SP-PCR involves extensive amplification reactions, and consists of a primary PCR followed by hemi-nested secondary PCR across the repeat region. We rigorously controlled for contamination by including ‘zero DNA controls’ in every gel. In SP-PCR, as in regular PCR analysis of most trinucleotide repeat alleles, an amplification of a single stable normal allele results in an artifactual stuttering pattern, which consists of a large band of high intensity trailed by 2–4 smaller bands with decreasing intensity (Figure 1a). In a previous study on SP-PCR, we have demonstrated that the tallest peak corresponds to the actual allele and the smaller peaks are due to PCR artifacts.27 When two DNA samples with different repeat sizes were mixed and analyzed by small pool PCR, the stuttering pattern was altered. If the size of allele differs by one repeat, the smaller allele is detected consistently with the higher intensity, as seen in Figures 1b and c, where the expanded progenitor allele is accompanied by mutant alleles with +1 and −1 repeat, respectively (Figure 1). When the size of alleles differed by two or more repeat units, two distinct peaks were observed. Figure 1d showed a mutant allele with two more repeats simultaneously amplified with the expanded progenitor allele. Thus, SP-PCR is able to differentiate repeat size variations from the stuttering artifacts.

Figure 1
figure 1

Small Pool PCR (SP-PCR) analysis of CAG/CAA repeat expansion in sca17 locus. (a) An expanded progenitor allele of 229 bp (54 CAG/CAA repeats, filled with blue), is trailed by a series of progressively smaller peaks. (b) The expanded allele is coamplified with a mutant allele, which has a deletion of one repeat unit. (c) The expanded allele is coamplified with a mutant allele, which has a further expansion of one repeat unit. (d) The expanded allele is coamplified with a mutant allele, which has a further expansion of two repeat units. (e) Zero-DNA control. The red peaks are DNA standards of 200 and 250 bp. The unfilled blue peaks are PCR stutters.

TOPO cloning

We performed regular PCR using primers described above on each patient's blood DNA. High-fidelity Pfu polymerase was used, and the PCR product cloned into TOPO cloning vector as per the manufacturer's instruction manual (Invitrogen). Ten clones were picked for sequencing analysis for each patient.

Statistical analysis

Each gel contains SP-PCR product from SCA17 patients and controls amplified and analyzed simultaneously. The GeneScan analysis was performed to detect the progenitor alleles and mutant alleles. The allele and mutant frequencies for each patient are scored and computed in the following programs for statistical analysis:28

  1. 1)

    Calibration estimation for Small-Pool PCR: The ‘Calibrate’ program (by BW Brown, UT-MDACC, Department of Biomathematics) uses SP-PCR results to quantify the DNA. The program uses the Poisson distribution of number of alleles observed in each reaction (0, 1, or 2+) combined with the maximum likelihood determination that any one observed allele has more than one copy of that allele present in the reaction. The program estimates the observed amount of DNA present in that set of reactions (relative to the amount of expected DNA) and the 95% confidence interval for that set of data.

  2. 2)

    Mutation frequency estimations for SP-PCR: The ‘FitFreq’ program (by BW Brown, UT-MDACC, Department of Biomathematics) estimates the mutant frequency of a sample. It uses the quantitation data from ‘Calibrate’, the number of replicates, and the number of mutants observed for that sample to calculate mutant frequency, 95% confidence interval, and negative log likelihood.

  3. 3)

    A log-likelihood ratio test combined the negative log likelihoods of the matched normal control and the patient sample, 2(([−log a]+[−log b])−[−log set a combined with set b]), yielding a χ2 value that was then analyzed using ‘STATTAB’ (UT-MDACC Department of Biomathematics) to determine significant differences between the control and patient data. For these analyses, P<0.01 was considered significant. STATTAB is available at <http://odin.mdacc.tmc.edu/anonftp/>.

Results

Somatic instability of the CAG/CAA repeat in SCA17 families

We analyzed DNA samples isolated from peripheral blood leukocytes of nine SCA17 patients. These patients are from different racial background and consist of four index patients of the Japanese families (Families J1, J2, J3, and J4),18 three patients from two German families (Families G1 and G2, including one parent–daughter pair from Family G1),19 and one aunt–niece pair from one Mexican family (Family M1). The allele sizes have been reported previously in the Japanese and German patients, and confirmed here by conventional, or genotyping, PCR analysis of the CAG/CAA repeat alleles (Table 1). In the two Mexican patients, we determined that their genotypes are 35/50 in the aunt and 36/55 in the niece.

Table 1 SCA 17 patients from Japanese, German, and Mexican families

In this study, we used SP-PCR to study somatic instability of expanded CAG/CAA repeats at the TBP locus in SCA17 patients. We analyzed, on average, 271±68 alleles for each of the nine patients in this study. In each patient, the two most frequent alleles represent the wildtype and expanded progenitor alleles, which matches our genotyping results (Figure 2 and Table 2). It is remarkable that the distribution of mutant alleles around the expanded progenitor alleles showed two distinct patterns: patients 1–4 and 7 (group I) have one tailed distribution with the mutant alleles trailing the expanded alleles, but the distributions in patients 5, 6, 8, and 9 (group II) are two tailed with both deletion and expansion mutant alleles flanking the expanded alleles (Figure 2). Thus, mutation in group I patients is strongly biased toward contraction, and the expanded alleles in group II patients are more prone to continuing expansion.

Figure 2
figure 2

Two distinct distribution patterns of mutant alleles in SCA17 patients detected by SP-PCR. In each patient, the normal and expanded progenitor alleles were the most frequent. In patients 1–4 and 7 (group I), there is a one-tailed distribution of the mutant alleles trailing the expanded progenitor allele. In patients 5, 6, 8 and 9 (group II), the mutant allele distribution is two-tailed flanking the expanded progenitor allele.

Table 2 Mutation frequencies of the expanded CAG/CAA repeats in SCA17 patients

Since the mutation load due to repeat instability might be cumulative and dependent on the age of the subjects, we selected normal control subjects who are age-matched and have equivalent repeat size to the wild-type allele of each SCA17 patient. Our data showed that alleles of the normal subjects were relatively stable; regardless of the age, the mean mutation frequency was 5.5±3.4% (n=9). Interestingly, similar to the mutant allele distributions described above, expanded alleles of SCA17 patients showed variable level of stability: in group I patients, the expanded alleles were relatively stable with a mean mutation frequency of 9.6±4.0% (n=5), which is not significantly different from controls (P>0.05, Table 2); but in group II patients, the expanded alleles were remarkably unstable with a mean mutation frequency of 30.6±4.6% (n=4). Therefore, the mutation frequency in group II patients is significantly higher than that in group I patients (P<0.01).

Correlation between somatic instability and sequence motif of expanded repeats

Repeat number is known as an important factor for instability. When we compare the mutation frequency with the number of total CAG/CAA repeats, we found a positive correlation (coefficient=0.76, Figure 3). It is noteworthy that the average number of total CAG/CAA repeats for group I patients is three less than that of group II patients (Table 2; 51±3.4 for type I and 54±2.2 for type II; P=0.17). However, this difference is within the range of variability in each group (Figure 3), and no dramatic changes of mutation frequency were found within either group. Therefore, the difference in total numbers of CAG/CAA repeats is unlikely to be the main determinant for the dramatic increase of mutation rate in group II patients. We thus examined the configuration of CAG/CAA repeats. We performed regular PCR on each patient's blood DNA using the high-fidelity Pfu polymerase and cloned the PCR product into TOPO cloning vector for sequencing analysis. It is quite remarkable that the sequence of the expanded alleles in all group II patients showed the configuration of (CAG)3 (CAA)3 (CAG)n1 CAA CAG (Table 3), and all group I patients had complex configurations of either (CAG)3 (CAA)3 (CAG)n1 CAACAGCAA (CAG)n2 CAACAGCAA (CAG)n3 CAACAG18, 19 or (CAG)3 (CAA)3 (CAG)9 CAACAGCAA (CAG)9 (CAA)3(CAG)9 CAACAGCAA (CAG)19CAACAG25 (Table 3). Thus, there is a strong correlation between CAG/CAA repeat configuration and instability: group I and II patients carry type I and II configurations, respectively. Type II configuration with no CAA interruptions shows increased instability.

Figure 3
figure 3

Correlation of mutation frequency and the number of total CAG/CAA repeats in SCA17. Diamond, group I patients; triangle, group II patients.

Table 3 Configuration of expanded CAG/CAA repeat expansion in SCA17 patients

Correlation between somatic instability and intergenerational changes in the repeat number in SCA17 families

In group I patients, patients 1–4 are the index patients from four Japanese families, in which there were no changes in repeat size during transmission of the expanded alleles from one generation to the next;18 patient 7 is from the German family, in which the expanded allele was also stably transmitted across generations.19 However, in group II patients, patients 5 and 6 are from two German families, where the size of the expanded alleles changed during the transmission.19 Patients 8 and 9, reported in this study, are from one Mexican family; and their expanded alleles were unstably inherited. These data suggest that expanded alleles with the simple motif of (CAG)3(CAA)3 (CAG)n1 CAACAG are remarkably unstable while those of the complex configuration with more CAA interruptions are relatively stable during intergenerational transmission, and this coincides with the degree of instability shown in the expanded alleles of somatic cells, suggesting that a similar mechanism operates in these two situations.

Inverse correlation between the age of onset and the size of expanded CAG/CAA repeat alleles in SCA17

In spite of the dramatic difference in stability between interrupted and uninterrupted CAG expansions, they seem to be equally pathogenic at equivalent numbers. Here, the age of onset and the number of CAG/CAA repeats in the expanded allele were reviewed in 19 patients reported in the literature and two patients from our Mexican family (Figure 4). As previously reported,18 we found a significant inverse correlation between the age of onset and the number of CAG/CAA repeat in this extended series. Interesting outliers are a 48 CAG/CAA repeat allele reported in three asymptomatic members of a German SCA17 family19 and a 43-repeat allele in a patient reported by Silveira et al.21 Although the former asymptomatic patients may develop the disease later in their life, the latter may lower the pathogenic range of CAG expansion down to 43 if potential phenocopy can be excluded. Thus, 43 CAGs might have been seen in both normal and SCA17 subjects, giving rise to a question of reduced penetrance associated with the lower end of the disease allele in SCA17.

Figure 4
figure 4

Correlation of age of onset and number of CAG/CAA repeats in SCA17.

Discussion

In this study, we used SP-PCR to study somatic instability of expanded CAG repeats at the TBP locus in SCA17 patients. SP-PCR is a highly sensitive assay in detecting low levels of somatic mosaicism. Our analysis on TBP locus showed that normal alleles of the CAG repeat have a low level of apparent mosaicism in peripheral blood leukocytes. Regular PCR using greater amount of input DNA from normal subjects cannot detect these rare mutant alleles that were detected by our SP-PCR analysis. Our evidence argues against the possibility that this apparent mosaicism represents artifacts: all zero-DNA controls were negative, and the distribution pattern of mutant alleles differs from sample to sample. Trinucleotide repeats within normal range have also been shown to have a low level of instability at another locus, the Huntington disease (HD) gene. At the upper end of the normal range of CAG repeats, HD alleles show repeat size mosaicism with occasional expansion into the full mutation range; when this event occurs in germline, it may be transmitted to the next generation. This upper end of normal range of CAG repeat in HD alleles is called ‘mutable normal alleles’ (ACMG/ASHG Statement 1998). Single molecule PCR analysis of CAG repeat length at the HD locus has demonstrated that a 30-CAG-repeat allele has an 11% mutation frequency in the sperm of a normal individual.29 A similar study has also been done for myotonic dystrophy type 1 (DM1); SP-PCR analysis of CTG repeat length in sperm showed low level instability of the normal alleles.30

Interestingly, we observed two distinct patterns of mutant allele distribution around the expanded progenitor alleles: patients 1–4 and 7 (group I) have one tailed distribution with the mutant alleles trailing the expanded alleles, but the distributions in patients 5, 6, 8, and 9 (group II) are two tailed with both deletion and expansion mutant alleles flanking the expanded alleles (Figure 2). Remarkably, the mutation profiles match the configurations of CAG/CAA expansion. Group I patients had type I configuration of (CAG)3 (CAA)3 (CAG)n1 CAA CAG CAA (CAG)n2 CAA CAG with expanded (CAG)n2. In contrast, group II patients had type II configuration of (CAG)3(CAA)3(CAG)n1CAACAG with expanded (CAG)n1. Expanded alleles in the complex type I configuration have been shown to transmit from parent to child without changes in repeat size, while those in the simple type II configuration have intergenerational instability.19, 31 Thus, pure expanded CAG repeats are associated with intergenerational instability, while CAG repeats with CAA interruption are stably transmitted from generation to generation.

A change of the allele size during the parent-to-child transmission is usually detected by comparing DNA isolated from blood. These alleles in somatic cells of the parent and the offspring, however, are separated by many cell divisions. Thus, the intergenerational changes of allele size are the combination of repeat size instability in the parental germline and somatic tissues, and the somatic instability in the offspring. Direct comparison of the progenitor allele across generations would be most appropriate, but this is not easy to do. To study the mechanisms of intergenerational instability, an alternative and parallel approach is to perform SP-PCR on DNA isolated from somatic tissues. Our data clearly demonstrated somatic instability in SCA17, which could contribute to repeat size variations between generations. Furthermore, somatic instability also raises the possibility of different expansion sizes in different tissues, which may contribute to the tissue specificity of the disease and genotype–phenotype correlation.

The relationship between the length and number of expanded repeats and their stability has been well characterized. It is widely accepted that the higher the number of repeats, the more unstable the expanded repeats are. However, for repeat configuration there have been no rigorous studies in the literature. The situation where the expanded repeats have similar numbers but different configurations is unique to SCA17 among CAG expansion diseases. In SCA1, some intermediately sized alleles have CAT interruptions that disrupt the polyglutamine tract with histidine, making the allele non-pathogenic and apparently stable during parent-to-offspring transmissions.32 However, these alleles are not fully expanded, and the number of transmissions observed to date is limited. In SCA2, a 34-CAG allele with CAA interruptions has been associated with SCA2-like phenotype.33 However, the stability of the repeat size during transmissions has not been extensively studied. In SCA8, expanded non-coding CTG/CAG repeats are associated with the disease. While the expanded repeats are intricately interrupted by various cryptic triplets and unstably transmitted from generation to generation, the effect of interruptions on the instability has not been systematically studied in SCA8.34, 35 Thus, identification of two distinct repeat configurations in the expanded range in our study provided the opportunity to directly examine the effect of repeat interruptions on repeat instability. Our data clearly demonstrated that expanded CAG repeat alleles in SCA17 patients have variable somatic mosaicism, and the somatic mosaicism is greater with pure repeats. Thus, expanded alleles with pure repeat configuration are less stable than those with interrupted and more complex repeat configuration, at least in blood cells. It remains to be determined whether this observation holds true in other tissues and germ line. Studies in other tissues will address the issue of tissue-dependent repeat instability and provide insight into the mechanism of pathogenesis, and studies in germ line might provide molecular explanation for the lack of anticipation in most of the SCA17 families.

Our data support the theory that CAA interruption, serve as a limiting or stabilizing factor for CAG repeat expansion. Zuhlke et al22 also reported increased intergenerational instability in families carrying pure repeat expansions without domain III (CAACAGCAA). It is of interest that, evolutionarily, the TBP CAG/CAA microsatellite seems to be expanding from nonhuman primates,36 with the most frequent human SCA17 allele carrying 37 repeats, very close to the pathogenic range. Two major mutational processes have been proposed for microsatellite repeat instability, which are DNA strand slippage during replication and unequal chromosomal crossing-over during recombination.37, 38 DNA slippage is considered the predominant mechanism underlying microsatellite repeat instability, particularly in somatic tissues. In germ lines, both processes might occur. Given the proximity of common repeat number and the critical pathogenic value in human TBP locus, mutations by either mechanism will lead to pathogenic repeat expansion. However, SCA17 is a rare disease, and majority of human alleles have complex configuration with all five domains. Therefore, CAA interruption is very like a stabilizing factor for the CAG/CAA microsatellite repeat in TBP locus.