Introduction

Microsatellites are generally defined as simple sequences of 1-6 nucleotides that are repeated multiple times and are present in both coding and non-coding regions of the genome. Repetitive sequences are well represented in the eukaryotic genome 1, 2, 3, 4, 5, 6, 7 and have been reported to be hot spots for recombination as well as sites for random integration 8, 9, 10. Thus, alterations in simple repetitive sequences lie at the center of DNA evolution and sequence diversity that drives adaptation. On the other hand, changes in repetitive sequences can result in deleterious effects on gene expression and function, leading to disease. Simple trinucleotide repeats (TNR) have taken on special significance in this regard since genomic amplification of TNR is the underlying genetic defect in a number of human diseases including neurodegenerative and neuromuscular diseases and mental retardation 11. Potential mechanisms for TNR expansion have been extensively reviewed in the last two years 11, 12. Therefore, in this review, we focus on interpreting the likelihood of the proposed mechanisms in consideration of general features of genome dynamics from different species. The features of microsatellite instability observed in bacteria, yeast, mice, and man can define the magnitude and direction of changes expected at TNRs and provide general clues as to how genomes evolve and how certain instability could contribute to human disease.

Incidence and significance of microsatellites

Repetitive sequences constitute 30% of the human genome, and are often sites of deletions and insertions 1, 2, 3, 4, 5, 6, 7. The incidence of repetitive elements is much higher than that of random sequences of the same base composition 3, 4, and the different microsatellites are represented in the genome at different frequencies. For instance, repeats of di- and tetranucleotides are more abundant than trinucleotide repeats (TNR) in all eukaryotes 2, 3, 4, 5, 6. However, when the distribution of simple repeats is compared among exons, introns and intragenic regions, TNRs and hexanucleotides prevail in exons in all taxonomic groups 7. Moreover, TNRs, which are polymorphic in nature, are longer in humans than in other species 3, 4, 5, 6. An evolutionary trend towards expansion of CAG repeats increases in order from monkeys, to apes, to humans 3, 4, 5, 6. For example, the number of CAG repeats in the androgen receptor gene in monkeys is similar to that in rodents, ranging from 1-4 repeats, whereas this number increases to 17 units in great apes and up to 26 in humans 7.

Repetitive sequences and human pathologies

Remarkably, a high rate of microsatellite instability was discovered in human cancers, first noted in cases of hereditary non-polyposis colorectal cancer (HNPCC) 13, 14, 15. The most common underlying cause of microsatellite instability in HNPCC is germ line mutation in one or more components of the mismatch repair (MMR) system 13, 14, 15, 16. It is now accepted that unstable maintenance of microsatellite repeats occurs in about 15% of sporadic colorectal cancers 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26. Microsatellite instability is also frequently associated with ovarian cancers 17, 21, 22 and other malignancies, including tumors of endometrium 22, 23, skin 24, brain 25, stomach 23, 26, 27 and small intestine 26 among others. In most cases of HNPCC 13, 16, 18, 20 and ovarian tumors 17, 21, 22, the majority of known mutations occur in Mlh1 and Msh2. Mutations in Msh6 19, 22 and Pms2 25 are observed less frequently, and mutations in Msh3 are rare. In some tumors, Mlh1 deficiency arises from promoter methylation and consequent shut-down of Mlh1 gene expression 28, 29, 30. The hallmark feature of these MMR-defective cancers is a genome-wide increase in spontaneous mutation rate 31, 32, 33, 34. Microsatellite instability in these cancers reflects the inability of MMR to correct post-replicative errors throughout the genome. As a result, polymerase slippage at the repeating unit can give rise to small insertions and deletions 35, 36. In both yeast and bacteria, frameshift errors at repetitive sequences increase by 10-100 fold in MMR-defective cells relative to controls 36, 37. Analysis of the magnitude and direction of instability at microsatellite loci in HNPCC has revealed that short sequences can undergo small insertions, but the majority of changes are deletions. For example, a set of 10 microsatellites was evaluated in 26 HNPCC patients 38. In these patients, instability occurred at TNR repeats within the normal Huntington's Disease (HD) and spinocerebellar ataxia (SCA1) loci, among others. However, 65% of the changes at the HD locus and 74% of the changes at the SCA1 locus were deletions 38. The magnitude of the changes in HD was, on the average, loss of 1-3 CAG repeats 38, with the most frequent change being a loss of 1 CAG repeat. Thus, in HNPCC, TNR repeats behave as typical microsatellites when post-replicative repair is defective, and do not undergo expansion as observed in human neurodegenerative diseases such as HD.

Trinucleotide instability and neurodegenerative diseases

TNR expansion also depends on the MMR system but in ways we do not yet fully understand 39, 40, 41, 42. At least four lines of evidence suggest that mechanisms of expansion in TNR diseases are different from those of HNPCC. First, HNPCC is characterized by a mutator phenotype with genome-wide increase in mutations, yet the TNR expansion is limited to a single disease locus 37 (Figure 1). The limitation to the single disease locus suggests that mutations in MMR repair proteins are unlikely to be the underlying cause for TNR expansion. Second, the insertions and deletions associated with microsatellite instability in HNPCC are small, and the size of the change does not vary extensively regardless of whether they occur within short or long repeats 33, 37. In contrast, in the absence of an MMR defect, the length of the TNR tract determines the probability of a deletion or an expansion event 43, 44, 45, 46. TNR repeats associated with neurodegenerative diseases must be above a TNR length threshold (for most TNR diseases it lies in the range of 32-42 repeats, ref. 11) before there is any probability of expanding in number 11, 43, 44, 45, 46. Short TNRs (below threshold) are stable. For intermediate TNR alleles, approaching the threshold, both small deletions and expansions occur at similar rate upon transmission (roughly the same size and frequency as is observed in HNPCC) 47, 48. However, once a TNR tract exceeds a threshold, the TNR tract becomes highly prone to expansion, which occurs in approximately 80-90% of the cases 44, 45, 46, 47, 48, 49, 50. In contrast to the instability in HNPCC, expansion of TNRs within neurodegenerative disease loci occurs in the context of overall genome stability, and the magnitude of TNR expansion in parent-to-child transmission increases with the length of the repeat tract 43, 44, 45, 46.

Figure 1
figure 1

Models of repeat instability for HNPCC and Huntington's disease. During replication, strand slippage can occur. In HNPCC patients the mismatch repair enzymes are defective (mmr) and unable to carry out repair (perpendicular line). Defective mismatch repair increases instability at repeats due to slippage and formation of extrahelical loops. The inability to repair results in genome-wide instability (top). In HD patients, repair enzymes are intact but unable to efficiently recognize and/or repair alternative DNA structures (shown is a DNA hairpin). The repair block occurs at the level of the DNA (bottom), and TNR expansion is primarily limited to the disease allele (single site mutation).

Third, tissue-specific instability occurs during human 51, 52, 53, 54, 55 or animal 54, 56 development in TNR disorders. Somatic variation in repeat tract length is observed in the brains of individuals affected by Fragile-X 55, HD 52, myotonic dystrophy (DM) 57, 58 and spino-bulbar muscular atrophy (SBMA) 59. Somatic changes at TNR are also found in different tissues among mouse models for these diseases 40, 41, 42, 56. Fourth, in HNPCC, microsatellite instability occurs at a wide range of di-, tri-, and tetra-nucleotide repeats 33, 37, yet TNR expansion is somewhat specific for structure-forming sequences 60, 61. In contrast to HNPCC, DNA repair proteins in TNR patients are normal. Current models suggest that secondary structures such as hairpins, cruciforms and triplexes form at specific sites within the disease loci 60, 61, 62, 63. Stable secondary structures serve as looped precursors for expansion, which occurs after processing and incorporation of the extrahelical DNA into the genome 11, 64, 65, 66. The mechanisms by which loops form within the TNR tracts in DNA, and how they are processed into the eventual expansion, are not fully understood; and several models have been proposed.

General mechanisms for TNR expansion

Models for expansion have been recently reviewed 12, 66. In general, the models can be divided into two basic classes, one of which is replication-dependent (Figure 2, ref. 66), and the other one is repair-dependent 66. It has been a matter of debate as to whether all of these mechanisms can act as independent pathways for expansion, or whether there is a single mechanism. This has been a difficult question to answer since support for various TNR expansion models has arisen from different systems and different cell types, whose properties of replication rates, transcription, and chromatin organization are unlikely to be the same. Studies in bacteria, yeast, mammalian cells, and mouse models have all contributed to the current understanding of TNR expansion as it is found in human diseases.

Figure 2
figure 2

TNR instability caused by polymerase slippage. DNA polymerase strand slippage has been proposed as the primary mechanism for instability of TNR. During replication, the TNR units can misalign resulting in an extrahelical DNA loop that increases TNR length if it occurs on the daughter strand (expansion), and decreases TNR length if it occurs on the template strand (deletion). Loop / hairpin structures when not properly repaired (no repair) (reference 43) are incorporated into the nascent strand (expansion, bottom) or skipped (deletion, top). Pol is DNA polymerase.

Comparative analysis of models for triplet expansion

Bacteria and yeast

It is generally accepted that changes in the length of microsatellites occur, on the evolutionary time scale, by a process of polymerase slippage. In this model, microsatellites misalign during replication, resulting in extrahelical DNA loops. Subsequent integration and/or improper resolution of the tract result in gain or loss of repeat units within the duplex DNA. An increase in microsatellite length occurs if slippage is on the daughter strand, and a decrease in microsatellite length occurs if slippage is on the template strand 67, 68, 69, 70 (Figure 2). Thus, the earliest model proposed for TNR expansion was the “replication slippage” mechanism.

Indeed, instability at TNRs could be modeled in both bacteria and yeast harboring plasmids with TNR repeats 67, 68, 69, 70, 71, 72. These models are valuable for testing how microsatellites might change during mitosis. Results from multiple laboratories indicate that TNR instability in these systems depends on the sequence, the initial length of the repeat tract, and, strikingly, on its orientation relative to the replication origin 67, 68, 69, 70. The mutation rate for TNR sequences capable of forming secondary structures is consistently high, while TNR sequences that do not form secondary structures display mutation rates equivalent to background 69. The high rate of instability at TNRs suggested that secondary structures might facilitate slippage at TNRs 12, 62, 65, 66. In contrast to the expansion in human neurodegenerative diseases, however, deletion of TNR repeats in bacteria and in yeast occurs roughly 10 times more frequently than insertion events 70, 71, 72, a profile more similar to HNPCC than to TNR expansion diseases. If interpreted in a replication slippage model, the strong deletion bias would suggest that looped intermediates form more frequently in the template relative to the daughter strand. The basis for the deletion bias is not presently understood, although proofreading activity of DNA polymerase is likely to be involved 73.

Another poorly understood feature of TNR instability observed in model organisms is orientation dependence. CAG/CTG repeats, for example, are unstable when CTG is the lagging-strand template, but the same sequence is relatively stable with CAG as the lagging-strand template 67, 68, 69, 70. A similar phenomenon has also been observed for CGG/CCG repeats 74, 75, 76 and GAA/TCC repeats 77, 78. Thus, the frequency of TNR instability and the extent of deletion vary depending on the direction of the replication fork progression, despite the fact that the same sequence is being replicated. A simple polymerase slippage model does not predict these outcomes.

The absence of single strand binding protein (SSB) increases the rate of instability in bacteria, consistent with the involvement of the lagging strand 79. CTG hairpins are somewhat more stable than CAG hairpins in that orientation 80, 81. Thus, one model for the orientation-dependence of instability is that transient existence of single stranded DNA during lagging strand synthesis might allow a window of opportunity for hairpins to form. The differential thermodynamic stability of CTG and CAG hairpins, in this model, accounts for the differential rate of hairpin “trapping”, or imparts differential repair of CTG relative to CAG hairpins 82. However, several pieces of evidence are inconsistent with such a model. First, in order for CTG hairpins to differentially form on the lagging strand, the rate of CTG hairpin formation would need to be sufficiently faster than that of CAG and would need to be insensitive to the presence of SSB. On single strand DNA, however, both CTG and CAG repeats form hairpins spontaneously, and re-anneal under pseudo-first order kinetics at equal rates 81.

Second, a recent in vitro study revealed that both the efficiency and the fidelity of hairpin processing depended critically on the structure of the DNA substrates [nick location and the slip-out composition — CAG versus CTG] 82. If hairpins are captured on the template strand during lagging strand synthesis, then the CAG and CTG hairpins would reside opposite of a 3′ nick. Yet, both CAG and CTG hairpins are repaired poorly under these conditions 82, inconsistent with the extensive deletion bias observed when CTG is the lagging strand template in vivo.

Third, the thermodynamics of CTG hairpin formation would apply only when it resides on the lagging strand template and not when it is on the daughter strand. However, end fraying and hairpin formation at the free end of the Okazaki fragment have been proposed as another model for expansion 83. A final issue is the fact that TNR instability can also occur on the leading strand during rolling circle replication 84. Thus, there is no clear model for how hairpins might form in an orientation-dependent manner. Most models agree, however, that replication plays a causative role in generating small insertions and deletions as mutations in polδ, polα, Rad27 and PCNA were all shown to increase TNR instability in yeast 70, 85. Overall, the inability of simple dividing organisms to reproduce the larger expansions observed in human diseases suggested that, if expansion occurred during replication, it must be influenced by additional parameters.

Those parameters are poorly understood. Expansion similar to that observed in human diseases does not appear to arise as a result of transcription in simple organisms. Experimental induction of transcription in replicating plasmids (as well as in mammalian cells) can increase the degree of instability, but the resulting changes are primarily deletions 86, 87, 88. Further, the increase of instability in some reports occurred only when the bacteria passed through stationary phases of cell growth 87. Emerging evidence suggests that DNA replication blocks, and their resolution through DNA repair pathways, play roles 66.

During primer extension reactions in vitro, pausing is observed at TNRs and other microsatellites 89, 90. TNRs appear to underlie the impediment, since the pausing is independent of the polymerase. Length- and sequence-dependent repeat instability is observed using a number of polymerases including Klenow fragment of Escherichia coli DNA polymerase I, bacteriophage T7 DNA polymerase or the human DNA polymerase β 89, 90, 91, 92, 93. In vivo, dramatic stalling at TNRs is observed in yeast during replication fork progression 94, 95, 96. As visualized by two-dimensional gel analysis of replication intermediates, CAG/CTG, CGG/CCG, and GAA/TTC repeats cause arrest of the replication fork in a length-dependent manner 94, 95, 96. These results suggested that TNR expansion might occur through mechanisms needed to re-start replication 12. Consistent with that idea, loss of unfolding proteins 97, 98 or of helicases such as Werner 99, 100, Bloom 101 and Srs2 102, all of which facilitate fork progression through difficult DNA sequences, increases the level of instability. In E. coli, TNR deletions can also occur after the collapse of the replication fork in an attempt to re-start replication 103. For example, mutations in recA and recB, recombination proteins needed to resolve replication fork collapse, had a stabilizing effect on (CAG)•(CTG) repeats 104. Analysis in yeast indicates that long TNRs are “fragile sites”, prone to breakage during replication 105. Consistent with this idea, a number of studies conducted in bacteria and yeast also provide evidence that gene conversion and recombination, as well as excision repair, frequently result in small deletions and insertions of TNR repeats 104, 105, 106, 107, 108, 109, 110, 111.

Although repeat-length changes in bacteria and yeast systems are primarily small, larger TNR expansions are not absent. However, they occur at low frequency and selection systems are required to observe them. One selection method, using 5-fluoroorotic acid, has been particularly informative 112. Upon selection in yeast, the frequency of large expansions from an existing length of 25 for CAG/CTG 69, CGG/CCG 95, or GAA/TTC 113 repeats is length- and sequence- dependent as observed in human diseases. Moreover, the sizes of the expansion are more consistent with those of human diseases, with gains of around 10-60 repeats, and 20 being the most frequent 113. Thus, data generated using these selection systems have been informative in identifying key factors affecting large TNR expansions.

For CGG repeats, a mutation in the replication factor C complex increased the expansion rate by 50-fold, suggesting an important role for DNA replication polymerases in the expansion mechanisms 95. Consistent with this, larger expansions were also observed in the absence of FEN-1/RAD27 in yeast 113. FEN1/RAD27 is responsible for removal of the flap formed in Okazaki fragments during replication and during repair-dependent synthesis 114, 115. Several laboratories have demonstrated that FEN-1 is unable to efficiently process stable secondary structures 113, 114, presumably because the 5′ end is not available for FEN-1 loading. Loss of FEN-1/RAD27 in yeast increased the rate of expansion in two distinct steps by 1 increasing the likelihood of flap formation, and 2 inhibiting flap processing, thereby, increasing flap half-life 113. Contraction rates for CTG and CAG tracts were measured using a simple variation of the selection assay 112. The rate of contractions by this assay in the rad27Δ strain was nearly identical for CTG and CAG, at 4.2 ± 1.2 × 10−3 and 5.0 ± 0.4 × 10−3 per cell generation, respectively 113. Recently, the fate of CAG/CTG repeats was tested in yeast harboring a RAD27 mutant deficient in its endonuclease activity 115. The TNRs were unstable in these cells. The inability to cleave flaps resulted in a flap equilibration in which various intermediates were formed by annealing to the adjacent primer. Thus, cleavage by Rad27 is needed to prevent expansion 115. While TNR expansion is clearly connected to problems arising from replication, all of the examined proteins are also involved in repair of DNA strand breaks. Therefore, whether infrequent expansions detected in various experimental settings arise by DNA replication per se or from repair-dependent synthesis remains an unresolved issue.

Overall, replication and repair problems in model organisms enhance the frequency of TNR instability in the form of small deletions and insertions. Simple models have brought to light the importance of replication stalling and replication fork collapse in TNR instability. However, in general, bacteria and yeast model systems have not yielded the expected insight into the TNR expansion process as might apply to human diseases.

Mammalian systems

Translation of results from simple model organisms into mammalian systems has been challenging in part due to the variable nature of the mammalian models used. The length of the TNR tract, the replication rate, and the tissue type are among the variables.

Expansion in cultured mammalian cells Mammalian cell models share the property of cell division with yeast and bacteria but at slower rates. While the fate of TNRs in bacteria and yeast overall displays similar trends, different results have been shown for the fate of TNRs in cultured cells obtained from affected individuals. In a number of studies, the endogenous disease-length alleles within HD (CAG repeats) 116, SBMA [CAG repeats] 117 and Fragile X (CGG repeats) 118 loci display little if any instability in cultured cells obtained from patients.

In cultured embryonic fibroblasts from R6/1 mice, CAG repeats in the human HD transgene remained stable in the absence of DNA repair enzymes such as Msh2 and Ogg1 119. New genetic assays have been developed using shuttle vectors containing the promoter-TNR-reporter gene sequences 120. The vector harbors the SV40 origin and the large T antigen gene allowing portability between primate cell lines 120. When propagated in cultured cells, CAG of 25-33 repeats contract at frequencies as high as 1% in both 293T human cells and in COS-1 monkey cells 120. Plasmids may replicate faster than the cells themselves, so these data indicate that the rate of replication can play a role in the resulting instability. Interestingly, plasmids harboring TNRs delete their repeats during replication in mammalian cells, similar to that in bacteria and yeast. Deletions are also observed during early and rapid cell division in single cells from the 8-cell embryo of R6/1 animals 116, yet no alteration occurred in these animals later in development after cells became terminally differentiated.

Taken together these data suggest that the rate of replication is a factor in TNR stability in mammalian cells. The fate of the repeats also depends on the length of the initial repeat tract, which has a large impact on the direction of change 121, 122. For example, in primary fibroblasts derived from a fetus with DM1, CTG tracts of 216 repeats expanded to 338–386 repeats with mutation frequencies approaching 100% 121. Expansion depended on replication. Both inhibiting replication initiation with mimosine and inhibiting leading- and lagging-strand synthesis with aphidicolin significantly enhanced CTG expansions at the disease allele but not at the short, normal allele 121. Similarly, CAG repeats in fibroblasts isolated from R6/2 animals harboring multiple copies of a human HD transgene showed significant expansions. After approximately 600 cell doublings, the major CAG peak increased in length from the initial 155 to approximately 170 triplets 123. In all these cases 121, 122, 123, the magnitude of expansion depended on the initial length of the repetitive tract.

Although there are some differences, the fate of repeats in proliferating mammalian cells displays some of the same trends as observed in dividing bacteria and yeast. Overall, data suggest that repeat length and replication rate predict the degree and direction (deletion vs. expansion) of instability. In reality these effects in mammalian cells are more complicated, as they appear to also depend on the cell type. For example, embryonic fibroblasts and lung cells from DM transgenic mice stably maintained their CTG repeats in culture, while repeats from kidney cells were unstable 124. Thus, locus-specific differences, differences in replication rates and/or tissue-specific factors may be important components affecting the expansion process in mammalian cells 124.

Expansion in vivo All studies in mammalian cell models and in simple organisms are consistent with the notion that replication is a critical component of the expansion mechanism. Whether expansion in a human disease arises from mitotic replication per se or from repair-dependent replication or both is unresolved. For example, long repeats have tendencies to break during cell division, as was shown in yeast 105, therefore, the two processes, replication and repair, are difficult to separate. In addition, functional alterations in replication and repair proteins in mammals have not been easy to evaluate. In mice, deletion of key replication and/or repair proteins relevant to TNR expansion is often embryonic lethal. For example, mice knockout for polymerase β 125 and for FEN-1 126 are not viable. Furthermore, in vivo tools to examine the relative importance of replication on expansion in animals are also limited due to the fact that cell proliferation in most adult tissues has ceased. Despite all these issues, mouse models have proven to be extremely valuable in validating proposed mechanisms for expansion.

The effect of FEN-1 haploinsufficiency on TNR expansion has been examined in heterozygous animals (Table 1). While there was no visible effect on age-dependent expansion at the HD locus in somatic tissues of HD/Fen-1(+/−) mice as compared to wild type littermates, a decrease in deletions and an increase in expansions were observed in male offspring 127 (Table 1). Similarly, no alterations in CTG tract size were observed in somatic cells from DM/Fen-1(+/−) animals 128. Repeat profiles between wild type, Fen1(+/−) and Fen1(−/−) early embryos have also been compared in the latter study. No difference was found, leading to a conclusion that FEN-1 is not essential for maintaining the stability of TNR in early embryonic divisions 128. FEN-1 is an essential replication protein, but it is also needed for gap filling synthesis during repair of DNA strand breaks and recombination. The lack of effects on expansion in Fen1(−/−) embryos is not easy to interpret. The results may indicate that there is embryonic selection for stable alleles. Another possibility is that during rapid cell divisions TNRs tend to undergo deletions, counteracting the expansion events that may occur.

Table 1 Effect of DNA repair enzymes on expansion in vivo

Germ cells as a model to assess expansion mechanisms Germ cells from transgenic animals have been a valuable model to further assess the importance of replication in causing expansion in mammals. The pool of germ cells in the adult male includes spermatogonia (SG), which divide, spermatocytes (SC), which undergo meiotic recombination, and spermatids (ST), which differentiate (and allow repair without replication) to generate mature sperms (Figure 3A). The degree of expansion at each stage of the germ cell development has been informative.

Figure 3
figure 3

Expansion and strand breaks in developing germ cells in HD. (A) A schematic diagram of spermatogenesis. Each developmental stage is shown; SG, spermatogonia; SC, spermatocytes; ST, spermatids; SZ, spermatozoons. SG undergo mitotic division and produce primary SC (1°) which after first meiotic division produce secondary SC (2°). In the second meiotic round ST are generated. They undergo terminal differentiation producing SZ. (B) Comet assay for single strand breaks in sorted germ cells of HD transgenic mice. The % of cells with breaks is shown. Expansion is detected only in ST in HD transgenic mice 129 but is observed in SG, SC and ST in HD patients 130.

In a mouse model for HD, expansion of CAG is primarily detected in haploid ST 129. Detection at this post-meiotic stage of development would indicate that expansion does not require replication. However, different results were obtained in germ cells of HD patients. In laser capture micro-dissected cells, expansion of the CAG tract was observed in premeiotic cells (which presumably included SG and SC) as well as in ST 130. The reason for the discrepancy between findings in the mouse model and in human samples is not yet clear. However, differences in germ cell development and in lifespan may contribute 131. In human spermatogonial stem cells, the lifetime number of cell divisions is estimated to be approximately an order of magnitude higher than in the mouse. There may be fewer opportunities for mouse premeiotic cells to accumulate expansion mutations 130. If true, this would predict that expansions in the sperm of affected individuals would increase with age. However, in a recent study of HD patients, no correlation was found between the CAG repeat-length variation in sperm and the age of the HD subjects at the time of sperm donation 131. Neither parental age nor birth order was shown to have a significant effect on inherited repeat-length changes in this group of HD patients. Expansions observed in post-mitotic germ cells must occur by repair-dependent mechanisms. The mechanisms at play in dividing cells may be polymerase slippage or replication stalling and re-start. Slippage, even in mammalian cells, appears to most frequently generate deletions. Thus, break-dependent repair synthesis, as has been observed in yeast, is an alternative process that could generate expansions 12, 94, 95, 105. Developing mouse germ cells are known to accumulate strand beaks, which can be observed by the comet assay (Figure 3B). Presently, differences between mouse and human germ cell models in HD remain unresolved.

In contrast to HD, CTG repeats in DM mice expand in both dividing SG and terminally differentiated ST 132. The repeat tract continues to increase in length with age, indicating that expansions are continuously produced during cell proliferations throughout life. In humans, expansions at the DM locus have been observed in dividing cells of early embryo 133. Why HD alleles undergo deletion and DM alleles undergo expansion in dividing cells is not known. However, a key difference between TNRs at the HD and DM loci is the length of the tract. The CTG repeat at the DM locus resides in a non-transcribed region of the gene and can grow to thousands of units 57, 58. As discussed above, in simple organisms, long repeats cause replication stalling, often resulting in strand breaks 12, 94, 95, 105. Thus, break-dependent replication may account for the presence of CTG expansions in dividing cells. In the HD gene, on the other hand, the CAG tract is within the protein-coding region and rarely exceeds 130 units 45, 46, 47, 48. Presumably, this is due to selective pressure. The HD protein is essential for viability in mammals. If HD alleles were too long, the resulting defective gene product would diminish cell survival. At the HD locus, the shorter CAG tract may not pose severe blocks for the polymerase and may tend to break less often. The existence of locus-specific factors that influence HD and DM cannot be excluded, and they are being explored actively.

In vivo mouse models and somatic expansion Mouse models have also shed light on the mechanism of somatic repeat-length changes, which are now believed to modulate the severity and onset of the diseases 53, 54, 134. Somatic expansion occurs at an inherited, expanded TNR. Alterations in repeat length have been observed in affected areas in HD and DM patients 52, 53, 57 and in the tissues of aging transgenic mouse models for HD 41, 54, 56, 129 and DM 40, 135, 136. We and others have shown that the inherited repeat tracts in HD transgenic mice are stably maintained from birth until 4 months but begin to expand in non-dividing brain cells at midlife 56, 129. Expanded CAG tracts continue to increase in length as these animals age 129, thereby serving as templates for synthesis of increasingly toxic HD proteins in the brain and other somatic tissues. Thus, in addition to the inherited expansion, somatic changes in repeat tracts in the brain may contribute to the disease by modulating its severity and onset.

The factors that cause somatic instability have been the subject of intense research. The importance of strand breakage in causing expansion in simple organisms prompted evaluation of whether TNR expansion was affected by factors involved in DNA repair in mice (Table 1). DM animals were crossed with mouse knockouts for key DNA repair enzymes. Knockouts of Rad52, Rad54, and Ku had no effect on expansion of the CTG repeat in DM mice 42. In contrast to simple model organisms, TNR expansion in mammals does not appear to require enzymes generally needed for repair of double strand breaks (DSB) 42. Recent evidence has revealed that somatic expansion in mammals more likely occurs by a base excision repair mechanism 119 (Figure 4). Age-dependent changes at the human HD transgene locus occur concomitantly with the accumulation of oxidative DNA damage. Importantly, loss of 7,8-dihydro-8-oxo-guanine-DNA glycosylase (Ogg1), a DNA glycosylase responsible for removal of oxidized guanines, suppresses TNR expansion in HD mice (Table 1). Deletion of other DNA glycosylases, however, does not suppress expansion (Table 1). Thus, age-dependent somatic expansion associated with HD occurs in the process of removing oxidized base lesions. TNR expansion in both germ cells and somatic cells of HD 39, 41, 43, 129 and DM 40, 42 transgenic mice requires the MMR proteins Msh2 and Msh3, but not Msh6. Thus it is possible that TNR expansion depends on the cooperation of base excision repair and MMR pathways through interactions of OGG1 and the Msh2/Msh3 heterodimer during the removal of oxidative DNA damage (Figure 4).

Figure 4
figure 4

Base excision repair model for age-dependent somatic TNR expansion in HD. During aging, endogenous oxidative damage arising from mitochondrial respiration creates oxidative DNA lesions. Oxidative lesions, such as 8-oxo-G, tend to accumulate within CAG tract with age (G=O). Under conditions of normal BER, OGG1/APE cleavage produces a single strand break, which facilitates hairpin formation and allows strand displacement during gap-filling synthesis. The lifetime of the hairpin is sufficiently prolonged by MSH2/MSH3 binding to allow ligation of the hairpin loop. Green bars are CAG or CTG, as indicated; red bars are the repeats added in the expansion event.

Taken together, the data from simple models to man suggest that both replication and repair processes are likely to contribute to TNR expansion. Future studies will refine the mechanistic models.

Genomic and chromatin factors governing microsatellite instability

Common cis-acting factors

In the simple, rapidly dividing organisms, bacteria and yeast, cis-elements influencing TNR fate have been limited to the sequence of the repeat, its length and the presence of interruptions 67, 68, 69, 70, 137.

Data on TNR instability from human studies and various transgenic and knock-in mouse models suggest that sequences immediately surrounding the repeats as well as overall chromatin context might also be important in determining TNR stability. Analysis of flanking elements for a number of different repeats has revealed that the most expandable loci are those located within CpG islands 138. Indeed, the methylation status of CpG can alter the stability of the CGG repeats at the Fragile X loci 138, 139, 140. Treatment of mammalian cells in vitro with methyltransferase inhibitors leads to the loss of the methyl group from 5-Me-cytosine, causing subsequent destabilization of the CTG/CAG repeats at the DM locus 141. DNA methylation is typically associated with tight chromatin packaging and gene silencing. Analysis of the DM locus reveals that the CTG repeats form a functional component of an insulator element. Methylation of this locus occurs in congenital forms of DM, prevents the binding of CTCF, and disrupts the insulator function 142. These remarkable findings suggest that chromatin context has an impact on the stability of TNRs.

Analogous to bacteria and yeast, the location of DNA replication origin relative to the repeat stretch can influence the stability and the direction of change (deletion vs. expansion; ref. 143) for human TNR disease loci. While several disease loci have been examined, including spinal cerebellar ataxia type 7, HD, SBMA, FMR1 and FMR2 144, 145, 146, the effects of the position of a replication origin have not yet been well characterized, The HD and SCA-7 repeats served as the lagging strand template. On the other hand, origin firing at the SBMA locus occurred on either side of the repeat 144. The replication origin for the FMR2 locus maps to the promoter region of the gene 145. In addition, the FMR2 replication origin coincided with CpG islands and tended to fire late in S phase 145. Precisely how the position of a replication origin determines TNR instability in the mammalian genome in vivo remains unclear. Mammalian origins of replication have yet to be fully mapped. Nevertheless, TNR instability is thought to be influenced by many factors including chromatin organization 142, DNA methylation 138, 139, 140, 141, and transcription etc.

Expansion in human diseases and mouse models: similarities and differences

Among model organisms, mice have so far been shown to best recapitulate the expansion in human TNR diseases. Transgenic or knock-in mouse models for TNR diseases have consistently demonstrated a trend towards expansion, both somatically and inter-generationally 40, 41, 54, 56, 129, 135, 136. Although they show an expansion bias similar to that in humans, TNRs in the mouse genome appear to have a higher threshold for instability. For example, only when a repeat stretch inserted into the mouse genome was very long, has expansion been observed 40, 41, 54, 56, 128, 135, 136. While moderately-expanded TNRs are known to be highly unstable in humans and to expand further in successive generations, they are stably transmitted in mice 147, 148, 149, 150, 151. Furthermore, in contrast to humans, long TNRs inserted in the mouse genome rarely show big leaps in repeat number over a single transmission. These differences suggest that genomic context and/or chromatin organization might play a significant role in determining the stability of microsatellites at a specific locus.

The transgenic and knock-in mouse models suggest that the site of transgene integration as well as the amount of human genomic sequence flanking the repeat affects the level of instability. For example, TNR within a cDNA inserted into the mouse genome often exhibited no instability despite a relatively high number of repeat units 148, 149. In contrast, transgenic mice that had large pieces of human genomic DNA (kilobases) as the repeat context showed more instability 40, 41, 54, 56. Additionally, mice that had long repeats inserted into the endogenous gene (knock-in) or into a transgene, tended to show more instability 40, 41, 54, 56, 152. None, of the models however recapitulated a threshold length for instability comparable to that observed in humans suggesting the existence of regulatory factors that differ between mice and humans.

Fragility and cell cycle control

Very little is known about the influence of chromatin organization and/or chromatin remodeling on microsatellite instability. Initial experiments in yeast have provided evidence that CAG/CTG and CGG/CCG tracts are prone to break in vivo 105, 108, 153, 154. It has been hypothesized that stretches of these repeats might represent fragile sites 105, 153, 154, and that they are more susceptible to breakage and DNA damage 155. Fragile sites have been described in yeast 156, 157, 158, 159 and mammalian cells 160. Common fragile sites in the mammalian genome are generally defined as loci that exhibit gaps and breaks under conditions of replicative stress 155, 160, 161. Perturbations in DNA replication often result in DSB at the fragile sites 161, 162 and eventually may lead to gross chromosomal rearrangements, such as translocations, deletions, and inversions 162, 163. Rare fragile sites, on the other hand, are thought to arise from expanded di- or trinucleotide repeats 164 which can break in vivo in the absence of replicative stress 105, 108, 153, 154. It has been suggested that DNA lesions may initiate the strand breakage 153, 154.

The mechanism underlying fragility in general is not well understood. Extensive studies in yeast have revealed properties that fragile sites may share. First, some were demonstrated to include repetitive sequences such as palindromes, inverted repeats, and Ty telomeric elements that are capable of forming structures 156, 165, 166, 167. Second, yeast fragile sites are situated on chromosomes in the areas of slow-moving DNA replication forks and origins of replication which fire late 161, 168, 169 and often lead to DSB 161, 165.

Cell cycle control is one of the cellular pathways designed to prevent chromosomal instability. Checkpoints function during the cell cycle to ensure the correct transmission of genetic material. Checkpoint proteins interact with the DNA replication machinery and respond to various threats to DNA, including damage, replication fork blocks and formation of aberrant DNA structures. Checkpoints can arrest the cell cycle and activate appropriate cellular responses 159, 170, 171. Consistently, defects in checkpoint or cell cycle progression proteins increase chromosomal instability and re-arrangements 159, 169, 170, 171, 172, 173. The role of checkpoint proteins in TNR instability has been examined in yeast. Lahiri et al. have found that deletion of Mec1 (homologue of human ATR) and Rad53 (homologue of human Chk2) resulted in increased fragility of expanded CAG repeats 173. Significantly higher rates of deletions have been observed in these mutant strains as compared to wild type cells. The authors hypothesized that repair of DNA damage-stalled replication forks and of breaks and gaps at long CAG repeats in checkpoint-deficient strains occurred with less fidelity, and thus, led to deletions.

There are fewer results available on checkpoint regulation and fragile sites in mammalian systems. There is, however, evidence suggesting that mammalian fragile sites possess some of the properties that have been described in yeast — i.e., slower replication fork progression and susceptibility to DSB 161, 165, 165, 168. The differences between the mammalian and yeast fragile sites include the larger sizes (can be greater than 100 kb) and the more complex composition for the former, as they do not consist of a single type of repeat 160, 166. Whether fragility is sequence-specific or is dictated by chromatin organization remains unclear. Emerging data on the involvement of the DNA repair machinery 39, 40, 41, 42, 119, 132 in expansion suggest that chromatin environment may determine the susceptibility of a particular repetitive locus to DNA damage and the consequent instability that follows repair.

Conclusion

It is clear that many biological transactions determine whether TNR expand or contract. However, data from simple organisms to man have revealed several features that are common among all systems. First, expansion depends on both replication and repair. Second, in the context of mitosis most TNR appear to contract rather than expand. There is a strong deletion bias at TNR tracts in simple organisms and in proliferating mammalian cells. Third, expansion appears to be associated with long TNR tracts at both the DM and HD loci. Fourth, long alleles are subject to breakage. Taken together, the facts implicate repeat length as a critical, and perhaps even a unifying factor in the expansion mechanism, which could tie together disparate findings as to the fate of TNR at different disease loci. Single and double strand breaks arising from replication stress may explain why expansion depends on both replication and repair proteins. Alternatively, replication may be required in the context of gap filling synthesis. Although the complex phenomenon of TNR expansion is not yet fully understood, studies in a number of in vivo systems are uncovering features of expansion mechanisms that can be further tested by the scientific community.