Short tandem repeats (STRs) represent intron polymorphism among individuals that occur frequently in the human genome. They consist of tandemly arranged nucleotide repeat units. Based upon a unique combination of STR alleles, individuals can be unequivocally identified.1 The number of STR markers required for unique characterization of individuals can be predicted based on the allele frequencies in a population.

The most common approach to identify STR alleles includes the in vitro amplification of STR (microsatellite) loci by the polymerase chain reaction (PCR) using primers flanking the repeat region and subsequent fragment analysis to identify the allele sizes.2, 3, 4, 5

STR marker systems are frequently used for the identification of parentage, kinship and other forensic purposes, but are also applied in the monitoring of hematopoietic chimerism in patients after allogeneic stem cell transplantation. The suitability of these markers for chimerism analysis, however, depends on different factors, which needs to be considered before recommending an individual marker or a panel of markers for chimerism characterization. It is mandatory for the marker used that an informative for example recipient-specific allele differs in size from the donor alleles by at least one or two repeat units. To accurately describe the relation between donor and recipient alleles, the Eurochimerism Consortium (Appendix A) has developed a proposal for a nomenclature, which provides a uniform identification system of STR alleles applicable for chimerism analysis, which is presented in this issue.6 The size of a defined allele can show some degree of variation due to the equipment used for fragment separation, but the inclusion of known reference samples in the analysis can eliminate this problem.

Many STR loci consist of simple repeats, that is, the repeat units are constant throughout the repeat region. For example, the microsatellite loci D3S1768, D19S253 and D10S2325 have simple tetra – or pentanucleotide repeats. The motif structure and number of repeats in selected alleles of these STR loci are shown in Table 1. Although the majority of STR loci are assumed to have simple repeat units based upon their Genome DataBase (GDB) reference sequence, a considerable number of STR loci display variability within the repeats, which may affect the allele assignment. We have investigated the composition of the STRs by sequence analysis and concluded that the existence of such STR loci is a greatly underestimated phenomenon. The so-called compound STR markers consist of repeat motifs displaying uniform length, but variable sequences within the repeat units (e.g. D2S1360 and D12S1064, Table 1). Complex repeats vary in their motif composition (tri-, tetra- and pentanucleotide repeats) and may also contain sequence polymorphisms within individual repeat units, while certain motifs within the remaining repeat units are constant (e.g. P450CYP19, D8S1132, D9S1118 and D12S391, Table 1). The so-called hypervariable complex repeats (e.g. D17S1290, SE-33, D11S554, D7S1517 and MYCL1, Table 1) consist of variable and constant repeat structures. For example, the variability in MYCL1 is determined by the number of tetra- [GAAA] and penta- [GAAAA] nucleotide repeats interspersed between different constant repeat units (Figure 1). D11S554 shows an even higher degree of complexity represented by the constant units [AAAG]3, [AAAGG]1 and [AAAAA]1 combined with a variable number of [AG]n and [AAAG]n motifs and the presence or absence of a number of other variable repeat units. Interestingly, two alleles of this locus, both displaying a fragment length of 220 bp, are composed of a different combination of variable repeats (Figure 2). Similarly, the presence of homozygosity was identified by fragment length analysis in a patient sample analyzed by the marker D7S1517, which revealed a single 196 bp allele corresponding with 25 tetranucleotide repeat units (Figure 3a). Sequence analysis, however, showed heterozygosity by revealing the presence of two alleles, as indicated by different nucleotides at certain positions in the sample sequence (Figure 3b). Among six selected patient samples tested by fragment length analysis of the D7S1517 marker, two were identified as homozygous for a 173 bp fragment and four for a196 bp fragment. While the samples displaying the 173 bp fragment were also homozygous at the sequence level, only one of the samples showing the 196 bp fragment was confirmed to be homozygous by sequencing. In the remaining 196 bp fragments, which were demonstrated to be heterozygous by sequence analysis, as many as five different alleles were identified (Figure 4). The examples of hypervariable complex STR markers shown, including SE-33, D11S554, D7S1517, D17S1290 and MYCL1, contain a variety of repeat unit structures. Sequence analysis of numerous samples identified constant and variable units within the repeat sequences. In STR markers with simple repeat structure, the fragment length is indicative of the number of (e.g. tetranucleotide) repeat units, regardless of nucleotide sequence variability and alignment of the repeat blocks. Constant regions within repeats do not affect the allele assignment. Complex repeats, however, contain a variable number of di-, tri-, tetra-, penta-, octanucleotide and other repeat motifs, which make the repeat unit identification and the allele assignment difficult. Although size differences between alleles are useful for the identification of individual human specimens, the actual alleles can only be determined by the analysis of repeat unit polymorphisms using DNA sequencing.

Table 1 Motif structure and number of repeats of the STR loci
Figure 1
figure 1

Identification of sequence variation of repeat units in MYCL1.

Figure 2
figure 2

Identification of sequence variation of repeat units in D11S554.

Figure 3
figure 3

(a): Homozygosity for D7S1517- 196 bp allele by fragment analysis. (b): Sequence analysis of this fragment (reverse complement), arrows point to heterozygous ‘C/G’ positions in the sequence revealed by two different repeat structures within the sequence with the same allele length. Two repeat units were identified: GAAA and CAAA.

Figure 4
figure 4

Identification of sequence variation of repeat units in D7S1517.

Different fragment sizes of STR alleles reflect only a part of the variability among individuals but do not identify the gene polymorphism itself in all instances. Allele frequencies based upon fragment size analysis are useful for the prediction of informative constellations in forensic medicine and in the monitoring of chimeric status after stem cell transplantation. But, when the actual sequence-defined alleles are considered, statistics based on homozygosity frequencies determined by fragment analysis can be erroneous. Thus, in forensic analysis, the presence of homozygous STRs, with identical fragment sizes, should be interpreted with care. Despite the high sequence variability within repeat motifs, applicability of STR markers in chimerism analysis is not affected. Informative markers for the monitoring of chimerism after stem cell transplantation can be selected on the basis of allele sizes alone, regardless of the complexity of repeat motifs.

Moreover, as outlined in the manuscript describing the introduction of the RSD code by the Eurochimerism Consortium,6 the variability within STR repeat motifs has impact on the proposed nomenclature of microsatellite markers eligible for chimerism analysis.