Introduction

Polymorphic Alu markers often show patterns of diversity whereby their highest frequencies are found outside Africa (Watkins et al. 2003). This observation raises questions pertaining to the origin of Alu diversity in worldwide populations, though it could simply be due in part to ascertainment bias or lineage sorting of an ancestral polymorphism (Watkins et al. 2003). An Alu insertion, classified as type Sb and subsequently Yb (Batzer et al. 1996), and hence of intermediate age (Zietkiewicz et al. 1994), is found 233,908 bp downstream from the informative dys44 haplotype (Zietkiewicz et al. 2003) in intron 44 on Xp21.3 (Blonden et al. 1994). In order to characterize the diversity of this genomic region, we searched for polymorphic sites both within and flanking the Alu insertion using SSCP (single strand conformational polymorphism, Orita et al. 1989) and DHPLC (denaturing high-pressure liquid chromatography), two mutation detection techniques often used to reveal mutations of interest in a candidate gene or region of DNA (Dobson-Stone et al. 2000; Gross et al. 1999). In addition to searching for polymorphic sites with DHPLC and SSCP, we also sequenced in full all Alu insertions.

Materials and methods

Eight samples from each of five regions of the world (Sub-Saharan Africa, East Asia, the Americas, the Middle East, and Europe) were taken to avoid ascertainment bias. Of the 40 samples, 12 were males, giving a total of 68 chromosomes. DHPLC analysis requires that hemizygous male DNA is mixed with another sample during PCR to ensure a mixture of heteroduplexes and homoduplexes. Hence, all male DNA was mixed with control DNA (a European male that was not included in the original 40 samples) in equal amounts. Seven pairs of primers were designed using the online Primer3 software (http://www-genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi) to amplify a 2,328-bp region around the 305 bp Alu insertion (NCBI accession number Z22650); PCR products ranged in size from 321 to 349 bp to allow satisfactory analysis by both SSCP and DHPLC (Table 1). PCR reactions contained: 1x buffer, 250 μM dNTPs, 2.2 mM/Mg2+, 0.6 μM primers, and 1.25 U of Taq polymerase (GibcoBRL). Thermocycler conditions consisted of 35 cycles of denaturation at 94°C for 30 s, annealing at 52°C for 30 s, and elongation at 72°C for 30 s, for all primer pairs.

Table 1 PCR primer pairs and product length, polymorphic sites, type of polymorphism, minor allele frequency, and the geographic distribution of heterozygotes (in counts) for each polymorphic site. SSCP single strand conformational polymorphism, DHPLC denaturing high-pressure liquid chromatography

SSCP (Orita et al. 1989; Zietkiewicz et al. 1997) and DHPLC were run in parallel with the two operators working blindly. SSCP was used following previously described procedures (Zietkiewicz et al. 1997) with a 6% polyacrylamide gel containing 10% glycerol (acrylamide-bisacrylamide ratio of 29:1) in TBE and run at room temperature for 16–21 h. For DHPLC (using a TransGenomic WAVE 3,500 DHPLC), a gradient of temperatures was run around the (manufacturer-provided) software-predicted optimal temperatures for three or four of the 40 PCR products for each fragment, and the two temperatures that showed the greatest split between heteroduplexes and homoduplexes (when present) were used for each sample-fragment set (to allow the resolution of mutations in different parts of the sequence). All differing SSCP and/or DHPLC patterns, indicating the presence of polymorphic sites, were identified for each run, and at least two individuals were sequenced from each group of individuals displaying a particular pattern; all Alu insertions (a total of 19 chromosomes) were sequenced manually as a matter of course. For each identified polymorphic site, the frequency of the minor allele among females samples (q f) was estimated assuming Hardy–Weinberg equilibrium by \(q_{{\text{f}}} = {\left( {1 - {\sqrt {1 - 2H} }} \right)}/2\), where H is the frequency of heterozygotes from the DHPLC data for the 28 female samples at each polymorphic site, while the frequency of the minor alleles among the hemizygous males (q m) was from allele counting. Overall minor allele frequency (q o) was therefore estimated as \(q_{{\text{o}}} = {\left( {q_{{\text{f}}} \times 0.56} \right)}{\left( {q_{{\text{m}}} \times 0.12} \right)}/68\), where 56, 12, and 68 are the numbers of female, male, and total number of chromosomes, respectively.

Results and discussion

The global distribution of the polymorphic Alu insertion among worldwide samples from Zietkiewicz et al. (2003) and Australian Aboriginal samples from Perna et al. (1992) is shown in Fig. 1. It is of interest to note that the Alu insertion reaches its highest frequency in Papua New Guinea but was not found at all in 59 unrelated Australian Aborigine chromosomes. Within Africa, the Alu insertion showed a high frequency in the M’Buti Pygmies but is nearly absent in the Biaka, while the presence of Alu in African Americans could be due to admixture with European or indigenous populations or inherited from their ancestral West African populations. Within Eurasia, the frequency of Alu is fairly consistent, while in the Americas, its frequency varies widely.

Fig. 1
figure 1

Worldwide Alu frequencies (sample sizes, as number of chromosomes, shown in parentheses). All samples are from Zietkiewicz et al. (2003), except the Australian Aborigines, which are from Perna et al. (1992). Afr Am African Americans, Aust Australian Aborigines (Perna et al.), Kari Karitiana, Mong Mongolian (all grouped together from Zietkiewicz et al.), NaD NaDene, Oji Ojibwa, PNG Papua New Guina (highland and coastal), We Afr West Africans (all grouped together from Zietkiewicz et al.)

A total of seven novel polymorphic sites were found in the 2,328-bp flanking region of the site of the Alu insertion; one triallelic single nucleotide polymorphism (SNP), four biallelic SNPs, and two single base insertions. Both SSCP and DHPLC identified the insertions and the triallelic SNP, though neither method convincingly identified it as triallelic (it was scored to be biallelic). Of the four biallelic SNPs, all were identified by DHPLC but only one by SSCP. Within the Alu insertion itself, one SNP was identified by SSCP but was missed by DHPLC; a second site was missed by both techniques and only uncovered by sequencing. The poor level of detection within the Alu insertion was likely caused by the increased fragment length due to the presence of the 305-bp insert (a total of 646 bp, with the flanking fragment Pr_3_Alu).

DHPLC appears to be limited by its ability to score a mutation on a polymorphic background (Dobson-Stone et al. 2000), though Gross et al. (1999) argued that they were able to call such sites confidently. Fragments Pr_1 and 2, which both have one SNP and one single base pair insertion polymorphism, displayed very strong DHPLC patterns for individuals with an insertion (Fig. 2a,b) such that they would overshadow the “shoulder” pattern that was present for individuals heterozygous at the SNP allele thus making the identification of the state of the SNP difficult. Furthermore, it was also not possible to detect or genotype SNPs when more than two alleles were present at the same site. Position −648 was found to have three possible alleles (g, t, or c), and while the heterozygote was identified by the presence of a shoulder on the DHPLC profile, there was no unequivocal and repeatable difference between a “t/g” heterozygote and a “c/g” heterozygote (Fig. 2a).

Fig. 2a,b
figure 2

DHPLC profiles highlighting the difficulty of using this approach for detecting mutants on an already polymorphic background. a fragment Pr_1 and b fragment Pr_2. For each profile, the known genotypes (after sequencing) are shown as bases for the single nucleotide polymorphism (SNP) and presence (+) or absence (−) of the single base pair insertion

Assuming Hardy–Weinberg equilibrium, we were able to use the combined female and male data to estimate the overall minor allele frequency (q o) for each polymorphic site. The highest values of q o were found for the t→g SNP at position −648 (allele frequency 0.33), and the g→a polymorphism at position −151 (0.25); these two polymorphisms were also found worldwide (i.e., in each of the five geographic regions. Although not shown in Table 1, the t→g SNP at position −648 was found in two European males). Three flanking region polymorphic sites appeared only once within the sample: the two t→c SNPs at positions −648 and +185 and the +t insertion at position −536. It is notable that all polymorphisms within the flanking region were found at least once within Africa. The two novel polymorphic sites found in the Alu insertion itself were both t→c SNPs, at positions 98 and 213, respectively, as measured from the beginning of the Alu insertion; the SNP at position 98 was found twice, both times in Chinese individuals, while the SNP at position 213 was found in an Ashkenazi Jew.

We believe that this is the first report of polymorphic SNPs within a polymorphic Alu distribution. This combination of a polymorphic Alu insertion and SNPs, both within the insertion and the flanking region, promises to provide an informative X-linked marker for the reconstruction of human population and evolutionary history and in particular the spread of Alu repeat elements throughout global populations.