Introduction

The human Y chromosome displays an extraordinary amount and variety of different classes of repetitive DNA when compared to other human chromosomes.1,2,3,4 This is assumed to have arisen as a result of the lack of homologous interchromosomal recombination over evolutionary time.5,6,7 However, intrachromosomal rearrangements during primate evolution, such as tandem duplications and inversions, may in addition have played an important role in shaping the molecular structure of the human Y chromosome.4,8,9,10 Tandem duplications are potential targets for homologous intrachromosomal recombination, and the occurrence of such rearrangements between Y-chromosomal repeats, which can result in sequence deletions, is thought to be involved in phenotypes such as male infertility.9 For example, it has been shown recently that the Y-chromosomal azoospermia factor c (AZFc) region consists of a complex of three palindromes constructed from nearly identical copies, which may have resulted from tandem duplication and inversion.9 Intrachromosomal exchange between Y-chromosomal copies derived from the HERV15 class of endogenous retroviruses has also been shown to cause deletions at the AZFa region responsible for male infertility.11,12 Although recombination between segmental duplications also occurs in autosomes and can cause genomic instability and disease,13,14 the Y chromosome is especially suitable for investigating intrachromosomal rearrangements, because of the absence of interchromosomal processes in the nonrecombining region (NRY).

Investigation of intrachromosomal recombination between tandem duplications on the Y chromosome is usually carried out by large-scale DNA sequence analysis; however, highly polymorphic DNA markers that occur within and are variable between the different intra-chromosomal copies should also be useful. A class of DNA markers potentially suitable for this application are microsatellites or short tandem repeat (STR) loci on the Y chromosome, since they display a high level of polymorphism and are widely used to characterize individual male lineages in forensics and molecular anthropology.15,16,17 Another prerequisite for their potential use as markers for studying recombination is that some of them occur in multiple copies on the Y chromosome. Of the less than 30 variable Y-STRs described to date, only four consist of two male-specific alleles, one of which is DYS385.15,18 The Y-STR system DYS385 shows one of the highest levels of polymorphism yet observed for Y-STRs, and hence it is often used in forensic15,19,20,21,22,23 and population history studies.17,24,25 It has been previously assumed that DYS385 exists in two polymorphic copies on the Y chromosome,15 although the lack of sequence information excluded separate analysis of the assumed two copies.18 Consequently, no individual locus assignment has been possible, and so the two alleles have been treated as a combined allele class, resulting in a loss of information and potential misinterpretation of true ancestry.

To investigate potential intrachromosomal recombination events on the human Y chromosome and to enhance the utility of DYS385 in forensic and population history studies, we analysed the two loci of DYS385 separately by making use of the accumulated sequence data for the human Y chromosome in publicly available databases. We observed in a German population sample (n=70) almost identical allele distributions at the two duplicated DYS385 loci, which could be interpreted as the result of a reciprocal genetic exchange, that is, by an intrachromosomal recombination mechanism. However, further studies of 261 male individuals from five populations, as well as Y chromosome haplotype analysis using SNP and additional STR markers, provide evidence that male population history, rather than a genetic exchange process, is the likely explanation for the observed phenomenon.

Material and methods

Human samples and Y-STR analysis

DNA samples from unrelated males from Germany (Berlin-Brandenburg and Leipzig areas; n=70), Poland (Wrozlaw area, n=47), Spain (Grenada area, n=49), Papua New Guinea (highland and coastal areas, n=48) and from Cameroon (Yaoundé area, n=47) were analysed for the DYS385a and DYS385b loci. For most of the samples, Y-STR data for the loci DYS19 (synonym DYS394), DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393 were already available from our previous studies,17,26,27 and for the remaining samples Y-STRs were analysed as described in detail on the website of the Y chromosome Haplotype Reference Database (YHRD) for US Populations (http://www.ystr.org/usa).23 Y-STR nomenclature is according to Kayser et al.15 except that DYS389II is considered without variation at DYS389I.

Separate DYS385a and DYS385b analysis

Sequence information for the two loci DYS385a and DYS385b was obtained from the clones RP11-569J3 (GenBank Accession no. AC022486) and RP11-143C1 (GenBank Accession no. AC007379). We defined the DYS385 repeat of RP11-143C1 as DYS385a and the corresponding repeat of RP11-569J3 as DYS385b. The alignment of 1.0 kb of the flanking part showed a variable sequence starting upstream to position 10930 (RP11-143C1) and downstream to position 127415 (RP11-569J3). We designed two new forward primers so that each locus could be amplified individually in a separate reaction: for DYS385a primer 5′-TGTTGTTGACTGTGGTAGGTA-3′ (sequence position 10875–10895 of RP11-143C1), and for DYS385b primer 5′-GGGAATGCAATTTCCCTTTAT3′ (reverse complement to sequence 127434–127453 of RP11-569J3). These two primers were designed from the single copy sequence between the two 190 kb copies. Analysis of the two DYS385 loci, DYS385a and DYS385b, was performed in a nested PCR approach: in the first reaction both loci were amplified separately using the two newly designed forward primers DYS385a or DYS385b, respectively, and the DYS385.2B primer reported elsewhere,18 generating a single product of about 700 bp in length for each Y-STR locus. The PCR reaction contained 1 × GeneAmp PCR buffer II (PE Biosystems), 3.0 mM MgCl2, 0.2 mM dNTPs (Pharmacia Biotech), 0.2 μ M each primer (Metabion), 2.5 U AmpliTaq DNA polymerase (PE Biosystems) and 20–50 ng genomic DNA. The PCR was performed as described elsewhere.18 In the second PCR, we used the published primers DYS385.1 (5′ labelled with TAMRA) and DYS385.2B18 to obtain a shorter fragment of about 250–280 bp in length, in order to be able to detect the Y-STR polymorphism in the number of 4 bp repeats. PCR conditions were identical with the first PCR (with the exception of the primers). A measure of 2 μl of 1:3–1:5 diluted fluorescent-labelled PCR products of DYS385a and DYS385b were separated on a 5% Long Ranger™ gel (FMC Bioproducts) using an ABI Prism™ 377 DNA Sequencer (Applied Biosystems, Foster City, CA, USA), and fragment length analysis was performed using the ABI Prism™Genescan software (Applied Biosystems). DNA sequencing of various length alleles of both loci was performed on ABI Prism™ 377 DNA Q2Sequencer (Applied Biosystems) using Big Dye chemistry (Applied Biosystems, Foster City, CA, USA) according to the recommendations of the manufacturer. For DYS385a, five different length alleles were sequenced (11, 12, 13, 16, 17), and for DYS385b eight alleles were sequenced (11, 12, 14–18, 20). For both loci, allele lengths as obtained from DNA sequencing analysis were 252 bp for allele 10 [(GAAA)10] and 284 bp for allele 18 [(GAAA)18], although alleles always appeared 6 bp shorter in the Genescan fragment length analysis.

Y-chromosomal SNP analysis

Three Y-chromosomal SNPs (SRY-1532, originally called SRY10831;28,29 92R7;30 and M931) were analysed in 70 German, 47 Polish and 49 Spanish individuals. Haplogroup 1 (hg1) is characterized by the derived states at SRY-1532 (G), 92R7 (T) and M9 (G); haplogroup 3 (hg3) is characterized by a reversion from the derived G allele to the ancestral A allele at SRY and the derived states at 92R7 and M9.32,33,34 SRY-1532 was genotyped using 0.4 μ M each of the primers 5′-CCTCTTGTATCTGACTTTTTCAC-3′ and 5′-CCACATAGGTGAACCTTGAAAAT-3′, 0.2 mM dNTPs (Pharmacia Biotech), 1 × GeneAmp PCR buffer II (PE Biosystems), 2.0 mM MgCl2 (PE Biosystems), and 1 U of AmpliTaq DNA polymerase (PE Biosystems) in a 25 μl reaction volume. Hot-start PCR (4 min at 95°C) was performed for 35 cycles at 94°C for 30 s 57°C for 30 s and 72°C for 45 s. SRY-1532 was typed via PCR–RFLP using the restriction endonuclease DraIII (New England Biolabs) with the G allele indicated by two fragments of 30 and 26 bp and the A allele by one (undigested) fragment of 56 bp. 92R7 was analysed according to Hurles et al35 and M9 as described elsewhere.36

Statistical and computational analyses

Sequence blast search was performed using BLASTN software (version 2.2.1) against the NCBI Homo sapiens genomic contig sequences database (http://www.ncbi.nlm.nih.gov/genome/seq/page.cgi?F=HsBlast.html&&ORG=Hs). LINE-1 and Alu elements were identified using the RepeatMasker software (http://ftp.genome.washington.edu/cgi-bin/RepeatMasker). Correlation analysis between allele frequency distributions was performed using the software package Statistica (Statsoft). A χ2-test for independence bet-ween allele frequency distributions and respective p-values from Monte Carlo simulations was performed using the pro-gram StatXact. The diversity of Y-STR haplotypes, according to Nei,37 was calculated using the software package Arlequin.38

Results and discussion

The DNA sequence of DYS385 was searched against the NCBI database of Homo sapiens genomic contig sequences. By large-scale alignment of the identified flanking sequence of DYS385, we detected an 190 kb fragment, which exists in two copies separated by 40 kb of unique sequence, mapping to Yq11.223. Sequence identity of the two copies, excluding repetitive sequence variation, was estimated to be 99.97%, which is identical to that estimated for the different copies of a 3 Mb tandem duplication in the AZFc region of the human Y chromosome.9 The two 190 kb copies are located in opposite orientations and thus contain the DYS385 microsatellite in their most proximal parts (Figure 1). The duplication event is likely to have occurred prior to the emergence of anatomically modern humans (100 000–200 000 years ago), since two male-specific alleles at DYS385 have been observed in all human populations studied to date.15,17,18,20,22,24,25,39,40,41,42,43,44

Figure 1
figure 1

Schematic structure of the inverted repeat containing the Y-chromosomal microsatellite DYS385. The green and yellow arrows represent the duplicated 190 kb repeats, separated by 40 kb of unique sequence, with the sequence orientation indicated. The distal and proximal parts of the repeat are characterized by LINE1 and AluSx sequences. The positions of the distal and proximal boundaries of the repeats are given as sequence positions in the GenBank contig NT_011875 (version NT_011875.7, GI:16172964). The position of the PCR-primers used for specific amplification of the DYS385a and DYS385b microsatellites separately in a nested PCR approach are indicated. The hypothetical intrachromosomal recombination mechanisms resulting in reciprocal genetic exchange is demonstrated.

We used the sequence information to design PCR assays specific for each individual DYS385 locus, designated DYS385a and DYS385b (Figure 1). Separate and locus-specific amplification of DYS385a and DYS385b was assured by using part of the 40 kb single-copy sequence between the two 190 kb fragments for specific DYS385a and DYS385b primer design (Figure 1). In all separate analyses of the DYS385a and DYS385b loci, a single male-specific PCR-product was observed, compared to the typically two-band pattern resulting from the simultaneous amplification of both loci following the conventional DYS385 PCR protocol (Figure 2). DNA samples, which produce a one-band pattern in the conventional PCR, yielded as expected two fragments of identical size from the analysis of the separate loci, confirming that single-band patterns for DYS385 in the conventional approach do indeed reflect two alleles of identical length at the two loci, as assumed previously,15,18 and not the existence of only one locus. DNA sequence analysis of several alleles of varying lengths at both DYS385 loci revealed no sequence differences either within or between the two loci, except for the expected variation in the number of (GAAA)n repeats.

Figure 2
figure 2

Separate amplification of the two duplicated loci of DYS385. Electropherograms from two individuals with the DYS385a/DYS385b haplotype 11/14 (ac) and 14/11 (df). (a, d): Conventional DYS385 amplification with the primer pair DYS385.1 and DYS385.2B, resulting in two products; (b, e): separate amplification of DYS385a; (c, f): separate amplification of DYS385b.The y-axis represents arbitrary fluorescent units, and the x-axis represents the size of fragments in base pairs.

The separate analysis of DYS385a and DYS385b in 70 unrelated German males revealed an unexpected and striking similarity of the allele frequency distributions at both Y-STR loci (Figure 3a). The two allele frequency distributions are significantly correlated (r=0.853; P=<0.001) and are not significantly different from one another by a χ2-test (χ2=15.45; df=10; P>0.05) (Table 1). Such a strong similarity of allele frequency distributions of two duplicated STR loci cannot be explained by microsatellite DNA evolution according to the generally assumed polymerase/strand slippage model,45,46 for the following reason. The duplication event is old, perhaps as old as 100 000–200 000 years, because all human populations carry two copies of the DYS385 microsatellite. After the duplication event, both copies of the DYS385 microsatellite started to mutate independently from each other as they are separated by 40 kb of interspersed unique sequence. Given a mutation rate of about 2 × 10−3 per generation as experimentally observed for DYS385,26 it is very unlikely that the microsatellite evolution would by chance result in almost identical, bimodal allele frequency distributions at both loci.

Figure 3
figure 3

Allele frequency distributions at DYS385a and DYS385b, and at seven additional Y-STRs, for a different German (ad), Polish (eh) and Spanish (il) population sample. Allele frequency distributions are presented both for all pooled individuals and separately according to Y-SNP haplogroups.

Table 1 Allele frequencies (%) at DYS385a and DYS385b (analysed separately) in different human populations

One way to explain such a nearly identical allele frequency distribution at two independently mutating Y-chromosomal microsatellite loci would be intrachromosomal genetic recombination. Such events would result in reciprocal exchange of sequence information between the two duplicated 190 kb fragments and thus, if frequent enough, would make the allele frequency distribution at the two copies of the DYS385 microsatellite almost identical (Figure 1). Since we detect the two DYS385 copies by specific PCR primers located outside the 190 kb fragment, the putative intrachromosomal recombination events must occur in the most proximal part of the 190 kb fragments to result in a complete reciprocal exchange including the DYS385 microsatellite (Figure 1).

It should be emphasized that nonreciprocal events such as gene conversion would result in different expectations than reciprocal genetic exchange. Gene conversion or other nonreciprocal events would create identical copies at the two loci, and thus an excess of homozygotes and a significant correlation of DYS385a and DYS385b at the individual level. Neither of these was observed in our data.

If intrachromosomal recombination is indeed the underlying cause for the similar DYS385a and DYS385b allele frequency distributions in the German sample, then we would expect other populations to also exhibit similar allele frequency distributions. We thus genotyped DYS385a and DYS385b in additional human populations from Poland, Spain, Papua New Guinea and Cameroon. However, all these other groups showed considerably different allele distributions between the two DYS385 loci, with nonsignificant correlation coefficients and with significant χ2 values (Table 1). These results indicate that instead of a general mechanism such as intrachromosomal recombination, a different phenomenon that exhibits population-specificity is apparently responsible for the strong similarity between the two allele frequency distributions in the German sample.

To investigate how the paternal history of Germans might result in similar allele frequency distributions at DYS385a and DYS385b, we genotyped the German, Polish and Spanish samples for three Y-SNPs, which define two Y chromosome haplogroups (hg1 and hg3) known from previous studies to be informative in European populations.32,33,34,35,47 The highly significant correlation in the German sample between the allele frequency distributions for DYS385a and DYS385b disappeared completely when the variation was partitioned by the two Y-SNP haplogroups (Table 2, Figure 3a, b). In those individuals, which belonged to neither hg1 nor hg3 (and thus are potentially a pool of different Y-SNP haplogroups), the correlation coefficient was still statistically significant (r=0.679, P=0.022), albeit lower than in the entire sample (Table 2, Figure 3b). This indicates that differences in the allele frequency distributions according to the Y-SNP background, primarily (but not exclusively) based on hg1 and hg3, are responsible for the strong similarity in allele frequency distribution between DYS385a and DYS385b in the German sample.

Table 2 Correlation coefficients (r) and their P-valuesa between the allele frequency distributions for DYS385a and DYS385b

More specifically, the combined Y-SNP/Y-STR analysis revealed that there are two alleles at DYS385a and DYS385b that are strongly correlated with the SNP background in the German sample, but in opposite directions. All DYS385a/DYS385b 11/14 haplotypes belonged to hg3, whereas all 14/11 haplotypes belonged to hg1. This relation is also evident from the individual allele frequency distributions for DYS385a and DYS385b, separated according to haplogroup (Figure 3b). Although there is no significant correlation between the DYS385a and DY3S85b allele frequency distributions in the Polish and Spanish samples (Table 2, Figure 3e, f, i, j), significant differences are observed in the distribution of DYS385a and DYS385b alleles according to the two Y-SNP haplogroups in all European groups analysed. In particular, alleles DYS385a-11 and DYS385b-14 are associated with hg3, and DYS385a-14 and DYS385b-11 with hg1.

Figure 3
figure 4

Continued

Figure 3
figure 5

Continued

Striking differences between the allele frequency distributions for the different Y-SNP haplogroups were also observed at nearly all other Y-STR loci analysed in all three populations (Figure 3c/d, g/h, k/l). In particular, at DYS392 nearly all hg3 individuals carry allele 11, whereas nearly all hg1 individuals carry allele 13. Since hg3 and hg1 are assumed to be monophyletic, consequently there was a single ancestral Y chromosome, with a specific Y-STR haplotype, for each haplogroup. Our data suggest that for hg3 this ancestral Y-STR haplotype was DYS19-389I-389II-390-391-392-393-385a-385b=?-10-17-25-10-11-13-11-14, and for hg1 it was 14-10-16-24(23)-11(10)-13-13-14-11. Differences in the amount of variation at each Y-STR locus, within each haplogroup, can partly be explained by variation in the locus-specific mutation rates. DYS392 and DYS393 showed very low levels of variation within the haplogroups (Figure 3) and also the lowest mutation rate among all Y-STR loci used here, whereas mutation rates for the other Y-STR loci were much higher.26 Our observations clearly illustrate that Y-STR haplotype variation is strongly dependent on the relevant Y-SNP background, confirming previous findings at single Y-STR loci.33,48,49

Significant associations of particular Y-STR haplotypes with particular haplogroups is not unexpected, but how does this explain nearly identical allele frequency distributions for DYS385a and DYS385b in the German sample, but no other sample? The answer apparently lies in different frequency distributions of the Y-SNP haplogroups, together with associated Y-STR alleles, resulting from different paternal histories of these populations. The frequencies of hg1 and hg3 differ significantly between the German, Polish and Spanish samples (χ2=48.77, df=4, P<0.0001). In the German sample, both haplogroups occur in similar frequency, whereas in the Polish sample the proportion of hg3:hg1 is about 5:1, and in the Spanish sample only hg1 exists with hg3 being totally absent (Table 3). It has been shown recently that hg3 and hg1 display a clinal distribution with increasing frequency for hg1 from the southwest of Europe to the northeast, and for hg3 from the central–eastern part of Europe to the southeast and southwest.34 Hg3 is assumed to have an eastern European origin about 750050 or 2550–3800 years BP.33,34 The higher frequency of hg3 in our Polish sample as well as a higher Y-STR haplotype diversity associated with hg3 in the Polish sample than in the German sample (Table 3) and the absence of hg3 in the Spanish sample, are in agreement with this presumed eastern origin of hg3.34 Hg1 is hypothesized to reflect the predominant early European Y chromosome lineage, which was widely distributed before being partially replaced by a major demographic expansion of agricultural migrants from the Near East into Europe carrying Y chromosome haplogroup hg9.34 Our observation of a higher frequency of hg1 in the German sample compared to the Polish sample, and especially with the highest frequency observed in the Spanish sample (Table 3), fits this hypothesis. The SNP mutation defining hg1 is assumed to be rather old, for example, 23 000 years BP50 and the high Y-STR-haplotype diversity observed here for all three populations presumably reflects the old age of this haplogroup (Table 3).

Table 3 Frequency of Y-SNP haplogroups and associated Y-STR haplotypea diversity in the German, Polish and Spanish samples

Thus, the chance 1:1 ratio of hg1 and hg3, together with their respectively associated DYS385 alleles, appears to be responsible for the nearly identical allele frequency distributions of DYS385a and DYS385b in the German sample. Proceeding from Germany to the east, to Poland, shifts the proportion of hg3:hg1 towards hg3 owing to its eastern European origin, which results in the disappearance of the similar DYS385a and DYS385b allele frequency distributions. Similarly, proceeding from Germany to the west, to Spain, shifts the 1:1 ratio of hg3 and hg1 completely towards hg1, thereby also resulting in different DYS385a and DYS385b allele frequency distributions. That we see in the German sample a lower but still statistically significant correlation when hg1 and hg3 are excluded indicates that other haplogroups and associated Y-STR haplotypes also contribute to this effect.

Consequently, the amount of incorrect allele–locus assignment for DYS385 when analysed simultaneously following the conventional DYS385 PCR protocol strongly depends on the population investigated and their hg1:hg3 proportion. When analysed simultaneously, the convention for statistical analyses of DYS385 data is that the shorter allele is artificially assigned to DYS385a and the longer allele to DYS385b. Thus, for hg3 Y chromosomes, with high frequency of DYS385a-11 and DYS385b-14 (and their derivatives), this approach would reflect the true allele–locus assignment, and in general the higher the frequency of hg3 chromosomes in a population, the lower the misclassification rate. In contrast, for hg1 Y chromosomes, with high frequencies of DYS385a-14 and DYS385b-11 (and their derivatives), this approach would not reflect the true allele–locus assignment, and the higher the frequency of hg1 chromosomes in a population, the higher the misclassification rate. In our data, the amount of incorrect DYS385 assignment was 40% in the German sample, decreasing to 12.8% in the Polish sample, but in-creasing to 65.3% in the Spanish sample, which exactly mirrors the relative proportion of hg1 chromosomes. This might be crucial for forensic applications of DYS385 using the traditional simultaneous amplification approach, especially if the number of additional Y-STR loci analysed is low.

Conclusion

We unexpectedly observed almost identical allele frequency distribution at two copies of the duplicated Y-STR locus DYS385 in a German sample. Although we initially suspected that intrachromosomal recombination could be responsible, further investigation showed that this phenomenon was restricted to the German sample. Investigation of the Y-SNP background revealed that the identical DYS385 allele frequency distributions in the German sample disappeared when the variation was partitioned by Y-SNP haplogroups. We showed that Y-STR variation depends strongly on the Y-SNP background; in particular two Y-SNP haplogroups (hg1 and hg3) are associated with complementary genotypes at the two DYS385 loci (14/11 and 11/14, respectively). These two haplogroups occur in roughly equal frequency in the German sample, but in different proportions in other samples, explaining the initial finding of identical allele frequency distributions at the two DYS385 loci in the German sample but not other samples. Thus, our data demonstrate that nearly identical allele frequency distributions at two highly polymorphic STR loci embedded in a large duplicated DNA-fragment on the Y chromosome, which may be interpreted as result of reciprocal genetic exchange (i.e. intrachromosomal recombination), can rather be an effect caused solely by population history. This should be taken into consideration when dealing with putative evidence for intrachromosomal recombination, especially on the Y chromosome. Furthermore, our results show that individuals with similar phenotypes at duplicate Y-STR loci can have different genotypes, which has important implications for forensic but also evolutionary applications involving such loci.