Introduction

Mitochondrial and Y-chromosomal markers show large differences in a way that genetic variation is partitioned within and between populations – intrapopulation diversity is usually lower for the Y-chromosome (inherited uniparentally through males) than for mtDNA (inherited uniparentally through females) (eg Jorde et al, 2000; Kayser et al, 2003). These differences could be due to sex-specificity in rate and scale of migration (including the prevalence of patrilocal marriage practices) and in effective population size (eg Bamshad et al, 1998; Seielstad et al, 1998; Perez-Lezaun et al, 1999; Oota et al, 2001; Kayser et al, 2003); consequently, comparisons between mtDNA and the Y-chromosome have been widely used to characterize sex-specific patterns. However, the differences could also reflect selective pressures acting differently in the two genomic components as well as their great disparity in molecular characteristics (such as size, copy number, cellular localization, coding proportion and mutation rate) (Cummins, 2001). The discordant conclusions about male and female migration rates and effective population sizes reached by studies using similar mtDNA markers but different Y-chromosomal markers provide an illustration of this problem (Wilder et al, 2004; Wood et al, 2005).

To test to what extent sex-specific processes shape the genetic structure of human populations, it would be desirable to study the variability of markers that are comparable to each other in molecular terms, and located in similar genomic contexts. In addition, to avoid bias arising from differential selection, it would be important to possess information about the selective pressures acting on these markers.

In the present study, we consider an attractive candidate for such a model, markers on the X- and Y-chromosomes, which share homology through an autosomal common origin. This homology ensures they share similar molecular characteristics. The differentiation of the sex chromosomes started 320 MA ago, presumably with a local suppression of recombination between the proto-X and proto-Y (Lahn and Page, 1999). A series of up to five successive inversions were responsible for the complete suppression of recombination between the X- and Y-chromosomes, with the exception of the pseudoautosomal region(s) (Ross et al, 2005). Although sequence divergence resulted from recombination suppression, some non-pseudoautosomal regions have conserved a very high level of DNA similarity (Ross et al, 2005). Exploiting the ancestral homology of the human sex chromosomes and their modes of inheritance (2/3 of X-chromosomes are inherited from females and 1/3 from males), we propose to use XY-homologous markers as a complementary tool to mtDNA and Y-chromosome to estimate the sex-specific processes shaping the genetic structure of human populations. We focus on the Xp22.3 and Yq11 homologous region in which dinucleotide microsatellite markers have previously been characterized (Malaspina et al, 1997) and studied (Scozzari et al, 1997). These markers are attractive because they can be specifically amplified from each chromosome, and therefore provide unambiguous information about both X- and Y-chromosome diversity. Their similar genomic contexts have been already described (Balaresque et al, 2003).

In this study, we analyze XY-linked microsatellite markers in 33 populations from Africa, Europe and Asia. We carry out an extensive intra- and interpopulation genetic differentiation analysis and evaluate alternative sex-specific processes to explain the observed patterns, asking if these processes differ according to the continental area under study. Our purpose is not to draw conclusions about the precise processes involved in the history of each population, but to assess the potential of XY-homologous markers, and to ask if the development of further marker systems from the extensive available sequence data is worthwhile.

Materials and methods

Samples

A total data set of 2549 X-chromosomes and 1413 Y-chromosomes from Africa, Europe and Asia was analyzed, representing 33 populations located in eight major geographic regions in Europe, Africa and Asia (Figure 1). Most of these data were extracted from Scozzari et al (1997) and Balaresque et al (2004). To complete the data published in the latter paper, we genotyped 386 Y-chromosomes from Ivory Coast (Akan and Yacouba), Ethiopia (Amhara and Oromo), Algeria (Mozabites), Morocco (Berbers), Corsica and the Orkney Islands (Orcadians). These males have been already typed for the X-linked microsatellite.

Figure 1
figure 1

Map showing the approximate location of 33 populations studied. Data extracted from Scozzari et al (1997) are represented by black circles; white circles refer to the data extracted from Balaresque et al (2004) and completed in the present study. The circles indicate the geographical area considered.

PCR amplification

The DYS413a,b microsatellite amplifications followed the protocols of Scozzari et al (1997). The DYS413 polymorphic system corresponds to two coamplified Y-specific loci each containing a (CA)n microsatellite. Alleles were treated according to Mathias et al (1994): the larger PCR fragment generated was assigned to the allelic classes DYS413a and the smaller to DYS413b. Whenever a single band was observed, the presence of two fragments of the same size was assumed. PCR primers were fluorescently labelled and the PCR products were run in a standard 6% denaturing polyacrylamide gel using an ABI 373A automated sequencer (Applied Bio-systems). GeneScan and Genotyper software packages (Applied Bio-systems) were used to size the amplified alleles.

Information on genomic context for the XY-homologous microsatellites and calculation of the Y/X mutation rate

The (CAIII) microsatellites have been mapped to Xp22.3 and Yq11, are located in one of the homologous regions existing between X- and Y-chromosomes outside the pseudoautosomal regions I and II, and are shown in Figure 2a. This representation is based on recent information about the molecular structure of the Y-chromosome (Skaletsky et al, 2003) and the X-chromosome (Ross et al, 2005). By an in silico search, we confirmed the location of the DXS8175 microsatellite in the homology region 9 (Ross et al, 2005) close to the VCX-B1 (VCX10r). On the Y-chromosome, the DYS413a,b microsatellites are located within palindrome P8, which shows arm to arm identity of 99.997% (Skaletsky et al, 2003). Figure 2b shows each microsatellite studied within XY-homologous region and the nearby VCXY genes.

Figure 2
figure 2

(a) The microsatellites and their surrounding sequences (3 kb) are represented by the black box. The homology region is represented by the two gray boxes on X- and Y-chromosomes. The detailed structure of the homology region on each chromosome is reported: on the Y-chromosome, we represented the two arm of the palindrome P8 (according to Skaletsky et al, 2003), and on the X-chromosome, the different subpart of homologous regions (according to Ross et al, 2005). (b) The molecular structure of the CAIII microsatellites and their surrounding sequences (containing VCXY genes) are represented.

This high degree of sequence similarity between the homologous flanking sequences of the X- and Y-linked microsatellites allows us to calculate the ratio of mutation rates between the sequences. This was carried out by counting the number of Y/X nucleotide substitutions in the flanking sequences in different primate species (Balaresque et al, 2003), and converting this to a mean substitution rate by considering the species divergence times, taken from Chen and Li (2001). Using comparisons between human and chimpanzee, human and gorilla and human and orangutan, the mean nucleotide substitution rates were 0.03 substitutions/site/MYA for DXS8175 and 0.20 substitutions/site/MYA for DYS413a,b. The Y/X ratio is then 0.20/0.03=6.6. This value is higher than those reported in the literature (see, for a review, Li et al, 2002), and this point will be addressed in the Discussion.

Measures of genetic diversity

Estimates of intrapopulation diversity

Genetic diversity (h) was estimated as h=1−∑pi2, where pi is the estimated frequency of the ith allele/haplotype (Nei, 1987). We used the derived population parameter θH as another expression of the intrapopulation diversity (Watterson, 1975). The population mutation parameter, θH (Watterson estimator) is equal to θH=[h/1−h]=2 and depends on N being the effective number of chromosomes and μ the mutation rate. The θH=[h/1−h]=2 for the sex chromosomes is equal to θH(Y)=2NYμY and θH(X)=2NXμX. According to the different number of chromosomes in each sex, NY=Nmales and NX=2Nfemales+Nmales; the ratio θH(Y)/θH(X) in each population depends on the sex-specific effective population sizes and on the Y/X mutation rate ratio.

Estimates of interpopulation diversity

We used ARLEQUIN version 2.0 software (Schneider, 2000) to perform analyses of molecular variance (AMOVA). AMOVA produces estimates of variance components reflecting the correlation of allelic/haplotypic diversity at different levels of hierarchical subdivision. This included (i) FCT: cluster of subpopulations relative to total population, (ii) FSC: subpopulation relative to cluster of subpopulations and (iii) FST: subpopulation relative to total population. Significance levels of the variance components and the corresponding F-statistics were obtained by comparison of the actual values with the distribution of 10 000 values obtained by randomization.

Estimates of sex-specific processes

The global differentiation coefficients FST(X) and FST(Y) were computed and compared to estimate the relative impact of sex-specific processes (migration rate and effective population size) that shape diversity patterns for the two sexes.

Assuming an island model with neutrality, the theoretical models predict for a given genetic system that FST=1/(1+2ν), where ν=Nm (Wright, 1951), with N the effective number of chromosomes and m the rate of migration. For the X- and Y-chromosomes, we have, respectively, FST(X)=1/(1+2νX) and FST(Y)=1/(1+2νY), with N(Y) the effective number of Y chromosomes, N(X) the effective number of X-chromosomes, m(X) the rate of X-chromosome migration and m(Y) the rate of Y-chromosome migration.

Then, from these differentiation coefficients FST(X) and FST(Y), we can calculate the νX/νY ratio as follows:

Using equation (1), in two specific cases, we can evaluate to what extent the migration rate and the effective population size shape the genetic differentiation across all populations.

Hypothesis 1 (H1):

  • Male and female effective population sizes are equal (Nfemale=Nmale=N), then N(X)=3N(Y); hence, νX/νY depends only on different migration rates of males and females. As , the ratio of male to female migration rate is given by equation (2):

Hypothesis 2 (H2):

  • Male and female migration rates are equal (mfemale=mmale=m); hence, m(X)=m(Y)=m, in which case the ratio νXY depends only on different male and female effective population sizes. According to Hartl and Clark (1989), NX=9(NfemaleNmale)/2(Nmale+Nfemale), so the male to female effective population size ratio is given by equation (3):

Testing correlation between X- and Y-derived distance matrices

We computed FST genetic distance matrices between populations for the X- and Y-linked markers and compared them using the Mantel test (Mantel, 1967). The significance of the correlation between the two matrices was determined by random permutations (n=1000) and measured by the Z-statistic (Manly, 1997). The computation was performed using the Microsoft Excel macro Mantel.xla v. 1.2 (compiled by Dr RA Briers – r.briers@shef.ac.uk).

Results

X- and Y-linked gene diversity: estimates of θH(Y)/θH(X) ratio

The mean values of DYS413a,b and DXS8175 haplotype/gene diversity for each group of populations are reported in Table 1. We calculated an estimate of the mean and the variance of the θH(Y)/θH(X) ratio for each group of populations considered. Averaging, the θH(Y)/θH(X)= 1.7951.495 and this ratio varies drastically between and within each geographic area considered (Table 1). Extreme values are reached in Asia (3.6622.005) and in West Africa (0.5110.291).

Table 1 DXS8175 and DYS413a,b gene diversity (meanSD) for the geographic area under study and estimates of θH(Y)/θH(X) ratio

Distribution of the genetic variability within and between populations: analysis of molecular variance and genetic distances

Y-linked genetic variation

The overall FST value for all 33 populations is 0.194 (P<0.001), indicating that a large proportion of the overall Y-chromosomal variation (80.6%) results from intrapopulation differences (compared to the X data here, Table 2). Overall FST values calculated for all populations of Africa, Europe and Asia are, respectively, 0.240, 0.068 and 0.032, reflecting a larger genetic differentiation of African populations than in other groups. The North vs South Sahara populations show a high degree of differentiation (FCT=0.184, P<0.01) and these two groups seem heterogeneous (FSC=0.172, P<0.001). A very similar pattern is observed when eastern and western African populations are grouped (FCT=0.160, P<0.04; FSC= 0.147, P<0.001). In West Africa, the FST value is 0.174 (P<0.001), the lowest value found in Africa and lower than the average Y-linked differentiation between all populations (0.194). This reflects a lower genetic differentiation between these populations in this geographic area. In Europe, no genetic differentiation appears by grouping Northern and Southern populations (FCT=−0.013, NS).

Table 2 Distribution of total genetic diversity within populations, among populations within the groups established and among these groups for the DXS8175 and DYS413a,b loci

X-linked genetic variation

The overall FST value calculated for all populations is 0.041 (P<0.001), indicating that most of X-chromosomal variation results from intrapopulation differences (96%, Table 2). This tendency remains unchanged when we consider populations from Africa (FST=0.034, P<0.001), Europe (FST=0.002, NS) or Asia (FST=−0.014, NS). North vs South Sahara populations show the highest FST value in Africa (FST=0.04, P<0.001), suggesting a high degree of genetic differentiation between these two groups (FCT=0.03, P<0.01). This kind of genetic structuring cannot clearly see between western and eastern groups in Africa (FCT=−0.006, NS). The West African populations show a low level of genetic differentiation (FST=0.009, P<0.01). In Europe, no genetic differentiation is observed between groups from the North and South (FCT=0.004, NS).

Testing correlation between X- and Y-derived distance matrices

We compared pairwise FST distance matrices for the X- and Y-linked markers by the Mantel test. We find a lack of association between these matrices (r=−0.023; z=0.065), which confirms the discordant pattern of genetic variation between X and Y-linked homologous markers in the AMOVA analyses.

Estimates of sex-specific parameters inducing differential genetic differentiation of populations

Considering a neutral model in which FST=1/(1+2ν), with 2ν=2Nm and applying this model to X- and Y-chromosomes, we used equations (1), (2) and (3) to evaluate both the sex-specific effective population size and migration rate ratios. Global genetic differentiation values of FST(X)=0.03 and FST(Y)=0.19 have been input in equation (1) and yielded a ratio νY/νX=7.58.

Under H1 (Nfemale=Nmale=N), this yields m(X)/m(Y)=2.53 and mfemale/mmale=3.29. Under this hypothesis, the migration rate should be three-fold higher for females than for males to explain the global diversity pattern.

Under H2 (mfemale=mmale=m), thus N(X)/N(Y)=7.58 and Nfemales/Nmales=10.71. Under this hypothesis, the effective population size of females should be 10-fold higher than that of males to explain the results. To provide a wider picture of the variation of these sex-specific processes among populations, we show the wide range of scenarios compatible with our X- and Y-genetic distances in Figure 3.

Figure 3
figure 3

(a) The curve represents all potential scenarios compatible with an average genetic differentiation between all populations corresponding to FST(X)=0.03 and FST(Y)=0.19. The scenario corresponding to each geographic area has been deduced from the gene diversity values. (b) Details of the general curve with most of the scenarios involving sex-specific processes explaining X- and Y-linked genetic differentiation between populations are seen.

Inference of sex-specific processes using intrapopulation information

In order to reconcile the X- and Y-linked data under the neutral island model considered above, the migration rate of females must be three times higher than that of males (given equal effective population sizes), or the effective population size of females must be 10 times higher than that of males (given equal migration rates).

According to Slatkin (1985), in a structured system with symmetric migration, the migration has an important impact only on the interpopulation differentiation, but a much smaller impact on the intrapopulation diversity. Assuming symmetric migration, we consider the theoretical case in which the 33 populations studied should be at equilibrium, to explain the variation of θH(Y)/θH(X) observed among populations. We know that θH(Y)/θH(X) depends on the effective population size (N) and the mutation rate (μ). If we assume that μ(Y)/μ(X)=6.6 (see above), we can explain the different θH(Y)/θH(X) observed between populations by the Y/X effective population sizes as follows: taking the mean value of θH(Y)/θH(X) in all populations to be 1.79 (see Table 1) and μ(Y)/μ(X)=6.6, 6.6NY/NX=1.79. We found that NX=3.68NY, corresponding to an Nfemale/Nmale=1.38. Assuming that this value is close to 1 (equal male and female effective population sizes), we can deduce from NX=3.68NY and from equation (2), the corresponding male to female migration rate ratio: (mX/mY)=7.58/3.68=2.06 and .

Then, for similar male and female effective population sizes, a μ(Y)/μ(X)=6.6 and θH(Y)/θH(X)=1.79, the expected migration rate of females must be 2.59 times higher than that of males to reconcile differences in X- and Y-linked genetic variation. We positioned on Figure 3 all the values of θH(Y)/θH(X) calculated for the eight groups of populations considered. This shows that a wide range of migration and demographic scenarios is compatible with our data and can be invoked to explain the drastic variation of θH(Y)/θH(X) between the different geographical areas (from Mfemale/Mmale=1.43 and Nfemale/Nmale=3.77 in South Africa to Mfemale/Mmale=5.89 and Nfemale/Nmale=0.49 in Asia). The outlying value for the western African populations suggests an opposite scenario, involving a higher migration rate or effective population size for males compared to females.

Discussion

Few studies have described the genetic variability of XY-homologous markers in human populations (Scozzari et al, 1997; Karafet et al, 1998; Kotliarova et al, 1999; Carvalho-Silva and Pena, 2000) compared to those studying mitochondrial and Y-chromosomal markers (eg Seielstad et al, 1998; Jorde et al, 2000). The usefulness of XY-homologous markers studied so far has been compromised by the technical problem that the X- and Y-linked copies cannot be individually amplified. Coamplification limits the use of classic estimators for inferring the features of intra- and interpopulation differentiation (eg Karafet et al, 1998). In this paper, we used the DXS8175 and DYS413a,b homologous microsatellites, which can be specifically amplified from the X- and Y-chromosomes.

Higher Y- than X-linked genetic differentiation: possible and testable explanations

Our findings, based on a data set comprising 33 populations from Europe, Asia and Africa, reveal a clear discordant pattern of genetic variation between X- and Y-homologous markers for all geographical areas considered, with between-population differentiation always greater for the Y- than for the X-chromosome. Two hypotheses could explain this result: sex specific processes or differential selection. We consider these in turn.

Sex- and population-specific processes?

Sex-specific processes could involve a lower migration rate of males compared with females, due to patrilocality, and/or a lower effective population size for males compared with females due to demographic processes.

In order to test the impact of such processes, we compared X- and Y-linked F-statistics calculated including all populations. However, a given F-statistics value is compatible with many different scenarios. We next analyzed the variability within each population for X- and Y-linked markers using Watterson's estimator θ. By comparing FST(X) and FST(Y) values (assuming a simple 1/1+4Nm), we showed that if all the populations considered in the present study were at equilibrium and exchanged migrants symmetrically, the X/Y genetic differentiation could be explained by sex-specific migration processes (mfemale/mmale=3.29), by sex-specific demographic processes (Nfemales/Nmales=10.71) or by a combination of both.

It seems unrealistic to draw conclusions using a simple and unique scenario involving the different sex-specific processes. In general, our data showed that a lower migration rate of males compared with females (due to the tendency for a female to move into her husband's natal household) and a lower effective population size for males compared with females (such as the prevalence of polygyny over polyandry, or cultural practices) could be two important factors explaining genetic structuring in human populations (eg Salem et al, 1996; Bamshad et al, 1998; Sibert et al, 2002). This general tendency may be reversed in some populations: as we showed in West Africa, the results suggest an opposite scenario (including populations originating from the Bantu expansions). A similar result has been reported by Hammer et al (2001) and Wood et al (2005) providing the evidence that the different approaches converge.

The large fluctuation of the Watterson's estimator ratio θH(Y)/θH(X) between populations in the same region suggests that these sex-specific features may be specific to each population and that they may vary whatever the geographical scale considered. Variations due to local cultural practices (including language affinities between populations) are consistent with these findings (Oota et al, 2001; Hamilton et al, 2005; Wood et al, 2005).

The ratio of the Y/X substitution rates used in our calculations has a major influence on our estimates of sex-specific parameters. This value, 6.6, is higher than previously reported values, which vary from 2.2 to 2.8 in higher primates (Li et al, 2002). The two Y-linked sequences studied here lie within each arm of a ‘palindrome’ – large highly similar repeat units arranged ‘tail-to-tail’ (Figure 2); gene conversion processes are known to be highly active between such sequences on the Y-chromosome (Rozen et al, 2003), and could act (as they do in the direct repeats flanking the AZFa region; Bosch et al, 2004) to enhance interspecific sequence divergence.

Sex-specific selection?

In assessing the potential impact of differential selection on genetic markers, we need to consider both direct effects on the sequences under consideration, and indirect effects due to linkage.

We have assumed a minimal impact of differential selection acting directly on the XY-homologous markers studied here, because of the similarity in genomic context (Balaresque et al, 2003), and despite the 50 million years separating each from a common ancestor (Lahn and Page, 1999). Although there is no reason to expect selective pressures to act directly on microsatellites in noncoding regions, the low sequence divergence of flanking sequences among primate species suggests that selection may be playing a role (Balaresque et al, 2003). The VCX and VCY genes are in close proximity to the microsatellites, but given that they appear to be under similar selective pressures (Lahn and Page, 1999), this should result in equivalent selective bias at both the X and Y loci.

Indirect selection on the microsatellites is affected by the recombination behavior of the sex chromosomes. As the Y-chromosome is largely nonrecombining, it represents only a single realization of the evolutionary process, and selection acting anywhere along its length could affect the diversity of DYS413a,b. However, this problem is unavoidable in any study, whether it compares the Y with the X, or with mtDNA. Selection could also be affecting DXS8175 through linkage to a locus under selection on the X-chromosome; however, because of recombination in females, the X-chromosome represents not one but many genealogies, with different selective histories.

The ideal marker for examining sex-specific phenomena (uniparentally inherited through males and females, but having similar molecular characteristics) does not exist. However, a combination of Y-chromosomal and mitochondrial haplotyping with the analysis of XY-homologous loci may prove to be powerful tool. A large potential resource of XY-homologous markers exists and could be readily accessed by simple in silico searches. Analysis of additional markers mapping to different regions of the sex chromosomes would obviate the effects of selection on individual X-linked loci, and would also allow the use of single-copy Y-specific loci, without the complicating factor of gene conversion processes.