Introduction

Arabia has played the role of a strategic crossroads between Africa and Eurasia, facilitating the first exodus of modern humans from the Horn of Africa to the present day Yemen through the Bab el Mandab Strait at mouth of the Red Sea.1, 2, 3 Subsequent migrations through the northern intercontinental passageway between Africa and the Levant (the Levantine corridor) have also been documented.4, 5 In addition, the Arabian Peninsula has linked the distant populations of China and India to communities of the Mediterranean and beyond. Although the Persian Gulf to the east and the Arabian Sea to the south offered easy passages to India and Asia, the Red Sea on the western coast of the Arabian Peninsula provided a natural connection to the Mediterranean Sea.

Just north of the peninsula, the Nile River Valley in Egypt and the Tigris–Euphrates area in Iraq comprised a region known as the Fertile Crescent. Recognized as the birthplace of agriculture during the Neolithic (∼8000 yBP) based on linguistic and archaeological evidence,6, 7 the Fertile Crescent participated in ancient international trade. Although the fertile soils produced a surplus of food, the region lacked the natural resources necessary for building permanent structures (timber) or making metals (minerals). Therefore, early inhabitants relied on trade to acquire these raw materials and established close links with the commercial centers along the Persian Gulf as reflected in archaeological finds.8, 9, 10 At the extreme southern end of the Arabian Peninsula, referred to as Arabia Felix by the Romans (‘Happy Arabia’ in Latin) and including present day Yemen, the spice trade was an important source of wealth. Frankincense and myrrh were commonly exported to the Mediterranean via camels and to India by sea.

In agreement with archaeological and historical records that accentuate the region's active role as a point of contact between distant populations, the Middle East displays a high degree of genetic diversity.11, 12, 13, 14 Although genetic diversity is elevated, various analyses have identified structural barriers to gene flow into and out of the Near East. Specifically, mtDNA,15 Y-chromosome14, 16, 17, 18 and autosomal STR studies19 have identified the Dasht-e Kavir and Dash-e Lut deserts in Iran and the Hindu Kush mountains in eastern Afghanistan as potential barriers to gene flow to the surrounding regions. In contrast, geographic facilitators for gene flow have also been described, including a region along the southern coast of Iran, Afghanistan and Pakistan known as Balochistan mediating gene flow from South Pakistan to South Iran.14

Mitochondrial DNA analyses have been performed on collections from Qatar, United Arab Emirates (UAE) and Yemen,5, 20 yet the paternal component of this historically and geographically significant region is incomplete. Although Y-chromosome studies have focused on neighboring areas, including Egypt,4 Somalia,21 Iraq,22 Syria and Lebanon23 as well as on the southern Arabian populations of Oman4 and Yemen,24 high resolution Y-chromosome analyses of the Persian and Oman Gulfs are fragmentary.

To gain a more complete understanding of this region's role in human dispersals, particularly in light of previous studies that have identified barriers and conduits to gene flow that would affect its Y-haplogroup substructure, the present study employs high-resolution Y-chromosome analyses of three southern Arabian populations: Yemen (n=62), Qatar (n=72) and the UAE (n=164). In addition, 17 Y-STR loci were typed to obtain STR-based age estimates for a selection of informative Y-chromosome haplogroups in the populations in which they were observed. Results from these Y-specific analyses were interpreted in conjunction with data on 15 autosomal STR loci for Yemen, Oman, Qatar, Iran, Egypt19, 25, 26 and UAE (Cadenas, unpublished results) reanalyzed collectively with the aim of exposing characteristics unique to the southern Arabian Peninsula.

Materials and methods

Sample collection and DNA extraction

Blood samples from 298 unrelated males representing three populations that include the UAE, Qatar and Yemen were collected in EDTA Vacutainer tubes. The paternal ancestry of the donors was recorded for a minimum of two generations. Table 1 provides additional information on the sample size, geography and linguistic affiliation of the populations involved. DNA was extracted from the blood using the phenol–chloroform extraction method.33 Ethical guidelines were adhered to in strict compliance with NIH guidelines as well as to those stipulated by the institutions involved.

Table 1 Geographic and linguistic description of populations analyzed

Y-haplogroup analysis

Seventy-six binary genetic markers were genotyped12, 30, 34, 35, 36 following the Y-chromosome phylogeny hierarchy using standard methods, including PCR/RFLP, allele-specific PCR37 and the YAP polymorphic Alu insertion.38 The amplicons generated from these methods were separated by electrophoresis in 1X TAE, 3% agarose gels and visualized subsequent to ethidium bromide staining and UV light photography in a Fotodyne FOTO/Analyst®. The phylogenetic relationships of the relevant Y-chromosome haplogroups are illustrated in Figure 1 according to YCC nomenclature39 with new marker designations as provided in the published literature.12, 30, 34, 35, 36

Figure 1
figure 1

Hierarchical phylogenetic relationships of Y-chromosome haplogroups and genotypic frequencies (percentages) observed for Qatar, UAE and Yemen. Ten markers shown in italics were not genotyped and are included for context, including five (M89, p12f2, M4, M173 and M17) that are equivalent to the binary markers typed in the present study. The following 21 markers were typed but not observed in the three populations: M131, M210, M148, M224, M281, V6, P16, M286, Apt, M258, M321, M68, M158, M289, M318, M319, M353, M317, M122, M18 and M75.

Statistical and phylogenetic analyses

Twenty-nine geographically targeted populations reported in previous studies (Table 1) were included in the statistical and phylogenetic analyses performed to assess Y-haplogroup variation and phylogeographic relationships throughout the region. The Georgia and Tajikistan data will be published in detail elsewhere. The various data sets were used at a resolution of major haplogroups (A through R). Haplogroup frequencies were compared by means of a χ2-test. Phylogenetic comparisons were made with multidimensional scaling (MDS) analysis based on Fst distances40 using the Statistical Package for the Social Sciences (SPSS) software program.41 Genetic structure was further examined by performing two sets of analyses of molecular variance (AMOVA)42 using the Arlequin version 2.000 package43 with the 32 populations subdivided according to two criteria, geography (North Africa, East Africa, Arabian Peninsula, Caucasus, Levant, Anatolia, Iranian Plateau, South Asia and Central Asia) and linguistic family (Afro-Asiatic, Indo-European, Niger-Congo, Altaic and South Caucasian). Table 1 indicates the populations included in each of the geographic and linguistic groups utilized in the AMOVA. Pairwise comparisons of the populations from the present study and all reference populations were generated using G-tests in Carmody's software44 to assess any genetic differences of statistical significance.

STR analysis

DNA amplification of 17 Y-specific STR loci (specifically DYS19, DYS385a/b, DYS389I/II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635 and Y-GATA H4) was performed using the AmpFlSTR Yfiler Amplification Kit (Applied Biosystems, Foster City, CA, USA) according to the manufacturer's instructions in an Eppendorf® Mastercycler®. DNA fragment separation and detection was achieved in an ABI Prism 3100 Genetic Analyzer (Applied Biosystems). ABI Genescan500 LIZ was utilized as an internal size standard. Amplicon sizes were determined using the Genescan® 3.7 software and alleles were designated by comparison to an allelic ladder from the manufacturer using Genotyper® 3.7 NT software.

Haplogroup-specific expansion times were estimated for select binary haplogroups (J1-M267, R1a1-M198, E3b1a-M78 and E3b1c-M123) by the linear expansion method. This procedure assumes a stepwise mutation model45 and a mean STR mutation rate of 0.00069 per STR locus per generation46 with a 25-year intergeneration time as performed in previous studies.4, 12, 47 The linear expansion method assumes a star-like genealogy attributable to continuous growth where the expected value of the average coalescence time for STR alleles (T) equals the STR variance (S) divided by the mutation rate (μ) times the number of generations since expansion.48, 49 STR variances were calculated using the vp equation of Kayser et al.50 In addition, STR-based divergence times were calculated for each of the haplogroups based on the method described by Zhivotovsky et al,46, 51 likewise assuming a mutation rate of 0.00069 per STR locus per generation46 and a 25 year intergeneration time.

Autosomal STR markers

To further assess the level of homogeneity of the populations under study, the observed and expected heterozygosity for 15 autosomal STR loci were calculated using the Arlequin version 2.000 package43 based on the genotypes for Iran,25 Qatar,26 UAE (Cadenas, unpublished results), and Kenya, Egypt, Oman and Yemen.19 Heterozygote deficiencies (Fis and corresponding P-values) were computed for all seven populations using the GENEPOP2 software52 according to the test described by Rousset and Raymond.53

Results

Phylogeography

A total of 41 paternal haplogroups were identified from the analysis of 164 United Arab Emirate, 72 Qatari and 62 Yemeni males. Figure 1 displays their hierarchical phylogeny as well as the frequency (percentages) distributions for the populations under study. The geographic distribution of the major haplogroups is illustrated in Figure 2. Only three haplogroups (E, J and R) display frequencies above 5% in the three populations occupying the southern portion of the Arabian Peninsula and combined account for 74–98% of the chromosomes within these collections. Figure 3 displays the geographic distribution of E, J and R derivatives in a subset of the populations listed in Table 1.

Figure 2
figure 2

Geographic distribution of major Y-chromosome haplogroup frequencies for the 32 populations described in Table 1.(a) Geographic location of regions described in the text.

Figure 3
figure 3

Geographic distribution of Y-chromosome haplogroup frequencies in selected populations (a) E clade, (b) J clade and (c) R clade.

AMOVA

To explore potential correlations between genetic diversities and linguistic or geographic partitioning, the AMOVA was performed. The results of the AMOVA for the three populations from the present study in addition to 29 reference populations are listed in Table 2. Assignment of the populations according to the nine geographical groups described in Table 1 generated a higher fraction of variability among groups of populations (15.73%) than among populations within groups (4.21%) indicating a greater degree of interregional structuring. In contrast, upon subdividing the populations according to the language family (Table 1), the percentage of among group variance is lower (8.17%) than among populations within groups (12.94%), suggesting higher intralinguistic structuring for these populations. Genetic diversity among groups of populations and populations within groups correlate significantly with geographic and linguistic partitioning.

Table 2 AMOVA resultsa

Phylogenetic analyses

An MDS test was performed to assess phylogenetic relationships among populations. The MDS analysis performed on a matrix of Fst values based on haplogroup frequencies for the populations in Table 1 is displayed in Figure 4. Geographic structuring is observed involving populations displaying affiliations with other populations within their biogeographic zone. Within the plot, the populations of Egypt, Iraq, Yemen, Qatar, Oman, UAE, Syria and Lebanon occupy an intermediate position with populations from Africa to one side and Anatolia, Caucasus, Iranian Plateau, Central Asia and South Asia on the other. Of note, for the observed partitioning is the affinity of Egypt to populations from the Arabian Peninsula. Furthermore, Yemen and Qatar segregate together but separate from their neighboring populations, Oman and UAE particularly along Dimension 2. As expected, the populations from Central Asia group together and away from the South Asian ones, however, there is a segregation of North Pakistan with populations from Central Asia, whereas South Pakistan shares a closer affinity to populations to the west.

Figure 4
figure 4

MDS analyses using Fst distances based on Y-haplogroup frequency data from the 32 populations described in Table 1.

Interpopulation haplogroup diversity

To discern statistically significant genetic differences, pairwise G-test comparisons were performed with the three populations from the present study as well as the 29 reference populations described in Table 1. A total of 496 pairwise assessments were made and the results are provided as Supplementary Table 1. The number of nonsignificant genetic differences observed was 31. Of note, the Lebanese and Syrian populations do not display statistical differences with populations from UAE, Qatar and Oman but do show significant differences with Yemen. Furthermore, Yemen is the only one to display statistically significant differences to all other populations in the analysis. Across all populations, the Lebanese and Syrians are involved in most of the pairings in which no significant difference is observed (α=0.05). In addition to the three south Arabian populations of Qatar, UAE and Oman, the Algerian Berbers, Greece and Tajikistan do not exhibit significant differences to Lebanon and Syria. However, Turkey and South Iran only exhibit nonsignificant values in pairwise comparisons with Syria.

Y-STR diversity

Y-chromosome STR diversity was ascertained to generate haplogroup age estimates. Results from the expansion time analysis of populations for haplogroups J1-M267, R1a1-M198, E3b1a-M78 and E3b1c-M123 are provided in Table 3. Y-STR data for the individuals genotyped are provided in Supplementary Table 2 (J1-M267), 3 (R1a1-M198), 4 (E3b1a-M78) and 5 (E3b1c-M123). The STR-based divergence times obtained using the method described by Zhivotovsky et al46 (Table 3) for M123 are comparable (11.1 ky for UAE and 10.6 ky for Yemen), however, the UAE and Yemeni haplotypes within this haplogroup are quite different from each other and did not form a compact network, suggesting the ancestors for the M123 chromosomes in both populations involved independent bottlenecks followed by similar demographic processes. In contrast, the J1-M267 haplotypes formed a compact network across all the three populations and generated older age estimates for Yemen, Qatar and UAE (9.7, 7.4 and 6.4 ky, respectively) in comparison to the linear expansion method (7.11, 4.93 and 5.43 ky, respectively).

Table 3 Y-Haplogroup variance, expansion and coalescence times based on Y-microsatellite loci

Intrapopulation autosomal STR diversity

In order to determine the level of heterozygote deficiency, possibly resulting from consanguinity, autosomal STR diversity was examined. Table 4 presents the observed and expected heterozygosity values of 15 autosomal STR loci for Kenya, Egypt, Iran, UAE, Oman, Yemen and Qatar with the Fis values provided in the final column for each population. A gradient in the number of loci that exhibit significant (P<0.05) heterozygote deficiency is apparent moving north then east with lower values from Africa (1/15 loci for Kenya and Egypt) and higher amounts to the west and south from the Iranian Plateau (2/15 in Iran) toward the southern populations of the Arabian Peninsula (2/15 for UAE, 3/15 for Oman, 4/15 for Yemen and 8/15 for Qatar).

Table 4 Heterozygote deficiency based on 15 autosomal STR loci

Discussion

Analyses of the South Arabian Y-haplogroup substructure as well as the region's phylogenetic relationships to neighboring populations have provided us information on the following points: (1) support of the role of the Levant in the Neolithic dispersal of the E3b1-M35 derivatives, (2) neolithic spread of the J1-M267 haplogroup from the north, (3) a high haplogroup diversity shared among populations along the eastern and western coasts of the Gulf of Oman and (4) a limited haplogroup diversity in Yemen also supported by significant heterozygote deficiencies at various hypervariable autosomal STR loci.

Distribution of E3b1-M35 derivatives

The presence of signature sub-Saharan African mtDNA lineages in the south Arabian populations has been attributed to various waves of gene flow to the region, including that associated with the East African slave trade. This is apparent from the exact mtDNA haplotype matches between lineages in Yemen and East Africa, including those associated with the Bantu expansion.20 The presence of the E3a-M2 lineage in Oman (7.4%),4 Yemen (3.2%), UAE (5.5%) and Qatar (2.8%) could lead to the oversimplified conclusion that these chromosomes are also a contribution from the East African slave trade. Mitochondrial DNA analysis of the Yemen Hadramawt indicates recent gene flow (∼2500 yBP) from Africa to the Arab populations in part through the slave trade, yet an ancient arrival from East Africa is responsible for the Y-chromosome haplotypes.54 The contrast between female- versus male-mediated gene flow between these two areas can be attributed to the assimilation of females within the Arabian populations, whereas the males were often excluded from reproductive opportunities. The E3b1-M35 sub-haplogroups, M123 and M78, are believed to have spread from East Africa to North Africa and later expanded eastward through the Levantine corridor and westward to northwestern Africa. Although E3b1a-M78 data suggest that this dispersal occurred in both directions,4, 34, 47 E3b1c-M123 disseminated primarily to the east.4 The distribution of the E3b1-M35 derivatives in Yemen, Qatar and UAE agrees with their arrival by expansion via the Levantine corridor rather than through the Horn of Africa. This route is similar to general patterns of Levantine mtDNA gene flows during the Upper Paleolithic55 to the Neolithic.5, 55 This is immediately apparent by the M35 profile of several East African populations. Despite characterizing the East African populations and persisting even after introduction of E3a-M2 during the Bantu expansion, E3b1*-M35 is completely absent from the Omani,4 Qatari and UAE collections and relatively low in the Yemeni (3.2%). Kenya, Sudan and Tanzania4, 56, 57 also lack the E3b1c-M123 derivative that is common in the Near East.12, 56, 57, 58 Furthermore, Ethiopia56 and Somalia21 exhibit high levels of E3b1a-M78 (22.7 and 77.6%, respectively), which is null or nearly absent in the two populations closest to the Strait of Sorrows (Bab-el Mandeb Channel), Yemen (0%) and Oman (1.7%),4 (χ2=170.618, d.f.=1, P<0.0001 when combining the frequencies for Ethiopia and Somalia versus Yemen and Oman).

On the other hand, Cruciani et al57 have postulated that the E3b1c-M123 clade may have originated in the Near East, as its presence in East Africa is restricted to Ethiopia (11.2%). The median expansion time for M123 in Egypt is 10.8 ky,4 comparable to the estimated age of M123 STR variation obtained through the method described by Zhivotovsky et al46 for UAE (11.1±3.9 ky) and Yemen (10.6±4.1 ky), although allelic differences between these two populations indicate that they do not share a common ancestry. Recent archaeological finds supports a trading relationship between Mesopotamia and the Arabian Gulf region dating back to the Al Ubaid Period (∼7000 yBP) as evidenced by the excavation of Ubaid pottery from Mesopotamia in UAE.8, 9, 10 Ancient maritime trade routes linking Mesopotamia to the Indus Valley included Dilmun (the island of Bahrain) and Magan (in the southeastern tip of the Arabian Peninsula). It is possible that the close ties between Mesopotamia with both the Nile River Valley and the ancient Persian Gulf region during the Neolithic helped disseminate these haplogroups.

UAE is characterized by polymorphic levels of E3b1a-M78 (7.9%), similar to the Qatari (4.2%; χ2=1.12, d.f.=1, P=0.29), with lower values in Oman4 (1.7%; χ2=5.49, d.f.=1, P=0.02) and greater frequencies in Egypt4 (18%; χ2=6.73, d.f.=1, P=0.01) where it is the highest M35 derivative. The majority of the UAE M78 representatives belong to the E3b1a3-V22 clade (6.7%). STR networks of this newly defined marker indicate that it parallels the M78 haplotype cluster δ, although some discrepancies exist.36 Based on the distribution and high STR differentiation of cluster δ, its dispersal may have occurred early, the first to spread the E3b1a-M78 chromosomes to North Africa and then the Near East.57

Origin of J1-M267

Previous studies on haplogroup J1-M267 have documented high frequencies of this haplogroup in the areas of Oman (38%),4 Iraq (33.1%),22 Egypt (20%),4 Lebanon (12.5%)23 and Turkey (8.99%).12 The combination of these data with the high frequency of J1-M267 in the Yemeni (72.6%), Qatari (58.3%) and UAE (34.8%) samples examined in the present study reveals a decreasing frequency moving from southern Arabia northwards (Spearman's correlation coefficient with ranks based on distance from Yemen: r=0.9286, n=8, P<0.01). It is also distributed throughout the northwestern African populations at considerable frequencies (35.0 and 30.1% in Algeria and Tunisia, respectively).58 Based on binary and STR markers, the greatest degree of differentiation for J1-M267 is detected in the Levant with two distinct demographic dispersals generating its current distribution. A higher observed STR diversity of this clade among Europeans and Ethiopians in comparison to populations of North Africa points to its arrival to Ethiopia and Europe during Neolithic times with a more recent appearance in the latter.58 Semino et al58 describe a YCAIIa22-YCAIIb22 motif in the North African (>90%) and Middle Eastern (>70%) J1-M267 representatives that is less frequent in Ethiopia and Europe, postulating that the dispersal of the M267-YCAIIa22-YCAIIb22 clade occurred during the Arab expansion in the seventh century A.D.

Median BATWING expansion times based on Y-STR data for the Omani (2.3 ky; 95% CI: 0.6–29.2) J1-M267 chromosomes4 indicate a more recent arrival to the South Arabian populations as compared to the older expansion times obtained for the Egyptian (6.4 ky; 95% CI: 0.6–278.5)4 and Turkish (15.4 ky; 95% CI: 0.4–604.8)12 representatives of this haplogroup. Conversely, in the present study, Y-STR age estimates based on the method described by Zhivotovsky et al46 generated much older values for the J1-M267 haplogroup in Yemen, Qatar and UAE (9.7±2.4, 7.4±2.3 and 6.4±1.4 ky, respectively) than seen in the Omani,4 consistent with an earlier arrival to the region during the Neolithic. The data suggest expansion from the north during the Neolithic (or perhaps more recently), which is also reflected in the lower STR variances in southern Arabia (0.14 for Qatar, 0.15 for UAE, 0.20 for Yemen and 0.27 for Oman4 versus 0.31 in Egypt4 and 0.51 in Turkey12). Subsequently, a series of recent demographic events may account for the high haplogroup frequency of J1-M267 in the populations from the present study.

Implications of Y-chromosome distribution in Arabia

Overall, the southern Arabian populations segregate together at an intermediate position with populations from the Levant in the MDS plot, appropriate considering their strategic geographic location at a major bidirectional gateway connecting Africa and Eurasia. Based on the AMOVA, it is also possible to deduce that the overall Y-haplogroup substructure observed in these regions is affected more by geography (φct=0.16) than by language (φct=0.08). Upon classifying the populations based on language family affiliations, the variance among populations within groups is greater (φSC=0.14) than the φct attributable to variation among groups. This difference can be expected since the Afro-Asiatic family encompasses a large variety of languages. Pairwise comparisons of the 32 populations based on Y-haplogroup frequency data (Supplementary Table 1) revealed that only 13 of 90 comparisons display nonsignificant differences within the Afro-Asiatic family, leaving a total of 77 pairwise comparisons generating significant differences. The 13 pairs with nonsignificant differences involve the Levantine populations of Lebanon and Syria possibly as a result of their central position in relation to other Afro-Asiatic groups, whereas the remaining three include the populations within northwest Africa.

Studies focused on this crossroads for human movements have identified geographical barriers that may have limited gene flow with neighboring regions. Specifically, a study based on 15 autosomal STR loci detected a concentration of genetic homogeneity within the Near East, suggesting that the Saharan desert, the Iranian deserts and the Hindu Kush Mountains may have acted as obstacles for dispersal.19 The portrayal of the Dasht-e Kavir and Dash-e Lut deserts of Iran as barriers to gene flow has been described in the context of the R1a1-M198 lineage14, 16, 17, 18 as well as in the dissemination of R1b1a-M269 within Iran.14 Moreover, an admixture analysis by Regueiro et al14 identified the harsh, mountainous terrain in Northeast Turkey as well as the Hindu Kush Mountains as limiting factors of gene flow to the Iranian Plateau, whereas the Balochistan acted as a possible conduit for human dispersals. This coastal region that encompasses parts of South Iran, Afghanistan and Pakistan may have provided a unique corridor along the Gulf of Oman.

To examine the degree and geographic extent of genetic homogeneity within the Gulf of Oman, the frequency of the predominant haplogroups were contrasted among the populations in the region. A χ2-test on the haplogroup frequencies of Oman, UAE, South Iran14 and South Pakistan30 indicates that the most frequent haplogroups, E (χ2=20.836, d.f.=3, P<0.0001), J (χ2=8.677, d.f.=3, P=0.0339) and R (χ2=40.142, d.f.=3, P<0.0001) are not evenly distributed among the four populations. As the MDS plot displayed a close affiliation between South Pakistan and North Iran and the former segregated away from the Gulf of Oman populations, the χ2-test was repeated excluding South Pakistan. Although statistically significant differences are still apparent for haplogroup E (χ2=10.170, d.f.=2, P=0.0062) and R (χ2=10.560, d.f.=2, P=0.0051), J (χ2=2.577, d.f.=2, P=0.2757) exhibits an even distribution among Oman, UAE and South Iran. However, a greater homogeneity is observed among the South Arabian populations of Oman, UAE and Qatar for haplogroups E (χ2=2.249, d.f.=2, P=0.3248), J (χ2=4.831, d.f.=2, P=0.0893) and R (χ2=0.308, d.f.=2, P=0.8573). The significant differences in frequency of haplogroups result in detectable clines moving from the South Arabian populations to South Iran and then South Pakistan (E: 18.8, 6.8 and 3.3%; J: 50.4, 35.0 and 25.3%; and R: 11.2, 25.6 and 46.2% for South Arabia, South Iran14 and South Pakistan,30 respectively).

In addition, South Pakistan, South Iran, UAE, Oman and Qatar (although to a lesser extent) share a similar Y-haplogroup substructure with clinal decreases in diversity detected as one moves west to Africa, north to the Levant and Caucasus and east to south and central Asia (Figure 2). Although the Hindu Kush Mountains and Iranian deserts may have played a significant role in encapsulating the region and limiting gene flow,14, 25 the coastal area may have served as a unique corridor facilitating dispersals into and out of the region at various times in recent human evolution.

At another extreme, the haplogroup distribution of Yemen shows very limited variation, particularly when compared to neighboring populations, Oman and UAE (3 versus 11 haplogroups each), whereas Qatar is intermediate with a total of seven haplogroups, four of which display frequencies of less than 3.0%. Although Qatar does not approximate the lack of diversity seen in Yemen, the two populations display affinities that are apparent in the MDS plot, in which populations of the Levant are interspersed among the South Arabian populations, with Qatar and Yemen segregating apart from both UAE and Oman.

Regional autosomal STR analysis

To investigate the underlying reasons for the limited Y-chromosome diversity in Yemen, the observed heterozygosity values of 15 highly polymorphic autosomal STR loci were calculated using samples from Kenya,19 Egypt,19 Oman,19 Yemen,19 Iran,25 Qatar26 and UAE (Cadenas, unpublished results) and are presented in Table 4. Owing to the large number of alleles that exist at each locus, obtaining heterozygote deficiencies may be indicative of a high degree of consanguinity within populations. Qatar possesses 8 out of 15 loci with significant heterozygote deficiency (P<0.05), approximated by Yemen (4 loci) and Oman (3 loci), whereas UAE and Iran display only two loci followed by Egypt and Kenya with one locus each.

A series of recent demographic events may offer an explanation for the Y-haplogroup distribution observed in Yemen. The J1-M267 Y-pattern in particular may have arisen as a result of a founder effect followed by genetic drift. Furthermore, nonrandom-mating practices are common in the area, with cultural beliefs that support polygamy and patrilocal behaviors that perpetuate specific male lines within the region. In addition, consanguineous marriages, particularly among first cousins, are common in the Middle East due to Muslim tradition. This form of inbreeding can serve to propagate a specific patrilineage. Although a combination of these processes probably played a part in forming the Y-haplogroup substructure seen in Yemen, based on the regional autosomal STR analysis, it is likely that inbreeding may have been a significant contributing factor.

A study performed within Sana’a City, Yemen revealed a incidence of consanguinity of 44.7%, with first-cousin marriages comprising 71.6%, and an average coefficient of inbreeding (the probability of an individual having two alleles identical by descent at a given locus) of 0.02442,59 almost double that of the Egyptian population (0.01)60 and four times that of the Turkish population (0.0064532).61 Similar studies conducted in Qatar indicate a rate of consanguinity of 54.0% (first cousin marriages accounting for 34.8%) and a coefficient of inbreeding of 0.02706,62 whereas comparable consanguinity values were observed in UAE (50.5%)63 and Oman (35.9%).64 These figures are representative of the region as a whole where consanguineous marriages are prevalent (28.96% in Egypt,60 33% in Syria,65 51.2–54.4% in Jordan,66, 67 57.7% in Saudi Arabia68 and 54.4% in Kuwait69).

It is significant that in spite of these characteristics, which tend to temper genetic diversity, high Y-chromosome haplogroup variability is exhibited in the Gulf of Oman coastal crescent. Patrilineal systems, polygamy and consanguinity are forces that will favor limited diversity along the lines of what is seen in Yemen. It is likely that the region's continued critical role in trade has rendered it an important point of contact between populations and a target of attacks in attempts to gain control of trade from the Persian Gulf. Furthermore, Oman's role in the East African slave trade has been well documented and supported by previous studies4, 5, 54 and may account, at least partially, for the greater diversity it displays.

Conclusion

A comparison of Y-haplogroup substructure of the populations surrounding the Gulf of Oman reveals similarities among them with detectable clines in haplogroup frequencies. This can be attributed to the existence at different times of a coastal corridor along the Gulf of Oman that may have facilitated dispersals into and out of the area. Chromosomes like E3b1c-M123 support archaeological data linking the Fertile Crescent with trading cities along the Persian Gulf, whereas derivatives of E3b1-M35 point to a Neolithic arrival to southern Arabia via the Levant. The limited variability seen in Yemen (and to some extent Qatar) does not mirror the diversity observed in the coastal populations of UAE, Oman, South Iran and South Pakistan. An analysis of heterozygosity using hypervariable autosomal STR loci indicates that both Yemen and Qatar display a deficiency in observed heterozygosity that may be affected to some extent by high rates of consanguineous marriages in the region. In addition, a string of relatively recent events may have maintained Oman and UAE in close contact with other cultures, including attempts to gain control of the Persian Gulf and Oman's involvement in the East African slave trade.