European Journal of Human Genetics (2010) 18, 348–353; doi:10.1038/ejhg.2009.166; published online 14 October 2009

The emergence of Y-chromosome haplogroup J1e among Arabic-speaking populations

Jacques Chiaroni1, Roy J King2, Natalie M Myres3, Brenna M Henn4, Axel Ducourneau1, Michael J Mitchell5, Gilles Boetsch1, Issa Sheikha6, Alice A Lin2, Mahnoosh Nik-Ahd2, Jabeen Ahmad2, Francesca Lattanzi7, Rene J Herrera8, Muntaser E Ibrahim9, Aaron Brody10, Ornella Semino11, Toomas Kivisild12 and Peter A Underhill2

  1. 1UMR6578 (Université de la Méditerranée, CNRS, EFS) Anthropologie bio-culturelle, Faculté de médecine de Marseille, Marseille, France
  2. 2Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA
  3. 3Sorenson Molecular Genealogy Foundation, Salt Lake City, UT, USA
  4. 423andMe, Inc., Mountain View, CA, USA
  5. 5Génétique moléculaire de la spermatogenèse, Inserm UMR 910, Faculté de médecine, Marseille, France
  6. 6Blood Center, University Hospital of Damas, Damascus, Syria
  7. 7Department of Anthropology, University of California, Berkeley, CA, USA
  8. 8Department of Human and Molecular Genetics, College of Medicine, Florida International University, Miami, FL, USA
  9. 9Department of Molecular Biology, Institute of Endemic Diseases, University of Khartoum, Sudan
  10. 10Pacific School of Religion, Berkeley, CA, USA
  11. 11Dipartimento di Genetica e Microbiologia, Università di Pavia, Pavia, Italy
  12. 12Leverhulme Centre for Human Evolutionary Studies, University of Cambridge, Cambridge, UK

Correspondence: Dr J Chiaroni, UMR6578 (Université de la Méditerranéen CNRS, EFS) Anthropologie bio-culturelle, Faculté de médecine de Marseille, France. Tel: +33 4911 89581; Fax: +33 4911 89598; E-mail:

Received 7 May 2009; Revised 21 July 2009; Accepted 2 August 2009; Published online 14 October 2009.



Haplogroup J1 is a prevalent Y-chromosome lineage within the Near East. We report the frequency and YSTR diversity data for its major sub-clade (J1e). The overall expansion time estimated from 453 chromosomes is 10000 years. Moreover, the previously described J1 (DYS388=13) chromosomes, frequently found in the Caucasus and eastern Anatolian populations, were ancestral to J1e and displayed an expansion time of 9000 years. For J1e, the Zagros/Taurus mountain region displays the highest haplotype diversity, although the J1e frequency increases toward the peripheral Arabian Peninsula. The southerly pattern of decreasing expansion time estimates is consistent with the serial drift and founder effect processes. The first such migration is predicted to have occurred at the onset of the Neolithic, and accordingly J1e parallels the establishment of rain-fed agriculture and semi-nomadic herders throughout the Fertile Crescent. Subsequently, J1e lineages might have been involved in episodes of the expansion of pastoralists into arid habitats coinciding with the spread of Arabic and other Semitic-speaking populations.


Y-chromosome haplogroup J1e, Neolithic, Arabic languages, pastoralism



Tracing the origin and expansion of pastoral nomadism in the Middle East has widespread significance for understanding the development of the civilizations of the ancient Near East and the spread of the Semitic languages throughout the Levant, the Arabian Peninsula and Mesopotamia. Y-chromosome analyses of modern populations of the Middle East can contribute to the delineation of the demographic and migration processes in this region. The predominant categories of Y chromosomes in this region are varieties associated with haplogroup J-M304. This haplogroup essentially bifurcates into two main sub-clades, J1-M267 and J2-M172.1

Previous studies of J1-M2672, 3, 4, 5, 6, 7 have found it to occur at high frequencies among the Arabic-speaking populations of the Middle East, conventionally interpreted as reflecting the spread of Islam in the first millennium CE.8 However, before the middle first millennium CE, a variety of Semitic languages were spoken throughout the Middle East. Recently, historical linguists9 have constructed novel classification trees of the Semitic languages in which the first split from the root of Proto-Semitic separated into East Semitic (Akkadian, Assyrian, Babylonian and Eblaite) and West Semitic. West Semitic then partitions into Ethiopic, Modern South Arabian (spoken in areas of Oman and Yemen) and the core cluster of Central Semitic. Central Semitic would then include the languages of Yemen (Old South Arabian), Arabic and the Northwest Semitic languages of the Levant – Ugaritic, Hebrew, Phoenician and Aramaic.9, 10, 11, 12 Not only have linguists reconstructed the phylogeny of Semitic languages, but also they have dated Proto-Semitic's age to the Chalcolithic Era, circa 5500-3500 BCE.13 In addition to the common Semitic language substrate found throughout the Levant and Arabian Peninsula, recent archeological studies have shown an early presence (ca. 6000–7000 BCE) of domesticated herding in the arid steppe desert regions.14

We recently showed an inverse correlation between J1-M267 frequency and mean annual rainfall in the Middle East populations.15 This finding was interpreted as a founder effect associated with small groups of Neolithic herder–hunters moving into the arid regions of the Arabian Peninsula with a pastoral economy, whereas another ancestral population with a closely associated sister clade, J2a-M410, remained mainly in the regions of the Fertile Crescent that had sufficient rainfall to support a Neolithic farming economy. Although humidity levels fluctuated during the Holocene, the present climatic regime in Arabia was established ~5000 years ago.16 Marginal habitats such as desert regions that were plausibly colonized by a few founders result not only in reduced genetic diversity but may also reduce linguistic diversity as evidenced by the broad geographical footprint of the Arabic language in the arid regions of the Middle East.

Although considerable sub-haplogroup diversification has been previously described within the J2-M172 clade,17 the occurrence of J1-M267 affiliated subtypes at frequencies exceeding a few percentage has not yet been reported.18 Here, we present the phylogeographical and haplotype diversity data from a major sub-clade of J1-M267 that is defined by the J1e-Page08 (aka P58) SNP.19, 20

Hereafter, we shall refer to this major sub-clade as simply J1e.


Materials and methods

The nomenclature used for haplogroup labeling is in agreement with YCC conventions and a recent update.16 All samples designated as haplogroup J1 were determined to be derived at M267. Chromosomes labeled as J1* are J1(xJ1e). Our study involves a total of 553 haplogroup J1 samples involving 38 populations (Supplementary Table 1). These distribute to 494 J1e-derived and 59 J1* samples. The majority of the samples were experimentally analyzed for the haplogroup J1e-defining SNP by either RFLP or DHPLC methodology, except for 55 reported as being of J1 membership from the Sudanese from Khartoum; Amhara from Addis Ababa, Ethiopia; and Iraqis from Nassiriya.18 These were inferred to belong to J1e based on companion YSTR haplotype data. The criteria to deduce J1e status involved the filter of DYS388 ≥15 repeats and YCAII A, B allele sizes of either 19, 22 or 22, 22. The haplotype data used in our analyses are given in Supplementary Table 2. The following eight loci, DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393 and DYS439, were used to estimate expansion times using the methodology described by Zhivotovsky et al.21 as modified according to Sengupta et al.17 A microsatellite evolutionary effective mutation rate of 6.9 × 10–4 was used. Networks were constructed by the median joining method using Network, where ε=0 and microsatellite loci were weighted proportionally to the inverse of the repeat variance observed in each haplogroup.22 For J1* chromosomes, the network included the following nine loci: DYS19, DYS388, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393 and DYS439. With the exclusion of DYS388, the same eight STR loci were also used to construct a network for haplogroup J1e-affiliated chromosomes.


Results and discussion

Figure 1a shows the geographical location of populations included in this study. J1* chromosomes have their maximal frequency in the Taurus and Zagros mountain regions of Eastern Anatolia, Northern Iraq and Western Iran (Figure 1c). It is noted that the J1* chromosomes frequently appear in combination with the 12 or 13 repeat pattern at DYS388, whereas the J1e chromosomes almost always display 15 or more repeats. Therefore, the J1e SNP information supports the previous inference that J1 chromosomes linked with DYS388=13 repeats share a common ancestry.1 Network analysis of J1* chromosomes (Figure 2a) show a bifurcating substructure. One cluster is associated with DYS388=15 and DYS390 >23 repeats and the other cluster with DYS388=13 repeats. The locale of highest J1* frequency occurs in the vicinity of eastern Anatolia (Figure 1c). Both J1* and J1e occur in Sudan and Ethiopia (Supplementary Table 1). Our data show that the YCAII 22-22 allele state is closely associated with J1e (Supplementary Table 2). Interestingly, in Ethiopia, all Cushitic Oromo and ~29% of Semitic Amharic J1 chromosomes are J1*.

Figure 1.
Figure 1 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact or the author

(a) Red symbols indicate the geographical locations of 36 populations analyzed. (b) Interpolated spatial contours of annual precipitation (mm) distribution. (c) Interpolated J1* frequency spatial distribution. (d) Interpolated J1e frequency spatial distribution. (e) Interpolated J1e mean haplotype variance spatial distribution. (f) Construed trajectories of J1e lineage spread episodes. In red are delineated the initial Holocene migrations from the Taurus/Zagros Mountains to the Arabian Peninsula. Shown with black arrows are the subsequent expansions of Arabic populations in Arabia beginning in the Bronze Age.

Full figure and legend (234K)

Figure 2.
Figure 2 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact or the author

(a) Median-joining network for J1* using the nine-locus Y-STR haplotypes. Networks were weighted according to Qamar et al.22 Loci analyzed included DYS19, DYS388, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393 and DYS439. (b) Median-joining network for eight-locus (excluding DYS388) Y-STR haplotypes for J1e.

Full figure and legend (202K)

Table 1 shows the average variance and expansion times of J1e with their linguistic and archeological correlates from those populations with five or more samples; the Assyrians of Syria, Iraq, Turkey and Iran were amalgamated into one group and the Arab populations of Qatar, UAE and Saudi Arabia were also combined. The mean variance across the 19 populations in Table 1 correlates significantly with latitude (r=0.36, P<0.035, two-tailed Kendall's τ) and nonsignificantly with longitude (r=0.02, NS). This result supports the hypothesis that the origin of J1e is likely in the more northerly populations in Table 1 and spreads southward into the Arabian Peninsula (Figure 1f). The high YSTR variance of J1e in Turks and Syrians (Table 1, Figure 1e) supports the inference of an origin of J1e in nearby eastern Anatolia. Moreover, the network analysis of J1e haplotypes (Figure 2b) shows that some of the populations with low diversity, such as Bedouins from Israel, Qatar, Sudan and UAE, are tightly clustered near high-frequency haplotypes suggesting founder effects with star burst expansion in the Arabian Desert.

The series of expansion times (Table 1) is also consistent with a subsequent Neolithic range expansion of J1e from a geographical zone, including northeast Syria, northern Iraq and eastern Turkey toward Mediterranean Anatolia, Ismaili from southern Syria, Jordan, Palestine and northern Egypt. Although there is a trend between the mean variances and the expansion time estimates, the latter do not uniformly increase with variance (Table 1) as some populations likely have more than one J1e founder. Support for this explanation involves cases in which there is the presence of two distinct varieties of YCAII chromosomes, namely, 19, 22 and 22, 22, whereas those with low mean diversity typically just reflect the 22, 22 class (Supplementary Table 2). A network analysis of J1e chromosomes (Figure 2b) also reflects situations of multiple founders.

Although the haplogroup diversification within J1e remains incomplete, the somewhat rare J1e1-M368 provides an insight into the geographical origin of J1e. It has been reported both in the Black Sea region of Turkey1 and Dagestan in the northeast Caucasus.18 Furthermore, J1e1-M368 displays the YCAII 19-22 pattern. Although the haplogroup relationships of YCAII alleles are unstable, nevertheless in the context of haplogroup J1, they are suggestive that the prevalent YCAII 22-22 variety may have evolved from a YCAII 19-22 ancestor.

Table 1 lists the current languages and the first millennium BCE Iron Age languages spoken in the geographical regions from which the samples were collected. Tracking back to the Iron Age, all the branches of the Central Semitic languages are represented – NW Semitic, Arabic and Old South Arabian in the Levantine and Yemeni sampling regions. The Assyrian samples and Iraqi Kurdish samples have been drawn from areas in Northern Mesopotamia speaking East Semitic languages at the time. The current data suggest an origin of J1e in the general area of eastern Turkey/northern Iraq associated with the Zarzian horizon,23, 24, 25 as they have similar early pre-agricultural expansions (16 kya, Table 1).

The timing and geographical distribution of J1e is representative of a demic expansion of agriculturalists and herder–hunters from the Pre-Pottery Neolithic B to the late Neolithic era.24, 26 The higher variances observed in Oman, Yemen and Ethiopia suggest either sampling variability and/or demographic complexity associated with multiple founders and multiple migrations. The expansion time associated with Yemen is somewhat older (7000 BCE) and may reflect a migration of herders into southern Arabia.27 Finally, the more recent expansion times (Table 1) observed in Arabs from the Arabian Peninsula, Negev Bedouins and Sunni Arabs from Hama, Syria, are consistent with a subsequent Chalcolithic/Early Bronze Age (3000–5000 BCE) advance of J1e to the Arab populations of Arabia from near the early attested Arabian-speaking area of Tayma in north central Arabia28, 29 (Figure 1f).

A comparison of the mean annual rainfall and spatial frequency distribution of J1e (Figures 1b and d respectively) indicates J1e peaks in the arid regions of the Arabian Peninsula. We performed a nonparametric Mann–Whitney test to address the hypothesis: is the frequency of J1e higher in arid regions (≤300mm) compared with regions with more rainfall in our sample set of African and Near Eastern populations? We found that the frequency of J1e was significantly greater in the arid than in the non-arid populations (P=0.0035). By combining all the arid populations (Supplementary Table 1) into one sample (n=16), we circumvented the details of the geographic frequency distribution, such that the J1e frequency pattern was examined primarily with regard to precipitation rather than geography, although the two are correlated.

Although most post-Last Glacial Maximum recolonization events have a typically northward signature,30, 31 our J1e results provide an example of a southward spread during the early Holocene. Although J1e is one of the most frequent haplogroups in the region, haplogroup E-M123 also shows its highest frequency and haplotype diversity in regions of the Fertile Crescent, decreasing toward the Arabian Peninsula.1, 2, 6 This co-distribution pattern of Y-chromosome haplogroups J1e and E-M123 resembles mtDNA haplogroups J1b and (PreHV)1 distributions that also display low levels of diversity despite their high frequency in Saudi Arabia.32, 33

Although on a broad scale the haplogroup J1e frequency distribution and expansion times are consistent with the model that it tracks a possible expansion of Neolithic agro-pastoralists from the Fertile Crescent into the arid Arabian Peninsula, several caveats must be considered. First, the patchy distribution of J1e frequency in the Levant (Syria, Jordan, Israel and Palestine) may reflect the complex demographic dynamics of religion and ethnicity in the region. Second, even though the highest YSTR variance of J1e lineages is in eastern Anatolia, northern Iraq and northwest Iran, one cannot entirely rule out recent admixture as a contribution to the high variance among ethnic Assyrians.

A recent Bayesian analysis of Semitic languages supports an origin in the Levant 5750 years ago and subsequent arrival in the Horn of Africa from Arabia 2800 years ago,11 thus providing an indirect support of our phylogenetic clock estimates. It is important to note that the glottochronological dates yield estimates for the break-up and expansion of the Proto-Semitic language. Proto-Semitic, itself, may have been spoken in a localized linguistic community for millennia before its bifurcation into the East and West Semitic branches. In summary, haplogroup J1e data suggest an advance of the Neolithic period agriculturalists/pastoralists into the arid regions of Arabia from the Fertile Crescent and support an association with a Semitic linguistic common denominator.14


Conflict of interest

The authors declare no conflict of interest.



  1. Cinnioğlu C, King R, Kivisild T et al: Excavating Y-chromosome haplotype strata in Anatolia. Hum Genet 2004; 114: 127–148. | Article | PubMed | ISI
  2. Semino O, Mari C, Benuzzi G et al: Origin, diffusion and differentiation of Y-chromosome haplogroups E and J: inferences on the Neolithization of Europe and later migratory events in the Mediterranean area. Am J of Hum Genet 2004; 74: 1023–1034. | Article | ChemPort |
  3. Luis JR, Rowold DW, Regueiro M et al: The Levantine versus the Horn of Africa: evidence for bi-directional corridors of human migrations. Am J of Hum Genet 2004; 74: 532–544. | Article | ChemPort |
  4. Di Giacomo F, Luca F, Popa LO et al: Y chromosomal haplogroup J as a signature of the post-neolithic colonization of Europe. Hum Genet 2004; 115: 357–371. | Article | PubMed | ChemPort |
  5. Arredi B, Poloni ES, Paracchini S et al: A predominantly Neolithic origin for Y-chromosomal DNA variation in North Africa. Am J of Hum Genet 2004; 75: 338–345. | Article | ChemPort |
  6. Cadenas AM, Zhivotovsky LA, Cavalli-Sforza LL, Underhill PA, Herrera RJ: Y-chromosome diversity characterizes the Gulf of Oman. Eur J Hum Genet 2008; 16: 374–386. | Article | PubMed | ChemPort |
  7. Zalloua PA, Xue Y, Khalife J et al: Y-chromosomal diversity in Lebanon is structured by recent historical events. Am J Hum Genet 2008; 82: 873–882. | Article | PubMed | ChemPort |
  8. Nebel A, Landau-Tasseron E, Filon D, Oppenheim A, Faerman M: Genetic evidence for the expansion of Arabian tribes into the Southern Levant and North Africa. Am J Hum Genet 2002; 70: 1594–1596. | Article | PubMed | ChemPort |
  9. Huehnergard J: Comparative semitic linguistics; in Shlomo Izre'el (ed): Semitic Linguistics: The State of the Art at the Turn of the Twenty-First Century. Winona Lake, IN: Eisenbrauns Inc, 2002, pp 119–150.
  10. Fox J: Semitic Noun Patterns. Harvard Semitic Studies vol 52. Winona Lake, IN: Eisenbrauns Inc, 2003.
  11. Kitchen A, Ehret C, Assefa S, Mulligan CJ: Bayesian phylogenetic analysis of Semitic languages identifies an early Bronze Age origin of Semitic in the Near East. Proc Biol Sci 2009; 276: 2703–2710. | Article | PubMed
  12. Rubin AD: The subgrouping of the Semitic language. Lang Linguist Compass 2008; 2: 79–102. | Article
  13. Diakonoff I: The Earliest Semitic Society: linguistic data. J Semitic Studies 1998; 43: 209–219. | Article
  14. Zarins J: Early pastoral nomadism and the settlement of lower Mesopotamia. Bull Am Schools Orient Res 1990; 280: 31–65. | Article
  15. Chiaroni J, King R, Underhill P: Correlation of annual precipitation with human Y-chromosome diversity and the emergence of Neolithic agricultural and pastoral economies in the Fertile Crescent. Antiquity 2008; 82: 281–289.
  16. Parker AG, Eckersley L, Smith MM et al: Holocene vegetation dynamics in the northeastern Rub al-’Khali, Arabian Peninsula, a phytolith, pollen, and carbon isotope study. J Quaternary Sci 2004; 19: 665–676. | Article
  17. Sengupta S, Zhivotosky LA, King R et al: Polarity and temporality of high resolution Y-chromosome distributions in India identify both indigenous and exogenous expansions and reveal minor genetic influence of Central Asian pastoralists. Am J of Hum Genet 2006; 78: 202–221. | Article | ChemPort |
  18. Tofanelli S, Ferri G, Bulayeva K et al: J1-M267 Y lineage marks climate-driven pre-historical human displacements. Eur J Hum Genet 2009. doi:10.1038/ejhg.2009.58.
  19. Repping S, Van Daalen SK, Brown LG et al: High mutation rates have driven extensive structural polymorphism among human Y chromosomes. Nat Genet 2006; 38: 463–467. | Article | PubMed | ISI | ChemPort |
  20. Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL, Hammer MF: New binary polymorphisms reshape and increase resolution of the human Y-chromosomal haplogroup tree. Genome Res 2008; 18: 830–838. | Article | PubMed | ChemPort |
  21. Zhivotovsky LA, Underhill PA, Cinnioglu C et al: On the effective mutation rate at Y-chromosome STRs with application to human population divergence time. Am J Hum Genet 2004; 74: 54–61. | Article
  22. Qamar R, Ayub Q, Mohyuddin A et al: Y-chromosomal DNA variation in Pakistan. Am J Hum Genet 2002; 70: 1107–1124. | Article | PubMed | ISI | ChemPort |
  23. Kozlowski S: The Eastern Wing of the Fertile Crescent: Late Prehistory of Greater Mesopotamian Lithic Industries. Oxford: Oxford Archaeopress, 1999.
  24. Cauvin J: The Birth of the Gods and the Origins of Agriculture. Cambridge: Cambridge University Press, 2000.
  25. Bar-Yosef O: The Natufian Culture in the Levant, threshold to the origins of agriculture. Evol Anthropol 1998; 6: 159–177. | Article | ISI
  26. Bellwood P: First Farmers: Origins of Agricultural Societies. Oxford: Oxford Blackwell Publishing Ltd, 2005.
  27. McCorriston J: Breaking the Rain Barrier and the Tropical Spread of Near Eastern Agriculture into Southern Arabia; in Kennett D, Winterhalder B (eds): Behavioral Ecology and the Transition to Agriculture. Berkeley, CA: University of California Press, 2006, pp 217–236.
  28. Parr PJ: s.v. Tayma'; in E. M. Meyers (ed): The Oxford Encyclopedia of Archaeology in the Near East. New York/Oxford: Oxford University Press, 1997, pp 160–161.
  29. Byrne R: Early Assyrian contacts with Arabs and the impact on Levantine Vassal tribute. BASOR 2003; 331: 11–25.
  30. Hewitt G: The genetic legacy of the Quaternary ice ages. Nature 2000; 405: 907–913. | Article | PubMed | ISI | ChemPort |
  31. Torroni A, Bandelt HJ, Macaulay V et al: A signal, from human mtDNA, of Postglacial Recolonization in Europe. Am J Hum Genet 2001; 69: 844–852. | Article | PubMed | ChemPort |
  32. Abu-Amero KK, González AM, Larruga JM, Bosley TM, Cabrera VM: Eurasian and African mitochondrial DNA influences in the Saudi Arabian population. BMC Evol Biol 2007; 7: 32–47. | Article | PubMed | ChemPort |
  33. Abu-Amero KK, Larruga JM, Cabrera VM, González AM: Mitochondrial DNA structure in the Arabian Peninsula. BMC Evol Biol 2008; 8: 45–60. | Article | PubMed | ChemPort |
  34. Cann HM, de Toma C, Cazes L et al: A human genome diversity cell panel. Science 2002; 296: 261–262. | Article | PubMed | ISI | ChemPort |
  35. Regueiro M, Cadenas AM, Gayden T, Underhill PA, Herrera RJ: Iran: tricontinental nexus for Y-chromosome driven migration. Hum Hered 2006; 61: 132–143. | Article | PubMed | ISI | ChemPort |
  36. Shlush LI, Behar DM, Yudkovsky G et al: The Druze: a population genetic refugium of the Near East. PLoS One 2008; 3: e2105. | Article | PubMed | ChemPort |
  37. Hassan HY, Underhill PA, Cavalli-Sforza LL, Ibrahim ME: Y-chromosome variation among Sudanese: restricted gene flow, concordance with language, geography and history. Am J Phys Anthro 2008; 137: 316–323. | Article


We thank A Silvana Santachiara-Benerecetti for providing the Ethiopian samples to OS. We thank Golden Software, Inc, for assistance with the spatial frequency maps. This work was supported by a grant from the France-Stanford Center for Interdisciplinary Studies to RK and OHLL, Origine de l’Homme des Langues et du Language to JC.

Supplementary Information accompanies the paper on European Journal of Human Genetics website (



These links to content published by NPG are automatically generated