Human Y chromosomes bearing the M267*G variant (defining haplogroup J1) are distributed over a vast area comprising Europe, South-western Asia, the Arabian peninsula, North and East Africa. Eight downstream SNPs have been identified so far along the J1 genealogy,1 none of which reaches appreciable frequencies in any population. Many authors have proposed STR-based motifs to trace the genealogies of pre-historic or ethno-religious ancestries. Examples are the Dys388*13 allele associated with early neolithic agro–pastoral cultures (King RJ and Underhill P, personal communication); the Galilee and the Dys388*17/YCAIIa/b*22–22 motifs for an Arab ancestry,2, 3 the Cohanim 6-locus motif to link the descendants of a Jewish priesthood.4 However, a wide range of times since the most recent common ancestor (TMRCAs) has been proposed for J1 and its subclades (between 36 and 10 KyBP), and different conflicting scenarios have been depicted to explain their current distribution.3, 5, 6, 7, 8, 9

Materials and methods

We surveyed the variation at 20 STR loci and at 6 SNPs in 282 J1 Y chromosomes of native unrelated donors from 29 populations. Ethnic data, genotyping protocols, quality standards, details of data analyses and haplotypes are provided as supplemental material (Supplementary Tables S1–6).

Results and discussion

A fine-grained map of the present day distribution of J1 chromosomes is given in Figure 1. The pattern is uneven, as is typical of Y lineages with a very deep genealogy and low-size demes. Frequency peaks over 50% of the whole binary variation are present in Arabia (Yemen, Qatar), Northern Caucasus (Dagestan), Sudan and in Negev Bedouins (Supplementary Table S1). Frequency is inversely correlated to haplotype diversity (R2=0.387, P<0.001, Supplementary Table S6), with Near Easterners showing the highest diversity, Dagestanians and Arabic Sudanese the lowest. No major J1 sublineage was defined by genotyped SNPs (Supplementary Table S1) confirming the need for future research efforts in this direction. Nevertheless, in the Amhara from Ethiopia, we found the very first case of a M368(xM367) chromosome, which supports the insertion of the paragroup J1e1* in the latest Y haplogroup phylogeny.1

Figure 1
figure 1

Contour map showing the present day distribution of J1 and J*(xJ2) chromosomes. Gridding was carried out starting from 336 frequency points (Supplementary Table S3) with SURFIT 2.1 ( Spatial surfaces were computed using GMT 4.3.1 ( Methodological details are available on request.

With the exception of the rare Palestinian modal haplotype,10 none of the previously described STR motifs resulted equal by descent, as they were found across ethnic groups with different cultural or geographic affiliation and in other lineages (J2, I*) than J1. Such results make their use to trace ancestries of individuals or communities (ie, Arab or Jewish) inconclusive. Calculations under the coalescent model for J1 haplotypes bearing the Cohanim motif gave time estimates that place the origin of this genealogy around 6.2 Kybp (95% CI: 4.5–8.6 Kybp), earlier than previously thought,4 and well before the origin of Judaism (David Kingdom, 2.0 Kybp).

Mismatch and multivariate analyses (Table 1, Figure 2) both pointed to common features for the Y chromosomes of Arabic speakers from Maghrib, Sudan, Iraq and Qatar (the Arabic pool). They show low diversity values, narrow mismatch curves with mode at 5–6 mutational steps and proximity at one side of the multidimensional genetic space. Opposite features were observed in a heterogeneous group, including Europeans, Kurds, Iranians and Ethiopians (the Eurasian pool); they show high haplotype diversity, are characterized by ragged mismatch curves with modes in the 11–16 range and cluster at the centre of the MDS plot. Omanis show a mix of Eurasian pool-like and typical Arabic haplotypes as expected, considering the role of corridor played at different times by the Gulf of Oman in the dispersal of Asian and East African genes.7

Table 1 Descriptive and inferential statistics calculated for 20-locus haplotypes on 282 J1-M267 Y chromosomes
Figure 2
figure 2

MDS plot of pairwise FST distances among 18–locus haplotypes (alleles at duplicated loci Dys385a/b and YCAIIa/b were pooled). Stress value (0.07775) denotes a statistically significant departure from random structure.13 Dots colour: light blue=Dagestan groups, black=Arabian groups, Grey=Maghrebian groups, Purple=Sudanese groups, Green=European groups, Orange=SW Asian groups, Yellow=Ethiopian groups. Dots’ shape: squares=Arabic-speaking groups; circles=Indo-European-speaking groups; diamonds=North-Caucasian-speaking groups; triangle=Semitic (non-Arabic)-speaking groups (the colour reproduction of this figure is available on the full text version of the manuscript).

We wondered whether clustering and similarities among mismatch curves in the Arabic pool reflect shared evolutionary history, following the hypothesis of a diffusion of J1 chromosomes mediated by the spread of Islam since 650 AD.2, 3, 9 To investigate this aspect in more detail, we compared the haplotype genealogy of the Eurasian and Arabic pools by using Median-joining networks constructed as described9 (Figure 3). The genealogy of the Arabic pool shows a star-like pattern with no geographic structuring. This feature supports a demic expansion from ancestral haplotypes currently shared by Maghrebians and Arabians and subsequent migrations. The Eurasian genealogy is deeper and suggests a longer evolution under constant size. Accordingly, we assigned priors under different size models while applying a Bayesian approach14 to estimate coalescence times for samples’ genealogies (Table 1). Results for Arabic populations and associated STR motifs (Galilee, Dys388*17/YCAII*22–22) excluded the timeline of the Arab expansion (1.35 KyBP), even from their lower confidence bounds, and pointed to a mid-Holocene time frame of 5.5–7.2 KyBP (median TMRCAs). This time window is related to a pre-historic phase of regionalisation in the human occupation of Sahara and Arabia, when semi-nomadic tribes, once diffused all over the Desert, retreated in water-rich refuges (ie, the Atlas range,15 the Sudanese plateau,16 Southern Arabia17) as a consequence of the rapid decline of monsoon rainfalls. In Eastern Sahara, it is associated with the rise of a dual productive economy, where specialised cattle pastoralism came to coexist with sedentary lifestyles, cereal farming and pottery production, clearly rooted in near East traditions. The genetic legacy of the mid-Holocene dispersal of foraging groups in the Sudanese Sahara, North Africa and Arabia would be tracked by Arabic J1-M267 chromosomes while the dispersal of agro–pastoralists with near eastern origins by other Y (E1b-M34 and E1b-M7818) or mitochondrial (U6b19) lineages.

Figure 3
figure 3

Network of 20-locus J1 haplotypes. (a) Arabic pool genealogy, (b) Eurasian pool genealogy. Area is proportional to frequency (the colour reproduction of this figure is available on the full text version of the manuscript).

As regards chromosomes bearing alleles Dys388*13 and YCAIIa*19, which are common in populations of the Eurasian pool and in Northern Caucasians, we, in general, obtained Late Pleistocene coalescence times – around 10.1–10.9 KyBP as average median values. These time estimates, summed to the high variation and wide distribution of these subclades, are consistent with the episodes of spatial re-expansion that occurred after the Last Glacial Maximum in the northern hemisphere, the latest being triggered by the end of the Younger Dryas event (12.9–11.6 KyBP20). The different pattern observed (namely, frequency peaks in the Caucasus and Anatolia for Dys388*13, in Ethiopia and south-western Asia with the absence of haplotypes of the Arabic pool for YCAIIa*19) does not deviate from random expectations of frequency shifts under an extended Wright–Fisher model (P0.05).

To resume, our results clearly reject the scenario put forward so far of a strict correlation between the Arab expansion in historical times and the overall pattern of distribution of J1-related chromosomes. Similarly, the causal association between STR-defined haplotypes and ethnic groups appear without any robust support, making its use inadequate for forensic or genealogical purposes. Instead, J1 variation provided the genetic background to correlate climatic changes to human demographic and socio-cultural events scarcely documented in the archaeological record – the dispersal of hunter gatherers after the termination of glacial conditions in the late Pleistocene and the desertification-driven retreat of tribes of Saharan and Arabian foragers in the transition to a food-producing economy.