Introduction

Europe was colonized by modern humans about 40 000 years ago and underwent a second colonization wave during the Neolithic, with the spread of farming.1, 2 The relative Palaeolithic and Neolithic contribution to the current European gene pool has been widely debated and is still under discussion.3, 4, 5, 6 Two opposing models have been cited to account for the spread of farming in Europe: the demic diffusion model, which implies a movement of people and therefore a significant Near Eastern genetic input,7 and the cultural diffusion model, which, on the contrary, considers the transition to agriculture as a cultural phenomenon, and therefore without major changes at the genetic level.8 Archaeological evidences suggest, however, that the spread of agriculture has been a complex process characterized by migrations and local admixture.9, 10

Genetic studies have described allele frequency clines for different markers along the European colonization routes.11, 12, 13 These have been interpreted in favour of demic diffusion and considered a strong indication of a Neolithic contribution to the modern European gene pool.14, 15 However, a number of simulation studies have demonstrated that allele frequency clines can also occur in range expansions where admixture is not present.16, 17, 18, 19 The impact of the Neolithic in the Balkans, as archaeological records show, is regarded to be considerable,20 although Mesolithic hunter-gatherers were present in this region just before the first appearance of Neolithic pottery.21 Thus, the question to address is the following: can we detect present-day signals of interactions between indigenous Mesolithic people and agricultural colonists in the southeast European gene pool? In this regard, particularly useful are Y-chromosome markers, whose distribution is often highly correlated with their phylogeny.22

Interestingly, the Y-chromosome gene pool of southeast European populations is characterized both by ‘autochthonous’ markers, such as haplogroup (Hg) I, present in the Balkans in pre-Neolithic times,23, 24 and by markers mainly belonging to Hgs E and J, which originated outside Europe, in Africa and the Middle East, respectively.12, 13, 25, 26, 27, 28 In addition, new Y-chromosome polymorphisms have added further sophistication to phylogenetic relationships, especially within Hgs E, J and I,29, 30, 31, 32 providing the opportunity to more fully evaluate the above issue. This possibility induced us to carry on a deeper genetic structure characterization of the Balkan area through the analysis of 80 Y-chromosome bi-allelic markers and 12 linked STR loci in 1206 subjects from 17 population samples mainly from southeast Europe.

Materials and methods

Samples

The sample consists of 1206 unrelated male individuals from 17 population samples (Figure 1). Two-hundred and thirty-five of these, namely 64 Albanians from Former Yugoslavia Republic of Macedonia, 29 Croatians from Osijek, 75 Slovenians and 67 northeast Italians (from the province of Trento), are reported here for the first time. The remaining include samples reported earlier,23, 27, 33 and consist of 104 Caucasians (38 Balkarians and 66 Georgians), 149 Greeks (92 from Athens and 57 from Macedonia), 55 Albanians (collected in Tirana), 89 Croatians, 99 Polish, 75 Czechs, 92 Ukrainians, 53 Hungarians and 255 Bosnia-Herzegovinians (84 Bosniacs, 90 Croats and 81 Serbs). Blood samples were collected from healthy unrelated adults after obtaining informed consent. DNA was extracted from whole blood according to the standard phenol/chloroform procedure, followed by ethanol precipitation.

Figure 1
figure 1

Geographic location of the studied samples: 1, Greeks; 2, Macedonian Greeks; 3, Albanians; 4, Albanians from Former Yugoslavia Republic of Macedonia (FYROM); 5, Bosniacs; 6, Bosnia-Croats; 7, Bosnia-Serbs; 8, Croats; 9, Croats from Osijek; 10, Slovenians; 11, northeast Italians; 12, Hungarians; 13, Czechs; 14, Poles; 15, Ukrainians; 16, Georgians; and 17, Balkarians.

In addition, P37.2* samples identified from a screening of the Sorenson Molecular Genealogy Foundation collection (over 14 000 Y chromosomes from more than 100 countries) were also included.

Examined bi-allelic markers and microsatellites

Eighty Y-specific unique event polymorphisms (Figure 2) were examined in hierarchical order. Two new mutations were discovered: M507 (AC006376.2: g.148697 T → G) and M521 (AC012068.5: g.5219 C → T). The first is associated with I-M253 and was discovered while typing M227, the second is associated with E-M78* and was ascertained while typing V12.

Figure 2
figure 2

Phylogeny of Y-chromosome haplogroups and their frequencies (%) in the examined populations. Nomenclature and haplogroup labelling according to the Y Chromosome Consortium (http://ycc.biosci.arizona.edu/) updated according to Karafet et al.32 *Paragroups: Y chromosomes not defined by any phylogenetic downstream-reported and -examined mutation. aIntrapopulation haplogroup diversity. The terminal markers of haplogroups E-V12 and E-V13 (V32 and V27, respectively) were typed but did not show any variation.

Bi-allelic markers were analysed by PCR/AFLPs (YAP34 and 12f213), by PCR/RFLPs (P37,35 M9, M269,36 V12, V13, V22, V27, V32 and V6528, 30) or by PCR/DHPLC (M241, M253, M267, M285, M287, M304, M343, M365, M367–M369,29 M410,37 M423, M429, M43831 and M406,38 and those reported in the YCC39).

Nomenclature used for Hg labelling is in agreement with YCC39 conventions and recent updating.31, 32, 38 Subsets of samples belonging to E-V13, J-M241 and I-M423 Hgs were also analysed at 12 STR loci: DYS19, YCAIIa/b, DYS388, DYS389I/II, DYS390, DYS391, DYS392, DYS393, DYS439 and DYS460 by using multiplex reactions according to STRBase information (http://www.cstl.nist.gov/biotech/strbase/y20prim.htm) and ABI PRISM® 3100 DNA Sequencer, internal size standard and GeneScan fragment analysis software. DYS44538 was analysed separately in J-M410 samples.

Statistical analysis

Hg diversity (H) was computed using the standard method of Nei.40 Principal components (PC) analysis was performed on Hg frequencies using Excel software implemented by Xlstat. The relative amount of accumulated diversity, as a function of geography, was evaluated through the mean microsatellite variance estimated for each population with a sample size of at least five individuals. Hg frequency and variance maps were generated by using Surfer Golden software following the Kriging procedure.41 Within specific Hgs, Median-Joining (MJ) networks42 were constructed using Network 4.5.0.0 program (Fluxus Engineering, http://www.fluxus-technology.com). Networks were calculated by the MJ method, where ɛ=0 and microsatellite loci were weighted proportionally to the inverse of the repeat variance observed in each Hg and after having processed the data with the reduced-median method. The age of microsatellite variation within Hgs was evaluated employing the methodology described by Zhivotovsky et al43 as modified according to Sengupta et al.37 A microsatellite evolutionary effective mutation rate of 6.9 × 10−4 was chosen as it is suitable for use in situations where the elapsed time frame is ≥1000 years or 40 generations, clearly appropriate given the prehistoric time depths being explored in this study.44

It is worth mentioning that ambiguities related to past episodes of population history (eg, size fluctuations, bottlenecks, etc.) create inherent uncertainties in the calibration of the YSTR molecular clock; thus the estimated ages of microsatellite variation should be considered with caution.

Results

Figure 2 illustrates the phylogenetic relationships of Y-chromosome Hgs and their distribution in the examined southeast European populations. The main Hgs observed in Europe45 (E, I, J, R1a and R1b) contribute differently to the gene pool of the various East European areas, Hg I and Hg R being the most represented always and Hg E and Hg J being mainly frequent in the southern Balkan populations.

Hg I is restricted to western Eurasia23 and is particularly frequent in the Balkans where it characterizes 36.3% of the total Y chromosomes. Two of its branches, I-M223 and I-M253, are scarcely represented in southeast Europe, the first being only sporadically observed and the second showing frequencies around 5%, with higher values (around 9%) in Macedonian Greeks and Croats. Differently, the recently described M423 SNP,31 which characterizes the previously paraphyletic P37 clade, accounts for the majority (77.2%) of the East European Hg I chromosomes. Its diffusion seems not to have affected the neighbouring North Italian populations, where low incidences (0–2%) are observed. The I-M423 sub-clade is characterized by a frequency distribution with high Central Balkan values (>70% in Bosnia-Herzegovina) and decreasing frequencies moving from the southern Dinaric Alps to northern Croatia. Although I-M423 comprises virtually all the I-P37.2 Balkan-related chromosomes reported earlier,31 we have also detected one I-P37.2* Albanian subject and, from a screening of previously identified P37.2 chromosomes (Rootsi et al23 and SMGF collection), 30 further P37.2* subjects, two from Moldavia23 and 28 of either documented or presumed western European ancestry, were identified.

Hg R1 is common throughout western Eurasia22 and accounts for more than 30% of the Balkan Y-chromosome pool. With the exception of one R1a*-SRY10831.2 and five R1b1-M343* individuals, all the remaining R1 lineages belong to R1a1-M17* and R1b1b2-M269. These two sub-clades, which show in Europe opposite-frequency gradients with maximum incidences in eastern and western regions, respectively, still display high values in northern Balkans and sensibly decrease southward. R-M269 chromosomes are common in the Balkans and Anatolia and, according to the observed internal divergence of their 49a,f branches,29 most likely predated the origin of agriculture. However, the current lack of informative Hg sub-division within these populations (Figure 2) does not allow, at this time, to evaluate the role of R-M269 chromosomes during the transition to agriculture.

E-M35 is the only branch of Hg E observed in this survey. It is prevalently represented by E-M78 chromosomes, almost completely (>90%) belonging to the recently described30 E-V13 sub-clade. Only four E-M78*, which do not belong to any already described sub-clade, have been observed in the southern Balkans. Two of them (from Greece) turned out to be characterized by the mutation M521 and therefore represent a new M78 lineage.

The majority of the Balkan Hg J Y chromosomes belong to the J-M172 sub-Hg and range from 2% to 20%. Both its main branches, J-M410 and J-M12/M102*, were observed; although the first is scattered in different sub-clades (J-M67, J-M92 and J-DYS445-6) with distinct local patterns, the second is most represented by J-M241.

The PC analysis, from the perspective of population Hg frequencies (Figure 3), reveals a tight cluster of populations not comprising southern Balkan and Caucasian groups. Common to this cluster are lower frequencies of Hgs, G-M201 and J-M410, and higher frequencies of Hgs, I-M423, E-V13 and J-M241. Whereas the first two are primarily Middle Eastern Hgs and have been shown to be associated with the early Neolithic colonization of Crete,38, 46 Italy,47, 48 and southern Caucasus, I-M423, E-V13 and J-M241, in spite of parallel Balkan patterns of distribution, have clearly different origins.30, 31, 38 Their comparison can therefore provide insights into the complex interaction between the European Mesolithic foragers and the Middle Eastern Neolithic farmers during the transition to farming society in the Balkans. These Hgs, although characterized by different distribution patterns of frequency and variance (Figure 4), display networks of microsatellite haplotypic variation (Supplementary Figures S1–S3), all consistent with a Balkan expansion.

Figure 3
figure 3

PC analysis performed using haplogroup frequencies in the populations of this study. Gr, Greeks; Mac-Gr, Macedonian Greeks; Alb-A, Albanians; Alb-F, Albanians from FYROM; Sr-B, Bosnia-Serbs; Bs-B, Bosniacs; Cr-B, Bosnia-Croats; Cr-O, Croats from Osijek; Cr-H, Croats of Croatia; Slo, Slovenians; NE-I, northeast Italians; Hun, Hungarians; Cz, Czechs; Pl, Poles; Uk, Ukrainians; Geo, Georgians; Bk, Balkarians. Thirty-four percent of the total variance is represented. Insert illustrates the contribution of each haplogroup.

Figure 4
figure 4

Frequency (left) and variance (right) distributions of the main Y-chromosome haplogroups, I-M423, E-V13 and J-M241, observed in this survey. Frequency data are reported in Figure 2, variance data are relative to the examined microsatellite reported in the Supplementary Table S2. We acknowledge that interpolated spatial frequency surfaces should be viewed with caution because of sample size.41 • Data from this study. Frequency and variance values were assigned to sample-collection places (dots). Population samples (geographically close) with less than five observations were pooled and the corresponding variance assigned to a middle position of the pooled sample locations. +Data from the literature.13, 23, 27, 28, 36, 45, 49, 50, 51, 52, 53, 54

Discussion

Various episodes of population movement have affected southeast Europe, and the role of the Balkans as a long-standing gateway to Europe from the Near East is illustrated by the phylogenetic unification of Hgs I and J by the basal M429 mutation.31 This evidence of common ancestry suggests that ancestral IJ-M429* Y chromosomes probably entered Europe through the Balkan route sometime before the Last Glacial Maximum. They subsequently evolved into Hg J in the Middle East and Hg I in Europe in a typical disjunctive phylogeographic pattern. Such a geographic corridor is likely to have experienced additional subsequent gene flows, including the migration of agricultural colonists from the Middle East. Pottery is a useful proxy for the spread of farming both spatially and temporally. The first appearance of pottery in the Adriatic region was in Corfu at 6500 BC and reached the northern most Adriatic 1000 years later.21 Its dispersal provides a comparative template for spatial and temporal patterns of Y chromosome Hg diversity observed in this area.

Hg J is most common (50%) in the Middle East and Anatolia,27, 29, 47 with a spread zone spanning from northwest Africa to India.12, 55 It has been related to different Middle Eastern migrations.12, 56 In addition to Hg J-M410, Hg G-P15 chromosomes, which are also common in Anatolia,29 have been implicated in the colonization and subsequent expansion of early farmers in Crete, the Aegean and Italy.38, 46, 47, 48 Earlier studies have concluded that the J-M410 sub-clades, J-DYS445-6 and J-M67, are linked to the spread of farming in the Mediterranean Basin,38, 47 with a likely origin in Anatolia.29 Interestingly, J-DYS445-6 and J-M92 (a sub-lineage of M67), both have expansion times between 7000 and 8000 years ago (Table 1), consistent with the dating of the arrival of the first farmers to the Balkans. The first detection of milk residue in ceramic pottery occurs in sites from northwest Anatolia 7000–8500 years ago,58 an age that approximates the Hg-expansion times.

Table 1 Ages of microsatellite variation and mean variance of microsatellite loci within haplogroups

Regarding Hg J-M12/M102, which is discernable from India to Europe, the M12/M102* chromosomes display a very high YSTR diversity, whereas on the other hand, the J-M241 sub-lineage has low diversity (Table 1) in the Balkans, indicating different demographic histories. Although Hg J-M241 shows high variance in India,37 its place of origin is still uncertain. As J-M241 has older expansion times in Sicily, Apulia and Turkey (Table 2), it may have arrived in the Balkans from elsewhere.

Table 2 Ages of microsatellite variation and mean variance of microsatellite loci within J-M241 haplogroup in Turkey, the Balkans and Italy

On the other hand, the expansion times of Hg V13 (Table 3) are consistent with a late Mesolithic time frame. The Greek Mesolithic, although different in its material culture from the Natufian Mesolithic of the Levant, bears some resemblance to the Mesolithic of southern Anatolia.60 This archaeological congruence between the Mesolithic of the Balkans and southern Anatolia may mirror the similar E-V13 expansion times observed for Konya, Franchthi Cave and Macedonian Greece, all approximately 9000 years ago. Moreover, E-V13 YSTR-related data from Bulgaria and Macedonia,28 both with a variances of 0.28, suggest an expansion time of approximately 10 000 years ago. It is likely that the origin of V13 occurred somewhere within the zone of these sample collections. In addition, it is also worth noting that in the Anatolian region of supposed Einkorn wheat origin61 (region 5 of Cinnioglu et al29), only one V13 chromosome out of 43 is found (PA Underhill, unpublished data). Therefore, as no evidence at present supports the association of E-V13 Hg with the attested origin of farming in southeast Anatolia, the possibility of farming adoption by Balkan E-V13-associated people is plausible. The low E-V13 frequency and STR variation observed in Crete38 indicate that if the first Neolithic colonists came from central Anatolia, they did not bring this Hg. The two more recent expansion times for V13 for Greece and Sesklo and Dimini (Table 3), dating to the Bronze Age, possibly reflect a more recent integration of some V13 chromosomes into the populations of the first farmers represented by J-M410 and G-M201 lineages. Both the lack of any plausible Middle Eastern source of E-V13 during either the early Neolithic or Bronze Age and the age of microsatellite variation observed are consistent with E-V13 chromosomes reflecting a Mesolithic heritage as suggested by King et al.38

Table 3 Ages of microsatellite variation and mean variance of microsatellite loci within E-V13 haplogroup in Turkey and Greece

As reported earlier,28 both J-M12 and E-V13 radiation patterns overlap geographically in the Balkans (Figure 4). Although J-M12 chromosomes were not genotyped for M241 by Cruciani et al,28 the low YSTR diversity observed suggests that these are predominantly M241 derivatives. The difference between E-V13 and J-M241 (Table 1) indicates that both E-V13 frequency and haplotype diversity would have been greater than J-M241 components just before the episode of population growth. This also is the case when the dating is carried out by disregarding the mutational steps connecting the three haplotypes that, including Turkish samples (Supplementary Figure S2), can be considered as founders.62 Whether or not E-V13 and J-M241 participated in the same demography remains uncertain.

The presence of E-M78* Y chromosomes in the Balkans (two Albanians), previously described virtually only in northeast Africa, upper Nile,28, 63 gives rise to the question of what the original source of the E-M78 may have been. Correlations between human-occupation sites and radiocarbon-dated climatic fluctuations in the eastern Sahara and Nile Valley during the Holocene64 provide a framework for interpreting the main southeast European centric distribution of E-V13. A recent archaeological study reveals that during a desiccation period in North Africa, while the eastern Sahara was depopulated, a refugium existed on the border of present-day Sudan and Egypt, near Lake Nubia, until the onset of a humid phase around 8500 BC (radiocarbon-calibrated date). The rapid arrival of wet conditions during this Early Holocene period provided an impetus for population movement into habitat that was quickly settled afterwards.64 Hg E-M78* representatives, although rare overall, still occur in Egypt, which is a hub for the distribution of the various geographically localized M78-related sub-clades.28 The northward-moving rainfall belts during this period could have also spurred a rapid migration of Mesolithic foragers northwards in Africa, the Levant and ultimately onwards to Asia Minor and Europe, where they each eventually differentiated into their regionally distinctive branches.

Differently from the earlier discussed Hgs, I-M423 represents the southeast European autochthonous clade of I-P37.2. Its distribution reaches Anatolia, where, however, it is only sporadically observed (2.6%, updated from Rootsi et al23). Also, virtually, all the I-P37.2* paragroup members identified in this survey harbouring the peculiar DYS388-15 trinucleotide repeat motif (not observed in any other Hg I clade) likely represent a new rare P37.2 sub-clade. Their distribution (Supplementary Table S1) and the associated YSTR variation age of 4000 years (Table 1) suggest that they expanded demographically, perhaps from central European regions during the Bronze Age. In this scenario, the only I-P37.2* chromosome observed in Albania, not characterized by the unusual DYS388-15 repeat motif marker, could either represent the consequence of a reversion event back to the ancestral allele or be a rare representative of the ancestral P37.2 state.

The network of the STR haplotypes identified in 222 Y chromosomes belonging to the I-M423 Hg (Supplementary Figure S1) is characterized by a star-like shape centred on the most frequent and diffused haplotype that is present in all Balkan populations. The marginal positions occupied by the three Turkish chromosomes are in agreement with a recent gene flow. The age of accumulated microsatellite variation associated with Hg I-M423 (Table 1) dates to around 8000 years ago (Early Holocene). Thus, although Hgs G and J mark the successful colonization and subsequent demic expansions of Neolithic pioneers to these regions, consistent with a wave of advance,19, 65 the widespread adoption of farming by Mesolithic hunter-gatherers in the Balkans and Central Europe is recorded in the autochthonous Hg I-M423.

These data indicate the complex interactions between farmers and foragers rather than the large-scale replacement of hunter-gatherers by pioneering agriculturalists during the spread from the Neolithic to the southeast Europe. The data also indicate that I-M423 and probably also E-V13 representatives would have been well established in the Balkans before the arrival of a nucleus of pioneering agriculturalists.

Thus, unlike Crete, southern and central Italy and the southern Caucasus, the cultural transmission of the Neolithic package played an important role. Either the initial G and J2 Hg agriculturalists who colonized the Balkans at first flourished but later diminished in a similar manner to that proposed regarding the Linearbandkeramik in central Europe66 or the package was rapidly and robustly adopted by local Mesolithic people in the southern Balkans (plausibly characterized by E-V13), who underwent a demic expansion and a subsequent range expansion to the eastern Adriatic. These former foragers who had recently acquired the Neolithic tradition participated in ‘leapfrog’ colonizations up the Adriatic, where they eventually transmitted agricultural practices to resident Mesolithic populations represented by I-M423 chromosomes.

Interestingly, the derived Y-chromosome scenario strongly recalls the fourth PC synthetic map of Europe calculated on gene frequencies at 95 nuclear loci,11 which displays a centre in the southern Balkans and a large surrounding area that terminates with a ‘propagule’ to the northeast of the central Balkans. On the basis of this observation, our assumptions could provide a possible interpretation of the described expansion centred in the southern Balkans.

Conclusion

This study provides a model that elevates the role of migratory foragers with remote eastern Saharan ancestry who, once established in Asia Minor with their own derived genetic signature, were destined to become the earliest converts to farming and the adherents of its further spread into Europe. Such an interpretation finds support in the ‘dispersal model’ of Impressed Ware in which the ‘Neolithic package’ was acquired by native groups and subsequently diffused by interactions between farmers and foragers.67 Although southeast Europe shows considerable archaeological evidence of the Neolithic transition,20 our Y-chromosome results provide biological evidence of complexity21 in the transition to farming in terms of the contrasting influences of pioneering agriculturalists and Mesolithic foragers.