Ancient and recent admixture layers in Sicily and Southern Italy trace multiple migration routes along the Mediterranean

The Mediterranean shores stretching between Sicily, Southern Italy and the Southern Balkans witnessed a long series of migration processes and cultural exchanges. Accordingly, present-day population diversity is composed by multiple genetic layers, which make the deciphering of different ancestral and historical contributes particularly challenging. We address this issue by genotyping 511 samples from 23 populations of Sicily, Southern Italy, Greece and Albania with the Illumina GenoChip Array, also including new samples from Albanian- and Greek-speaking ethno-linguistic minorities of Southern Italy. Our results reveal a shared Mediterranean genetic continuity, extending from Sicily to Cyprus, where Southern Italian populations appear genetically closer to Greek-speaking islands than to continental Greece. Besides a predominant Neolithic background, we identify traces of Post-Neolithic Levantine- and Caucasus-related ancestries, compatible with maritime Bronze-Age migrations. We argue that these results may have important implications in the cultural history of Europe, such as in the diffusion of some Indo-European languages. Instead, recent historical expansions from North-Eastern Europe account for the observed differentiation of present-day continental Southern Balkan groups. Patterns of IBD-sharing directly reconnect Albanian-speaking Arbereshe with a recent Balkan-source origin, while Greek-speaking communities of Southern Italy cluster with their Italian-speaking neighbours suggesting a long-term history of presence in Southern Italy.


Supplementary Tables
Population labels as in Supplementary Table S1. The tested ethno-linguistic minorities are indicated by labels in italic.   Table S1).

Supplementary
In addition, we included 148 new samples from two regions historically tightly interwoven with Sicily and Southern Italy, namely Albania (Tosk and Gheg) and Greece (Northern Greece, Central Greece, Peloponnesus, Crete, Cypriot-Greeks, Anatolian/Dodecanese-Greeks). Among them, 75 Greek samples were obtained from the Genographic Database (Genographic Project, National Geographic Society, Washington District of Columbia, USA) in order to reach a higher sampling coverage for the Greek area. To be consistent with the adopted sampling criteria, we selected only those individuals whose Greek genetic ancestry could be accurately traced and we looked for samples with familiar specific local ancestry in one of the six aforementioned Greek regional areas province, in the southern tip of Calabria). We will refer to the former as Salentino Greeks (GRI_SAL) and to the latter as Calabrian Greeks (GRI_BOV and GRI_CAL). In particular (see also

Supplementary Section 3 PCA and ADMIXTURE projection of ancient samples
We used the genetic space defined by the modern populations to project ancient individuals onto the PCA plot ( Neolithic-Eastern European mix, to which a 14% contribution from the Caucasian component is added ( Supplementary Fig. S3).

Supplementary Section 4 Fine-scale genetic structuring in modern Euro-Mediterranean populations
Common approaches used to explore population genetic structuring among analysed Southern Italian and Southern Balkan populations revealed a significant homogeneity among the different populations of Sicily and Southern Italy, confirming no clear inter-regional differentiating pattern between Apulia, Calabria and Sicily 16, 17 . Furthermore, a common 'Mediterranean' genetic background, shared by large portions of Southern Italy (including Romance-speaking populations and Italian Greeks) and the Greek-speaking islands (i.e. Crete, Cyprus, Anatolian and Dodecanese Greece) has been identified ( Supplementary Fig. S1, Supplementary Fig. S2). The only partial exception is a Balkan genetic connection between continental Greece and the Peloponnesus with Albania (and Kosovo).
To overcome this limited genetic structuring and disentangle subtle levels of genetic differentiation, we applied the haplotype-based approach implemented in CHROMOPAINTER/fineSTRUCTURE ( Fig. 3, Supplementary Fig. S4, Supplementary Fig. S5, Supplementary Fig. S6, Supplementary   Table S4). Since fineSTRUCTURE-clustering method is blind to any a priori geographic and/or cultural classification of samples, the resulting correspondence between genetic-based clustering and population-level grouping provides an unbiased approach to detect population differentiation at finer scales. Relative proportions for the 14 considered clusters (i.e. for the hierarchical level with clusters of at least 10 individuals each) in any of the modern Euro-Mediterranean populations are reported in main text Fig. 3. Supplementary Fig. S6 Fig. S2).
Subsequent hierarchical levels of the tree (Supplementary Fig. S6)  The level of the hierarchy at 14 clusters (Fig. 3, TREE14 in Supplementary Table S4  iv) private Calabrian Greek (white in Fig. 3) and Cypriot (aquamarine in Fig. 3) clusters.
While Calabrian Greek and Cypriot clusters are highly specific of their corresponding populations, on the other hand the two SSI-clusters (AW-Sicily and CE-Sicily) appear tightly interrelated with each other (Supplementary Fig. S5), showing some degree of admixture within a genetically contiguous area that, additionally to Southern Italy, encompasses also Crete and the 21 Aegean/Dodecanese islands (i.e. what we called the "Mediterranean genetic continuum"). Although not too much emphasis should be therefore given to the "divisive" aspect of these two clusters, some differentiation emerges in the relative proportion to which each cluster is present in West Sicily (and Apulia) or East Sicily (and Calabria) respectively. In addition, one of the two clusters (the AW-Sicily purple one) appears more properly related not only to Crete and the Aegean/Dodecanese Greek islands, but also to Continental Greece, thus providing the framework for a finer exploration of subtle differentiation patterns.

Supplementary Section 5 Patterns of IBD-sharing in Southern Italian and Southern Balkan population groups
The emerging differences in pattern of sharing between Southern Italian and Southern Balkan In general, we detected lower values of sharing for both Southern Italian and Southern Balkan population groups with either Caucasus, Near East or Sardinia (Fig. 4). However, the South of Italy and the Greek-speaking islands differ from continental Southern Balkans for slightly higher values of IBD-sharing with the Caucasus and Near-East/Saudi-Arabia. In addition, marginally higher North-Western European-and Sardinian-IBD relatedness characterize the whole Sicily and Southern Italy with respect to both continental and insular Greece, while a remarkably higher and significant signal of IBD-sharing with the North-Central Balkans distinguishes the Continental Southern Balkan group (Fig. 4). However, some affinity with the North-Central Balkans (and Eastern Europe) has been observed also in the Greek-speaking islands as well as in Apulia and West Sicily, compared to Calabria and Central-East Sicily (Fig. 4).
When tested for significance (grubbs.test of the R software package outliers), continental Southern Balkans confirm outstanding values of IBD-sharing with North-Central Balkan populations for most of the considered length classes, with respect to both the Greek-Islands and SSI (Fig. 4).
Interestingly, despite much lower values of sharing, the observed Balkan IBD-relatedness is significant also for those populations in which higher frequencies of the Apulia/West Sicily cluster were found, especially as far as the highest length classes (4-5 or >5 cM) -i.e. more recent time frames -are concerned. If some recent exchanges along the Adriatic Sea may be more easily assumed for Apulia -due to its higher geographic proximity with the Southern Balkans -it is more difficult for Western Sicily, which is much more distant and separated from the rest. However, recent re-peopling of semi-deserted areas with Greek settlers, are well documented at least for the Sicilian Arbereshe, who in fact show the highest percentage of the purple cluster in Western Sicily.
All the other differences in IBD-sharing, although exceeding the percentiles of their respective distributions, are not supported by corresponding significance tests (Fig. 4). It is possible that more ancient sharing (e.g. involving Near East/Caucasus as well as Sardinia) resulted in lower and notstatistically significant differences for the inferred IBD-patterns. On the other hand, more recent migration processes (e.g. involving the Balkan Peninsula) may explain the significant differences observed between present-day Southern Balkan and Southern Italian populations, thus hinting at the presence of multiple admixture layers.

Supplementary Section 6 Population genetic ancestry of Italian Arbereshe and Greek-speaking minorities
Population relationships and structuring patterns of Italian Arbereshe and Greek-speaking ethnolinguistic minorities were preliminary explored by means of a PCA on our Geno2 dataset only (511 individuals genotyped with the Illumina GenoChip 2.0 array for 123,700 autosomal SNPs). Our populations are placed along a geographic axis of genomic variation, stretching from the Balkans to Southern Italy (Supplementary Fig. S7). Consistently with global PCA plot (Fig. 2, Supplementary   Fig. S1 Table S5). Vectors of IBD-sharing of each Italian Arbereshe and Greek-speaking group with the 18 comparison populations from Albania, Greece (both continental and insular) and Southern Italy newly analysed in the present study, were subtracted among each other and cross-tested for significant differences.
Compared to Greek-speaking enclaves, the Albanian-speaking Arbereshe of both Sicily and Calabria confirm a significant excess of IBD-segments shared with their putative Albanian-source (especially Gheg) populations, for all the classes of length (Supplementary Table S5). On the contrary, we found that all the tested Italian-Greek populations have higher IBD-relatedness with their Italian neighbours (Supplementary Table S5), especially for the medium and longer classes of segments (3-4, 4-5 and >5 cM). As such, while the Arbereshe trace their recent genetic ancestry to the Balkans, the Greek-speaking communities of Southern Italy hint their higher similarity to the Italian local neighbours.
When directly tested against putative Balkan source and Southern Italian recipient populations, an Albanian-specific shared ancestry distinguishes both ARB_CAL and ARB_SIC from their geographic neighbours of Cosenza (Calabria) and Palermo (Sicily), respectively (Supplementary Table S5). We interpret this result as the evidence that Arbereshe minorities of Southern Italy are genetically discontinuous from their Italian-speaking neighbours for sharing closer recent ancestors with Albanians. However, some differences emerge between the two Arbereshe groups. Despite evidences of recent contacts with Southern Italians (Fig. 3, Supplementary Table S5), the Albanianspeaking groups of Calabria are the only case for which a direct genetic link to the Albanianspecific cluster (Southern Balkan, cyan in Fig. 3) is demonstrated. On the contrary, Arbereshe from Sicily attribute virtually all their individuals to the AW-Sicily genetic cluster (purple in Fig. 3) encompassing insular and continental Greece as well as Apulia and West Sicily. As mentioned above, the relatively higher Balkan-IBD sharing that all the populations from this cluster exhibit with other continental Greek and Balkan groups (and partly Eastern Europe; Fig. 4

Supplementary Section 8 Statistical testing of ancient relationship patterns
Besides cases of presumptive isolation (most notably Calabrian Greeks), genetic differences between fineSTRUCTURE-detected clusters may account for different patterns and times of admixture. Therefore, we used the 14 genetically identified clusters, instead of actual populations, to perform f3-population tests among all possible trios and for dating significant admixture events   Fig. S9). Overall, these results confirm the preliminary inferences from PCA-and ADMIXTURE-projection analyses (Fig. 2, Supplementary Fig. S3 Table S8). This fact may suggests traces of Yamnaya influences extending from Eastern Europe to the Balkans.
Since Yamnaya are described as a mixture of Caucasian (CHG) and Eastern European (EHG) Hunter-Gathering populations 4, 6 , further tests were designed to better characterize the Mesolithic 28 layer observed in our Euro-Mediterranean population groups. D-statistic tests suggest that Mesolithic groups from both Eastern (EHG) and Western (WHG) Europe tend to form a clade with each other to the exclusion of modern European clusters, however signalling traces of their ancestry in modern populations with respect to CHG (Supplementary Table S8 Table S8).

Summing it up, our analyses show that a Caucasus-related ancestry is observed in both Southern
Italian and Southern Balkan populations. Nevertheless, these populations do not seem to reveal such significant evidence of Bronze-Age Yamanya-like introgressions, which have been interpreted as the most probable vectors of CHG-like ancestry in Central-Eastern and Northern Europe and were also linked with the demographic diffusion of some Indo-European languages 4 . These results may suggest that Caucasus-related ancestry reached our Mediterranean populations through migratory 29