Greek colonisation of South Italy and Sicily (Magna Graecia) was a defining event in European cultural history, although the demographic processes and genetic impacts involved have not been systematically investigated. Here, we combine high-resolution surveys of the variability at the uni-parentally inherited Y chromosome and mitochondrial DNA in selected samples of putative source and recipient populations with forward-in-time simulations of alternative demographic models to detect signatures of that impact. Using a subset of haplotypes chosen to represent historical sources, we recover a clear signature of Greek ancestry in East Sicily compatible with the settlement from Euboea during the Archaic Period (eighth to fifth century BCE). We inferred moderate sex-bias in the numbers of individuals involved in the colonisation: a few thousand breeding men and a few hundred breeding women were the estimated number of migrants. Last, we demonstrate that studies aimed at quantifying Hellenic genetic flow by the proportion of specific lineages surviving in present-day populations may be misleading.
'…board a fast ship to Sicily, where you could sell them for a profit.' (Homer, Odyssey XX 382-383)
From the eighth century BCE, the Western Mediterranean witnessed a settlement process which originated in the Aegean and had its centre of gravity in Eastern Sicily and Southern Italy, an area later known as Magna Graecia. A substantial agreement on the drivers of this colonisation and the geo-historical dynamics of the settlements is present1,2,3 but the nature of early settlements, the scale of demographic impact and its genetic legacy are still debated. Archaeologists, historians and demographers have proposed different degrees of Greek contribution, with scenarios ranging from a colonisation process based on small groups of males moderately admixing with autochthonous groups (Boardman4 p. 163; Yntema5), 'like ants or frogs about a pond' (Plato, Phaedo 109a-b), to substantial migrations from Greece and a Hellenic origin for a significant part of the pre-Roman Italian population.6, 7, 8 The pattern of genetic variation observed in Italian populations has been interpreted either as supporting a substantial Greek contribution to the current Sicilian and southern Italian gene pool,9, 10 or having being shaped by other demographic processes.11, 12, 13
When the genetic impact of the Greek Colonisation (GC) has been specifically addressed14, 15 a lineage-based interpretative approach was used, relied on the frequency of the more frequent haplogroups (E-V13) or STR motifs (Balkan Modal Haplotype) in present-day Greeks. However, these approaches can be strongly misleading. Population patterns might not hold when single nucleotide polymorphisms defining finer haplogroup assignments are genotyped. Moreover, it is problematic to treat specific lineages or haplotypes as markers of GC, as these studies have done, because (1) the region where a haplotype is most common today is not necessarily the region where it originated,16, 17 (2) modern population samples from the hypothesised source region may not be a good proxy for ancestral source populations, and (3) present-day patterns might be related to other events that triggered migration along the same route, most notably the Neolithic agricultural revolution or migratory flows during the Bronze Age, the Classical Era and the Christian Era. In addition, previous investigations have not formally tested alternative demographic models to clarify the scale of migration associated with the GC, an issue that has been puzzling demographers for long.18, 19, 20, 21
To search and characterize the genetic echoes of the demographic impact of the colonisation process, we applied a hypothesis-testing approach consisting of a deep molecular characterisation at male specific Y chromosome regions coupled with a dedicated sampling strategy to include relevant source and recipient populations. In addition, extensive simulations aimed at comparing alternative models for the origin and fate of Greek genetic contributions to southern Italy and Sicily were performed. In doing so, we: (a) tested for the presence of specific and robust signatures of Greek contribution to Italy and Sicily; (b) tested alternative models to estimate the most likely scale of the colonisation process, with a focus on the relative contribution of males versus females; and (c) evaluated the implications of lineage-based approaches in the characterisation of past demographic events.
Materials and methods
A total of 811 unrelated individuals native to Greek districts thought to be at the origin of the GC migrations (Euboea island and Corinthia) and of cities placed in close proximity to primary and secondary Greek colonies in southern Italy were recruited through informed consent (Figure 1). The project received ethical approval by the institutions involved in the collection of the samples. Other putative recipient population samples from northern and central Italy were made available (N=201; Brisighelli et al12). In addition, DNA samples from Albania,22 Croatia23 and West Anatolia24 were obtained as reference populations. To analyse samples of source, recipient and reference populations of comparable size, Italian populations were clustered into six main groups following geographical and historical criteria (Supplementary Table S1).
DNA was extracted from whole blood or oral samples (buccal swabs or saliva) through a modified salting out procedure25 or commercially available extraction kits (Master Pure Purification kit, Epicentre, Nucleon BACC, GE Healthcare Bio-Sciences, Pittsburgh, PA, USA).
A total of 59 previously published single nucleotide polymorphisms was analysed following a hierarchical genotyping strategy. Samples were amplified in a standard PCR reaction and the SNaPshot Multiplex System (Life Technologies Corp., Carlsbad, CA, USA) primer extension protocol was used. All samples were first genotyped for markers, E-M35, F-M89, G-M201, H-M282, I-M170, K-M9, J-M172, J-M267, J-M304, R-M173, P-M45, R-M17 and R-M269, to classify them into major European branches. Samples belonging to haplogroups E-M35, E-M78, J-M172, I-M170 and R-M269 were further analysed by means of haplogroup specific multiplexes (Supplementary Table S2). Furthermore, samples assigned to haplogroup G-M201 were analysed for markers M406 and P15 through direct sequencing (Supplementary Table S3). Nomenclature used for haplogroup labelling follows YCC conventions26 and recent updating (ISOGG Y-Tree 2015 http://www.isogg.org/tree/).
The entire data set was also analysed at a total of 26 Y chromosome short tandem repeats (YSTRs): those included in the AmpFlSTR Yfiler PCR Amplification kit (Applied Biosystems, Foster City, CA) and additional 9 YSTRs (DYS460, DYS388, YCA-II a/b, DYS461, DYS445, YGATA-A.10 and DYS413 a/b) by suitably designed multiplexed-PCR reactions (Supplementary Table S4). We finally assembled a haplotype data set based on 20 of the 26 analysed STR markers by excluding those STRs that in the PCR analysis co-amplify two loci and whose allele assignation to a defined locus was not possible (DYS385 a/b, YCA-II a/b and DYS413 a/b). A subset of samples (N=304) has been analysed for the hypervariable region I of mitochondrial DNA (mtDNA) using primers 15997L and 017H.
An FST genetic distance matrix27 was computed on Y haplogroup frequencies using the Arlequin package (version 188.8.131.52)28 and graphically represented by a non-metric multidimensional scaling.29 The analysis involved 18 population samples: the 6 Italian recipient groups, the 2 source samples from Greece (Euboea and Corinthia), the 3 reference samples (Turkey, Albania and Croatia) and 7 additional samples from Crete,30, 31 mainland Greece31 and Lebanon.32 To make possible comparison across data sets genotyped with different single nucleotide polymorphism panels, haplogroups were pooled to the least basal common node on the ISOGG 2015 Y tree (http://www.isogg.org/tree/) for a total of 17 groups. Stress value suggests a non-random distribution of population samples in the bi-dimensional plot (0.0758, P<0.05).33
Inferring pairs of Y haplotypes with GC ancestry
To identify pairs of Y haplotypes in the source and recipient populations with a time since the most recent common ancestor (TMRCA) compatible with the migrations from Greece to southern Italy and Sicily in the Archaic Period, we used Equation 31 (described by Walsh,25 p. 907) as implemented in the software ASHEs 1.1 (ashes.codeplex.com34). Briefly, TMRCA Bayesian posterior distributions were calculated for pairs of chromosomes separated by 0 to 2k mutational steps (where k is equal to the number of loci), assuming haplotypes composed by the set of 20 selected Y-STR loci (see above), a strict stepwise mutational model, a mutation rate of 3.09 × 10−3 per locus per generation (averaged values from Burgarella et al35 and Ballantyne et al36) and a lambda value of 0.0002 (1/N, where N=effective population size; here we used N=5000 in accordance with the study by Hammer37). For each k, we explored the likelihood distribution at 102 (GC scenario; ~2750 years ago using 27 years per generation) and 300 (neolithic scenario; ~7500 years ago using 25 years per generation) generations (Supplementary Table S5). We corrected the male generation intervals calculated on present-day genealogies (31.9 years38) according to the generalised reduction of life expectancy in pre-historic societies and the Y-based estimates calibrated in traslocated historical groups (25–30 years39). To assess which interval of mutational differences between haplotypes is the most suitable to represent the GC contribute to Italy, we normalised each distribution and chose the mutational range within the likelihood inferred for the GC scenario, which minimised the overlapping (<0.4%) with the distribution inferred for the neolithic scenario. This range was identified as 8–12 mutational steps.
To estimate the 95% confidence boundaries, we approximated each distribution to a normal one, centred on the most likely TMRCA value. We then calculated the area below the posterior distribution comprised between the most likely value and the origin of the curve, and equated this value to 50% of the total likelihood curve. The 2.5–97.5% bounds of the distribution were identified as the TMRCA values comprising the 95% of the right and left hand sides likelihood of the curve.
Tracking genetic footprints of the GC legacy in Italy
To detect Greece-to-Italy genetic contributions from the Archaic Period (1000–400 BCE), we first performed pairwise comparisons at 20-locus YSTRs haplotypes between either sources (from Euboea, Corinth) or reference populations (from Croatia, Albania and Turkey) and putative recipient populations (20 Italian groups pooled in 6 geographical districts, Supplementary Table S1). Then, we calculated, for each population pair and within the same haplogroup, the proportion of haplotypes diverging for 8–12 mutational differences. To assess whether the enrichment in haplotype pairs matching this interval was significantly higher between sources and recipient populations than between sources and reference populations, we performed Fisher's exact tests with the Arlequin software v 3.5.28 We applied the same procedure to both the whole sample (not filtered) and a subset of haplotypes (filtered), the latter obtained by removing all haplotypes from recipient populations with fewer than seven mutational differences from source population haplotypes, since the effect of more recent demography might overestimate this signal.
To estimate the contribution of Greek colonisers to present-day southern Italian communities, we applied a hypothesis-testing approach based on the deviation from observed values of the haplotype divergence expected under different simulated scenarios. As measure of molecular divergence we used DHS, a distance measure that calculates the extent of exactly matching haplotypes between pairs of diverging pools.40 The closer the divergence between pools of haplotypes, the lower the value of DHS, ranging from 0 (all haplotypes shared by the two populations) to 1 (no shared haplotypes).
As paternally inherited molecular markers we used a set of YSTRs selected on the number and the type which maximise the duration of DHS linearity with time. Six was the highest number that ensures DHS increases linearly within the last 300 generations (Supplementary Figure S1). Accordingly, we choose six loci: DYS393, DYS445, DYS456, DYS460, DYS461 and GATA-A10. The selected panel included loci characterised by tetrameric regular repeats, high and comparable estimated mutation rates (between 2.5 and 3.3 x10-3 mutations per generation according to Burgarella et al35 and Ballantyne et al36) and no incomplete alleles. As maternally inherited molecular markers we used 360 bp hypervariable region I sequences, whose divergence, as measured by DHS, increases linearly over 300 generations (Supplementary Figure S2).
Different sets of parameters (see Supplementary Table S6) were modelled under a stochastic Markov chain Monte Carlo method as implemented in the software ASHES (http://ashes.codeplex.com/). As starting haplotype pools, we used source and recipient meta-populations obtained by reiterating the real data n times (Euboea and Corinth samples as source, a mix of southern Italian samples as recipient) until a final Ne equal to one-sixth of the current census size estimated according to the two demographic models was reached. For each simulation model, we considered two populations coming into contact at time t0 and exchanging M=Nem haplotypes from the source to the recipient pool, where Ne is the effective size and m the fraction of migrants. From time t1, the two populations were allowed to evolve independently for 102 (Y haplotypes) or 110 (mt haplotypes) generations—that is, the time since the migration to Italy of the early colonisers from Greece (~2750 ya assuming, respectively, 27 and 25 years per generation). For each model, 100 iterations were performed and summary statistics of DHS values were calculated. We considered as varying parameters the initial effective size of source and recipient populations, the increment rate (0.00 ind/gen, stationary model; 0.01 individuals per generation, growth model), and the number of exchanged haplotypes M (0, 500, 1000 and 5000). Invariants were the mutation rate (0.0027 mut/site/gen for Y haplotypes; 0.0000041 mut/site/gen for mt haplotypes), haplotype diversity (0.90 and 0.80, respectively, for source and recipient Y haplotypes; 0.96 and 0.92, respectively, for source and recipient mt haplotypes) and DHS between source and recipient pools at t0 (0.7 for Y haplotypes; 0.5 for mt haplotypes). The latter values are those expected between pool of haplotypes coming into contact 2750 ya after an initial divergence from an ancestral Anatolian pool some 6250 (150 gen) or 7500 (200 gen) ya under a model implying germ-line mutation rates as above and no size increment (data not shown).
The distributions of simulated DHS values were compared with empirical values calculated for each source/recipient pair of samples. Euboea and Corinth have been used as source samples, and Italians (pooled into six main geographic areas: West Sicily, East Sicily, Ionian Italy, South Italy, Central Italy and North Italy) have been used as recipient samples. The data were considered to fit the model when observed DHS values fell within 2 standard deviations (s.d.) by the mean of the simulated distribution.
Y haplogroup frequencies are reported in Supplementary Table S7 while the overall pattern of inter-population genetic relationships is shown in Figure 2 and Supplementary Table S10. Cretan, mainland Greek and Lebanese samples were introduced to widen the spectrum of the historical players acting in south-eastern Mediterranean at the time of the GC, as proxies of non-Corinthian Dorian colonisers of South Sicily (since Crete contributed to the foundation of Gela and, in turn, Akragas), non-Euboean Ionian colonisers of South Italy and East Sicily, and Phoenician settlers in West Sicily, respectively. Looking at the reciprocal positions on the plot, little evidence of these historical events emerges, with the positioning reflecting geography rather than history. Accordingly, we observe higher genetic distance than that expected based on archaeological evidence between putative descendants of source (Greeks from Ionia, Corinthians and Cretan) and recipient (Sicilians) groups of the GC, as well as between the putative founders (Lebanese) of the Phoenician colonies in western Sicily (Motya, Panormos and Solus) and the present-day population.
Signatures of the Archaic Hellenic contribution
To detect genetic signatures of Greek migration in southern Italy and Sicily related compatible with the Archaic scenario, we compared fractions of haplotype pairs within the 8–12 mutational range, or 'GChp', with the same fractions obtained by using Albanian, Croatian and Turkish samples as reference sources (Table 1a).
Samples from East Sicily, West Sicily, South and Central Italy showed significant (P<0.01) enrichment of GChps when the Greek sample from the Euboea Island was compared with Corinthia and reference sources. Except in West Sicily and Central Italy, this enrichment remained highly significant even after correcting for multiple tests (P<0.05).41 Conversely, when considering Corinthia against other reference sources, none of the recipient samples showed a full set of significant values. The comparisons involving recipients versus Albania most commonly showed a lack of significant enrichment in GChps with respect to Euboea and Corinthia. We reasoned that contacts either between sources and recipients or between sources after the GC, that is, during the Classical and Christian periods, might have contributed to increase the GChps rate. Thus, to provide more stringent conditions for haplotype identification, we excluded all haplotypes with a molecular distance less than seven mutational steps. This 'filtered' data set confirmed the pattern observed with the less stringent criteria for the East Sicily/Euboea pair, which showed significant enrichment in GChps in two out of three comparisons even after the Bonferroni correction (Table 1b). The results for West Sicily and South Italy did not hold statistical significance when a Bonferroni correction was applied. None of the other Italian recipients showed a full set of significant enrichment with Euboea or Corinthia.
Estimating Greek contribution
The number of GChps identified using the suggested molecular distance cannot be used either to directly estimate the current Greek legacy in Italian populations or to provide an indication of the original demographic contribution. Nevertheless, this approach helped us to identify populations (East Sicily, and, to a lesser degree, West Sicily and South Italy), that are characterized by a significant association with Greek populations derived during the time window of interest.
To quantify the original demographic impact of the Greek settlers inferable from present-day Y chromosome variability, we explored two main census scenarios using a simulation-based approach. In the first scenario, high count, we based our model on the demographic estimates of Beloch,18, 19 who suggested a census size of 1.35 million people for Sicily and of 3 million for Greece at the time of the Hellenic colonisation in the Archaic Period. In the second scenario, low count, we modelled population size estimates that were smaller by an order of magnitude.21 Nevertheless, the two scenarios have similar source/recipient effective size ratios (S/R). If we assume that the proportions of past Sicilian, Euboean and Peloponnese census and male (and female) effective population sizes is one-sixth of the current census size, we estimate a S/R of 3.65 for the high-count model and a S/R of 3.75 for the low-count model. Simulation results are reported on Figure 3 and Supplementary Table S8. When considering Y-STR haplotypes, the observed DHS value between Euboea and East Sicily (0.5353) is compatible with an effective number of migrants ranging between 500 and 5000, clearly rejecting larger contributions (10 000), irrespective of the scenario considered. The DHS value obtained for hypervariable region I haplotypes (0.5995) supports an effective number of migrants between 500 and 1000, with larger contributions clearly excluded. When the NRY- and mtDNA-based estimates are paired according to the demographic model, the male-to-female migrant ratio ranged between 1:1 and 2:1 under a population growth model and between 2:1 and 10:1 under a constant population size model.
Lineage-based demographic estimates
Previous investigations14, 15 have suggested that the Y chromosome lineage E-V13 is a marker of the Hellenic contribution in the Mediterranean. To test the validity of this approach, we repeated the enrichment test described above by considering only haplotypes belonging to the E-V13 lineage. Given the relatively low frequency of this haplogroup in areas outside the Balkan peninsula, only the East Sicily sample provided a size of NRY haplotypes (N=20) large enough to perform meaningful comparisons. All the 20 E-V13 haplotypes in the sample from East Sicily had matches in the 8–12 mutational range when compared with GC and reference sources. Accordingly, no enrichment in GChps was found except versus Turkey (Fisher's exact test, P<0.01). When haplotype pairs with mismatches of 0–7 steps were removed (F data set), both Croatian and Turkish samples showed an increased relative number of GChps (respectively, 9 and 7) with respect to the other source samples (2 in Albanian and Corinthian samples, 1 in the Euboean sample).
We further explored the impact of haplotypes belonging to specific lineages by calculating the fraction of GChps belonging to the various single nucleotide polymorphism-defined lineages contributing to the overall enrichment (Supplementary Table S9). E-V13 is the major contributor for all the sources excluding Croatia. The contribution of E-V13 in Euboea and Corinthia was much lower than in the three reference sources (range 12.6–17.1 versus 17.5–22.8%). Similarly, we evaluated the contribution of E-V13 GChps in the F data set (Supplementary Table S9). As such, E-V13 haplotypes are no longer the major fraction of GChps and were under-represented in Corinthia (3.4%) and Euboea (1.1%). Figure 4 clearly shows that for these two candidate GC sources the largest quote of E-V13 haplotypes pairs did not fall in the 8–12 but in the 1–4 mutational step interval. Moreover, it demonstrates that only haplotype pairs within the Albania sample reached the highest peak within the 8–12 range.
Evidence of Hellenic genetic echoes in Italy
The history of the European continent has been characterised by a large number of migration and admixture events.42 The peopling of the Mediterranean is a clear example of this complexity. Hence, not here the use of descriptive approaches aiming at summarising the observed genetic variation can easily miss signatures related to a given event.
Aware of these limitations, we attempted to recover genetic signals related to the Greek colonisation in southern Italy by analysing samples specifically collected to provide information on the source and recipient populations actually involved. None of the Italian populations showed a closer affinity with Greek and Greek-related sources when Y chromosome data was analysed using multidimensional scaling analyses (Figure 2). There are many explanations for this: limited historical migration between the two countries; lack of continuity between original and present-day source populations with current samples not being a good proxy of the ancestral populations they come from; the signal of the ancestry could be confounded by more recent or more ancient events. Nevertheless, when we used an approach designed to take into account the mutational process, we recovered a signature of the Greek Contribution to Sicily during the Archaic Period. A first-level analysis based on the Bayesian posterior distribution of mutational steps compatible with the former colonisation phase (GChps) showed that the most evident signal was in East Sicily, but this was also found to have parallels, or have diffused into neighbouring regions of West Sicily and South Italy. A second-level analysis, performed under more stringent conditions, again detected a signal in East Sicily. The lack of similar signals in other areas known to have been colonised by Greek migrants (eg, Ionic Italy) can be explained by inadequate source samples (ie, the Achaia region is not represented in our data set), as well as by either a lower demographic impact or subsequent population discontinuity. We note that many of the pairwise comparisons were not significant due to high background signals from the Albanian sample, this being particularly noticeable when the Corinthian sample was involved. The easiest explanation for this is the close genetic affinity observed among the two areas (see Figure 2) deriving from direct and/or indirect gene flow. For example, it is known that Greek colonies in present Albanian territory, such as Apollonia and Epidamnos, were founded by Corinthians4 and that the area around Korinth was settled by southern Albanian orthodox Christians between the thirteenth and sixteenth century, the descendants of whom are identified as Arvanites.43 During the sample collection, attention was given to this issue: individuals who self-reported as Arvanite were excluded from the analysis. Nevertheless, some Arvanite ancestry will have been unreported and may have affected the results.
Despite the multiple alternative explanations for historical gene flow, it is relevant to stress here that a signature specifically related to the Euboea island in East Sicily was consistently found at different levels of analysis, in line with the historical and archaeological evidences,2, 3, 4 attesting an extended and numerically important Greek presence in this region.
Sex-biased gene flow
The numerical dimensions of the migration from Greece that resulted in the establishment of Hellenic colonies have been debated by scholars for centuries (see the study by Scheidel8 among others).
The signal from East Sicily points towards the lower end of the size spectrum proposed by historical demographers, with values in the order of thousands breeding men and few hundreds breeding women. From this perspective, our results are then compatible with the hypothesis that the migration and settlement process was driven by males. Interestingly this is one of the few cases of sex-biased gene flow skewed towards an increased male instead of female contribution.44
It is obvious that such numbers refer to the colonists who arrived in East Sicily as inferred from the descendants still living today. It is also worth stressing here that such estimates should not be taken as absolute but considered as indicative of the scale of the contribution. A possible flaw in these results could be also due to the sex-differential migratory rate after the first settlement. It is known that patrilocality is a commonplace in continental Italy and Sicily. This may have facilitated the diffusion of mtDNA variation at a larger scale than Y chromosome variation, thus lowering the probability of finding local female genetic signatures of the GC (the study by Heyer44 but see the study by Marks45).
The genetic contribution of a given source within a defined historical scenario has often been estimated using the number of chromosomes assigned to given haplogroups assumed to have a specific geographic/ethnic origin. In relation to the GC of the Mediterranean, the lineage defined by the E-V13 marker has been used to estimate the Hellenic contribution to the Sicilian gene pool.14 By assuming that all E-V13 chromosomes have a Hellenic origin, authors estimated a contribution of ~37% to the population in Sicily. The reconstruction of a STR-based network linking the Sicilian modal haplotype and its one-step neighbouring haplotypes provided a TMRCA of about 2380 years before present, with a 95% confidence ranging between 675 and 6940 years ago. More recent contributions and differential origins are expected to affect such estimates, but attempts to mitigate the impact of these phenomena were not implemented.
In this study we highlighted that, when alternative sources were taken in consideration, E-V13 did not show any specificity as a marker of the Hellenic contribution. The signal that we found using the full set of haplotypes within a limited range of mutational distance disappeared when only E-V13 unbounded haplotypes are considered, becoming even significant for other non-Greek sources when a filter for recent gene flow was applied. The contribution of E-V13 from Corinthia and Euboea, in fact, reaches its peak well before the 8–12 mutational steps range (Figure 4), while in this range and samples it showed minor relative frequency.
These findings suggest a poor association between the Y haplogroup E-V13 and the East-to-West GC migratory waves. The effects of more recent gene flow or sampling bias, may have masked the original E-V13 signal from Greece. At any rate, our results caution against the use of specific lineage-based approaches to test for hypothesised population contributions and underline the need for a more targeted approach to explain the occurrence of given haplotypes within a population, providing tests of alternative hypotheses, a wide spectrum of reference samples and mutation-limited inference methodology.
This project was funded by the British Academy (BARDA-47870; CC). The authors would like to acknowledge all the participants who offered their biological samples for analyses. We thank Giacomo De Leo and Anna Flugy (the Università di Palermo) and the staff of the Servizio di Medicina Trasfusionale dell'Ospedale Muscatello di Augusta, AVIS Comunale di Santa Croce camerina, UOC Medicina Trasfusionale, Ospedale di Lentini, Unita` Operativa Complessa di Medicina Trasfusionale, Azienda Ospedaliera Umberto I and Siracusa for assistance in sample collection. ST is grateful to Domenico Accorinti and Domitilla Campanile for their suggestions. The Croatian collection was funded by grants from the Medical Research Council (UK), European Commission Framework 6 project EUROSPAN (Contract No LSHG-CT-2006-018947) and Republic of Croatia Ministry of Science, Education and Sports research grants to IR (108-1080315-0302).
About this article
Supplementary Information accompanies this paper on European Journal of Human Genetics website (http://www.nature.com/ejhg)