The Y chromosome of autochthonous Basque populations and the Bronze Age replacement

Here we report on the Y haplogroup and Y-STR diversity of the three autochthonous Basque populations of Alava (n = 54), Guipuzcoa (n = 30) and Vizcaya (n = 61). The same samples genotyped for Y-chromosome SNPs were typed for 17 Y-STR loci (DYS19, DYS385a/b, DYS398I/II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, Y-GATA H4) using the AmpFlSTR Yfiler system. Six major haplogroups (R, I, E, J, G, and DE) were detected, being R-S116 (P312) haplogroup the most abundant at 75.0% in Alava, 86.7% in Guipuzcoa and 87.3% in Vizcaya. Age estimates for the R-S116 mutation in the Basque Country are 3975 ± 303, 3680 ± 345 and 4553 ± 285 years for Alava, Guipuzcoa and Vizcaya, respectively. Pairwise Rst genetic distances demonstrated close Y-chromosome affinities among the three autochthonous Basque populations and between them and the male population of Ireland and Gascony. In a MDS plot, the population of Ireland segregates within the Basque cluster and closest to the population of Guipuzcoa, which plots closer to Ireland than to any of the other Basque populations. Overall, the results support the notion that during the Bronze Age a dispersal of individuals carrying the R-S116 mutation reached the Basque Country replacing the Paleolithic/Neolithic Y chromosome of the region.

www.nature.com/scientificreports/ substructure observed in the Basque today relative to geography result from pre-Roman tribal structure dating back to the Bronze Age.
Recently, high-coverage genome-wide SNP data utilizing ancient DNA from Iberian samples from the past 8000 years confirms the genetic isolation of the Basques since the Bronze Age, ~ 2200-900 BCE 37 . The study describes the Basques as an Iron Age population genetically impacted by migrations from the Pontic-Caspian STEPPES. These population movements into Iberia likely correlated with the introduction of the Urnfield tradition and Indo-European languages to the region 38 . Yet, in Iberia only the Basque groups failed to adopt an Indo-European language. According to the study, these STEPPE migrants initially co-existed with the local Neolithic farming communities and eventually mixed to generate the Bronze Age Iberian population about 4000 years ago. On the average, the current Iberian populations possess about 40% Pontic-Caspian DNA 39 . In some Iberian communities the native Y-chromosomes of Neolithic farming communities were almost totally replaced by the Indo-European R1b lineages 39 . For unknown reasons, the replacement of Copper Age Western European Y chromosomes made up mainly of haplogroups I2, G2, and H by R1b Bronze Age chromosomes was more dramatic in Iberia. This sex-specific replacement suggests a higher contribution of incoming males than females, which is also supported by a lower X-chromosome input from the STEPPES. Today this Y chromosome turnover is particularly pronounced in the Basques, which exhibit 87% R1b 19 . In the rest of Iberia the abundance ranges from 43% in Malaga to 81% in Catalonia 40 . The population-dynamic mechanisms that generated such a sex-specific replacement are unknown.
According to Myers et al. 41 the M269 mutation that defines Y haplogroup R1b originated in the Near East and travelled with Neolithic farmers to Northern Anatolia and then to the Pontic-Caspian steppe where a number of its subclades became closely associated with the spread of Indo-European languages into Western Europe. A subsequent mutation, S116 (also known as P312), likely occurred in what is today France approximately 5500-5000 years ago 41 . The R1b Y-chromosome haplogroup in Iberia is mainly represented by the R-S116 haplogroup, which reaches 80% in the Basque Country (average of all Basque Provinces) 42 . A derivative lineage of S116, DF27, which likely originated in Iberia reaches a maximum value of 63% in the Basque Country and minimum value of 40% in Galicia 12 .
Considering the elevated level of Indo-European Y chromosomes observed in the general Basque population, while retaining mtDNA and autosomal Neolithic sequences, we decided to investigate the Y-chromosome constitution of the three regional autochthonous Basque populations of Alava, Guipuzcoa and Vizcaya. In doing so, we were testing the hypothesis that Paleolithic/Neolithic Y chromosomes were replaced by S116 Y chromosomes in the three populations during the Bronze Age.

Materials and methods
Sample collection and DNA isolation. Whole blood samples were collected from 145 autochthonous Basque individuals from the provinces of Guipuzcoa (n = 30), Vizcaya (n = 61) and Alava (n = 54). Peripheral blood was collected in EDTA vacutainer tubes by venipuncture from unrelated healthy individuals. All samples were procured from donors voluntarily while closely adhering to the ethical guidelines stipulated by Colorado College, Colorado Springs, Colorado USA and the University of the Basque Country, UPV/EHU, Vitoria-Gasteiz, Spain. All donors gave their informed consent prior to inclusion in the study, following the ethical principles and guidelines of the Declaration of Helsinki for the protection of human subjects. DNA was extracted from leukocytes by the phenol-chloroform method described by Batzer and Deiniger 43 . Individuals were considered autochthonous Basque if the four grandparents were born in the Arratia (Vizcaya province) and Goiherri (Guipuzcoa province) valleys of the Basque Country as indicated in Fig. 1. In the case of the Alava sample, the criterion was that the four grandparents were born in the Alava province.
Y-STR haplotyping. The same samples genotyped for Y-SNPs were typed for 17 Y-STR loci (DYS19, DYS385a/b, DYS398I/II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, Y-GATA H4) using the AmpFlSTR Yfiler PCR amplification kit as recommended (Applied Biosystems of Life Technologies/Fisher Scientific). Resulting amplicons were separated on an ABI Prism 3130 XL Genetic Analyzer using the ABI GeneScan 500 LIZ as an internal size standard and fragment lengths were estimated by GeneMapper v3.2 (Thermo Scientific). Y-STR alleles were assigned by comparison with an allelic ladder provided by the manufacturer. The number of repeats at DYS389II was calculated after subtracting the  50 program. Haplotype and haplogroup diversities were computed using the software package Arlequin V3.5 51 . All phylogenetic analyses were performed using the loci in common among the collections listed in Supplementary Table 1. DYS385 was excluded from the haplotype diversity calculations because is not possible to discriminate between the DYS385a and DYS385b loci with the Y STR kit. The number of repeats at DYS389II was calculated after subtracting the number of repeats at DYS389I. Discrimination capacity was calculated by dividing the number of different haplotypes by the total number of individuals in the population. The fraction of unique haplotypes was determined as the percent proportion of unique haplotypes. www.nature.com/scientificreports/ Population pairwise genetic distances (Rst values) and the corresponding P-values were calculated for all given pairs of populations using the Arlequin v3.5 software 51 . The pairwise population comparisons were tested at a significance level of 0.01 with 10,000 permutations 52 . In order to compensate for potential inclusion of false positives, type I statistical errors, the Bonferroni correction was applied (α/m = 0.01/528 = 1.89394 × 10 -5 ). The DYS385 locus was not included in population comparison for the reasons previously stated. All samples carrying non-consensus alleles and null alleles were excluded from the Rst calculations. Subsequently, multidimensional scaling (MDS) plots were generated in order to examine the phylogenetic relationships among populations in Supplementary Table 1. The MDS plot was constructed with SPSS v14.0 53 using the Rst pairwise values. A Neighbor Joining (NJ) tree, based on Fst distances 54 , was constructed with the software PHYLIP 3.52c 55 in order to deduce phylogenetic relationships between the populations under analysis. Bootstrap analysis involved 1000 replications.
Y-STR haplotypes of individuals belonging to the R-S116 haplogroup were used to generate a Median-Joining networks (NETWORK 4.5.1.6 at http://www.fluxu sengi neeri ng), in which the Y-STR markers were weighted inversely to their repeat variance and the Maximum Parsimony (MP) option was employed to produce the least complex topology.
Y-STR haplotypes were used to estimate the time to the most recent common ancestor (TMRCA) of the R-S116 sub-haplogroup. With this aim, rho statistic (ρ) 56 and weighted rho (ρ W ) 12 were estimated with an R script available in GitHub (http://githu b.com/fcala fell/weigh ted_rho). The number of repeats at DYS389II was calculated after subtracting the number of repeats at DYS389I. Mutation rates were obtained from the Y-Chromosome STR Haplotype Database (YHRD, www.yhrd.org) on March, 2020. The statistical significance of the time estimate differences were assessed using the Past 4.02 software (http://palae o-elect ronic a.org/2001_1/past/issue 1_01.htm).
Ethical standards. The IRB of Colorado College approved this study. All experimental protocols were approved by the IRB of Colorado College.  Table 8. Both Guipuzcoa and Vizcaya exhibit identical modal haplotypes while Alava differs only at DYS456 by one mutational step. For comparison the partial Irish 57 modal haplotype is included in Supplementary Table 8. An Atlantic modal haplotype has been reported by Wilson and colleagues based on the Basque, Irish and Welsh populations 58 . Since not all Y-STR loci genotyped for the Basque populations in the present study were previously genotyped in the Irish population, not all the corresponding markers are available in the Irish modal haplotype. Of the ten comparable loci between the three Basque populations and the Irish, two differ by one mutational step.

Results
Forensic and population genetic parameters. Table 1 provides the forensic and population genetics parameters of Alava, Guipuzcoa and Vizcaya populations using the Minimal 9-loci, Extended 11-loci and Y-filer 17-loci haplotypes. As expected the values for number of haplotypes, unique haplotypes, fraction of unique haplotypes, discrimination capacity and haplotype diversity values augmented as the number of loci increased. Using the 17 loci included in the Y-filer system, the Alava, Guipuzcoa and Vizcaya populations exhibit high levels of genetic diversity, with haplotype diversity values of 1.0000, 0.9978 and 0.9978 as well as fraction of unique haplotypes of 1.00, 0.89 and 0.92, respectively. Discrimination capacity values of 0.94, 1.00 and 0.95 were estimated for Alava, Guipuzcoa, and Vizcaya, respectively. In general the increment to the 17-loci level of the Yfiler system as compared to the 9-loci of the Minimal and 11-loci Extended systems improves the resolution of discrimination in all three populations, especially for the Guipuzcoa group, which possesses fewer individuals. Non-consensus alleles were confirmed by repeating the amplification process.
Phylogenetic analyses. Y-SNP haplogroups. Y-SNP haplogroups and their frequencies are shown in Supplementary Table 9. Six major haplogroups (R, I, E, J, G, and DE) were detected in 145 individuals from three Spanish Basque provinces of Alava (n = 54), Guipuzcoa (n = 30) and Vizcaya (n = 61), haplogroup R-S116 being the most abundant, ranging from 87.3% in Vizcaya and 86.7% in Guipuzcoa to 75.0% in Alava. None of the frequency differences by pairs of populations were statistically significant (Vizcaya-Alava: ts [test statistic] = 1.704, p = 0.088; Vizcaya-Guipuzcoa: ts = 0.080, p = 0.936; Guipuzcoa-Alava: ts = 1.320, p = 0.187). In the rest of the Iberian Peninsula the abundance of R-S116 ranges from 43% in Malaga to 81% in Catalonia while in other Western European populations, where its frequency is high, its prevalence is 80.69%, 74.66%, 50.91% and 16.67% in Brittany (France), Ireland, Portugal and Denmark, respectively 40 . The only other haplogroup that exhibits frequencies in double digits in two of the three Basque provinces examined is E-V65 (Alava, 17.3% and Vizcaya, 10.9%). E-V65 is of Northern African origin and could have dispersed into Iberia across the Mediterranean at various times since its origin approximately 4300 years ago. All the other haplogroups range from 0 to 6.7% depending on the Basque population. J-L24 and G-M287 are markers associated with the agricultural/Neolithic revolution while I-M253 is linked to Norse dispersals of the late eighth to late eleventh centuries of the current era. When the full set of haplogroups are considered together among the Basque populations, the observed differences are not statistical significant (Exact P value = 0.07814 + − 0.00970).  Table 1), the lower distances in the order listed. Subsequent to the application of the Bonferroni adjustment for potential type I errors (α/m = 0.01/528 = 1.89394 × 10 -5 ), additional pairwise Rst genetic distances were found to be statistically insignificant. The Phylogenetic relationships among the three autochthonous Basque populations under study and the 30 geographically targeted key reference populations were evaluated using MDS analysis based on the Rst distances (Fig. 2). In the MDS graph, the populations from Alava, Guipuzcoa and Vizcaya segregate together in a cluster with the reference groups from the Basque Country (general population), Gascony, Ireland and the USA Basques to the right-center of the plot. Further to the left along the X-axis, the rest of the Western European populations form a cluster. The Eastern European reference populations follow to the left along the X-axis. The West Asian, Arabian and North African groups partition widely along the Y-axis at the far left of the graph. The groups of Africa and South Asian ancestry living in the United Kingdom are disperse within this last loose cluster. Within the European populations, the partitioning in the plot mirrors their west to east geographical distribution. The NJ projection based on different algorithms (Fig. 3) illustrates the same general topology for the European populations as the MDS graph. The cluster made up of Alava, Guipuzcoa, Vizcaya and the reference groups from the Basque Country, Gascony, Ireland and the USA Basque in the MDS plot (Fig. 2) also partition in proximity in the NJ tree with bootstrap values ranging from 100 to 40.6%. Of the populations in this Basque-Irish assembly, Alava segregates closer to non-Basque Western European populations. Discrete groupings of the Balkan region (Balkan, Macedonia, Romania and Greece), North African and Arabian populations are evident in the NJ tree. Yet, it is not clear why the United Kingdom populations of South Asian (UKS) and African (UKA) descend partition together (bootstrap = 42%), since in the MDS plot these two populations do not segregate in close proximity.
Population structure and age estimates of haplogroup R-S116. Considering the haplogroup homogeneity of the autochthonous Basque populations examined in this study, network analysis was performed to assess the genetic structure within haplogroup R-S116 (Fig. 4). The network was generated from the Y-STR haplotype data (DYS19, DYS389I/II    www.nature.com/scientificreports/ The oldest age for haplogroup R-S116 (Supplementary Table 11) is found in the Vizcaya population at 4553 ± 285 years (weighted age, 30 years/generation). Alava and Guipuzcoa exhibit younger time estimates at 3975 ± 303 and 3680 ± 345 years (weighted age, 30 years/generation), respectively. Ages calculated based on 25 years per generation generate more recent ages (Supplementary Table 11). All age estimate comparisons among Alava, Guipuzcoa and Vizcaya were found to be statistically insignificant.

Discussion
Examination of the Y haplogroups present in the autochthonous Basque populations of Alava, Guipuzcoa, Vizcaya indicate qualitative and quantitative differences among the three. Yet, these differences were not statistically significant. The R-S116 haplogroup predominates in all three populations, although it is less abundant in the population of Alava (Supplementary Table 9). The other haplogroups present such as I-M253, J-L24 and G-P287 are minor contributors and differ in frequencies in the three autochthonous Basque groups. Noteworthy, the Neolithic markers J-L24 and G-P287 were detected only at frequencies of 3.8% and 3.3% in Alava and Guipuzcoa, respectively. No Neolithic haplogroups were found in Vizcaya. No Paleolithic Y chromosomes were observed in the three Basque populations 59 .
The observed abundance of R-S116 in the autochthonous Basque populations is of particular interest. At 87.3% and 86.7% frequencies in Vizcaya and Guipuzcoa, respectively, these two Basques groups represent the highest recorded worldwide 40 . The R-S116 mutation and its ancestor polymorphism R-M269 have been associated with the spread of Indo-European languages in Western Europe during the Bronze Age 60,61 . The R-S116 mutation likely originated in what is today France about 5500-5000 years ago from a R-M269 background and subsequently spread to Iberia and Northwestern Europe 41 . Resulting from these Indo-European migrations that originated in the Pontic-Caspian steppe, it is theorized that these western European populations experienced a patrilineal genetic replacement from a Paleolithic/Neolithic composition to a Bronze Age Y-chromosome composition. However, this Bronze Age replacement disproportionally affected the Y-chromosome since a lot of the mtDNA in Iberia is still of Paleolithic origin (haplogroups H1, H3, U5 or V) 62 and autosomal DNA is not uniform in Western Europe 62 . It is not clear which evolutionary force(s) were at play to account for this sex-bias replacement. It has been proposed that the invading Bronze Age populations carrying the R-S116 mutation possessed superior technology including bronze weapons, horses and wheeled vehicles, thus could have easily subjugated and/or annihilated most of the native Neolithic farmers and any remaining hunter-gatherers. Culturally, these invaders were warlike and priced heroism and conquest 63 . Another potential factor that may explain the observed Y-chromosome bias is that invading armies are mainly made-up of men and casualties are usually inflicted on the male population leaving the woman for the winners to reproduce.
The age estimates of R-S116 are congruent with its origin somewhere in France approximately 5500-5000 years ago 41 considering a migration to the Basque Country of a population carrying it. The TMRCA for the Spaninsh Basque population as a whole for haplogroups I2 and J2a have been reported at 7800 ya and 5500 ya, both older than R-S116 12 . These older ages for I2 and J2a in the Basque are expected considering that these haplogroups arrived in Europe during the Neolithic. Yet, similar expansion times (about 50 generations ago or 1200 ya) and www.nature.com/scientificreports/ estimated initial and final populations for haplogroups R-S116, G2a, I2, and J2a in the Spanish Basque as a whole have been reported 12 . It is not known if the same demographic forces affected the whole of the Y chromosome composution in other parts of Iberia. As a result of the shortcomings related to the estimation of Y-STR mutation rates, the ages provided in this article should only be consider as relative assessments. As indicated by Ballantyne et al. 64 and Busby et al. 65 , mutation rates vary considerable among Y-STR loci and alleles, therefore, correspondence involving studies using different STR loci are difficult. Furthermore, age estimations computed using Y-STR diversity are affected by various factors that may overestimate the values, including several incursions from different ancestral populations. Therefore, age estimates should be considered as upper bounds. Notwithstanding these considerations, the ages reported in the present study are suitable for comparisons involving the populations analyzed in this study.
A review of the Rst values (Supplementary Table 10) demonstrates that the populations of Alava, Guipuzcoa, Vizcaya are genetically closer to each other than to any of the reference populations, being the Rst values statistically insignificant. Notable, the genetic distances separating Ireland from the Alava, Guipuzcoa, Vizcaya and United State Basques populations became statistically insignificant when the Bonferroni correction was applied. Also, the genetic distances of Gascony to the Alava, Guipuzcoa, Vizcaya, general Basques and United State Basques were statistical insignificant even prior to the Bonferroni correction. Considering the distant geographic location of Ireland in Northern Europe to the Basque Provinces, these genetic affinities suggest some degree of common ancestry among these populations, likely related to the homogenizing effect of the Bronze Age dispersals also reflected in the high levels of the R-S116 mutation throughout the region.
The MDS illustrates a longitudinal partitioning of populations with the Basque groups at one end and more geographically easterly groups progressively to the left of the plot. The population of Ireland stands out from this geographically specific partitioning. Ireland segregates within the Basque cluster. In fact Ireland′s closest group in the MDS is the population of Guipuzcoa, the most geographically easterly of the Basque groups. Furthermore, in the MDS Guipuzcoa is closer to Ireland than to any of the other Basque populations. For comparison, the Caucasian population of the United Kingdom segregates with the non-Basque Western European cluster. Confirmation of these genetic affinities between the Irish and the Basque populations is seen in the topology of the NJ tree (Fig. 3) generated using different algorithms relative to the Rst and MDS analyses. In this discussion of the similarities between the Basques and Irish at the level of haplogroup R-S116, it is important to indicate that the formers exhibit high frequencies of R-DF27 and the later of R-M529 12,42 suggesting that these mutations may have occurred in situ in the respective regions on R-S116 individuals.
The resulting R-S116 network's topology based on the three Spanish Basque populations is star-shaped and symmetrical (Fig. 4). Most of the nodes are singletons or doubletons and only ten are shared among the three Basque populations. Three of the haplotypes (nodes) shared among the populations are located near the center of the network and emanating from them are singleton haplotypes of individuals residing in the three provinces. Also, most of the nodes are separated by only one or two mutation events. No inter-population structure is seen in the network as individuals from the three groups are randomly distributed throughout the network. This suggests a recent population expansion with the three autochthonous populations separating shortly after the arrival of the R-S116 migrants to the Basque region. Also, the random distribution of individuals belonging to the three Basque populations within the network indicates that people with similar haplotypes live in the three provinces. This is likely the result of substantial movement of individuals within the Basque Country. The random distribution of R-S116 haplotypes within the network argues for Y-chromosome homogeneity within the Basque Country.
The relationship of the Y-STR haplotypes of the three Basque populations under R-S116 to other European populations was examined by MDS and Network analyses. The partitioning pattern observed in the MDS plot based on Y-STR haplotype diversity under sub-haplogroup R-S116 ( Supplementary Fig. 1) contrast dramatically with the partition of European population observed in the MDS projection based on the entire Y-STR haplotype diversity (Fig. 2). Populations such as Netherlands, Sweden, Hungary and Germany, among others, segregate as part of a cluster in the R-S116 plot as compared to a rather disperse partition of European populations in the general MDS. By restricting the analysis to diversity within R-S116, the genetic affinity among the European population has been reduced. Similarly, the Network analysis exhibits a random distribution of European populations into a number of inter-population star-like nodes separated by a small number of mutational steps (Supplementary Fig. 2). Notably, the larger nodes are made up of most of the populations analyzed. There is no evidence of inter-or intra-population structure. This outcome reflects limited Y-STR haplotype diversity within R-S116 among European populations suggesting a recent contemporaneous separation of European populations carrying the S116 mutation. This scenario is compatible with the hypothesis that a swift dispersal of Indo-European invaders from the STEPPES into Europe about four to five millennia ago led to Y chromosome replacement.

Conclusion
The analyses performed in this investigation support the hypothesis that during the Bronze Age a dispersal of individuals occurred that led to the replacement of the Paleolithic/Neolithic Y-chromosome composition in Western Europe by Indo-European R-S116 lineages. Our data shows that this substitution was not uniform and that in some localities such as the Basque Country of Spain, the replacement was more vast and thorough than in other regions of Western Europe. Although our data does not uncover the evolutionary mechanism(s) that brought about such a specific and dramatic replacement of Y-chromosome types, it demonstrates that the Bronze Age dispersal genetically linked populations as geographically dispersed as Ireland, Gascony and the Basque Country of Spain.