Clear phylogeographic pattern and genetic structure of wild boar Sus scrofa population in Central and Eastern Europe

The wild boar Sus scrofa is one of the widely spread ungulate species in Europe, yet the origin and genetic structure of the population inhabiting Central and Eastern Europe are not well recognized. We analysed 101 newly obtained sequences of complete mtDNA genomes and 548 D-loop sequences of the species and combined them with previously published data. We identified five phylogenetic clades in Europe with clear phylogeographic pattern. Two of them occurred mainly in western and central part of the continent, while the range of the third clade covered North-Eastern, Central and South-Eastern Europe. The two other clades had rather restricted distribution. In Central Europe, we identified a contact zone of three mtDNA clades. Population genetic structure reflected clear phylogeographic pattern of wild boar in this part of Europe. The contribution of lineages originating from the southern (Dinaric-Balkan) and eastern (northern cost of the Black Sea) areas to the observed phylogeographic pattern of the species in Central and Eastern Europe was larger than those from the regions located in southern France, Iberian, and Italian Peninsulas. The present work was the first mitogenomic analysis conducted in Central and Eastern Europe to study genetic diversity and structure of wild boar population.

www.nature.com/scientificreports/ than 100 mutation steps from all other mtDNA clades, Fig. 3) and represented the oldest evolutionary branches on the tree (Fig. 2). The Italian clade (5) is a sister haplogroup to all the remaining clades of wild boar from Europe and North Africa (Fig. 2). Among these, the most distinct genetically were clade 6 and the haplotype BGWB1 of the clade 3 (Figs. 2, 3), all found in individuals from South-Eastern Europe (Bulgaria and Romania; 26 ). The remaining European clades (1)(2)(3), and the North African clade 7 differed from each other by 12-15 mutation steps (Fig. 3). The phylogenetic tree based on the D-loop sequences (Fig. S1) showed basically the same structure, yet it slightly differed in topology from the mitochondrial genome tree. As the values of posterior probabilities on its branches were lower, the phylogeny of wild boar based on the whole mitogenomes is more credible.
The spatial distribution of the seven clades of wild boar showed a clear phylogeographic pattern in Europe and North Africa (Fig. 4). Clades 1 and 2 occurred mainly in Western and Central Europe. Clade 3 covered the largest area: from the northern Ural Mts. to Eastern, Central and South-Eastern Europe including the Balkans. The remaining clades had more restricted distribution: clade 4 occurred (except one individual from Western Russia) in the Caucasus Mountains, clade 5 in Italy, and clade 6 in South-Eastern Europe (Romania and Bulgaria). Clade 7, although genetically closely related to the European clades 2 and 3, occurred only in the North Africa (Fig. 4). Interestingly, we identified a contact zone of three mtDNA clades: 1, 2 and 3 in the Central Europe, on the territory of Poland (Fig. 4).
Genetic diversity of wild boar population in Central and Eastern Europe. In our study area the overall haplotype diversity of wild boar mitogenomes was 0.92 (SD = 0.013) and nucleotide diversity 0.0006 (SD = 0.00003) (Table S2). About 0.5% of the nucleotide sites in the sequences were polymorphic. The most numerous haplotypes were H2a.2 (17% of samples) and H6.1 (16%). More than half of the haplotypes were singletons. In the D-loop sequences 2.5% of nucleotide sites were polymorphic. Overall haplotype diversity was 0.71 (SD = 0.017) and nucleotide diversity 0.0015 (SD = << 0.0001) ( Table S4). The most numerous haplotype was H2a (49% of samples). Six haplotypes were singletons.
In Central and Eastern Europe, we detected four mtDNA clades of wild boar, although clade 4 was represented by one individual from western Russia only (Fig. 4). The majority of identified haplotypes of the mitochondrial genome (69%) and the control region (53%) belonged to the most numerous and genetically diverse clade 3 ( Table 2). This clade included the most frequent haplotypes of mitochondrial genomes: H2a.2, H3.1 and H6.1 (Fig. S2), which corresponded to the most numerous haplotypes of the control region: H2a, H3 and H6 (Fig. S3). The ranges of the haplotypes H2a and H6 covered the largest area (Fig. S4), although, H6 was rare in Poland and was not found in eastern Ukraine.
The haplotype H3, although widely distributed, did not occur in the southern part of the studied region. Several haplotypes belonging to clade 3 had more restricted distribution, e.g. D-loop haplotype H2c and mitogenomic haplotype H2a.11 occurred only in the southern part of Central and Eastern Europe (Figs S4, S5).
Clades 1 and 2 contributed 17% and 14%, respectively, to the mitogenomic sequences, but 21% each to the D-loop sequences. The nucleotide diversity and B values (effective number of haplotypes) of these clades were  26 . Colours and numbers of clades as in Fig. 2 www.nature.com/scientificreports/ lower than those of the clade 3 (Table 2). In Central and Eastern Europe haplotypes of clades 1 and 2 occurred mainly in Poland, and most of them were rare and represented only by single individuals (Figs. 5, S4, S5). Mismatch distribution analyses and neutrality tests (Tajimas'D, Fu's Fs) performed for the three clades of mitogenomic and D-loop sequences of wild boar did not reveal expansion of those clades ( Table 2, Fig. S6).
We compared the frequencies of different mtDNA clades and the indices of genetic diversity among 9 (mitogenomes) and 15 local (D-loop) populations (Fig. 5, Tables S2, S3). The frequency of clade 1 was highest in the South-West Poland (42-67%) and it declined to 20-0% towards south and east (Fig. 5). Clade 2 were found exclusively in Poland, with the exception of one individual recorded in Eastern Belarus. The share of clade 3 was highest in north-eastern, eastern and south eastern parts of Central and Eastern Europe (up to 80-100% in Russia, Belarus, Ukraine and Hungary) and it declined towards west (20-33% in Western Poland).
The haplotype diversity of the mitogenomic populations varied from 0.73 to 1, the values of the B index from 3 to 4.76 and the nucleotide diversity ranged between 0.4 and 1 × 10 -3 (Table S2). The haplotype diversity of the D-loop populations varied from 0.51 to 0.86, the value of B index from 1.72 to 5.55 and the nucleotide diversity from 0.7 to 2.0 × 10 -3 . Both indices of haplotype diversity (Hd and B) revealed that the most diverse populations of wild boar in our studied region inhabited southern, central and northern Poland and north-eastern Belarus (Tables S2, S3, comp. Figure 5). The nucleotide diversity of the D-loop populations increased significantly with latitude (r = 0.712, p = 0.003), but we did not find such statistically significant correlations for haplotype diversity indices. The neutrality tests performed for each of the mitogenomic and D-loop populations did not reveal any sings of expansion (Tables S2, S3).  (Tables S4, S5). We did not detect significant isolation Analyses performed using Geneland for both D-loop and mitogenomic sequences showed three genetic groups of wild boar in Central and Eastern Europe, although the distribution of these groups were different for each set of the populations (Fig. 6). D-loop sequences were assembled in three groups GI -GIII and mitogenomic sequences in groups GmI-GmIII (Fig. 6). D-loop sequences were clumped into GI consisting of samples from Poland and a few from western Belarus and western Ukraine, GII covering north-western Russia, Belarus, northern and southern Ukraine, Romania and Hungary and GIII grouping the easternmost samples from eastern Ukraine and western Russia (Fig. 6). However, if Geneland was run on D-loop sequences from Poland only, the border between two identified genetic groups was very similar to that between mitogenomic groups GmI and GmIII (see insert in Fig. 6, upper panel). Based on mitogenomic sequences, Geneland separated three groups: GmI in north-western Poland, Gm II in southern Poland, Slovakia, Hungary and the southern-most Ukraine, and GmIII in north-eastern Poland, Belarus, Ukraine and western Russia (Fig. 6).

Genetic structure of wild boar population in Central and Eastern Europe.
Among the D-loop genetic groups the highest haplotype diversity was detected in the GII group and nucleotide diversity was the highest in the GIII group (Table S6). The highest haplotype diversity among three mitogenomic genetic groups was observed in the GmI group, and nucleotide diversity in the GmIII group (Table S6). The Fst values calculated among all pairs of genetic groups (D-loop and mitogenomic) were statistically significant in all cases but the largest genetic differentiation was detected between mitogenomic groups GmI and GmII (Fst = 0.22, Table S7) and the lowest between D-loop groups GI and GII (Fst = 0.08, Table S8).

Discussion
Previous studies on European wild boar, based on shorter fragments of mtDNA (from 80 diagnostic bp in 2 , to 410-660 bp in Scandura et al. 5 and Kusza et al. 27 , respectively) did not show any clear phylogeographic structure within the large pan-European clade E1, spanning from Iberia to Ukraine and western Russia. The proposed division of E1 into A and C clusters 2,23 soon appeared too simplified, as several regional studies revealed numerous subclades of intermediate phylogenetic position between A and C and geographically restricted ranges of occurrence, namely in Greece 24 and the Dinaric-Balkan region (from Slovenia to North Macedonia 25 ). This work, based on 1175-bp long fragment of mtDNA and full mitogenomes of wild boar (from our study combined with those published by Khederzadeh et al. 26 ) described the phylogeny of wild boar with much higher resolution and allowed for the spatial lineage sorting across Europe.
We found seven phylogenetic clades of wild boar in the geographic area from North Africa, to the mainland of Europe, to Dagestan in the North Caucasus. We confirmed the findings by Khederzadeh et al. 26 that haplotypes from Dagestan and Italy (our clades 4 and 5, respectively) were genetically most distinct from all other lineages in Europe. Our clade 5 corresponded to E2 and clades 1-3, 6 and 7 (the last one found in North Africa) to the pan-European clade E1 2,5,31 . Clade 6, not reported in earlier studies, with 3 haplotypes from Bulgaria and Romania, along with the most distinct haplotype BGWB1 from Bulgaria (clade 3) are most likely signals of very diverse populations in the Balkans (comp. 24,25 ). This requires further detailed studies including more dense sampling in that region.
Clades 1 and 2 occurred widely in western and central Europe (from Iberia to Czech Republic and Poland) and only sporadically further south and east, whereas clade 3 appeared and increased in frequency to be a dominant one among wild boar populations in Eastern Poland, Hungary, Belarus, Ukraine, Romania and Western Russia. Therefore, a complete exchange of clades takes place between the two longitudinal extremes of continental Europe. A contact zone of one eastern and two western clades (about 700 km wide in Poland) probably runs diagonally from Croatia, Western Hungary, to Poland and the Baltic States. Our dense sampling in Central and Eastern Europe, where the eastern clade 3 was dominating, revealed-not surprisingly-that it was genetically diverse, with further spatial substructuring: groups of haplotypes with different positions on the branches of the phylogenetic trees and networks had different frequencies regionally. Future research should focus on a comparable dense sampling of wild boar from western and southern Europe for mitogenomic analyses. This will be essential to reveal the spatial pattern of clade 1 and 2 frequencies in Iberia, Italy, and Western Europe.
In Central and Eastern Europe, the highest genetic diversity was detected in the contact zone of different mtDNA clades that occurred in wild boar populations inhabiting the Western, South-Western, and Central Poland. According to the obtained Fst values and distribution of different haplotypes, the most intensive gene www.nature.com/scientificreports/ flow among wild boar populations in our study area is in the north-east to south-east direction and the east-west direction. Wild boar inhabiting Western Poland were genetically most distant among the analysed populations. This is in agreement with the results of Veličković et al. 19 , who showed that populations of wild boar with The clear phylogeographic patterns of the identified clades 1-6 raises a question about the glacial refugia of wild boar, and the contributions of the refugial populations to the contemporary gene pool of the species in Europe. The contact zones of different genetic lineages in Central and Eastern Europe were identified for several other mammalian species and they testify to the origin of those lineages from different refugia (e.g. [32][33][34] ). According to fossil records 35 , during the LGM wild boar inhabited Northern Spain, Central Portugal, Southern France, South-central and North-western Italy, Slovenia, Croatia, and Greece. Sommer and Nadachowski 35 did not analyse the eastern regions, such as Black Sea coasts and the Caucasus. Reconstruction of wild boar LGM range by modelling based on the present-time habitat suitability of the species was done by Vilaça et al. 36 , however, this did not include Russian populations. Thus, the model of LGM refugia only produced three candidate regions: Iberia-Southern France, Italy, and Greece. Another approach was applied by Markova and Puzachenko 37 , who reconstructed the ecosystems of Europe (the whole continent from the Atlantic and Mediterranean coasts to the Ural Mountains) based on records of indicative plant (spores and pollen) and mammal (fossils) species. Habitats suitable for wild boar (Mediterranean forest, Caucasian forest, and the southern variant of periglacial forest-steppe) were supposed to form the continuous belt of variable width covering northern half of the Iberian peninsula, southern France, Apennine peninsula (without its southern part) up to the Alps, the whole Balkan region (with the exception of the southernmost Peloponnese peninsula), the Pannonian Basin, the northern and eastern coast of the Black Sea, ending around the Caucasus Mountain ridge, with a larger refugial area on its southern side [37 p. 46]. Ecological niche modelling performed by Niedziałkowska et al. 38 for the European red deer (Cervus elaphus), a species with habitat preferences similar to those of wild boar, also showed that the surroundings of the Black Sea provided suitable refugium for deer during LGM.
Interestingly, phylogeographic studies on the present-day wild boar populations inhabiting their presumed LGM refugia evidenced great genetic diversity in those regions, and the endemic character of the local gene pools rather than source of postglacial recolonization of Europe. This was true for E2 clade in Italy 2 , Southern Balkan haplogroup 25 , numerous haplotypes from Greece and Spain 24 . The last authors concluded that those southernmost peninsular regions still harbour the remnants of pre-LGM gene pool, yet they played no role in the recolonization of Central-Eastern Europe. The recolonization must have been only based on the gene pool in the northern parts of the refugia (leading-edge colonization; 39 ).
Vilaça et al. 36 believed that the Iberian refugium did not significantly contribute to the post-glacial European range of wild boar, and proposed two major colonization routes: one starting from the refugium in southern France and/or northern Italy, and leading wild boars to western and central Europe, and the second one starting from the Balkans and reaching north-eastern parts of the continent. Veličković et al. 25 argued that mtDNA lineages originating from Iberia, Italian and Balkan Peninsulas played a similar role in the post-glacial recolonization of Europe by wild boar. The results of our study partly confirmed the hypotheses of Vilaça et al. 36 and Veličković et al. 25 . It is highly probable that clades 1 and 2 spread from south-west towards central and eastern Europe, whereas wild boar belonging to clade 3 could have arrived from the south and/or south-east and populated the central and north-eastern Europe. The two colonization waves met in central-eastern part of the continent and formed a contact zone. As regards clade 3, we suppose that the LGM refugium of this highly diverse clade covered the Dinaric and the Balkan region as well as the contiguous northern coast of the Black Sea. As suggested by the spatial distribution of subgroups of related haplotypes (comp. Figs S4 and S5), the northward dispersal of wild boar in the postglacial period began both from the south (Dinaric-Balkan) and the south-east (Black Sea northern coast).
To answer the question where the western waves of colonizers (clades 1 and 2) originated from, we would need to expand the sampling and mitogenome analyses of wild boar in the gaps between the area of the contact zone and the potential refugial regions in Iberia, France, and Italy.
Although the genetic diversity of wild boar populations in Europe was higher in the past, than nowadays, the spatial distribution of the main mtDNA clades (E1, E2 and the Near Eastern) has not changed significantly since the early Holocene, in contrast to the mtDNA phylogeographic pattern of the domestic pigs 21,23,40 . Only one clade of the species (Y2 according to nomenclature provided in 21,22 ), occurring in the past in southern and south-eastern Europe but genetically rather closely related to the Near Eastern wild boar [21, this study Figs S7, S8], is today extinct. Furthermore, according to Scandura et al. 23 , hybridization with pigs has not had any significant impact on the genetic diversity of wild boar.
Despite fluctuations in wild boar numbers in the recent past (e.g. population decrease in the Little Ice Age of the eighteenth-nineteenth centuries), the species never disappeared from Central and Eastern Europe 8 . Therefore, such climate-driven oscillations probably did not have significant impact on the phylogeographic pattern of wild boar. A similar situation was recorded in another temperate ungulate species-the European red deer. Although the genetic diversity of deer was higher in the past and their numbers significantly decreased or they even became extinct in the eighteenth-nineteenth centuries in some regions of north-central Europe, the present-day phylogeographic pattern of the species reflects the postglacial migration routes from different LGM refugia localized in western and south-eastern parts of Europe (e.g. 32,[41][42][43]. Further studies on wild boar genetics including also nuclear DNA markers are needed to confirm the phylogenetic pattern of wild boar in this part of Europe. Nonetheless, the study performed by Frantz et al. 21 suggested that, on a larger spatial scale, the phylogeographic pattern obtained based on the mtDNA data and whole genomes were concordant.

Conclusions
Analyses of the whole mitogenomes of wild boar and combination of the obtained sequences with previously published data revealed the phylogeny of European wild boar with higher resolution. We detected five European clades of the species with clear phylogeographic pattern. During the post-glacial times, Central and Eastern Europe was probably recolonized by wild boar belonging to at least two western clades and one large, diverse clade from south-eastern Europe. Nowadays, the contribution of mtDNA lineages originating from the southeast (most probably the Dinaric-Balkan region and the contiguous area of northern coast of the Black Sea) to the wild boar population in Central and Eastern Europe, is larger than that of clades originating from western and south-western part of the continent. Genetic diversity was largest in the populations occurring in the contact zone of three different clades of wild boar. Genetic differentiation (Fst) among studied populations varied from very low to very high values. No isolation by distance was detected. The most intensive gene flow among studied wild boar populations took place in the north-east to south-east direction and in the east-west direction. Population genetic structure of wild boar in Central and Eastern Europe reflected the phylogeographic pattern of this species.

Material and methods
Collection and laboratory analyses of samples. We analysed tissue samples and hair follicles of 548 wild boars collected in six countries of Central and Eastern Europe in 2001-2015 (Table 1) (Table 1). We divided the analysed 548 wild boar into 15 populations, and the 101 individuals with sequenced mitogenomes into 9 populations according to their sampling regions (Fig. 1). A list of sampling localities with the exact number of specimens and years of sampling is given in Table S1.
The whole genomic DNA was extracted using BioTrace DNA Purification Kit (Eurx, Poland). The sequence of mitochondrial control region (1175 bp) was amplified using two sets of primers: (1) SSmtDNApm33F 5′-ATA CCA ATC ACT AGC ATC ATCG-3′; SSmtDNApm34R 5′ GAG TTC CAT GAA GTC CAG CTAC-3′ 44 and (2) 1F: 5′ CTT ACT TCA GGA CCA TCT CA-3′; 1B: 5″-GGT TGA GCA AGG CGT TAT -3′ 45 . Full mitochondrial genome was amplified using 19 sets of primers described by Jiang et al. 45 (Table S9). PCR reactions were set in 25 µl volume and run in 2720 Thermal Cycler (Applied Biosystems, US). Reaction mix contained GoTaq Hot Start Polymerase www.nature.com/scientificreports/  . The networks for all sequences obtained in this study for the two markers (mitogenomes and D-loop) were also constructed using Median joining method in PopArt 1.7 49 . Parameters of genetic diversity (number of haplotypes, number of polymorphic loci, and haplotype and nucleotide diversity) were calculated for the two mtDNA markers in DNAsp 6.11.01 50 . We also calculated B index using the formula B = 1/Σpi 2 where pi is the frequency of haplotype i in the population 51 . It ranges from 1 to n-where n is the total number of detected haplotypes. This index better reflects the effective number of haplotypes in a sample due to its wider range than the commonly used haplotype diversity index (see 28,33 ). Two tests of neutrality (Tajima's D and Fu's Fs) as well as their statistical significance for populations and for whole sample set in the two markers were calculated in DNAsp. We checked the Isolation by Distance (IBD) for both markers using Mantel Test performed in Arlequin v3.5.2.2 52 . The described parameters of genetic variability were also calculated for each of the three main mtDNA clades of wild boar detected in Central and Eastern Europe. Neutrality tests, using DNAsp, and mismatch analysis, using Arlequin, were also performed for each of the clades detected by mitogenome and D-loop sequences.
Correlations between genetic variability parameters (haplotype diversity, nucleotide diversity, B index), coordinates of the localities of mitogenomic or D-loop populations and sample size were calculated using the rcor. test implemented in "ltm" package R (Rizopoulos 2006, https:// www.r-proje ct. org/).
To detect the genetic structure of wild boars, we ran three independent Geneland ver. 4.9.2 53-58 analyses: for mitogenomes, for all D-loop sequences, and for D-loop sequences only from Poland, were performed under spatial model with 100,000 iterations, thinning = 100, and 20 multiple independent runs. Parameters of genetic diversity for the groups indicated by Geneland were calculated using the software described above and used for calculation of such variables for populations. To evaluate gene flow, Fst values among all pairs of populations (for the two markers) and genetic groups indicated by Geneland ver. 4.9.2 53-58 were calculated in Arlequin.

Data availability
D-loop sequences obtained in this study are deposited in GenBank (Accession numbers: MW842550-MW842568). Other datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.