Introduction

Pork is the most widely consumed meat worldwide1. Pig industry is mainly based on a limited number of cosmopolitan lean breeds which have been extensively used for breeding improvement. These highly selected breeds are raised in intensive production systems, focused on maximizing productivity, and supplying the market with fresh pork. Besides, many local, less performing breeds exist, although some of them are nowadays close to extinction. These traditional breeds are usually associated with local forms of pig husbandry and their meat is used for the production of high-quality and niche products. The Iberian pig, raised in the South-West of the Iberian Peninsula, is probably the most representative local pig breed, although many others are reared in European countries. Common characteristics of these breeds are a good environmental adaptation, rusticity, low muscle mass deposition and high adipogenic potential and, in many cases, superior meat quality traits2.

Conservation of small local breeds is mainly conditioned by their economic relevance but also depends on socio-cultural value, adaptation to local agro-climatic conditions, contribution to the development of local communities and marginal areas, and scientific importance. Efficient utilization of local breeds is needed and for this goal, technological advances, growing knowledge and innovative ideas founded on scientific research must be developed. The genetic characterization of these resources is a preliminary step for the development of conservation programs and to boost local breed promotion, and their sustainable use3.

Despite a few whole genome studies have been carried out for some European local pig breeds, such as the Iberian4,5 or Casertana6,7, many other autochthonous breeds have not been analyzed in detail yet and are considered untapped genetic resources. Recently, the diversity at several relevant candidate gene polymorphisms has been evaluated in twenty local European pig breeds8, providing information about the segregation of interesting markers for breeding or traceability9,10 purposes and giving some first insights into the genetic structure of these populations. Nevertheless, genetic characterization of animal breeds is usually addressed with the analysis of neutral markers like microsatellites or SNPs, as recommended by FAO11. High density SNP panels help to investigate genome wide diversity with a higher resolution. Dense SNP panels can be applied to a variety of genomics studies including inference on population history, structure and admixture12, estimation of effective population size4,5, QTL mapping strategies13 and whole genome association studies14 and genomic selection15. Also, comparative genomic diversity enables us to explore the degree of genomic variation and linkage disequilibrium (LD) among pig breeds. This as well helps to detect genomic regions that have been subject to selective sweeps in different pig populations16.

In this study, we analyzed genomic diversity of 20 European local pig breeds: Black Slavonian and Turopolje (Croatia), Basque and Gascon (France), Schwäbisch-Hällisches Schwein (Germany), Apulo-Calabrese, Casertana, Cinta Senese, Mora Romagnola, Nero Siciliano and Sarda (Italy), Lithuanian indigenous wattle and Lithuanian White old type (Lithuania), Alentejana and Bísara (Portugal), Moravka and Swallow-Bellied Mangalitsa (Serbia), Krškopolje pig (Slovenia) and Iberian and Majorcan Black (Spain). The genotyping information from a high density SNP chip has been employed to assess genomic diversity and structure and to identify selection signatures. This work is framed within the TREASURE project (https://treasure.kis.si), a multidisciplinary European Union funded project pointing toward the development of sustainable pork chains in several European local pig breeds.

Results and Discussion

A total of 985 pigs from 20 European autochthonous breeds and a small Spanish Wild Boar population (n = 7) were successfully genotyped with the GeneSeek ® Genomic Profiler (GGP) 70 K HD Porcine chip (Illumina Inc, USA). The sample sizes of each analyzed population are included in Table 1. Selection of animals for sampling was performed by local specialized personnel with a deep knowledge of each breed, in order to get representative samples within each analyzed population. A total of 60,451, out of 68,516 SNPs, remained after the removal of those with more than 10% missing genotypes or Minor Allele Frequency (MAF) lower than 0.01. The average within-breed MAF (Table 1, Fig. 1) ranged from 0.133 (Turopolje) to 0.294 (Sarda). In agreement with these results, Turopolje was the breed with the highest number of SNPs (29,740) with the lowest MAF values (ranging between 0.01 and 0.05) while Sarda had the highest number (15,350) of highly informative SNP markers with frequencies between 0.4 and 0.5 (Fig. 1 and Supplementary Table 1). The breeds showing the lowest number of informative SNPs were Alentejana, Basque, Iberian, Swallow-Bellied Mangalitsa, Mora Romagnola, Turopolje and Wild Boar, since more than 25% of their SNPs had MAF values lower than 0.05. On the contrary, the breeds showing the most informative genotyping results, with more than 20% of their SNPs with MAF higher than 0.40 were Sarda, Krškopolje, Schwäbisch-Hällisches Schwein, Bísara, Nero Siciliano, Old type Lithuanian White, Black Slavonian, Moravka and Lithuanian indigenous wattle.

Table 1 Samples sizes (N), mean minor allele frequencies (MAF), observed (HO) and expected (HE) heterozigosities, inbreeding coefficient of an individual (I) relative to the subpopulation (S) (FIS) and Wright’s fixation index (FST), for each analyzed breed.
Figure 1
figure 1

Frequency distribution of minor allele frequencies (MAF) in all the breeds.

In general, the informativity of the SNP chip was moderate in most of the studied breeds, which is not unexpected as these local breeds have not been considered for the design of commercial porcine chips17. The most widely used Illumina 60 K chip was designed and validated using samples from Berkshire, Duroc, Hampshire, Landrace, Large White, Meishan, Pietrain, Synthetic lines (Large White and Piétrain) and wild boar17. The GeneSeek ® GGP Porcine HD Genomic Profiler used in the present study is an improved version of the latter one, which was designed to include the most informative SNPs from the Porcine60K chip, according to findings in the same major commercial breeds, and that was complemented with new SNPs to improve the coverage of the chromosomes and the telomere regions, to better account for recombination18. In spite of the ascertainment bias implicit in the use of SNP chips, previous results have shown that these tools provide reliable estimates of genomic diversity for comparative studies between European populations, even in local breeds19.

Genetic diversity parameters and population structure

Within-breed genetic diversity

Genetic variability parameters for the analysed populations are presented in Table 1. Within-breed observed (HO) and expected (HE) heterozygosity ranged from 0.195 (Turopolje) to 0.363 (Krškopolje) and from 0.187 (Turopolje) to 0.382 (Sarda), respectively (Table 1). Turopolje, Mora Romagnola, Basque and Wild Boar exhibit the lowest HO and HE values and Krškopolje, Sarda and Old type Lithuanian White, the highest ones. Across-breeds averaged values for HO and HE were 0.297 (±0.053) and 0.303 (±0.054), respectively. These heterozygosity values are considerably lower than those reported previously for European cosmopolitan and Chinese pig breeds20,21,22,23, which ranged from 0.30–0.40 to 0.60–0.70 with an average of about 0.5, and similar to those reported for some local breeds19. The lowest heterozygosity values observed in Turopolje are in agreement with a previous study in which microsatellites were employed to assess the genetic diversity and population structure in eight populations corresponding to Balkan pig breeds24. In addition, Lithuanian indigenous wattle shows the smallest (−0.066) inbreeding coefficient of an individual relative to the subpopulation (FIS) while Apulo Calabrese showed the highest one (0.138) (Table 1). Negative FIS values were observed in Mora Romagnola, Schwäbisch-Hällisches Schwein, the two Lithuanian and the two French breeds (Gascon and Basque). Negative values indicate random mating among the individuals of the subpopulations but do not necessarily imply lower values of the total inbreeding coefficient which takes into account the accumulated inbreeding along the generations25.

Lack of selection programs and frequent or recurrent admixture, common to these untapped local pig breeds, should presumably lead to a higher degree of genetic diversity. However, the level of genetic variation in our local breeds is in general lower than that in cosmopolitan pig breeds16,17,18,19,20. The reason for this could be due to the small effective population size: in some of the breeds, only few founders were left at the beginning of the preservation programs2. This can also be the cause for high level of inbreeding observed in some of these breeds. For instance, the highest FIS values obtained in Apulo Calabrese and Casertana are in agreement with their endangered situation and small census26,27 and similar values were reported recently in an analysis based on candidate gene polymorphisms8.

Differentiation among breeds and genetic distances

The level of population differentiation can be quantified using the fixation indexes. The fixation Index (FST) for each breed (Table 1), estimated as the average of breed pairwise comparisons per SNP and then averaged by breed, shows the highest value for Mora Romagnola (0.161) and the lowest value for Sarda (0.092). The overall FST value from all the SNP markers was 0.115 (±0.020), indicating that most of the genetic variation occurred within populations rather than between breeds, as previously reported for pig populations28,29. The value obtained is concordant with previous works28, where the averaged FST value calculated between European breeds was 0.134, ranging from 0.021 to 0.209. The overall FST value obtained in the present work (analyzing data of 70K SNP chip) is considerably lower than the one observed in our previous study (0.27)8 performed with an array of selected SNPs. This difference is expected because in the latter study only 39 causal mutations and polymorphisms in candidate genes were used whereas, in the present work, SNP markers across the whole genome and mostly neutral were used6. In addition to this, the distribution of different genetic diversity parameters estimated for each SNP marker is shown in Supplementary Fig. 1.

Nei’s genetic distances30 between studied breeds range from the minimum value of 0.276 (observed between Alentejana and Iberian breeds) to the maximum value of 0.604 (observed between Apulo Calabrese and Mora Romagnola) (Supplementary Table 2), with an averaged genetic distance of 0.440 (±0.057). The Neighbor-Joining tree (NJ) constructed from these genetic distances (Fig. 2) is in general agreement with the geographical distribution of most of these breeds, namely the breeds geographically close cluster together, such as the two French breeds; Iberian, Alentejana and Wild Boar which come from the Iberian Peninsula; the two Lithuanian breeds (Lithuanian indigenous wattle and Lithuanian White old type), and the six Italian breeds (Apulo Calabrese, Casertana, Cinta Senese, Mora Romagnola, Nero Siciliano and Sarda) that are all placed in the middle of the unrooted tree (Fig. 2). These findings are expected considering that closely located breeds are more likely to share common ancestors. Besides this resemblance with the geographical distribution of the breeds, this tree cannot be used to infer any phylogenetic relationships, considering the complexity of events that might have contributed to construct the current genetic pools of the investigated breeds which share a common European origin.

Figure 2
figure 2

Neighbor-joining tree constructed with Nei’s distances. Numbers correspond to the percentage in which the node is recovered.

Principal component analysis (PCA) was employed to explore the clustering of individuals of different populations. The first three principal components explained 14.26%, 10.92% and 8.49% of the total variation. PCA allowed the visualization of groups formed by individuals belonging to the same breeds (Fig. 3). Moreover, clearly separated clusters were observed for Mora Romagnola, Turopolje, Gascon, Basque and Old Type Lithuanian White breeds. Some relationship of the clusters with the breeds’ geographical distribution could also be distinguished in few cases. For instance, Alentejana and Spanish populations group together as well as French and Lithuanian breeds, in agreement with the constructed NJ tree. On the other hand, the two Croatian breeds (Turopolje and Black Slavonian) plot very distant to each other, although two Black Slavonian pigs cluster with the isolated Turopolje group. The net differentiation between the two Croatian breeds is in agreement with previous results24 obtained after the study of several Balkan breeds with microsatellite markers, showing a clear distinction which is now confirmed at the genomic level and with a wider panel of breeds. The isolation of Turopolje breed is not surprising, as this breed is among the oldest ones in Europe, apparently being locally domesticated in the Middle Ages31, while Black Slavonian breed was formed at the end of the 19th century by crossing Mangalitsa with several imported breeds of pig. However, as both breeds were raised in the past in the same geographical area, the observed few exceptions in clustering may result from uncontrolled mating between them32. Moreover, previous works using microsatellite markers already showed higher genetic heterogeneity in the Black Slavonian population with one herd being clustered together with Turopolje pigs, suggesting moderate gene flow between the Black Slavonian breed and the Turopolje population, matching our results20.

Figure 3
figure 3

Genetic structure of the investigated 20 porcine breeds and Wild Boar population. Each point represents the eigenvalues of principal components 1 and 2 (A) and 2 and 3 (B). Points are colored according to the country and the shapes represent the different breeds.

Although the individuals are grouped by breeds, there is some overlapping among them, with a big cluster including many breeds. In the left end of this big cluster, Iberian and Alentejana breeds are completely overlapped, in agreement with their genetic closeness and common breeding history2,8. Next to the Iberian cluster, Spanish Wild Boar and Majorcan Black pigs are also closely grouped, all composing a nucleus of South-Western European populations. A close relatedness between the domestic breeds and the wild relative is observed. This finding is in agreement with the previously proposed recurrent admixture between wild and domesticated animals in Europe10,33, which might be especially intense in these local breeds, exploited in free-range systems, favoring a long history of genetic exchange with wild boars. The only breed coming from the Iberian Peninsula and being located far away from the south-western cluster is the Bísara breed, for which the separation from Iberian breed had been previously reported8,12. This is in agreement with its Celtic origin34. Interestingly (Swallow-bellied) Mangalitsa pigs are located in the middle of the western nucleus, quite far away from Moravka (the other Serbian breed), as already evidenced with the NJ tree. Genetic proximity between Hungarian Mangalitsa and Iberian pigs was reported previously by Herrero-Medrano et al.19. Next, four Italian breeds (all but Mora Romagnola) are clustered in the middle, together with Moravka and Black Slavonian pigs. The right end of the Italian cluster partially overlaps with Bísara and Krškopolje breeds, which are followed by Schwäbisch-Hällisches Schwein and Lithuanian Indigenous wattle, which is in the right end, close but separated from the other Lithuanian breed.

Interestingly, the breeds located in the right end of the PCA plots (Fig. 3A,B) are those with the highest heterozygosity values, which may be related to introgression or admixture with other breeds. In some cases Asian introgression, common in many European breeds, may be the explanation. In fact, for instance, the Old type Lithuanian White breed was developed by improving Lithuanian Indigenous wattle pigs with Large White, Middle White, Edelschwein, Berkshire and local Danish pigs35, which may explain its apparently high genetic diversity, despite several bottlenecks occurred since 2003, leading to a critical situation nowadays. The Schwäbisch-Hällisches Schwein breed was originated by crossing local pigs in Württemberg with Chinese pigs, and later by crossbreeding with pigs imported from England2. The creation of Black Slavonian breed, which was a dominant one on the territory of Croatia up to the mid-20th century, resulted from planned crossing between Mangalitsa, Berkshire, Poland China and Large Black (Cornwall) pig36,37. On the other hand, several breeds have been subjected to crossbreeding with cosmopolitan European breeds carrying Asiatic introgression, such as Krškopolje, which was likely crossed with German Landrace in times when the breed was cast out. Regarding the two Serbian breeds, Moravka was created as a result of unsystematic crossings of the old pig Šumadinka with Berkshire and possibly with Yorkshire38, while no Asian introgression is known for Swallow-Bellied Mangalitsa. This may explain the separation of both breeds observed in the NJ tree and PCA plots, as well as the proximity of Mangalitsa breed to the south-western populations, which are free of Asiatic introgression.

Linkage disequilibrium (LD) analyses

Differences in LD among populations result from the differences in population history and demography39,40 and detailed information on LD in domesticated animals is important because it is of high utility for fine mapping of genes41. Usually, a substantial extent of LD has been found in domestic species, which may be due to small effective population size in commercial populations. Our study provides an overview of LD patterns against physical distance in 20 European pig breeds and Iberian Wild Boar.

Different SNP marker sets defined by breed (Supplementary Table 3) were used to estimate LD for all SNP pairs in a distance lower than 50 Mb, dividing this window into three different categories: (a) 0 to 2 Mb, (b) 2 to 5 Mb, (c) 5 to 50 Mb; and averaging the r2 values in distance bins of 0.05, 0.20 and 5.0 Mb for classes a), b) and c), respectively (Supplementary Table 4). Overall LD across the genome between adjacent SNPs ranged from 0.289 in Wild Boar to 0.604 in Mora Romagnola.

Overall r2 values by breed were plotted against increasing distances (Fig. 4). As expected, most tightly linked SNP pairs have the highest r2 and average r2 rapidly decreases as distance increases, with a similar pattern to what has been observed in previous studies in pigs and in other species42,43,44,45. Values of r2 at short distances (0.00–0.05 Mb) ranged from 0.305 (Nero Siciliano) to 0.595 (Mora Romagnola); and at long distances (45–50 Mb) r2 values ranged from 0.028 (Iberian) to 0.089 (Casertana). The persistence and strength of LD varied among breeds. Focusing on the domestic breeds, while the LD of Iberian and Alentejana breeds decreased by the half at 0.15 Mb, showing the highest LD decay, the LD of Mora Romagnola and Turopolje decreased by the half at 1.8 and 1.75 Mb, respectively, showing the highest LD persistence. In addition to this, Fig. 4 reveals that all breeds showed r2 < 0.2 at distances lower than 2 Mb except Mora Romagnola and Turopolje, which showed r2 < 0.2 at distances lower than 5 Mb. This high level of long LD extent could point out that these breeds have experienced more unbalanced contributions (bottlenecks) or genetic drift compared with the other ones40. Similar r2 values for all the distances were observed for Iberian and Alentejana breeds, supporting the genetic closeness of these breeds already described above. Wild Boar showed the lowest extent of LD in agreement with previous findings28 and as expected for an outbred and non-admixed population. In a previous work44, European pig breeds showed high levels of significant differences in the extent of LD, in agreement with our results.

Figure 4
figure 4

Linkage disequilibrium decay. Average linkage disequilibrium plotted against distance between SNPs across the 18 autosomes for each breed.

Pairwise r2 estimates for short distances (0.05 Mb) were averaged by autosomes in all the breeds (Supplementary Table 5). These estimates revealed variation among chromosomes. While SSC1 was the chromosome with the highest LD in most breeds (Alentejana, Basque, Bísara, Casertana, Iberian, Krškopolje, Lithuanian indigenous wattle, Majorcan Black, Swallow-Bellied Mangalitsa, Moravka, Nero Siciliano, Old type Lithuanian White, Sarda, Schwäbisch-Hällisches Schwein, Wild Boar), SSC10 was the chromosome with the lowest LD observed in many breeds (Alentejana, Basque, Bísara, Black Slavonian, Cinta Senese, Iberian, Krškopolje, Swallow-Bellied Mangalitsa, Moravka, Nero Siciliano, Old type Lithuanian White, Schwäbisch-Hällisches Schwein, Turopolje, Wild Boar). Chromosomes with highest and lowest LD mean values were SSC3 (Mora Romagnola, r2 = 0.728) and SSC10 (Nero Siciliano, r2 = 0.256), respectively.

Effective population size across generations

Synteny r2 estimates between all pairs within 50 Mb were computed to estimate Ne across 50 generations (Fig. 5, Supplementary Tables 626) using the recombination values for each chromosome showed in Supplementary Table 3. Wild Boar had the highest Ne 50 generations ago (521.68) and Mora Romagnola and Turopolje were the breeds with the lowest Ne values (56.63 and 59.08, respectively). Table 2 shows the current effective population size, with Iberian pig having the highest Ne (89.18), and Casertana the lowest value (9.44).

Figure 5
figure 5

Estimated effective population size (Ne) along 50 generations.

Table 2 Current effective population size (Ne), standard deviation (SD) between brackets and sample size (N) by breed.

Meuwissen46 recommended an effective population size of 100 in order to maintain the genetic diversity of a population, which is not accomplished in any of the populations analysed in the present work. Our findings further confirm the need for conservation strategies for many of the studied breeds. The most extreme cases are Casertana, Apulo Calabrese, Turopolje, Mora Romagnola and both Lithuanian pig breeds, for which conservation efforts are currently being undertaken27,47,48,49.

In fact, most local breeds are characterized by having small effective population sizes, which affects their diversity, and leads to high levels of LD and a high proportion of SNPs with fixed alleles. In general, breeds showing the highest levels of LD were those with the lowest effective population size and higher inbreeding (Apulo Calabrese, Casertana and Turopolje).

FST analyses

The estimation of FST index was used in order to detect genomic regions that could be involved in domestication, breed pattern establishment or selective breeding. A genome-wide scan of divergent genomic regions was carried out through the estimation of Wright’s FST at each marker as a measure of genetic differentiation. Candidate regions to diversifying selection were identified as those in the 99th percentile of the empirical distributions of sliding windows (Supplementary Figs 222). A total of 502 windows per breed were identified as outlier windows (Supplementary Tables 2747) and when the outlier windows were adjacent, they were considered as the same genomic regions.

A total of 19 genomic regions overlapped in five or more pig groups (breeds and/or wild boars) on chromosomes SSC1, SSC2, SSC6, SSC7, SSC8 and SSC13 (Table 3). In these regions, candidate genes related with reproduction (ADAD1, PRDM1 SPACA1 and SLCO4C1), lipid, carbohydrate and protein metabolism (PGD, UBE4B, RNF150, UBE2E1, CNR1, RBP7 and STARD4), growth and development (FER, IL2, IL15, IL21 and PRDM1), cellular homeostasis (ATG5), locomotor behavior (NOVA1, SOBP) and response to nutrient (BCKDHB) were identified. The region located on SSC8 (100.93–101.74) overlapped in seven breeds (Alentejana, Apulo Calabrese, Bísara, Cinta Senese, Gascon, Krškopolje, Moravka) and in Wild Boar and contained genes involved in growth and reproduction. The breeds with the highest number of overlapped regions (with five or more breeds) were Alentejana and Iberian in addition to Wild Boar (12 regions), whereas the breed with the lowest number was Mora Romagnola, in agreement with a specific genetic differentiation in this breed, also highlighted in the PCA analyses.

Table 3 Genomic regions with outlier FST-windows shared among at least five breeds and genes annotated within these regions in Sscrofa11.1.

Besides, breed specific signatures were identified as those in the 99.9th percentile of the empirical distributions specifically detected in one breed but not in the remaining ones (Supplementary Tables 4869). A total of 115 breed specific regions were detected in the 21 populations, with a mean of 5 ± 3 regions detected per breed. The number of specific signatures detected in each breed was quite variable. Several breeds showed a large number of specific regions, such as Turopolje (14 regions) or Mora Romagnola (12 regions), while the numbers were smaller in other ones, such as Iberian or Sarda with just one region detected. Chromosomes 1 and 13 were the ones harboring the highest’s numbers of breed specific regions (15 and 17, respectively). The higher differentiation observed on Turopolje and Mora Romagnola may suggest a higher genetic drift in those breeds due to their limited effective population size50. The identified regions contain interesting candidate genes involved in functions and pathways related to productive and behavioral traits. Although a detailed discussion of the genes identified in each of the quoted regions is beyond the scope of this work, some findings can be highlighted.

Several different olfactory receptor genes were detected within the potentially selected regions in Apulo Calabrese (SSC9), Gascon (SSC13), Iberian (SSC15), Lithuanian Indigenous wattle (SSC4), Swallow-Bellied Mangalitsa (SSC2) and Mora Romagnola (SSC1). This is in agreement with the large repertoire of functional olfactory receptor genes in pigs, the fast evolution of these genes33,51 and their relevant role in smell and food finding, especially in these extensively-reared breeds. The putative selection of different subfamilies in each breed could match with the specialization in the detection of specific odors, characteristic of each breeds’ environment, as there is a wide functional diversity among olfactory receptors subfamilies. In fact, different gene clusters have been potentially associated with the recognition of specific odorants42. Moreover, the relevance of these genes exceeds their role in olfaction as they are also implicated in reproductive and behavioral traits, which may influence fitness.

Also, candidate genes involved in relevant metabolic functions associated with the adipogenic phenotype of our breeds are identified within the putative selected regions. For instance, in the Basque breed, the genes ACOX1 and CPT1A, both involved in fatty acid metabolism are identified. NFKBIA and PPARGC1B which have both been associated with backfat thickness in pigs52,53 are detected in Krškopolje and Cinta Senese breeds, respectively. DECR1, a positional candidate gene for the first fatness QTL detected in pigs on SSC454,55, is observed in a potentially selected region in Gascon breed. DLK1, a gene with a fundamental role in muscle growth and fat deposition56, is detected in Majorcan Black. LPIN1, which has been associated with obese pig phenotypes57, is detected in Turopolje breed. Taste receptor genes play a fundamental role in survival through the identification of dietary nutrients or potentially toxic substances, being linked to eating behavior and adaptation to specific geographical locations and diets58 and potentially related to growth and fat deposition10,59. Among this gene family, the TAS2R16 gene was detected in a selection signature in the Turopolje breed. Genes involved in the endocrine regulation of growth and insulin signalling, are also observed: IRS1 is detected in Casertana; GAL and GALR2 in Basque, and members of the insulin like growth factor binding protein gene family are detected in Black Slavonian. These genes code for signalling molecules that integrate and coordinate numerous biologically key extracellular signals within the cell. Some of them are intermediate of the insulin signalling, with a key role in growth, fatness and energy homeostasis60,61.

Different and abundant genes involved in proteolysis were also found, such as HEDTC2 and IDE in Alentejana; USP54 in Casertana; CAPN10, RNPEPL1 in Iberian; PSMA6 in Krškopolje, CTSV in Turopolje, or UBE4B, RNF150 and UBE2E1 detected in 5 or more breeds simultaneously (Table 3). Increased protein turnover has been proposed as a potential determinant for the limited muscle growth usually observed in local pigs2,62,63. A signal of diversifying selection was reported by Wilkinson et al.64 and Ai et al.65 in European pig breeds, close to the EDNRB gene, which is implicated in coat color pattern in mammals. This same region has been detected in the present work in Apulo Calabrese breed. Nevertheless, signals of selection were not detected near the two main known coat color genes, KIT and MC1R, for which allelic variation is associated with many of the coat color variants in pigs66,67. This could be due to incomplete coverage or informativity of the SNP chip in these particular regions.

Regarding Wild Boar-specific selection signatures, only three regions were detected. Two of them are located in SSC7, an autosome that has been repeatedly shown to be associated with domestication and behavior-related traits in QTL and GWAS studies68. Interestingly, the TECTB gene is potentially included in a selected genome region in Wild Boar. This gene is expressed in the inner ear and has a main role in hearing, which may be associated with survival in wild environmental conditions. One of the regions detected in SSC7 (33.55–33.59 cM) is located very close to PPARD gene (31.22–31.29 cM), related to ear morphology, fat deposition and growth, and detected previously as being located in a differentiated genomic region between European breeds and Wild Boar64.

In previous works analyzing different commercial and traditional pig populations, a number of regions showing between-breed signatures of selection has been identified16,64. In these studies, as well as in the present one, genes mapped to these regions can be considered as candidates under selection in pig breeds. Some common biological functions have been detected in different works, such as olfaction, growth or muscle development69,70. Nevertheless, when comparing different works, variable regions and genes have been observed probably due to differences in the breeds analyzed, statistical methods, SNP density or sample size. Especially, domestic pigs under different evolution and production conditions show different selection signatures and in our case all tested populations are locally produced breeds, which have not been selected for lean meat content, muscularity or enhanced reproduction. Thus, in a differentiation analysis among those breeds, expected signatures may be weaker than those observed in commercial and highly selected genotypes, or more related to domestication and breed standards establishment than actual artificial selection processes.

Conclusions

The obtained results were useful for the characterization of the genomic diversity of autochthonous European pig breeds. Results highlighted the genetic closeness among several of these domesticated breeds, and with their wild ancestor, as well as clear differentiation of some other ones and confirm the need of conservation programs to protect their genetic pools. Linkage disequilibrium patterns and extent have been determined at the genome level for a wide repertoire of European traditional breeds, showing potential effects of admixture and inbreeding. Putative signals of selection were detected for regions containing genes involved in growth, muscle development, reproduction, metabolism, behavior and sensory perception. Our findings improve the knowledge about the genome biology of European local pig breeds, and provide candidate genes for relevant traits, as well as useful information for future conservation, association or selection approaches.

Methods

Animals and sampling

Experts of each country, including personnel from breeding organizations and herd books, selected the animals to be included in the analyses in order to get a representative sampling of each breed. Selection of individuals for sampling was performed by avoiding highly related animals (no full- or half-sibs), balancing between sexes and prioritizing adult individuals or at least animals with adult morphology. Blood samples were obtained from 39 to 54 individuals from each one of the 20 local pig breeds included in the study: Black Slavonian and Turopolje (Croatia), Basque and Gascon (France), Schwäbisch-Hällisches Schwein (Germany), Apulo-Calabrese, Casertana, Cinta Senese, Mora Romagnola, Nero Siciliano and Sarda (Italy), Lithuanian indigenous wattle and Lithuanian White old type (Lithuania), Alentejana and Bísara (Portugal), Moravka and Swallow-Bellied Mangalitsa (Serbia), Krškopolje pig (Slovenia) and Iberian and Majorcan Black (Spain). Besides, seven European wild boars were employed as outgroup. Specialized professionals from each institution that provided animal material obtained blood samples following standard routine monitoring procedures and guidelines, at farm or at slaughter. No procedures with animals were performed that would demand ethical protocols according to Directive 2010/63/EU (2010) and blood samples were obtained as a general breeding procedure or previously collected DNA only reused here. A total of 992 DNA samples were genotyped.

The genomic DNA was extracted from leukocytes present in 8–15 mL of peripheral blood, collected in Vacutainer tubes containing 10% 0.5 M EDTA (ethylenediaminetetraacetic acid, disodium dihydrate salt) at pH 8.0. The extraction was performed using either a standardized phenol-chloroform, high-salt method or a commercial kit71.

Genotyping

Samples were genotyped with the GeneSeek ® GGP Porcine HD Genomic Profiler v1 (Illumina Inc, USA), which includes 68,516 SNPs evenly distributed with a median of 25 kb gap spacing and sharing 42,372 markers with Illumina porcine SNP60 chip. The average genotyping call rate was 0.94.

Genotype quality control (QC) and data filtering were performed using PLINK72. SNPs with MAF lower than 0.01 or more than 10% missing genotypes were excluded from the analyses in a preliminary filter to inspect the distribution of MAF across the genotyped SNPs. In addition, an individual call rate threshold was set to 95% and ten samples (two Bísara, two Casertana, one Cinta Senese, one Moravka two Nero Siciliano and two Schwäbisch-Hällisches) were removed for further analyses.

Data analyses

A total of 51,219 SNPs mapped on the 18 autosomes on Sus Scrofa 11.1 were used to compute, for the 20 studied breeds, the following indicators of genetic diversity: observed (HO) and expected heterozygosity (HE), inbreeding coefficient of an individual (I) relative to the subpopulation (S) (FIS), fixation index (FST) and inbreeding coefficient of an individual (I) relative to the total (T) population (FIT), heterozygosity index based on observed heterozygosity in individuals within breeds (HI) based on expected heterozygosity in subpopulation (HS) and based on expected heterozygosity for overall breeds (HT). Calculations were made with VCFtools software73. Nei’s genetic distances12 were calculated in R74 environment. Pairwise distances were used to build a NJ tree with the nj function belonging to the ape75 library in R. In addition, population structure was inspected trough PCA analyses performed with DISSECT software tool76.

Linkage disequilibrium (LD) analyses

Markers with MAF lower than 0.05, with more than a 10% of missing values, significantly deviating from Hardy-Weinberg equilibrium (P < 8.27 × 10−7) and unmapped or mapped in sexual chromosomes were excluded from LD analysis. This QC filtering was carried out independently for each breed and the number of SNP used is showed in Supplementary Table 3. The LD coefficient r2 was estimated for all marker pairs less than 50 Mb for each population and autosome independently using PLINK. To plot LD vs physical distance between markers, following Saura et al.4, we divided SNP pairs into three distance classes, (a) 0 to 2 Mb, (b) 2 to 5 Mb and (c) 5 to 50 Mb. Distance bins of 0.05, 0.20 and 5.0 Mb were used for classes (a), (b) and (c), respectively, and average r2 values for each bin were plotted against physical distance. Sample sizes were similar for all the breeds except for Wild Boar. As sample size can have an influence on LD estimation, a correction for sampling size was used to estimate r2 in Wild Boar as follows: (r2 − 1/N) (1 − 1/N)77.

Effective population size across generations

Estimates of effective population size (Ne) for each population were computed using the relationship between LD and Ne according to the following equation39,78:

$${r}^{2}={(\propto -4{{\bf{N}}}_{{\bf{e}}}{\bf{c}})}^{-{\rm{1}}}+1/{\rm{N}}$$

where c is the distance between SNPs (Morgans), Ne is the sample size and absence of mutation was assumed (\(\propto =1\)) and N is equal to the number of diploid individuals sampled in the analyses. r2 estimates were computed between all pairs within 50 Mb windows within chromosome. Physical distances were converted to genetic distances in Morgans taking into account the specific recombination rate estimated for each chromosome by Muñoz et al.79. To estimate population size per generation, r2 between SNP pairs at a determined specific genetic distance corresponding to t = 1/2c80, where t is the generation, was considered. Finally, Ne at each generation was estimated through a non-linear least square approach based on the equation mentioned above. Ne estimates were calculated back for a period of 50 generations based on the same set of genotypes used in the linkage disequilibrium analyses (Supplementary Table 3).

FST analyses

Hardy-Weinberg equilibrium was inspected individually in all breeds. SNPs significantly deviating from Hardy-Weinberg equilibrium (P < 8.27 × 10−7) in at least one of the studied breeds were removed. A total of 828 SNPs was extracted from the subset of 51,219 SNPs used to compute genetic diversity parameters. Pairwise Wright’s FST81 were estimated as a measure of genetic differentiation according to the method described in Wilkinson et al.64. For each breed, a total of 21 breed-pairwise comparisons at each SNP were obtained and they were averaged to get overall FST for each SNP per breed. FST values were averaged in sliding windows of 13 SNPs centred at the 7th SNP which determined the genomic location of the window. Regions in the 99th percentile of the empirical distributions of windows per breed were defined as candidate regions to genetic differentiation. Genes were annotated with Biomart tool82.