A comparative analysis of SLA-DRB1 genetic diversity in Colombian (creoles and commercial line) and worldwide swine populations

Analysing pig class II mayor histocompatibility complex (MHC) molecules is mainly related to antigen presentation. Identifying frequently-occurring alleles in pig populations is an important aspect to be considered when developing peptide-based vaccines. Colombian creole pig populations have had to adapt to local conditions since entering Colombia; a recent census has shown low amounts of pigs which is why they are considered protected by the Colombian government. Commercial hybrids are more attractive regarding production. This research has been aimed at describing the allele distribution of Colombian pigs from diverse genetic backgrounds and comparing Colombian SLA-DRB1 locus diversity to that of internationally reported populations. Twenty SLA-DRB1 alleles were identified in the six populations analysed here using sequence-based typing. The amount of alleles ranged from six (Manta and Casco Mula) to nine (San Pedreño). Only one allele (01:02) having > 5% frequency was shared by all three commercial line populations. Allele 02:01:01 was shared by five populations (around > 5% frequency). Global FST indicated that pig populations were clearly structured, as 20.6% of total allele frequency variation was explained by differences between populations (FST = 0.206). This study’s results confirmed that the greatest diversity occurred in wild boars, thereby contrasting with low diversity in domestic pig populations.

Studies have evaluated MHC-Class II haplotype relationship with such parameters as production 20 , reproduction 21 and/or genetic characterisation 8,13,22 . Several studies in the field of biomedical research have concluded that SLA-DRB1 is an xenoantigen 5 ; some studies have even been aimed at determining which pig alleles have the greatest reactivity 23 whilst others have dealt with xenotransplantation 5,24,25 . MHC gene variability could be affected by natural and/or artificial selection 26,27 ; regarding diversity, allele determination in a target population is thus relevant when designing epitope-based vaccines where it is recommended that the most prevalent pig population alleles (super-types) are identified so that such vaccines are designed to serve the majority 28,29 .
Local pigs are used in various traditional production systems worldwide, including native pigs in Asia, Africa and the Mediterranean (Iberian pigs); they are called creole pigs in Latin American countries 30 . The different breeds used in the modern pig industry have a greater impact on the rural economy because they are considered a low-cost protein source 31,32 . The appearance of the domestic pig in America arose from the introduction of Iberian or hairless pigs during the time of the Conquest; they have become adapted to local conditions over a period lasting more than 500 years, having some special phenotypical characteristics 31,33 . Three breeds are currently located in specific areas of Colombia. The nucleus of the Casco Mula breed is on Colombia's Eastern Plains (Llanos Orientales) and the foothills of the Eastern Plains region (Piedemonte Llanero), the San Pedreño breed is in the Antioquia and Viejo Caldas departments and the Zungo breed is found on the Atlantic coast (mainly in the Córdoba department). The latter region is considered the place where the first specimens arrived in Colombia 34,35 .
Phenotypically, the San Pedreño breed is black, has a small head, a short snout and straight/upright ears; the Casco Mula breed's name arises from syndactyly, as the pigs are multi-coloured, have red and sometimes black hair, a medium-sized snout, concave head forward-sloping, large ears and strong, short legs. The Zungo breed is characterised by being black and hairless, large-sized and having floppy ears; it has been reported that pigs from the Caribbean region have been crossed with pigs from the Duroc, Poland China and Hampshire breeds 31 .
These breeds' living nuclei are kept in germplasm banks in the Colombian Agricultural Research Corporation's (AGROSAVIA) research centres because they have a nucleus consisting of less than 1000 individuals 35 . According to previous reports these breeds are characterised by being able to produce meat and reproduce in difficult environmental situations 32 . Several studies have used genetic markers for better characterising breeds regarding their production potential [32][33][34]36 ; a germplasm bank-based study of the San Pedreño breed recently characterised its genetic structure 35 .
The creole pig's adaptation to varying environmental conditions has promoted the development of their productive and reproductive abilities. Production-related genetic improvement programmes in Colombia have enabled the development of commercial hybrids whose production-related qualities abilities have favoured their widespread distribution in Colombia (www.solla .com) 32,37 . This research was thus aimed at genotyping the SLA-DRB1 locus in pig populations from Colombian creole breeds and commercial hybrids and comparing the diversity amongst Colombian populations and to that of some previously reported international ones.
The amount of recombination events for creole pigs (3 events) was slightly higher than that for the production animals (2 events). No recombination events were found regarding Casco Mula; the recombinant SLA-DRB1*09:02 allele was found in both populations (Fig. S2). This indicated a less divergent gene pool for the Colombian populations. When only considering PBR codons, Sd ranged from 0.70 (Remanso) to 2.79 (Micromini) and N d from 7.0 (Duroc) to 11.11 (KNP); Colombian populations, Yorkshire_E, SNU and Duroc also had low levels of diversity. As expected, a large percentage (54% www.nature.com/scientificreports/ to 82%) of N d in the β1 domain was due to PBR positions, confirming these positions being the most variable ones (Table 3).
A clustering method was used for assigning individuals to populations according to their genotypes. Based on ΔK distribution, an inferred K value of 4 provided the best fit to the data and thus all individuals could be grouped in 4 populations (Fig. S5). Interestingly, KNP and SNU were well-defined populations having fewer admixed genotypes. Colombian populations could be separated into two groups. Creole pigs were admixed with SNU, agreeing with dendrogram and MDS ordination plots. Remanso and Manta genotypes were admixed with KNP, agreeing with first MSD coordinate (Fig. 5).

Discussion
This study was aimed at comparing SLA-DRB1 (Exon 2) genetic diversity for the first time in two Colombian pig populations. It was seen that the amount of alleles was similar in each population; the Casco Mula breed had the least amount of alleles. Individuals from the San Pedreño breed had the most alleles and individuals from the Zungo breed the highest nucleotide variability (Table 1). Such data coincided with that reported by other groups 26,38 , though differing from that reported by some authors 8,26 ; for example, twenty alleles have been identified in a population of 133 wild boars in Croatia 8 . Sequence diversity analysis showed that most polymorphisms in the β1 domain were non-synonymous, Colombian pig populations having lower values in this region. However, synonymous and non-synonymous polymorphism distribution in the PBR was more uniform, despite  www.nature.com/scientificreports/ the Colombian populations having less diversity. An excess of non-synonymous polymorphisms was evident in the PBR; this was to be expected due to this region's role in antigen presentation 11,12,26 . This study's results confirmed greater diversity in wild boars, contrasting with low diversity in domestic pig populations. The clearest differences were observed regarding allele richness which was especially low in some populations (R s < 5 in the Casco Mula, KNP, Duroc and Landrace_C populations). However, some populations having low R s had high h e (corrected for sample size), indicating a marked variation regarding allele frequency distribution.
Considering that genetic drift and the structure of animal crosses are factors affecting allele frequency distribution (having greater intensity in populations having smaller effective population size) [39][40][41][42][43] , then variations regarding expected heterozygosity and allele richness could have been due to the combined effects of small effective population sizes and varying degrees of endogamy. The latter factor would have mainly been due to the  www.nature.com/scientificreports/ artificial selection of production features. For example, selection based on the Casco Mula breed's characteristic feature (hoof) and its small effective population size 35 could lead to intensive in-breeding which would then produce differences regarding allele frequency and could eventually lead to a loss of some alleles. This idea could also be supported by the high F IS values for some of these populations, indicating higher homozygote percentages than those to be expected in a population in equilibrium and could suggest higher endogamy levels. Alleles DRB1*01:02 and *10:01 were previously considered ancestral due to their presence in populations having different origins 26 . It was seen in this study that alleles DRB1*01:01, *01:02, *02:01:01, *09:01:01 and *10:01:01 were broadly distributed, often having high frequency in various populations. As the pigs' domestication seems  www.nature.com/scientificreports/ to have been occurred during six independent events 44 , these alleles' distribution could either indicate their presence in the founding populations in only some domestication events or their presence in the founding populations in all domestication events, accompanied by the subsequent loss of some alleles due to drift and/ or natural selection. Allele DRB1*02:01:01 may thus have been present in pig populations originating the creole and commercial line populations but which could have disappeared due to genetic drift (or its frequency could have become reduced and not been sampled) in the Manta population. In fact, several studies have established that mutationally robust populations might have less mutational load; however, it is not clear whether small populations have less susceptibility to drift-related fitness reduction 45 .
The creole pig group's results particularly seem to have coincided with previously reported data regarding individuals from the San Pedreño breed 35 , confirmed by analysing pedigree data from information collected from 2008 to 2017 for estimating the generation interval, the level of pedigree integrity, consanguinity and evolution. It was particularly interesting that this group had 6.73% consanguinity 35 . There have been no reports concerning this for Casco Mula pigs to date.
Considering the group's experience 46 , it was decided to use bioinformatics methods, involving Haplofinder software, for assigning alleles in heterozygous individuals; this tool has been used regarding bovine species for DR 47,48 and DQA 49 , thereby facilitating assigning a large amount of individuals, resorting to cloning in just 16% of the population. The data obtained from analysing variability between populations revealed a low level of heterozygotes, this being more evident in individuals from the Casco Mula breed and those from the Manta and Remanso groups.
The recombination events observed in this study coincided with that reported by other authors 8,50 and have been considered a molecular mechanism associated with MHC genetic diversity; such phenomenon has been observed in ungulate 51 and passerine species 52 .
Studies in Poland regarding commercial line pigs and native individuals have analysed runs of homozygosity (ROH), identifying genome regions having high ROH levels, for example, SSC4 (51.9-55.9 Mb) which is related to various functions, including immunological ones 53 . Detecting high levels of homozygosity in various regions of the genome of inbred Babraham pigs led to developing specific MHC-related analysis, revealing a high level of homozygosity, thereby making it a candidate for research group use as a biomodel 54 .
The pig populations had a clear genetic structure based on the SLA-DRB1 locus. Interestingly, the low genetic differentiation in Colombian creole populations contrasted sharply with the great differentiation between commercial line populations. This could have arisen from crossing individuals for obtaining such lines. Commercial lines' impact on Landrace pigs' genetic diversity has been evaluated recently; the current breed provides evidence of the old lines' formation, but that such fusion has reduced Landrace genetic diversity 55 . Various clusters were identified based on a small genetic distance and/or high bootstrap support values. This indicated that the populations forming these clusters have high genetic identity and detailed analysis has revealed that they share a large percentage of their mean absolute allele frequency and the alleles so involved. Such information could be used in infectious disease control programmes or when designing vaccines [56][57][58] . The SLA-DRB1*02:01:01 allele has 79% amino acid identity with HLA-DRB1*01:01 in pigs and has been used for making a bioinformatics predictor based on pocket profile methodology 28 , highlighting the need for characterising SLA diversity when designing vaccination and control programmes.

Materials and methods
Animals and DNA isolation. The study involved sampling 188 animals from six populations, thereby identifying a first group called the creole pig group consisting of individuals from the Zungo (n = 26), Casco de Mula (n = 30) and San Pedreño populations (n = 35). All blood samples from this group were provided by AGROSAVIA's research centres. A second group called the commercial line consisted of Super Mom 52 and PIC commercial hybrid populations; blood samples were taken on three pig farms in the Cundinamarca department, identified as follows: Remanso U.D.C.A (n = 31), Manta (n = 31) and Ubaque (n = 35) (Fig. 1). All the pigs were randomly sampled; variables such as age, gender, production stage or relationship were not considered.
Venous blood was obtained by jugular vein puncture and kept in tubes with anticoagulant (EDTA). Genomic material was extracted following Wizard Genomic DNA Purification Kit recommendations. All samples' DNA integrity was determined on 1.5% agarose gel; a µDrop microplate reader was used for determining DNA concentration. The samples were then stored at − 20 °C until PCR. The sample collection procedure was approved by the Universidad de Ciencias Aplicadas y Ambientales' (U.D.C.A) bioethics committee (in minutes No.002/2019). All procedures followed the guidelines and regulations established in the Colombian code of bioethics (resolution 8430 of 1993) and Law 84/1989 regarding the protection of animals.
Genotypical data regarding the populations reported in the pertinent literature was included for carrying out the comparative analysis of diversity between Colombian and international populations, such as Wild boar 8  www.nature.com/scientificreports/ by a final extension step at 72° for 5 min. All PCR products were amplified in 2 independent reactions, revealed on 2% agarose gel and then purified and sent to Macrogen Inc . (South Korea) to be sequenced in both senses. CLC Main Workbench v.3.6.5 software (QIAGEN, Aarhus, Denmark; https ://digit alins ights .qiage n.com) was used for assembling and analysing the sequences. Heterozygotic individuals' sequences were manually edited, IUPAC/IUBMB single-letter code was used for ambiguous pairs 60 and Haplofinder Python script was used for assigning them [47][48][49] . This involved creating a library with the exon 2 sequences from the alleles reported for SLA-DRB1 in IPD-MHC (www.ebi.ac.uk/ipd/mhc). pGEM-T Easy vector (Promega) was used for ligating heterozygotic samples which could not be assigned by bioinformatics methods which were then transformed in competent E. coli JM109 cells to separate both alleles, following the protocol reported by FIDIC 61 . Selection involved blue/white colony screening; the amount of positive colonies was replicated and scaled for plasmid purification; 5-10 positive clones were obtained per sample, confirmed on 1.5% agarose gel and sent to Macrogen again to be sequenced in both senses. The complete genotyping results are shown in Table S6.
Genetic diversity indices, recombination events and phylogenetic analysis. The MUSCLE sequence alignment tool (https ://www.ebi.ac.uk/Tools /msa/muscl e/) was used for aligning the sequences; they were then evaluated for identifying indels (insertion and elimination events) which could have affected the length of any sequence to be analysed. Minor adjustments were then made to the alignments to eliminate gaps. DnaSP (v5) software (http://www.ub.edu/dnasp /) was used for evaluating each population's genetic diversity, nucleotide diversity (π), theta (per site) from eta (Θ), the amount of segregating sites (S), the amount of haplotypes (h) and haplotype diversity (Hd) along with their standard deviations (SD).
Maximum likelihood trees with double precision, considering Jukes-Cantor as a substitution model were built based on the alignments using FastTree Version 2.1.9 (http://www.micro beson line.org/fastt ree/) for phylogenetic reconstruction; node robustness was evaluated by bootstrap method with 1000 replicates 62 . The phylogenetic trees were visualised in the web tool Interactive Tree Of Life V3 (http://itol.embl.de) 63 ; the Recombination Detection Program (version 4) (RDP4) was also used. In-depth statistical analysis included using RDP, GENECONV (for detecting gene conversion), BootScan (screening nucleotide sequence alignments for evidence of recombination ), Maximum Chi-Squared test (MaxiChi) (for identifying potential recombination events between two sequences or between two sequences and a putative derived sequence), Chimaera (for identifying recombinant and parental sequences and estimate breakpoint positions), Sister Scanning (SiScan) (for assessing phylogenetic and compositional signals in various patterns of identity), 3Seq (for ascertaining whether any sequence in a data set is a recombinant or mosaic), VisRD (for visual recombination detection in a sequence alignment) and BURT (for testing significance in factor analysis/analysis of variance) methods 64 . Measures of genetic diversity and Hardy-Weinberg equilibrium. The maximum likelihood method was used for obtaining the amount of alleles (Na) and allele frequencies by direct counting, along with their standard errors 65 . FSTAT software was used for calculating allele richness 66 . Observed heterozygosity (ho) and unbiased expected heterozygosity (he) according to Nei 67 were estimated using Arlequin v.3.5 software for population genetic analysis 68 . Potential Hardy-Weinberg variation percentages were estimated by F IS statistic test 69 included in Genepop v.4.7.0, using the exact test of significance 70 . Molecular Evolutionary Genetics Analysis (MEGA X) 71 software was used for calculating the average amount of nonsynonymous (N d ) and synonymous (S d ) polymorphisms and their standard errors (by bootstrapping with 10,000 replicates) for all sequence pairs within populations. N d and S d were not normalised for sequence length to attain the percentage of polymorphisms in the PBR regarding the whole β1 domain encoding sequence.
Population structure and genetic differentiation. Arlequin software v.3.5.2 analysis of molecular variance (AMOVA) 68,72 was used for assessing variation amongst populations. POPTREE2 software 73 was used for estimating D A genetic distances between populations from allele frequencies 74 . R packages ape 75 and poppr 76 were used to estimate Nei standard genetic distances between populations from amino acid sequences 77 . R packages ape, poppr and stats were used for inferring dendrograms with NJ and BIONJ algorithms. Confidence values for nodes in the dendrogram were obtained by bootstrapping alleles with 1,000 replicates and the trees were constructed using webserver Phylogeny.fr 78 . Multidimensional Scaling (MDS) analysis based on D A genetic distances involved using the R cmdscale function. R squared was used for evaluating the representation of the trees' original distances and MDS and correlation analysis was used for evaluating sample size effects on h e and h o . The Bayesian iterative algorithm in STRU CTU RE multilocus genotype data for assigning individuals to a particular population 79 was used for further evaluation of population structure, inferring genetic clusters in which individuals could be assigned. Structure harvester 80 a programme for parsing Pritchard's STRU CTU RE output and performing the Evanno method 81 was used for determining the most likely K by ΔK distribution. Ten replicates were run for each K using the admixture model with correlated allele frequencies and Markov Chain Monte Carlo 50,000 burn-in steps, followed by 100,000 iterations.