Introduction

The major histocompatibility complex class I (MHCI) proteins are expressed in all somatic cells (Bjorkman and Parham 1990), where they perform a variety of cellular tasks by presenting self-derived and foreign cytosolic peptides to receptors on CD8+ cytotoxic T lymphocytes (Cresswell 2005). MHCI proteins thus play crucial roles in the vertebrate adaptive immune system by providing defense against various pathogens, and in autoimmunity. The MHCI protein complex comprises two non-covalently conjugated components: the MHCI heavy chain, which is encoded in the MHC region on chromosome 6 in humans, and on chromosome 12 or 18 in dogs (Xiao et al. 2016); a non-glycosylated light chain of β2-microglobulin, which is a non-MHC protein ~100 amino acids long (see York and Rock 1996). The MHCI glycoprotein comprises a cytoplasmic tail, transmembrane domain, and extracellular α1, α2, and α3 domains (Bjorkman and Parham 1990) encoded by exons 2, 3, and 4, respectively. The MHCI α1 and α2 domains are usually extensively polymorphic at both the nucleotide and amino-acid levels, especially for amino-acid residues that function as antigen-binding sites (ABS; Hughes and Nei 1988), where MHCI proteins interact with intracellular peptides. Individuals and/or populations that are highly polymorphic for this region have a greater chance of responding to novel pathogens (Hughes and Nei 1992). The genetic diversity of MHC genes thus serves as an indicator of the immunological fitness of a population and its ability to respond to diseases (Sommer 2005). MHC also appears to be important in sexual selection (Sin et al. 2015; Gessner et al. 2017), kin recognition (Zelano and Edwards 2002), individual odors (Brown et al. 1989), and pregnancy outcomes (Kydd et al. 2016).

The diversity of MHC genes, particularly within exon 2 of MHC class II and exons 2 and 3 of MHCI, is presumably maintained by positive selection (see Akiyama et al. 2017; Saka et al. 2017), which fixes and accumulates beneficial point mutations through interactions between pathogens and their hosts. Its polymorphism is also retained by balancing selection (Nishita et al. 2015, 2017)—overdominance and frequency-dependent selection—which enhances the high allelic diversity within species and maintains ancestral allelic diversity over long periods. Some mammalian MHC allelic lineages have persisted for more than a million years, even across speciation events (Figueroa et al. 1988; Abduriyim et al. 2017). This trans-species polymorphism (TSP) can be detected in phylogenetic reconstructions (Nishita et al. 2015, 2017; Abduriyim et al. 2017; Akiyama et al. 2017; Maibach et al. 2017; Saka et al. 2017), where some alleles from one species are more closely related (sometimes even identical) to alleles from a different species than to alleles from the same species (Klein 1987; Klein et al. 2007). Phylogenetic analyses of MHCI genes can also show orthologous relationships, in which sequences are grouped by gene or locus, rather than by species (Gu and Nei 1999; Nei and Rooney 2005; Cao et al. 2015). The orthologous relationship is yet reflected in species closely related (Nei and Rooney 2005; Cao et al. 2015), because genes are diverged from their common ancestor by speciation.

Eurasian badgers, genus Meles, split into four distinct species—European badger, M. meles; Southwest Asian badger, M. canescens; Asian badger, M. leucurus; and Japanese badger, M. anakuma (Marmi et al. 2006; Sato 2016)—from a common ancestor more than 2 million years ago (Mya) (Marmi et al. 2006; Del Cerro et al. 2010); the last divergence of the Japanese badger from the continental Asian badger occurred between 0.21–1.09 Mya (Marmi et al. 2006). These species show differences in geographic variability, size, coat color and other morphological characteristics (Corbet 1978; Abramov 2003; Abramov and Puzachenko 2013; Sato 2016), and diet ecology (Kaneko et al. 2006; Li et al. 2013). They also differ in demographic histories, which might have affected the MHC diversity in the past: the Japanese badger revealed low genetic diversity with possibly the most recent population growth (Kurose et al. 2001; Tashima et al. 2011), whereas European and Southwest Asian badgers underwent a historical sudden population expansion during the Middle Pleistocene (Marmi et al. 2006). These two badger species with a higher level of diversity have effective numbers of females six times greater than that of the Asian badger (Marmi et al. 2006). All four Meles species seem to have evolved under diverse pathogenic pressures, and some pathogens are species/region-specific in these species (Hancox 1980; Harasawa et al. 2014; Moreno et al. 2015; Hornok et al. 2017). A related finding is that MHCI genes in the European badger have been associated with the prevalence and/or intensity of certain pathogens (Sin et al. 2014). Moreover, these species possess several identical MHC class II DRB alleles, showing TSP at different taxonomic levels, and some alleles identified were novel to certain species and populations (Abduriyim et al. 2017). Comparative analyses revealed that the evolution of MHCI fundamentally differed from that of class II. Although MHC class II allelic lineages were shared, no such trans-species sharing of allelic lineages was seen at the MHCI loci in some primate species (Boyson et al. 1996), mostly due to the faster evolutionary rate of MHC class I than that of class II (Takahashi et al. 2000; Piontkivska and Nei 2003). Thus, all these facts present a compelling argument to analyze the variation and evolution of MHCI genes in the Eurasian badger. This would also contribute to insights into whole MHC diversity in these Meles species.

The MHC class II, but not class I, gene-assortative mating preference was evidenced in M. meles (Sin et al. 2015), although both MHC classes I and II genes have been associated with the prevalence and/or intensity of certain pathogens (Sin et al. 2014). Differences in the allelic diversity of MHC class II DRB genes have been detected among UK (Sin et al. 2012a) and continental populations of M. meles; 16 novel alleles were found from the continental population of M. meles, and the other 3 Meles species also showed higher genetic diversity (Abduriyim et al. 2017). Sin et al. (2012b) characterized MHCI genes in M. meles and found signs of separate evolutionary histories of MHCI protein domains: they studied insular population in UK and biased their phylogenetic analyses toward distantly related taxa. Thus, analyzing MHCI genes in closely related species is also needed to obtain further evidence for their findings.

In this study we characterized highly variable MHCI exons 2 and 3 and intervening intron 2 in four Meles species to compare MHCI gene content and diversity. Specifically, we addressed whether there are signatures of positive selection and recombination in Meles MHC alleles; how the different alleles are related among Meles species and to alleles in other carnivorans; and whether the Meles MHCI α1 and α2 domains evolved differently. Detailed comparative analyses of the MHCI gene diversity among the closely related Meles species can provide insights into understanding the evolutionary mechanisms driving variation in the immune system.

Materials and methods

Samples and DNA extraction

We used 25 samples from Eurasian badgers, of which 9 were from M. meles, 8 from M. leucurus, 6 from M. anakuma, and 2 from M. canescens. Sampling localities and profiles were as shown in Fig. 1 and Supplementary Table S1. Unfortunately, we could obtain only two individual samples from M. canescens, and the two samples may not fully represent the MHC gene characteristics in this species. Total DNA was extracted by using the DNeasy Blood and Tissue Kit (Qiagen), according to the manufacturer’s protocol, and preserved in TE buffer at 4 °C until use.

Fig. 1
figure 1

Map of Eurasia indicating locations where Meles samples analyzed in this study were obtained. Symbols: squares, M. meles; hexagons, M. canescens; circles, M. leucurus; stars, M. anakuma. Sample details are shown in Supplementary Table S1

PCR amplification

Part of the MHCI gene spanning exons 2 and 3, and including the intervening intron 2, was amplified from the samples by PCR using the primers Meme-MHCIex2F and Meme-MHCIex3R (Sin et al. 2012a). PCR amplifications were conducted in 25 µL volumes, each sample containing 7.5 pmol of phosphorylated primer, 125 µM of dNTP, 1× buffer (Mg2+), 0.62 U of PrimeSTAR GXL DNA polymerase (Takara, Japan), and 10–100 ng of genomic DNA. Cycling conditions in the Takara Dice Touch Thermal Cycler were 2 min at 94 °C; 27–30 cycles of 10 s at 98 °C, 15 s at 60 °C, and 30 s at 68 °C; and a final hold at 4 °C. The amplicons were subsequently separated by electrophoresis on 2% agarose gels and visualized with ethidium-bromide fluorescence to confirm molecular size and quantity. Target amplicons were then purified using the QIAquik PCR Purification Kit (Qiagen).

Obtaining and processing sequence data

PCR amplicons were cloned to obtain separate single sequences. Cloning and sequencing were done as described by Abduriyim et al. (2017). Following blue/white screening, 20−31 positive clones were sequenced for each individual using M13 forward and reverse primers. To determine bona fide MHCI sequences, we followed the criteria used for dog DLA (Kennedy et al. 2000). Identical sequences derived from at least two badger individuals or from two separate PCR reactions from the same individual were accepted as bona fide; unique single sequences were discarded as possible chimeras. After validation of the forward and reverse primer reads for each sequence, the vector and primer sequences were removed. The target sequences were aligned and the consensus sequences checked by using MEGA 6.0 (Tamura et al. 2013) and/or Seqman (Swindell and Plasterer 1997) in the DNASTAR Lasergene package. DnaSP v. 5 (Librado and Rozas 2009) was used to identify identical sequences among all sequences obtained from each badger species. To detect identical or homologous sequences in DNA databases, all unique MHCI sequences obtained from the four badger species were subjected to NCBI BLAST-nuc searches (Altschul et al. 1990). Verified MHCI sequences were named in compliance with the nomenclature conventions proposed by Klein et al. (1990). Verified sequences showing any signs of deletion, insertion, or premature stop codons within exons were regarded as pseudogene, and others as a presumably functional allele (PFA).

Data analyses

Parameters related to variation were calculated for each species with MEGA 6.0 and DnaSP v.5. Non-synonymous (dN) and synonymous (dS) substitution rates were calculated with MEGA 6.0, using the method of Nei and Gojobori (1986) with the Jukes and Cantor (1969) correction. To assess historical selection pressure, the ratio dN/dS (ω) was calculated separately for ABS codons, determined according to Bjorkman et al. (1987), and non-ABS codons, separately for each species. Z-tests of historical positive selection were conducted in MEGA 6.0. Additionally, to search for evidence of positive selection site by site, a codon-based likelihood analysis was performed with CODEML implemented in PAML 4.4b (Yang 2007). To check for positively selected sites (PSS), which are indicated by ω values significantly greater than 1, we used the following PAML models (Yang and Nielsen 2002; Yang et al. 2005): M1a (nearly neutral), M2a (positive selection), M7 (beta), and M8 (beta and ω). To determine whether the positive-selection models M2a and M8 provided a significantly better fit than the neutral-evolution models M1a and M7, we performed likelihood ratio tests (LRTs) to compare twice the difference in log likelihood using a χ2 distribution. PSSs were identified by posterior probability values greater than 95% for models M2a and M8 using Bayes empirical Bayes inference, which is not affected by recombination events (Yang et al. 2005). Only sites where both the M2a and M8 models gave significant signatures were considered to be under positive selection.

Recombination analyses were implemented for the nucleotide alignments spanning exons 2 and 3, and for the intervening intron 2 in RDP4 (Martin et al. 2015). In the first run, we employed the RDP (Martin et al. 2010), GENECONV (Padidam et al. 1999), MaxChi (Smith 1992), and Bootscan (Martin et al. 2005) methods with default settings, using the Bonferroni correction for multiple comparisons to detect recombination events. Recombination events detected by at least three of these methods were then rechecked using all RDP methods available (Martin et al. 2010). All apparent recombination events detected in this manner were verified by MaxChi plots and matrices and neighbor-joining trees from inferred fragments, to confirm recombinant sequences and breakpoint placement. Only recombination events verified in this manner were regarded as having significant support. Finally, a genetic algorithm for recombination detection (GARD; Kosakovsky Pond et al. 2006), provided by the Datamonkey webserver (Delport et al. 2010), was also applied to inspect recombination signatures. While both RDP4 and GARD deduce the position of breakpoints, RDP4 also identifies highly likely recombinants and their parent sequences. Due to the similar effects of recombination and gene conversion on the evolution of the <1000 bp sequence regions, they were not differentiated and are hence hereafter referred as recombination in the general sense (Richman et al. 2003).

We performed phylogenetic analyses separately for the nucleotide alignments of badger MHCI exon 2, intron 2, and exon 3 sequences, together with homologous sequences from some other carnivore species available from GenBank, including dog (Canis lupus familiaris) DLA, cat (Felis catus) FLA, ocelot (Leopardus pardalis), cheetah (Acinonyx jubatus), tiger (Panthera tigris tigris), and European badger (Meles meles). Sequences from Lynx species (Marmesat et al. 2017) and other carnivores were not included, due to shorter sequence lengths than those we obtained. All sequences were aligned by using MEGA 6.0 with the Gblocks option to yield the greatest number of informative nucleotide positions (Talavera and Castresana 2007). As a preferable alternative to phylogenetic trees in cases where gene duplication, recombination, and conversion likely occurred, neighbor-net networks for the MHCI sequences were reconstructed by using the uncorrected P-distance method implemented in Splitstree4 v4.14.5 (Huson and Bryant 2006). Nodal support was assessed by analysis of 1000 bootstrap pseudoreplicates, with only support values >70% being indicated in our trees.

Results

Diversity of MHCI sequences

A total of 588 clones were sequenced from 25 individuals representing four badger species, with a mean of 23.5 clones sequenced per individual. In BLAST-nuc searches, our sequences showed 95–100% similarity to sequences of Meles MHCI genes registered in GenBank. The final aligned MHCI data set comprised exon 2 (252 bp, 84 amino acids), exon 3 (255 bp, 85 amino acids), and intervening intron 2 (variable; 188–244 bp). We detected 64 distinct MCHI alleles among the four species, including 15 from M. meles (of which 12 were novel), 19 from M. leucurus, 7 from M. canescens, and 23 from M. anakuma (Supplementary Tables S2S5). The deduced amino-acid sequences of PFAs all showed classical MHCI features. The highest number of PFAs detected in a single individual was six in M. meles, five in M. leucurus, seven in M. canescens, and six in M. anakuma (Supplementary Tables S2S5), indicating that at least three loci exist in M. leucurus, M. meles, and M. anakuma, and four loci in M. canescens.

Based on inferred frame shifts or truncation in open reading frames, we identified presumed MHCI pseudogene sequences in M. meles and M. anakuma. Meme-MHCI*PS05 from M. meles showed a deletion of nucleotide positions 31–33, and Meme-MHCI*PS06 showed a two-nucleotide insertion between positions 113 and 114 in exon 2. Mean-MHCI*PS07 from M. anakuma showed a nucleotide insertion between positions 24 and 25 in exon 2. Mean-MHCI*PS0105 from M. anakuma showed a premature stop codon at amino acid position 128 encoded from exon 3 (Supplementary Fig. S1). We included these presumed pseudogene sequences in phylogenetic analyses, but not in analyses of diversity, selection, and recombination. Among the 64 alleles from the four badger species there were 25 distinct intron-2 sequences (Supplementary Fig. S2), with length variants of 188 bp (number of sequences with intron-2 size of 188 bp among 64 alleles identified, 24), 189 bp (5), 199 bp (21), 244 bp (9), 215 bp (2), 194 bp (1), 195 bp (1), and 201 bp (1).

For all four Meles species, at the nucleotide level MHCI exon 3 showed a higher number of variable sites than exon 2, but fewer non-synonymous substitutions; the number of polymorphic sites in the amino-acid sequences was similar between the two exons (Table 1). Intron 2 showed a higher number of variable sites than exons 2 and 3 in M. leucurus and M. canescens, but a similar number to exon 2 in M. meles and M. anakuma (Table 1). For all four species, the nucleotide diversity was lowest for exon 2 (0.079–0.083), intermediate for exon 3 (0.086–0.102), and highest for intron 2 (0.098–0.223; Table 1).

Table 1 Sequence polymorphism in MHC1 exons 2 and 3, and intervening intron 2, for M. anakuma, M. canescens, M. leucurus, and M. meles

Selection and recombination analyses

We analyzed selection pressures separately for domains α1 (encoded by exon 2) and α2 (exon 3), as their evolutionary histories might have been different. The non-synonymous substitution rate (dN) was higher than the synonymous rate (dS) for ABSs in both domains in all four species. However, ω (dN/dS) values were >1 for ABSs in the α1 domain, but not in the α2 domain in all species, indicating that, historically, positive selection to maintain allelic diversity was stronger in the α1 domain than in the α2 domain (Table 2). PAML positive selection models M2a and M8 showed a better fit than did neutral evolution models M1a and M7 in all four species, except for the α1 domain in M. canescens MHCI (Table 3). Features of positive selection were found in both domains α1 and α2. Parameter estimates under the M2a and M8 models indicated that more sites (35–37%) were under positive selection in the α1 domain than in the α2 domain (5–6%) in M. leucurus, M. canescens, and M. anakuma. A few sites (about 4%) were under positive selection in both domains in M. meles (Table 3). Two to five sites were identified as under positive selection, and PSSs invariably fell within ABSs in all four badger species (Supplementary Fig. S1).

Table 2 Rates of non-synonymous (dN) and synonymous (dS) substitutions and the ratio ω (dN/dS) for antigen-binding sites (ABS) codons, non-ABS codons, and all codons (combined) in MHCI exons 2 and 3 for M. anakuma, M. canescens, M. leucurus, and M. meles
Table 3 Positively selected sites (PSSs), log-likelihood (lnL) values, and parameter estimates under four PAML models of codon evolution, and values of the test statistic (TS) and probability (P) for the likelihood ratio test (LRT) for MHCI domains α1 and α2 in four Eurasian badger species

RDP4 detected one and three recombination events in M. anakuma and M. meles MHCI sequences, respectively (Supplementary Table S6). Two recombination breakpoints (RBPs) were detected for M. anakuma (2 RBPs), M. meles (6 RBPs) in RDP4. GARD, on the other hand, detected recombination signals in M. leucurus as well as in M. meles and M. anakuma, but not in M. canescens. The RBPs detected by GARD did not exactly correspond to those detected by RDP4 (Supplementary Table S6), although the positioning of RBPs in the exons and introns was similar.

Phylogenetic analyses

In the phylogenetic reconstruction for MHCI exon 2 (Fig. 2 and Supplementary Fig. S3a), Meles sequences did not group monophyletically by species, but formed three clades (A, B, and C), with each clade containing sequences from all four species. TSP was clearly evident, with some exon 2 sequences from particular Meles species being more closely related to sequences from other Meles species than to those from the same species. Clade B formed the sister group to a clade containing sequences from the giant panda, and in clade A seven exon 2 sequences from the giant panda formed a basal clade.

Fig. 2
figure 2

Phylogenetic neighbor-net for MHCI exon 2 (simplified) sequences from four Meles species from this study and including canine, feline, and ursine sequences obtained from GenBank. Symbols: squares, M. meles; hexagons, M. canescens; circles, M. leucurus; stars, M. anakuma. Bootstrap support values of 70% or greater are shown. See detailed tree(s) in Supplementary Fig. S3

Three clades containing Meles sequences were also evident in the phylogenetic reconstruction for exon 3 (Supplementary Fig. S3b). Although feline, canine, ursine, and Meles sequences tended to form separate clades, a few sequences (such as Acju-AJUI3 and L. pardalis_LPAI69K) nonetheless showed TSP within the feline group, and two canid sequences fell outside the canine group, positioned basal to Meles cluster C.

The phylogenetic reconstruction for intron 2 sequences likewise showed distinct Meles clades A, B, and C (Supplementary Fig. S3c). These clades included exactly the same allele sequences for intron 2 as for exon 3, except that Meme-MHCI*1PS06 and Mele-MHCI*02 were in clade A for intron 2 but in clade B for exon 3.

Discussion

Diversity of Meles MHCI genes

Kaufman et al. (1994) delineated well-conserved amino-acid residues related to structural features characteristic of classical MHCI proteins. For instance, a cysteine (C) residue forms an intradomain disulfide bridge in the α2 domain; other characteristic residues include an N-linked glycosylation site in α1, a threonine (T) to interact with the TAP (transporter associated with antigen processing) complex, residues in α3 that bind to CD8 glycoprotein on the surface of T cells, and eight highly conserved residues in α1 and α2 in mammals that bind to the N-termini and C-termini of peptides. Sin et al. (2012b) found all these features in a transcriptional analysis of the UK M. meles population. As expected, within the deduced amino-acid sequences from PFAs we observed the same classical features of MHCI proteins in α1 and α2 (Supplementary Fig. S1). In addition, we detected extensive polymorphism for both α1 and α2 (Table 1), in contrast to monomorphic or oligomorphic non-classical MHCI molecules (Shawar et al. 1994). Taken together, these features demonstrate that most of the PFA sequences we obtained from all four Meles species are classical MHC class Ia genes.

The domestic dog has one classical MHCI gene with 73 alleles (Wagner et al. 2002; Ross et al. 2012; Venkataraman et al. 2017); domestic cat has three classical MHCI genes (Yuhki et al. 2008) with a number of alleles; and giant panda has four classical MHCI genes, of which Aime-I has 16 alleles (Pan et al. 2008; Zhu et al. 2013). Liu et al. (2017) reported 17 functional alleles from four MHCI genes in four wolf (Canis lupus) individuals. Therefore, it is not surprising that we detected 7−19 PFAs in the four Meles species (Supplementary Tables S2S5). Our findings of allele numbers would indicate the possibility of detecting more novel alleles in other populations of Meles species.

PFAs detected per individual (Supplementary Tables S2S5) indicated that M. canescens could have more loci than the other three badger species, but this needs corroboration, as we studied limited number of samples from M. canescens. Among a thousand individuals in the UK population of M. meles, Sin et al. (2012b, 2014, 2015) detected seven PFAs, indicating the existence of at least two loci. We identified 13 PFAs (Supplementary Table S2), which might be from 1−3 loci, from only nine individuals of continental M. meles. These results parallel those for MHC class II DRB genes in UK and continental M. meles (Sin et al. 2012a; Abduriyim et al. 2017), suggesting that the UK population shows lower MHC allelic diversity than the continental populations. The UK population of M. meles reported to have more alleles of MHCI gene than those of MHC DRB (Sin et al. 2012a, b, 2014, 2015); the reverse was observed in its counterparts in the continent, and in the M. canescens, M. leucurus, and M. anakuma species (Abduriyim et al. 2017; Supplementary Tables S2S5), indicating that they might be subject to a different level and/or type of selection pressure from pathogens or to some effect of genetic drift in a small insular population. Intriguingly, we detected allele Meme-MHCI*05 in five of nine continental M. meles individuals, and the allele Mele-MHCI*15, which has identical nucleotide sequences to Meme-MHCI*05 in exons 2 and 3 in three of eight M. leucurus individuals. In addition, exons 2 and 3 in three alleles of M. meles, Meme-MHCI*05, *12, and *14, encode an identical amino-acid sequence; therefore, seven of the nine M. meles individuals have identical partial amino-acid sequences of MHCI protein, and identical sequences were also found in individuals of all M. leucurus examined (encoded by Mele-MHCI*04, *06, *07, *10, and *12), in all M. canescens (Meca-MHCI*02, *03, and *06), and in two of six M. anakuma (Man-MHCI*09 and *12) (Table 4). Meme-MHCI*05 was as well identified in thousands of individuals studied in the UK population (Sin et al. 2014); thus, it appears that the amino-acid sequence deduced from Meme-MHC*05 is the dominant sequence and has an important function in Meles species.

Table 4 Identical sequences of MHCI alleles between two or more of the Meles species included in the present study

There are two types of MHCI pseudogenes: complete genes whose expression is interrupted by frame shifts, premature stop codons, or other defects; and gene fragments lacking one or more exons characteristic of expressed genes (Hughes 1995). Both pseudogenes and novel functional genes are thought to have originated from classical MHCI loci by gene duplication (Hughes 1995; Beck et al. 1999). We identified several presumed pseudogene sequences with mutations leading to frame shifts or premature stop codons in M. meles and M. anakuma, but not in M. leucurus or M. canescens. The pseudogenes varied in frequency; for example, pseudogene Meme-MHCI*PS05 with a deletion was detected in four of nine M. meles individuals, while Meme-MHCI*PS06 with an insertion was found in only one individual. Each five of six M. anakuma individuals contained at least two pseudogenes.

The locus-specific intron sequences flanking the MHC polymorphic exons may prove useful for PCR-based MHC typing strategies (Cereb et al. 1995). The intron 2 of Meles sequences (Supplementary Fig. S2) revealed that most of the mutations are indels (insertions and/or deletions), leading to size variations of 188−244 bp. Shorter intron 2 sequences were documented for M. meles (Sin et al. 2012b), while longer intron 2 sequences were found in pseudogenes in canids and M. meles (Wagner 2003; Sin et al. 2012b). However, longer intron 2 of functional genes was recorded in many other mammal species (Cereb et al. 1996). In addition, supporting our results, the feline intron 2 sequences likewise varied between 182 and 202 bp (Pokorny et al. 2010; Castro-Prieto et al. 2011).

Evolution of Meles MHCI genes

We found no evidence of historical positive selection based on accumulated non-synonymous substitutions for ABS or non-ABS codons in α2 domain in any of the target badger species (Table 2), which parallels the results for the UK population (Sin et al. 2012b). The PAML analysis, however, identified particular codons under positive selection in both domains α1 and α2 (Table 3), where all PSSs fell within ABSs (Supplementary Fig. S1). It is conceivable to detect positive selection in both these domains, as domains α1 and α2 are both responsible for peptide binding (Kaufman et al. 1994). Our results of positive selection analysis using PAML are inconsistent with the study by Sin et al. (2012b), which detected positive selection only in domain α1 but not in domain α2 for the UK population of M. meles. In contrast, our study indicated that both domains are under positive selection pressure, but the intensity of positive selection was likely quite different for domains α1 and α2 across all four Meles species, except for those from UK and continental populations of M. meles. In turn, positive selection shaping and maintaining polymorphisms in ABS can lead to a wide range of antigen peptides in any given species or population, and may increase the chance of survival against pathogenic and parasitic infections (Hughes and Nei 1992; Hughes and Yeager 1998). Therefore, considering species/region-specific as well as common pathogens (Hancox 1980; Harasawa et al. 2014; Moreno et al. 2015; Hornok et al. 2017), we hypothesize that, in the continental population, the α2 domain plays a crucial role in recognizing pathogens that are unfamiliar or novel to the UK population in M. meles, and that not only α1 but also α2 domain is important for pathogen recognition in the other three Meles species.

In the phylogenetic neighbor net, exon 2 and 3, and intron 2, sequences did not form species-specific clades among Meles species, but for each gene region distantly related clades A, B, and C contained sequences from more than one species, a common characteristic of MHC genes. TSP (Klein et al. 1998) is indicative of balancing selection, resulting in ancestral allelic lineages being maintained for long periods, even after speciation events and subsequent diversification (Bernatchez and Landry 2003). The pattern of sequences clustering across species is ultimately due to orthology, which results in sequences clustering by locus (Gu and Nei 1999; Cao et al. 2015) rather than by species. MHCI intron 2 sequences tend to form locus-specific clusters in closely related species (Groot et al. 2002), which may be reflected in the high similarity in the composition of cluster C between intron 2 (Supplementary Fig. S3c) and exons 2 and 3 (Fig. 2, Supplementary Fig. S3a, b). In some cases, badger individuals had more than two alleles within a cluster in the phylogenetic tree, which indicates that orthologous loci underwent independent duplications in some species. Miska et al. (2002) argued that, with large-enough divergence times between species, orthologous relationships between loci could be lost. The divergence time between Meles species is more than 2 million years (Marmi et al. 2006; Del Cerro et al. 2010). Thus, this amount of evolutionary time is more likely insufficient to have eliminated orthologous relationships between loci in Meles species.

Recombination between loci can contribute to high allelic diversity in MHC genes (McAdam et al. 1994; Schaschl et al. 2005), and exon exchange between loci via recombination has been documented for mammalian MHC (Hughes and Nei 1989; Gu and Nei 1999). In our study, both the RDP4 and GARD analyses detected signatures of recombination events in M. anakuma and M. meles, but only GARD detected recombination in M. leucurus (Supplementary Table S6). RBPs inferred by RDP4 did not exactly correspond to those inferred by GARD. However, these two methods often give inconsistent RBPs (Zhao et al. 2013; Huang et al. 2016), probably because they use different computational methods (Kosakovsky Pond et al. 2006; Martin et al. 2010). Detection of recombination is a complicated task, and hence improved methods are needed for the purpose (Bay and Bielawski 2011).

In our analyses, RDP4 identified RBPs in intron 2 and exon 3 in both M. anakuma and M. meles, while GARD identified RBPs in intron 2 and exon 3 in these two species and M. leucurus. Although these RBPs appear to be random, suggesting they might be PCR artifacts, PCR-mediated recombinant sequences will likely be present in only a few clones and are unlikely to be sequenced twice (Huang et al. 2016), whereas the recombinant sequences we detected were each found in 3–19 clones. Partly corroborating this, we detected natural recombination events: the degenerate 13-bp sequence motif CCNCCNTNNCCNC, which is associated with human recombination hotspots (Myers et al. 2008), occurred in nearly half of the Meles intron 2 sequences, where most of the RBPs were located. No recombination hotspots are known for PCR-mediated recombinants (Cronn et al. 2002; Lahr and Katz 2009; Huang et al. 2016), indicating that the recombination events we detected did not occur in vitro, but in vivo. Sin et al. (2012b) likewise reported in vivo recombination events in the UK population of M. meles.

Incongruity in within-clade topology and positive selection intensity between exons 2 and 3 indicates that domains α1 and α2 had different evolutionary histories. This is to some degree in agreement with the findings of Sin et al. (2012b). The birth-and-death evolution model assumes that novel genes arise by gene duplication, and that some of the duplicated genes, which may represent ancestral polymorphism, are retained for long periods, while others are deleted or become non-functional (Nei and Rooney 2005). Because our study and Sin et al. (2012b) detected a number of pseudogenes derived from Meles MHCI PFA, and our phylogenetic analysis also indicated independent duplication of orthologous loci, it would be expected under the birth-and-death model of evolution. More recombinants with RBPs in intron 2 and exon 3 were found in M. meles than in any of the other species, corroborating the suggestion by Sin et al. (2012b) that the α2 domain underwent a concerted evolution, in which RBPs in intron 2 and exon 3 separated domain α1 from domain α2 and/or α3, with the α2/α3 domain sequences becoming homogenized. Although, Gu and Nei (1999) and Nei and Rooney (2005) argued that recombination is relatively rare and tends not to be selectively advantageous, and that variation in MHC genes is generated primarily by mutation and selection, we cannot rule out the proposal (McAdam et al. 1994; Schaschl et al. 2005) that mutant alleles were generated by recombination or gene conversion, as many recent studies, in addition to ours (Nishita et al. 2015, 2017; Abduriyim et al. 2017) and Sin et al. (2012b), have collectively reported recombination events in MHC.

We detected many more PSSs at ABSs in the α1 domain than in the α2 domain in M. anakuma, M. leucurus, and M. canescens, but equivalently low numbers of PSSs in domains α1 and α2 in M. meles (Table 3). The RDP4 and GARD analyses both detected recombination events in M. anakuma and M. meles, whereas no recombination signatures were evident in M. canescens, and only GRDG showed RBPs in M. leucurus (Supplementary Table S6). Most RBPs were in intron 2, with a few in exon 3. Intriguingly, all four Meles species showed an equally high level of polymorphism for both domains α1 and α2 (Table 1). Taken together, these findings indicate that positive selection dominated in shaping the high polymorphism in α1 domain, whereas both recombination and positive selection contributed to the high polymorphism in α2 domain. Meanwhile, the possibilities obtained from the positive selection, diversity, and recombination analysis together suggest that different mechanisms shaped MHCI gene polymorphisms in these closely related species. This could be explained by the different biological traits, contemporary and historical demography, and/or the varying spectra of pathogens to which these species exposed. To comprehensively understand the complexity of MHC evolution in these Meles species, further pooled analysis on the genetic diversity of MHC and additional studies on demographic, biological and ecological traits, and pathogens is needed.

Conclusion

Our findings emphasize the significance of exploring each component of a gene separately (Sin et al. 2012b) to gain comprehensive knowledge of MHCI evolution. Both MHC class I and II (Abduriyim et al. 2017) genes showed extensive TSP, which is indirect evidence of balancing selection. However, the separate phylogenetic analyses exhibited incongruity between domains α1 and α2 in MHCI proteins, implying that they have different evolutionary histories. As Sin et al. (2012b) proposed, the more rational interpretation could be a concerted evolution of the α2 domain, given that RBPs in intron 2 probably lead to homogenizing exon 3 (α2 domain), while exon 2 (α1 domain) could be subject to balancing selection. A smaller number of shared alleles and higher number of pseudogenes were observed in MHCI compared to class II (Abduriyim et al. 2017). This is due to the faster turnover rate of typical MHCI loci relative to that of MHC class II. Owing to various degrees of divergence, MHCI classical genes generate a number of non-classical genes and pseudogenes (Piontkivska and Nei 2003).

The differences in selective pressures and recombination events, but equally high sequence variations in the four Meles species, indicate different mechanisms for shaping MHCI gene polymorphisms. The varying spectrum of pathogens, particularly species/region-specific pathogens, provides a more plausible explanation to drive and retain variations. Taking MHCI-associated autoimmunity and recombination events in these species into consideration, it can be concluded that the ethological, ecological, or biological traits and demographic histories, which are scarcely known, may to some degree play a role in MHC evolution.

Data archiving

Sequence data have been submitted to GenBank: accession numbers LC350023–LC350083.