Introduction

Genes of the major histocompatibility complex (MHC) constitute a key component of the adaptive immune system in vertebrates, as they code for proteins directly involved in the presentation of intra-cellular and extra-cellular pathogens (MHC class I and class II, respectively). Although MHC molecules of both classes differ in their basic structure, they share a notable feature—a special groove that binds antigens (the so-called peptide-binding region, PBR). MHC-bound antigens (or self-peptides) are presented to T lymphocytes and may be recognized by T-cell receptors (TCRs), which triggers a cascade of immune responses that are aimed at pathogen elimination (Janeway et al. 2001).

In general, the evolution of the MHC in jawed vertebrates proceeded from a simple design with relatively few genes in Chondrichthyes fish to a much more complex architecture in mammals (Kulski et al. 2002), where duplicated MHC class I and class II gene copies are scattered over several loci (defined as specific positions on the chromosome where gene copies are located). In birds, ancestral phylogenetic lineages have relatively low numbers of MHC gene copies, e.g., most species from one of the oldest avian clades (order: Galliformes) have only one dominantly expressed classical MHC gene copy at each class (Kaufman et al. 1999). The MHC duplication rate was generally suppressed in non-passerine birds, but greatly accelerated during the passerine (order: Passeriformes) radiation (Minias et al. 2019). Among oscine passerines, the highest numbers of MHC copies were recorded in the Sylvioidea (MHC class I) and Passeroidea (MHC class II) superfamilies (Minias et al. 2019), reaching over 30 copies in some species (Sepil et al. 2012; Biedrzycka et al. 2017a). At the inter-specific level, different species may have different evolutionary optima of MHC copy numbers and some copies may be deleted or inactivated via deleterious mutations (turned into pseudogenes) over evolutionary times, consistent with the birth-and-death model of MHC evolution (Nei et al. 1997). The presence of multiple MHC copies (generated through gene duplications) is thought to confer fitness benefits, at it is expected to increase the number of alleles (unique gene variants) expressed within individuals and, thus, should enhance the spectrum of pathogens recognized by an individual organism (Bentkowski and Radwan 2019). On the other hand, the number of MHC copies cannot expand in an unlimited manner and, according to the optimality hypothesis (Nowak et al. 1992), copy numbers are thought to be restricted by the inherent costs of allele expression, e.g., depletion of TCR repertoire, as recently shown in rodents (Migalska et al. 2019).

One of the most notable features of the MHC is an extreme level of allelic diversity (the number of alleles per gene) within populations, which is maintained by various forms of pathogen-driven balancing selection (Spurgin and Richardson 2010; Radwan et al. 2020), including the mechanisms of overdominant selection (heterozygote advantage, i.e., higher fitness of heterozygous over homozygous genotypes), negative frequency-dependent selection (higher fitness of rare genotypes) and fluctuating selection (spatial and temporal variation in fitness value of particular genotypes) (reviewed in Radwan et al. 2020). Although the relative roles of these three mechanisms in generating MHC diversity are very difficult to separate (Radwan et al. 2020), their combined effect can generate and maintain thousands of MHC alleles within natural vertebrate populations (e.g., Biedrzycka et al. 2017a). Also, strong balancing selection can maintain highly adaptive alleles for long evolutionary periods and, consequently, coalescent time of MHC alleles may span many millions of years (Takahata 1990). Persistence of MHC alleles beyond species diversification produces a specific phylogenetic pattern where some alleles are more similar between species than within species (so-called trans-species polymorphism) (Takahata 1990).

So far, the evolution of the mammalian MHC has been most rigorously studied among all vertebrate groups and the major evolutionary features of these genes were first revealed during the pioneering research on mammals (Hess and Edwards 2002; Bernatchez and Landry 2003). However, the patterns of molecular evolution of the MHC are thought to largely differ between mammals and birds. The mammalian MHC primarily evolves according to the divergent evolution model (Ota and Nei 1994), where different loci are maintained independently of each other after duplication events. Consistently with this hypothesis, some MHC loci were found to have duplicated very early in the mammalian radiation and evolved independently since the divergence of marsupial and placental mammals (Takahashi et al. 2000). In contrast, the avian MHC is characterized by more recent duplications with little evidence for orthologous relationships between loci across different lineages (Edwards et al. 1995, but see Burri et al. 2008). Also, different MHC loci in birds are thought to be continuously homogenized (e.g., by gene conversion) and often evolve as a single unit, consistently with the concerted (rather than divergent) model of evolution (Nei and Rooney 2005). Molecular analyses provided evidence for the concerted evolution of MHC class II in several avian lineages (Wittzell et al. 1999; Gillingham et al. 2016), but it remains unresolved whether similar mechanism shapes MHC class I diversity in birds.

Although the recent advancement of second- and third-generation sequencing technologies sparked research on the MHC in a broad spectrum of non-model organisms (O’Connor et al. 2019), most studies focus either on a single taxon or on a single MHC class, which seriously limits our knowledge on whether MHC class I and class II genes have similar evolutionary trajectories in the same vertebrate lineages. In birds, comparisons of MHC classes I and II evolution in a multi-species framework have been conducted for several non-passerine clades (Gillingham et al. 2016; Minias et al. 2016; Sallaberry‐Pincheira et al. 2016), whereas multi-species analyses of passerine MHC focused either on class I (e.g., O’Connor et al. 2016; Drews et al. 2017) or class II (e.g., Jarvi et al. 2004; Balasubramaniam et al. 2016). On the other hand, the few phylogenetically robust analyses of MHC evolution in birds (across a wide spectrum of avian lineages) have usually been based on publicly available genetic resources (GenBank database), thus combining data generated with a variety of methodological (genotyping and data processing) approaches and suffering from unbalanced sample sizes (both in terms of the number of individuals/species genotyped and species composition) (e.g., Minias et al. 2018, 2019). Although this kind of research can provide some broad pictures of MHC evolution in non-model taxa, due to its inherent methodological limitations it may not be particularly appropriate to detect any fine-scale differences in the evolutionary mechanisms between MHC class I and class II. In fact, evolutionary trajectories of the two MHC classes should ideally be compared using the same set of taxa, uniform sample sizes and uniform genotyping/data processing methodology, but such analyses are still lacking for most avian lineages, including passerines.

The aim of this study was to compare the evolution of MHC class I and class II genes in oscine passerines using three sister clades of finches (Fringillinae and Carduelinae) and buntings (Emberizidae) as our study object. Despite the fact that both finches and buntings are relatively diverse (201 and 42 species, respectively; Winkler et al. 2015) and belong to the most commonly represented passerine families in the Western Palearctic avifauna, they have been largely underrepresented in passerine MHC research and our knowledge on the MHC evolution in these clades is fragmentary and limited to just a couple of species (Arnaiz-Villena et al. 2007; Li et al. 2017; Drews and Westerdahl 2019). For example, phylogenetic analyses of MHC class I in three carduelid finches (Serinus) provided evidence for trans-species polymorphism and slower evolution of MHC class I in continental (African and Asian) species compared to the species from Canary Islands (Arnaiz-Villena et al. 2007). Examination of MHC class I expression profiles in another carduelid finch, the Eurasian siskin Spinus spinus, allowed identification of classical and nonclassical alleles and revealed that several MHC I genes are highly expressed in this species (Drews and Westerdahl 2019). As far as we are aware, MHC class II has not been studied in Fringillidae, but its polymorphism was assessed in two east Asian Emberiza species (Emberizidae), providing support for positive selection signature and trans-species polymorphism, but no evidence for recombination occurring at these genes (Li et al. 2017). To the best of our knowledge, the evolution of MHC class I and class II has never been directly compared within these avian lineages. Here, we reconstructed and compared the evolutionary history of MHC class I and class II genes across the clades of Emberizidae, Carduelinae and Fringillinae using uniform genotyping and data processing methodology, as well as uniform sample sizes. Specifically, we hypothesized that contrasting selective regimes from intra-cellular and extra-cellular pathogens could have generated and maintained differences in: (1) gene copy number variation (duplication rates), (2) sequence polymorphism, (3) recombination (gene conversion), (4) nucleotide substitution rates and (5) phylogenetic patterns (e.g., the level of trans-species polymorphism) between the MHC class I and class II genes. Such comparative analyses of evolutionary trajectories in the two MHC classes provide a unique opportunity to infer how selection from two distinct groups of pathogens (intra-cellular and extra-cellular) affected the evolution on their hosts.

Materials and methods

Sample collection and preparation

The MHC polymorphism was examined in eleven species from three clades: Emberizidae (three species): corn bunting Emberiza calandra, yellowhammer Emberiza citrinella and reed bunting Emberiza shoeniclus; Carduelinae (six species): common redpoll Acanthis flammea, European goldfinch Carduelis carduelis, European greenfinch Chloris chloris, hawfinch Coccothraustes coccothraustes, Eurasian bullfinch Pyrrhula pyrrhula and Eurasian siskin; and Fringillinae (two species): common chaffinch Fringilla coelebs and brambling Fringilla montifringilla. The differences in the number of species sampled per clade within our study roughly reflected the true differences in the species richness between the clades, with Fringillinae (3 extant species) and Carduelinae (166 extant species) being the least and the most species rich (Winkler et al. 2015). We captured and sampled blood from eight individuals per species and sampling took place at four locations in Poland (5–53 individuals per site; Table S1 in Appendix 1). All samples were collected in March–December 2018 during breeding, autumn migration and early wintering seasons. Since most individuals (ca. 85%) were captured during migration or wintering, we had no reliable information on their origin and different sampling sites could not be treated as separate populations (birds from the same population could migrate through different sampling sites and birds from different populations could migrate through the same sampling site). Consequently, sampling location was not taken into account in our analyses. Blood was collected from each captured individual by puncturing the ulnar vein with a disposable needle. Ca. 50 μl of blood was collected with a heparinized capillary, transferred into Eppendorf tubes with 95% ethanol and kept at 2 °C until laboratory analyses. Genomic DNA was extracted from blood samples with Bio-Trace DNA Purification Kit (EURx, Gdansk, Poland).

MHC amplification and Illumina sequencing

In non-model species with relatively high allelic diversity of the MHC (as in the case of our study system, see results for details), any single pair of primers is unlikely to amplify all alleles existing within populations (and within individuals). Since different primers can amplify different MHC alleles, a combination of genotyping data obtained with two (or more) independent pairs of primers is likely to provide a more reliable (less underestimated) estimate of the total number of alleles present within individuals and, consequently, of the gene copy number. Thus, each MHC class (I and II) was genotyped using two sets of independent primers (Table 1 and Fig. S1 in Appendix 1), which were previously used for MHC genotyping in a wide range of passerine species (Aguilar et al. 2006; Alcaide et al. 2013; Canal et al. 2010; O’Connor et al. 2016). The primers successfully amplified fragments of class Iα exon 3 or class IIβ exon 2 in all our study species and following the approach by O’Connor et al. (2016) we did not aim to design species-specific primers, so that the results were more comparable at the inter-specific level. Although peptide-binding grooves of both MHC classes I and II molecules are coded by two exons (exon 2 and 3 coding for α1 and α2 domains at class I; exon 2α and 2β coding for α1 and β1 domains at class II; Chen et al. 2015), class Iα exon 3 and class IIβ exon 2 are thought to be most polymorphic and are most commonly targeted while genotyping MHC in non-model vertebrates, including birds (avian MHC sequences deposited in GenBank: 255 exon 2 class Iα vs. 6691 exon 3 class Iα; 30 exon 2 class IIα vs. 4629 exon 2 class IIβ; as accessed on 5 October 2020). For amplifications we used fusion primers composed of the MHC primer (as described in Table 1), a 7-bp barcode indicating sample identity, and Illumina Nextera Transposase adaptor sequences (Illumina Corp., San Diego, CA, USA). All PCR reactions were conducted in a final volume of 20 μl containing 10 μl of 2X HotStarTaq Plus Master Mix Kit (Qiagen, Venlo, Netherlands), 10–20 ng of genomic DNA and 0.2 μM of each primer. Protocols for PCR reactions were taken from original sources (Table 1). Positive amplifications were confirmed with horizontal electrophoresis in agarose gels and PCR products were purified with AMPure XP magnetic beads (Beckman Coulter, Brea, CA, USA). A separate library was constructed for each primer set using NEBNext DNA Library Prep Master Mix Set for Illumina (New England Biolabs, Ipswich, MA, USA). All four libraries were sequenced using Illumina v2 Kit at a 2 × 250 bp paired-end Illumina MiSeq platform. PCR products from each primer pair (88 individuals) were sequenced in a separate Illumina run. We also added 26 technical replicates across all four runs (26 different individuals genotyped twice using independent PCR amplifications).

Table 1 Primers used for MHC class I exon 3 and MHC class II exon 2 genotyping in finches and buntings.

Illumina data processing and allele validation

Illumina sequencing data were processed using the Amplicon Sequencing Analysis Tools (AmpliSAT) web server (Sebastian et al. 2016) and algorithms implemented therein. First, AmpliMERGE algorithm was used to merge pair-ended Illumina reads (default settings). Second, we used the AmpliSAS tool to proceed with de-multiplexing, clustering, and filtering of merged sequences. The clustering stage uses the modified algorithm by Stutz and Bolnick (2014) to deal with genotyping errors associated with high‐throughput sequencing techniques, i.e., to identify reads resulting from genotyping errors and to cluster them with reads identified as true alleles. Briefly, all variants in the amplicon are first ordered by depth and clustering starts from the most dominant sequence (highest depth). Each remaining amplicon sequence is compared with the dominant one and its sequencing/PCR errors (artefacts) are identified based on user‐defined criteria by performing high‐accuracy pairwise global alignments between the sequences (Sebastian et al. 2016). Here, we used default criteria for Illumina sequencing data (substitution errors—1%, indel errors—0.001%, minimum dominant frequency—25%), as recommended by Biedrzycka et al. (2017b). During the filtering stage, chimeras were detected using a set of specific rules (see Appendix 1 for details) and discarded. Sequences with the final (after clustering) read frequency of <3% per sample (amplicon) were also discarded (following AmpliSAT recommendations for Illumina sequencing). Minimum amplicon depth was 100 reads, while the maximum amplicon depth was set to 5000 reads (because of performance reasons). The AmpliSAS output for all four Illumina runs is shown in Appendix 2.

Although our genotyping approach targeted only fragments of MHC class I and class II molecules and these fragments originated from an unknown number of gene copies in an unknown synteny in the genome, we henceforth refer to these fragments as alleles. Different pairs of primers amplified sequences of different length (Table S2 and Fig. S1 in Appendix 1). At MHC class I, primers PP1_1 amplified a fragment of exon 3 (264 bp out of 276 bp total exon length, codon positions 5–92 within the exon) and primers PP1_2 amplified shorter fragment of the same exon (240 bp, codon positions 5–84) (Table S2 in Appendix 1). At MHC class II, primers PP2_1 amplified a fragment of exon 2 (219 bp out of 270 bp total exon length, codon positions 18–90 within the exon) along with four codons of intron 2 that were subsequently removed from the alignment (Table S2 in Appendix 1). Primers PP2_2 amplified shorter fragments of MHC class II exon 2 (168 bp, codon positions 18–73; Table S2 in Appendix 1). Longer sequences (obtained with PP1_1 and PP2_1 primers) were used for the analyses of polymorphism, recombination, selection, and phylogenetic clustering, as our priority was to obtain the results across the larger part of the exons rather than across the larger set of alleles. However, since none of our primer pairs was likely to amplify all existing alleles across all duplicated loci and null (unamplified) alleles were likely to differ between primer sets, we have repeated the analyses of selection using shorter sequences (PP1_2 and PP2_2) to test whether null alleles biased our results. Although our exon sequences were not complete, they covered the key fragments of the peptide-binding groove of MHC class I (ca. 95% of α2 domain) and class II (ca. 80% of β1 domain) MHC molecules that are likely to be targeted by pathogen-driven selection in passerine birds (Minias et al. 2018). All alleles had cysteine amino acids at the highly conserved residues responsible for intra-domain disulfide bridge formation (positions 11 and 74 at MHC class I, position 75 at MHC class II; Fig. 3).

Copy number variation

Following a common practice in MHC research, the number of MHC gene copies in each species was estimated by dividing the maximum number of putatively functional MHC alleles detected per any individual within species by two (i.e., assuming heterozygosity at each locus) (reviewed in Minias et al. 2019). Here, we first assessed repeatability of copy number estimates between both independent pairs of primers by calculating intra‐class correlation coefficients in the irr package (Gamer et al. 2012) developed for R statistical environment (R Foundation for Statistical Computing, Vienna, Austria) and, then, used the copy number estimates inferred from the combined data in all further analyses. Non-functional alleles (having stop codons or frame-shift mutations) were not included in copy number estimates. All the other alleles were recognized as putatively functional, although some of them could possibly contain non-functional mutations outside the regions that were targeted by genotyping. To reconstruct evolutionary history of copy number variation, we fitted six macroevolutionary models, which describe different evolutionary scenarios: (1) Brownian Motion (BM)—constant rate neutral evolution (due to genetic drift) or fluctuating selection towards different evolutionary optima, (2) BM adjusted for λ—BM model with varying rates of genetic drift and fluctuating selection across the phylogeny (λ decreasing from 1 to 0 indicates increasing phylogenetic independence), (3) BM adjusted for δ—BM model with different rates of early (more ancestral) vs. late (more recent) evolution of the phylogenetic lineage, (4) time-dependent linear model—assumes linear changes in evolutionary rates, (5) time-dependent early burst model—assumes non-linearly decreasing evolutionary rates and (6) Ornstein–Uhlenbeck—stabilizing selection with a constant evolutionary optimum. All models were fitted to the data with the fitContinuous function in the geiger R package (Harmon et al. 2008) and the relative fit of all models was compared using the Akaike information criterion corrected for small sample sizes (ΔAICC). The rate of MHC copy number evolution was estimated using each of these models (1000 iterations per model). We also used the phylogenetic scaling parameter λ (BM model adjusted for λ; Pagel 1999) to estimate phylogenetic signal, which describes statistical dependence in trait values (MHC gene copy numbers) among species due to their phylogenetic relationships (the tendency of related species to resemble each other) (Revell et al. 2008). In general, phylogenetic signal can vary between 0 and 1, where λ = 0 indicates complete phylogenetic independence of a trait, whereas λ = 1 indicates that trait evolution is completely determined phylogenetically (Freckleton et al. 2002). The ancestral states of MHC copy numbers at the root of the phylogeny (trait value expected in the last common ancestor of our study species) were reconstructed using the maximum-likelihood approach (under the assumptions of BM process), as implemented in fastAnc function in the phytools R package (Revell 2012). Phylogenetic relationships between species were based on a complete avian time-calibrated phylogeny developed by Jetz et al. (2012) with a backbone topology from Ericson et al. (2006). To account for phylogenetic uncertainty, we generated a consensus phylogeny from a random sample of one thousand trees downloaded from the BirdTree web server (Jetz et al. 2012). Differences in the number of MHC class I and class II copies detected per species were tested with t-test for dependent samples, while the linear association between MHC class I and class II copy numbers across species was tested with Pearson correlation coefficient.

Sequence polymorphism

We used DnaSP v.6.10.3 software (Rozas et al. 2017) to calculate the number of segregating sites, total number of mutations and nucleotide diversity, as the basic measure of sequences polymorphism. We also used DistCalc function from the MHCtools R package (Roved 2019) to calculate Grantham (1974) and Sandberg et al. (1998) distances between amino acid sequences, as they take physio-chemical proprieties of amino acids into account. All polymorphism measures were calculated separately for each clade, as well as across all clades. Since we had no reliable information on the breeding origin of sampled birds (most individuals were captured during migration), we did not compute Tajima’s D values.

Recombination

To quantify recombination signal at the MHC class I and class II, we used RDP v.4.97 software (Martin et al. 2015), which implements several different algorithms developed to detect recombinant sequences. The following approaches were used to assess recombination in our data: Maxchi (Smith 1992), BootScan (Salminen et al. 1995), Genconv (Padidam et al. 1999), SiScan (Gibbs et al. 2000), RDP (Martin and Rybicki 2000), Chimaera (Posada and Crandall 2001) and 3Seq (Boni et al. 2007). All analyses were run using default settings, statistical significance threshold of P = 0.05, and Bonferroni correction for multiple comparisons. A recombination event was recognized when supported by two or more algorithms, all events recognized by a single algorithm were discarded. Different recombination events were identified based on the different location of recombination breakpoints along exon sequences. Since MHC polymorphism may be retained beyond species divergence, recombination signal can be traced over long evolutionary times (Minias et al. 2016) and, thus, we conducted the analyses across all species from all clades. Recombination signal was also assessed as the number of breakpoints within 100 nucleotide window and the presence of recombination hot and cold spots was tested with the local hot/cold-spot test (1000 permutations), as implemented in RDP software.

Selection

Patterns of selection were assessed with the relative rate of nonsynonymous (amino acid altering) nucleotide substitutions per nonsynonymous site to synonymous (silent) nucleotide substitutions per synonymous site (dN/dS). Nonsynonymous mutations are expected to accumulate faster than synonymous mutations under positive (diversifying) selection, resulting in dN/dS > 1. In contrast, negative (purifying) selection removes nonsynonymous mutations and, thus, synonymous mutations accumulate at a faster rate (dN/dS < 1). Finally, lack of significant deviations from dN/dS = 1 indicates no signature of selection, consistent with neutral evolution. Following recommendations by Kryazhimskiy and Plotkin (2008), substitution rates were estimated at the inter-specific levels, both within and across all three phylogenetic clades. Since selection estimates can be biased by the presence of recombinant sequences, which affect tree topologies used to infer substitution rates (Anisimova et al. 2003), the dN/dS ratios were computed using non-recombinant sequences only, except for estimates across all clades, which were computed both with and without recombinant sequences. Codon-specific signatures of pervasive (constant along the entire phylogeny) positive and negative selection were inferred using Bayesian (Fast Unconstrained Bayesian AppRoximation, FUBAR) and maximum-likelihood (fixed effect likelihood, FEL) approaches (Kosakovsky Pond and Frost 2005; Murrell et al. 2013). Also, episodic (under a proportion of branches) positive selection was assessed using Mixed Effects Model of Evolution (MEME) (Murrell et al. 2012). All analyses were run at the Datamonkey web server (Weaver et al. 2018) under default settings and input trees inferred from the sequence alignments. Residues with posterior probabilities > 0.95 (FUBAR) or statistical significance P < 0.05 (FEL, MEME) were considered to have enough support for selection signal. As the exact structure of MHC molecules, including the location of the peptide-binding groove, has not yet been resolved in our study species (and generally in birds), the dN/dS ratios were calculated for different sets of MHC class I exon 3 and class II exon 2 residues: (1) all residues; (2) 20 most positively selected residues; (3) putative peptide-binding residues of passerine birds, as inferred from the global analysis of selection at the avian MHC (Minias et al. 2018); and (4) human peptide-binding residues, as inferred from the crystallographic structure of MHC molecules (class I: Saper et al. 1991; class II: Brown et al. 1993).

Phylogenetic clustering of alleles

Phylogenetic relationships between MHC alleles were inferred using two approaches. First, we used the approximately maximum-likelihood approach implemented in FastTree v.2.1.5 software (Price et al. 2010), where initial topologies were inferred using a neighbour-joining approach and refined with subtree-pruning-regrafting, as well as minimum-evolution and maximum-likelihood nearest-neighbour interchanges. Local node support values were estimated based on Shimodaira–Hasegawa (1999) tests. Second, we used Bayesian inference from MrBayes v.3.2.6 software (Ronquist et al. 2012), which implements Markov chain Monte Carlo (MCMC) algorithms to approximate the posterior probabilities of trees (MCMC settings: 4 chains, 120,000 chain length, 20,000 burin-in period and 500 sampling frequency). While neither of these approaches is valid for ancestral state reconstruction, they are expected to provide reliable information on allele clustering patterns, although we acknowledge that the results may be affected by recombination (gene conversion) mechanism typical for the MHC. Phylogenetic analyses were run separately for MHC class I and class II alleles. In all analyses, we used a general time-reversible model of nucleotide substitutions with a discrete Gamma distribution to account for varying rates of evolution at different sites, but for MHC class I we also set the proportion of invariables sites (as determined with model selection analysis in MEGA v6.06; Tamura et al. 2007). MHC sequences from the lesser kestrel Falco naumanni (GenBank Nos: EU120676 for class I, EU107738 for class II) were used as outgroups, as Falconiformes form a closely related ancestral clade for Passeriformes (Jetz et al. 2012).

Results

Illumina sequencing with different pairs of primers

After processing, the total depth of alleles varied between 2083 and 4142 reads per sample (amplicon), depending on the pair of primers (Table S2 in Appendix 1). We retrieved 439 (PP1_1) and 387 (PP1_2) MHC class I alleles, while at the MHC class II we retrieved 306 (PP2_1) and 372 (PP2_2) alleles. Reproducibility of allele identification was 92%, as assessed with technical replicates. No evidence was found for a correlation between read number and number of alleles detected for any of the runs (Pearson correlation coefficients, all P > 0.05). The number of non-functional alleles was relatively low (2.3–2.7% for class I; 0.8–4.2% for class II). The highest number of non-functional alleles was recorded in Fringillinae (five MHC class I and 11 MHC class II sequences). Much lower numbers (1–2) of non-functional alleles were recorded in several other species (A. flammea, C. carduelis, and P. pyrrhula at class I; C. coccothraustes, E. schoeniclus at class II), while in the remaining species no non-functional alleles were recorded. All non-functional alleles were discarded, resulting in 427 (PP1_1) vs. 377 (PP1_2) functional MHC class I alleles (337 overlapping between both primer pairs), and 289 (PP2_1) vs. 364 (PP2_2) functional MHC alleles (210 overlapping between both primer pairs).

The number of alleles detected per individual was much more repeatable between the pairs of primers at class I (R = 0.75, 95% CI: 0.65–0.83) than class II (R = 0.35, 95% CI: 0.13–0.53). A similar pattern was found for the number of gene copies estimated for each species based on the maximum number of putatively functional MHC alleles detected per individual (class I: R = 0.83, 95% CI: 0.46–0.95; class II: R = 0.57, 95% CI: −0.05 to 0.87). Higher repeatability for class I could be enhanced by relatively similar forward primers (88.2% vs. 40.0% sequence similarity for MHC class I and class II, respectively; Table 1). At class I, the average number of alleles detected per individual was 7.67 ± 0.43 for PP1_1 and 7.26 ± 0.30 for PP1_2, while data combined for both primer pairs gave 8.88 ± 0.43 alleles retrieved per individual. Large differences in the number of alleles per individual between primer pairs were detected in five species, where PP1_1 retrieved more alleles in E. citrinella (on average 2.63 ± 0.92 more alleles), F. coelebs (2.88 ± 0.55 alleles) and F. montifringilla (3.50 ± 0.42 alleles), while PP1_2 retrieved more alleles in E. schoeniclus (3.25 ± 0.62 alleles) and C. coccothraustes (3.13 ± 0.55 alleles) (Fig. S2A in Appendix 1). At the class II, the average number of alleles detected per individual was 5.05 ± 0.24 (PP2_1) and 6.31 ± 0.23 (PP2_2), but combination of data for both primer pairs resulted in an average of 8.33 ± 0.31 alleles per individual. The largest differences were recorded in Fringillinae, where PP2_2 retrieved on average 2.25 ± 0.96 (F. coelebs) and 4.75 ± 0.70 (F. montifringilla) more alleles per individual than PP2_1, whereas PP2_1 retrieved 2.00 ± 0.93 more alleles per individual in P. pyrrhula (Fig. S2B in Appendix 1). Any single primer pair underestimated the number of functional gene copies per species by an average of 0.86 ± 0.24 (MHC class I) and 1.45 ± 0.23 (MHC class II), being consistent with an overall underestimation rate of 20.2% when compared to data combined across both pairs of primers.

Copy number variation

The total number of alleles retrieved within individuals (across both pairs of primers) was 2–20 (class I) and 4–18 (class II). The number of gene copies in each species ranged from two (P. pyrrhula) to ten (F. coelebs) at the MHC class I, and from three (S. spinus) to nine (E. citrinella) at the MHC class II (Fig. 1). Phylogenetic signal was much lower at MHC class I than class II (0.19 vs. 0.64). The evolution of copy number at both MHC classes was best explained by the BM model (Table 2). Ancestral states at the root of the phylogeny were estimated at intermediate values (5.75 class I copies and 6.03 class II copies according to the best fitting model, Table 2), suggesting that some taxa lost, while the other gained MHC copies during the evolution of this passerine lineage. Ancestral states at the root node for each of three clades were highest for Fringillinae (8.42 class I copies and 7.22 class II copies) and lowest for Carduelinae (4.99 class I copies and 5.32 class II copies). Evolutionary rate was higher for MHC class I rather than class II copy number evolution (Table 2).

Fig. 1: MHC gene copy numbers.
figure 1

Ancestral character estimation of copy numbers at MHC class I (A) and class II (B) along the branches and nodes of the phylogeny in finches and buntings. Bars associated with each terminal node indicate raw number of gene copies (loci).

Table 2 Relative fit (ΔAICC) and parameter estimates of models describing the evolution of gene copy number at MHC class I and class II in finches and buntings.

Sequence polymorphism

Although MHC class I had higher allelic diversity than class II (427 vs. 289 alleles, as retrieved across all species with PP1_1 and PP2_1 primer pairs), sequence polymorphism at the nucleotide level was greater at the MHC class II. MHC class II exon 2 had higher proportion of segregating sites (84.5%) and more mutations (n = 364) than MHC class I exon 3 (74.3% segregating sites; 313 mutations) (Table 3). Nucleotide diversity and amino acid distances (Grantham and Sandberg) were also higher at MHC class II than class I (Table 3). This pattern was primarily apparent in Emberizidae and Carduelinae, but not in Fringillinae (Table 3). Indel mutations were recorded within exons of both MHC classes. A single-codon deletion at position 61 of the MHC class I exon 3 was recorded in 54.9% sequences across all species. Similarly, a single-codon deletion was recorded at position 74 of the MHC class II exon 2, but only in Emberizidae (E. citrinella and E. schoeniclus; 1.4% of all sequences). Also, a single-codon insertion after position 85 was recorded within the same exon in Carduelinae (A. flammea and Ch. chloris; 7.5% of all sequences).

Table 3 Polymorphism of MHC class I exon 3 and class II exon 2 within and across three passerine clades of finches and buntings (n = 8 individuals per species).

Recombination

Recombination signal was much stronger at the MHC class II when compared with class I. We identified 20 recombination events at the class II, while only six recombination events were identified at the class I. The number of breakpoints per 100 nt window was estimated 14–26 for the MHC class II and 3–7 for MHC class I (Fig. 2). Proportion of recombinant sequences was also significantly higher for the MHC class II than class I (27.7 vs. 8.2%; G = 23.97, df = 1, P < 0.001). The average number of recombinant sequences per recombination event was similar between the MHC classes (5.83 ± 2.68 vs. 4.00 ± 1.10; t = 0.74, df = 24, P = 0.47). The most recombination events were recorded within Carduelinae (66.7% at class I and 90% at class II), while MHC sequences from other clades showed much lower evidence for recombination, but this may be due to differences in the number of species genotyped in each clade.

Fig. 2: Recombination at the MHC.
figure 2

Recombination signal at the MHC class I exon 3 (A) and class II exon 2 (B), as assessed with the number of breakpoints detectable within a 100 nucleotide (nt) window (black line). Dark and light grey areas indicate the 95 and 99% confidence intervals for the expected degrees of breakpoint clustering in the absence of recombination hot and cold spots (as assessed with the local hot/cold-spot test). Positions in alignment where black line emerges above the grey areas indicate a recombination hot spot, while positions where it drops below the grey areas indicate a recombination cold spot.

Selection

Signature of positive selection was consistently stronger at the MHC class I than class II, both within and across all three clades. The number of residues under pervasive positive selection in different clades was 10–17 or 11–16 at class I (11.4%–19.3% of all residues), as recognized with Bayesian (FUBAR) or maximum-likelihood (FEL) algorithms, respectively (Table 4). At class II, the number of residues under pervasive positive selection was estimated at 2–8 (FUBAR) or 1–10 (FEL) (1.4–13.7% of all residues). Across all clades, the number of residues under pervasive selection was estimated at 18 (20.5%) and 9–10 (12.3%–13.7%) at class I and class II, respectively (Table 4 and Fig. 3). The number of residues under episodic positive selection was higher at the MHC class I in two clades: Emberizidae (20 vs. 12; 22.7 vs. 16.4% of all residues) and Fringillinae (20 vs. 2; 22.7 vs. 2.7% of all residues), while contrasting pattern was found for Carduelinae (15 vs. 21 residues at class I and class II, respectively; 17.0 vs. 28.8% of all residues) (Table 4). The dN/dS ratios at the most positively selected residues and at the passerine PBR were consistently higher at the MHC class I than class II, both within and across clades (Table 4), providing further support for stronger signature of positive selection at the MHC class I. Identification of positively selected residues (across clades) was highly repeatable between FUBAR and FEL algorithm (class I: R = 0.93, 95% CI: 0.89–0.95; MHC class II: R = 0.88, 95% CI: 0.81–0.92). Positions of positively selected residues identified across all three clades were moderately repeatable with positions of the passerine PBR (class I: R = 0.47, 95% CI: 0.28–0.61; MHC class II: R = 0.51, 95% CI: 0.32–0.66) and poorly repeatable with the human PBR (class I: R = 0.20, 95% CI: −0.02 to 0.39; MHC class II: R = 0.09, 95% CI: −0.13 to 0.30) (Fig. 3). We also found strong repeatability in selection signal (0.60 < R < 0.97, all P < 0.001; Table S3) between two independent sets of primers (PP1_1 and PP1_2 for MHC class I; PP2_1 and PP_2 for MHC class II), which indicated that null alleles were unlikely to introduce significant biases.

Table 4 Selection signature at the MHC class I exon 3 (264 bp fragment) and class II exon 2 (222 bp fragment) within and across three passerine clades of finches and buntings.
Fig. 3: Selection signature at the MHC.
figure 3

Alignments of amino acid sequences of MHC class I exon 3 (A) and MHC class II exon 2 (B) in finches and buntings (one sequence shown per species). Dots indicate amino acids identical with the consensus sequence (generated from all available sequence). Positively selected residues are marked with red, while negatively selected residues are marked with blue. Dark and light colours indicate selection supported with both or either Bayesian (FUBAR) and maximum-likelihood (FEL) approaches, respectively (as inferred for non-recombinant sequences only). Residues of the putative peptide-binding region (PBR) of passerine birds (based on the global analysis of selection by Minias et al. 2018) and humans (based on the crystallographic structure of MHC molecules by Saper et al. 1991 and Brown et al. 1993) are indicated with large dots (•) at the top of each alignment. Variation in selection parameter (dNdS; FUBAR analysis) is shown above the alignments.

Phylogenetic clustering

Neither MHC class I nor II alleles clustered by family, subfamily (clade) or species, although phylogenetic pattern differed between the two classes. At class I, most alleles of Fringillinae were interspersed with Emberizidae and Carduelinae sequences across the entire tree (Fig. 4 and Fig. S3 in Appendix 1), providing support for strong trans-species polymorphism. At class II, there was a small cluster close to the root containing sequences from all three clades (E. citrinella, E. calandra, C. coccothraustes, P. pyrrhula and F. coelebs) (Fig. 4 and Fig. S3 in Appendix 1), possibly representing an old evolutionary lineage (locus) of MHC class II that was retained in several species long beyond divergence. This cluster contained sequences of high phylogenetic similarity at the inter-specific level, including the only case of MHC allele sharing between families (the same sequence found in E. calandra and C. coccothraustes). All other cases of between-species allele sharing were recorded within the clades of Fringillinae (one class I and one class II alleles shared between F. coelebs and F. montifringilla) and Carduelinae (six class I alleles shared between A. flammea, C. carduelis, and S. spinus). Apart from the small cluster at the root, all MHC class II alleles of Fringillinae formed a separate cluster, while sequences of Emberizidae and Carduelinae were interspersed across the remaining part of the tree (Fig. 4 and Fig. S3 in Appendix 1).

Fig. 4: Phylogenetic clustering of MHC alleles.
figure 4

Consensus maximum-likelihood topology for MHC class I exon 3 (A) and MHC class II exon 2 (B) in Emberizidae (red), Carduelinae (orange), and Fringillinae (green). Lesser kestrel was used as an outgroup to root the trees (black). Local bootstrap support is provided for key nodes (only values > 0.5 presented). Scale bar indicates genetic distance (nucleotide substitutions per site).

Discussion

Our analysis of the MHC in two sister clades of non-model passerine birds, finches and buntings, revealed contrasting evolutionary patterns between class I and class II genes, which are responsible for recognition of different groups of pathogens (intra-cellular and extra-cellular pathogens, respectively). Most importantly, MHC class I genes were under stronger pervasive (but not necessarily episodic) positive selection than MHC class II and this might have been a key evolutionary force that generated and maintained higher allele diversity (allele numbers) at the MHC class I in our study species. Despite weaker selection and lower allele diversity, MHC class II genes showed greater allele divergence (in terms of nucleotide diversity) and had much greater recombination (gene conversion) signal. Contrasting selection regimes did not generate any apparent variation in copy numbers between MHC class I and class II, which both evolved via fluctuating selection and drift (consistent with BM evolution; Revell et al. 2008). However, the rate of copy number evolution was higher at the MHC class I than class II, resulting in higher variation of MHC class I copy numbers among closely related species (lower phylogenetic signal).

Copy number variation

The first aim of our study was to reconstruct the copy number evolution at the MHC in finches and buntings. The highest numbers of gene copies (≥8 class I and ≥7 class II) was recorded in E. citrinella (Emberizidae) and the two Fringilla species (Fringillinae), suggesting that duplication rate in these taxa could have been enhanced by elevated selective pressure from both intra-cellular and extra-cellular pathogens. Both MHC class I and class II were found to have evolved consistently with the BM model, indicating a key role of fluctuating selection in the copy number evolution. Previous analyses in passerines suggested that MHC class I copy numbers might have evolved consistently with a BM model (O’Connor et al. 2016), but similar analyses of passerine MHC class II were lacking. On the other hand, a large-scale comparative analysis of MHC genes across the entire avian tree (passerines and non-passerines combined) provided support for highly contrasting patterns of copy number evolution between MHC classes I and II (Minias et al. 2019). The results of this study suggest that broad generalizations may not well reflect the patterns at the finer phylogenetic scale, as we showed that similar mechanisms were responsible for the evolution of gene copy numbers at both MHC classes I and II in our study system (Fringillidae and Emberizidae passerines). Despite this similarity, evolutionary rate was higher at class I than class II and there was a greater variation in MHC class I copy numbers among closely related species (lower phylogenetic signal), which may suggest that selective regimes at class I are more evolutionarily labile and species specific. To quantify the number of MHC gene copies, we followed an approach by O’Connor et al. (2016), who used two independent pairs of primers to determine the total number of alleles recorded in each individual. Here, we confirmed that this approach can substantially increase reliability of copy number estimates, as using a single pair of primers would underestimate copy numbers by, on average, ca. 15% at class I and 25% at class II across all species, because of allele-specific amplifications by each primer pair (27.8% MHC class I and 52.5% MHC class II alleles were amplified in a primer-specific manner). Taking this into account, we highly recommend using multiple pairs of primers in future research on copy number variation at the MHC in non-model vertebrates species.

Selection patterns

Stronger signature of positive selection at the MHC class I (when compared with class II) is consistent with expectations from the recent global analysis of selection at the avian MHC (Minias et al. 2018). Although Minias et al. (2018) recorded higher number of positively selected sites and higher nucleotide substitution rates (dN/dS ratios) at the class I exon 3 than MHC class II exon 2 across over 20 families of passerine birds, our study species of Fringillidae and Emberizidae were virtually unrepresented in these analyses (only two MHC class II alleles from two species included). More importantly, previous analyses were based on highly unbalanced GenBank data (in terms of sample sizes and species composition) that were generated using different methodological approaches, which could possibly introduce biases in the comparisons between the two MHC classes. Here, we attempted a rigorous comparison of selection patterns between MHC class I and class II across multiple passerine species, based on uniform methodology and balanced sample sizes. The results of our selection analysis in finches and buntings revealed stronger selection at MHC class I than class II, but the magnitude of these differences largely differed between clades. The most apparent differences in pervasive positive selection were recorded in Fringillinae (16–17 vs. 1–2 positively selected residues at MHC classes I and II, respectively), while Carduelinae showed weakest differences (10–11 vs. 7–10 positively selected residues at MHC classes I and II, respectively). The analysis of dN/dS ratio provided a very similar picture, suggesting that selective regimes at the two MHC classes may be specific to particular avian lineages. So far, comparative analyses of selection patterns at the MHC class I and class II in non-model passerine birds have been scarce and the only exception that we are aware of is the domestic sparrow Passer domesticus, in which stronger signature of positive selection was found at the MHC class II than at class I (Borg et al. 2011), being in stark contrast to our findings for finches and buntings. MHC research in non-passerine birds from various orders also seems to provide clear support for stronger positive selection at the MHC class II. For example, a higher proportion of positively selected residues was recorded at MHC class II exon 2 (13%) than MHC class I exon 3 (7%) across six flamingo species (order Phoenicopteriformes) (Gillingham et al. 2016). Even greater differences were recorded in five species of prairie grouse (order Galliformes) with 3.4% and 17.7% of residues being under positive selection at the MHC class I and class II, respectively (Minias et al. 2016). This corresponded with ca. threefold difference in the dN/dS ratio, being higher at the MHC class II (Minias et al. 2016). Variation in the strength of selection signal between MHC class I and class II is traditionally attributed to differences in selective pressures from intra-cellular and extra-cellular pathogens, and although this may be speculative, our findings seem to suggest that intra-cellular pathogens could possibly be considered as the major selective agent for finches and buntings. In general, all our study species are small (mean body mass of 27.6 ± 4.3 g) and are expected to harbour less diverse faunas of extra-cellular parasites than much larger non-passerine species (e.g., Morand and Poulin 1998), which may explain greater evolutionary importance of MHC class I in these taxa.

Polymorphism and phylogenetic clustering

As expected, stronger positive selection at the MHC class I resulted in higher allele diversity at these genes (427 vs. 293 alleles at classes I and II, respectively). However, although class II alleles were less diverse in terms of their numbers, they showed greater divergence at the nucleotide level, with almost 60% higher nucleotide diversity compared to the MHC class I alleles (0.219 vs. 0.138 across all species). This pattern may indicate that duplications of MHC class II genes occurred earlier in the evolutionary history of these lineages than duplications at the MHC class I, consistently with lower evolutionary rate and higher phylogenetic signal in MHC class II copy numbers, and thus alleles from different class II loci might have had longer time to accumulate independent mutations and diverge. This scenario seems to be supported by phylogenetic inference, providing evidence for the presence of a small MHC class II cluster, where alleles from all three subfamilies clustered together. This could possibly represent an old evolutionary lineage of MHC class II alleles that was retained beyond species divergence. At the same time, all the other class II alleles of Fringillinae formed a distinct cluster that was well separated from both Carduelinae and Emberizidae sequences, also suggesting that MHC class II genes may show low level of orthology between these groups. This phylogenetic pattern was clearly inconsistent with the evolutionary history of our study taxa, where the Emberizidae lineage first separated from the Carduelinae–Fringillinae lineage (ca. 26 million years ago; Jetz et al. 2012), and thus was expected to show strongest genetic divergence. Evolutionary relationships between taxa were even less apparent at the MHC class I, where alleles from all clades were highly interspersed across the entire tree, providing evidence for strong trans-species polymorphism. The presence of trans-species polymorphism was also supported by several cases of allele sharing between species, which were more frequent at MHC class I than class II. The pattern of trans-species polymorphism and allele sharing among species at the MHC was previously described in other avian lineages (Kikkawa et al. 2009; Eimes et al. 2015) and other vertebrates (e.g., Cutrera and Lacey 2007), and it is hypothesized to occur under strong balancing selection, which may enhance maintaining favourable alleles through long evolutionary times (Takahata 1990).

Recombination signal

Higher nucleotide divergence of MHC class II alleles was accompanied with higher recombination (gene conversion) rate at these genes. This pattern was rather unexpected, as gene conversion does not generate de novo variation at the MHC (and thus it may not necessarily contribute to sequence divergence), but it generates new haplotypes by shuffling existing variation within and between loci (and thus it should contribute to allele richness; Högstrand and Böhme 1999). In fact, gene conversion is recognized as one of the primary mechanisms that maintain allele diversity at the MHC and it is known to be particularly important for regenerating MHC haplotype variation in bottlenecked populations, as the gene conversion rate may greatly exceed the rate of point mutations (Spurgin et al. 2011). Surprisingly, the recombination rate in our study system (higher at class II) followed a pattern that clearly contrasted with allele diversity (higher at class I). As much as 27.7% of MHC class II alleles were identified to show a signal of recombination (compared to 8.2% at class I) and we detected 14–26 recombination breakpoints per 100 nt at the MHC class II (compared to 3–7 breakpoints at class I). Since we could not assign alleles to loci, it is impossible to conclude whether this higher recombination rate at the MHC class II was primarily driven by within- or between-locus DNA transfer. Higher between-locus gene conversion may lead to the concerted evolution of MHC loci, where MHC sequences derived from various loci become homologous and their orthology can be masked (Wittzell et al. 1999). In our study, MHC class II alleles clustered more by taxon (family/subfamily) when compared with MHC class I, which may suggest a higher rate of concerted evolution at MHC class II than class I in finches and buntings. In birds, concerted evolution has usually been detected at MHC class II (e.g., Miller and Lambert 2004; Li et al. 2011; Gillingham et al. 2016; Goebel et al. 2017), but this was primarily attributed to scarcity of similar research on MHC class I (Promerová et al. 2009). Nevertheless, taxon-specific analyses of scarlet rosefinches Carpodacus erythrinus and blue tits Cyanistes caeruleus revealed patterns consistent with concerted evolution at the MHC class I (Promerová et al. 2009; Wutzler et al. 2012). Our direct comparison of MHC classes I and II suggests that stronger concerted evolution at class II may reflect different evolutionary trajectories of the two classes rather than being a mere artefact resulting from publication bias.

Methodological limitations

Rapid development of next-generation sequencing technology within the last two decades sparked an unprecedented advancement in research on multi-gene families, such as the MHC. Although short-read sequencing can provide an easy way to obtain extensive information on relatively short fragments of targeted genes in a variety of non-model species, this approach certainly suffers from major drawbacks, as it does not provide a holistic view on the molecular evolution of entire genes and usually cannot assign allele fragments to particular loci. Previous research using longer sequences (including intron regions) or implementing locus-specific primers provided novel and invaluable insights into gene orthology and long-term evolutionary dynamics of the avian MHC (e.g., Burri et al. 2008, 2010; Cloutier et al. 2011). Third-generation sequencing, which allows to reliably assemble long regions with repetitive elements, promises the next important step in MHC research, providing information on the arrangement of the MHC genes at the genomic scale (O’Connor et al. 2019; He et al. 2021). However, despite its inherent limitations, we believe that research based on short-read sequencing can still provide a valuable contribution to our understanding of the MHC evolution when used in a phylogenetically robust (multi-species) context, which is often unattainable by other, much more laborious and high-cost approaches.

Conclusions

In conclusion, our study constitutes one of few existing evolutionary comparisons of avian MHC class I and class II genes in a multi-species approach. So far, research of this kind primarily focused on non-passerine avian lineages, which generally have a different architecture of MHC region to passerines (e.g., much lower copy numbers, Minias et al. 2019). Our molecular analyses of the MHC in two diverse families of Old World passerines, finches and buntings, provide support for contrasting evolutionary trajectories of class I and class II genes, where stronger positive selection and faster evolutionary rate of gene copy numbers were recorded at MHC class I, whereas MHC class II showed a stronger gene conversion (recombination) signal, greater sequence polymorphism and stronger signal of concerted evolution. While our results suggest that intra-cellular and extra-cellular pathogens can exert different evolutionary effects on their hosts, at the same time they reveal fine-scale (between families and subfamilies) variation in the mechanisms that drive the evolution of passerine MHC classes I and II. We strongly recommend re-focusing MHC research from single-species and single-class approaches towards multi-species analyses of both MHC classes, which may substantially increase our understanding of evolution of these key immune genes in different phylogenetic vertebrate lineages.