Introduction

The major histocompatibility complex (MHC) is the most polymorphic and gene-dense region in the vertebrate genome (Kelley et al., 2005). The locus contains numerous genes involved in immune responses including genes for cytokines, complement and antigen-processing proteins, and the highly polymorphic glycoproteins responsible for presenting endogenous and exogenous antigens (MHC class I and class II, respectively) to T cells. In addition to class I and class II genes, clusters of other related genes can be found within the MHC, such as tripartite motif loci (TRIM), the butyrophilin loci (BTN) and in birds, the B-G loci (Kaufman et al., 1999a).

Variation in MHC genes is vital for generation of a diverse array of antigenic peptides for T-cell presentation. The rapidly evolving pathogen repertoire requires multiple MHC loci with highly polymorphic molecules necessary for presentation of pathogen-derived peptides to effector T cells. For example, single loci in the human MHC have 100s of alleles and allelic combinations may generate 10 000s of potential haplotypes (Marsh et al., 2005). In contrast, domestic poultry show more constrained diversity in their MHC (Hosomichi et al., 2008; Chaves et al., 2010) although; a recent study of domestic chickens identified nearly 100 MHC-B haplotypes (Fulton et al., 2006). MHC haplotypes have been found to have strong influences on immune response and disease resistance in chickens. For example, studies have shown that MHC haplotypes influence immune response to Rous sarcoma virus (Aeed et al., 1993; White et al., 1994; Taylor 2004).

In the context of the MHC, the best-studied bird species are those of agricultural significance; MHC regions have been examined in chicken, quail, duck and turkey (Kaufman et al., 1999b; Moon et al., 2005; Shiina et al., 2004, 2007; Chaves et al., 2009b, 2011). Sequencing the MHC locus of the chicken, quail and turkey (MHC-B) identified the classical MHC class I and II genes within an ∼50 kb distance (Shiina et al., 2004, 2007; Chaves et al., 2009b). Commercial selection has clearly reduced allelic diversity in chicken lines (Muir et al., 2008). A sampling of genetic variation within commercial turkeys shows that some MHC haplotypes occur at high frequency within breeder lines (Chaves et al., 2010).

Population level studies of MHC variation in wild birds have focused mainly on two technologies to survey variation at class I and class II loci. Southern hybridization methods have been used to estimate the number of genes and alleles/locus for the class I and II loci thereby providing a rough estimate of overall genetic variation at the MHC (Emara et al., 1992; Westerdahl et al., 2000; Freeman-Gallant et al., 2002; Richardson and Westerdahl, 2003; Alcaide et al., 2008). PCR-based techniques such as reference strand-mediated conformational analysis, single strand conformation polymorphism and denaturing gradient gel electrophoresis (DGGE) are also effective in estimating the number of alleles, especially when locus-specific primers are available (Ramon et al., 1998; Goto et al., 2002; Westerdahl et al., 2004; Knapp, 2005; Alcaide et al., 2010). Studies have generally identified 2–4 class IIB loci in birds and provide evidence for increased MHC diversity in outbreeding populations. These methods, however, can lead to overestimation in the number of loci due mainly to problems of distinguishing loci and alleles (Jacob et al., 2000; Miller et al., 2004). Recently, Babik et al. (2009) tested the use of high-throughput DNA sequencing technology (Roche 454 sequencing) as an alternative for MHC genotyping. This approach used degenerate primers based on previously sequenced loci/alleles to simultaneously amplify multiple loci by PCR. Primer tags allowed for segregation of sequencing reads to allow for digital genotyping of multiple individuals in a single sequencing pool. Sequencing and PCR errors remain problematic, but bioinformatic methods can be used to help identify low-frequency artifact alleles.

Amplifying, cloning and sequencing the highly variable MHC domains of class I and class II loci is labor intensive, but can provide the best estimate of the number of loci present in the genome. Sequence surveys typically focus on the peptide-binding region (PBR) encoded in exons 2 and 3 of class I and exon 2 of the class IIB genes. The PBR is the region that determines which antigens will be presented to T lymphocytes and sequence differences are indicative of functionally different alleles. When combined with mRNA sequencing, expressed alleles can be matched to their genomic counterpart, improving allele designations. Studies that characterize only the PBR, however, risk concluding that variation observed at the DNA level directly translates into significant differences in disease resistance/susceptibility.

Studies that focus only on the class I and class II loci also overlook the potential importance of variation at other genetically linked MHC genes. As comparative genome sequences become increasingly available, it is possible to expand MHC surveys beyond the multilocus class I and class II genes to include tightly linked adjacent loci, potentially avoiding the difficulties associated with genotyping these loci. The goals of this project were twofold: first to demonstrate the ability of locus-wide single-nucleotide polymorphism (SNP) genotyping to identify MHC haplotypes and second, to quantify variation within the MHC-B locus of wild turkeys as compared with commercial birds. The turkey (Meleagris gallopavo) with six recognized subspecies, is the only domesticated agricultural species native to North America (Dickson, 1992). Commercial breeds are believed to be derived primarily from the south Mexican subspecies (M. g. gallopavo) via reintroduction to the New World from Europe (Reed, 2009; Speller et al., 2010). Results of this study show the benefit of an assembled MHC haplotype in investigating variation and confirm that MHC diversity in wild birds is significantly greater than in commercial lines.

Materials and methods

Genomic sequencing

A panel of 40 individuals from three subspecies (Eastern, Merriam's and Rio Grande) of turkey was examined in this study (Supplementary Table S1). The eastern subspecies (M. g. silvestris), has the largest geographical distribution occurring from the east coast through the Great Plains; Merriam's (M. g. merriami), is located in parts of Colorado, New Mexico and Arizona; and the Rio Grande subspecies (M. g. intermedia), is present in most of Texas and parts of northern Mexico. DNA samples were from 11 different locations, representing a subset of samples examined by Mock et al. (2002), Latch et al. (2002) and a hunter harvested bird from Winona, MN, USA were used in this study. Samples from the populations studied by Mock et al. (2002) were relict, native populations, with presumably higher sequence diversity. Individuals were sequenced at nine interspersed locations across the MHC-B region (Figure 1, Supplementary Table S2) as previously described (Chaves et al., 2010). PCR products were purified using a MinElute PCR Purification Kit (Qiagen, Inc., Valencia, CA, USA) and sequenced with an automated ABI Sequencer (Applied Biosystems, Carlsbad, CA, USA). Sequence data were manually analyzed using Sequencher software (Gene Codes, Corp., Ann Arbor, MI, USA) and single-nucleotide variants (SNVs, SNPs and deletion/insertion polymorphisms (DIPs), respectively) were recorded.

Figure 1
figure 1

The turkey MHC-B locus and position of amplicons (arrows) sequenced for haplotype analysis and genetic mapping. Positions of primers within the reference sequence are given in Supplementary Table S2.

Haplotype identification and phylogenetic analysis

Polymorphisms were analyzed using Arlequin, PHASE and Haploview software (Schneider et al., 2000; Stephens and Donnelly, 2003; Barrett et al., 2005) to quantify haplotype frequency and diversity within the 40 sequenced birds. Analysis was performed using 45 SNVs (42 SNPs and 3 DIPs) with minor allele frequencies (MAFs) of 0.2 or greater that were successfully genotyped on all 40 individuals. An additional haplotype inference analysis included samples from a previous study of commercial turkeys (Chaves et al., 2010) using SNVs of high MAF common among the two groups. Analysis of haplotype relationships was performed using ClustalW (Larkin et al., 2007) and SplitsTree4 (Huson and Bryant, 2006).

Class IIB DGGE and sequencing

To compare variation in one of the MHC-B multigene families among individuals with different SNP haplotypes, the class IIB genes of the B-locus were examined by DGGE. For DGGE, primers (cIIB_DGGE_F—5′-AAAACCCGGAGCCCTTGT-3′ and cIIB_DGGE_R—5′-ACTCACCGCTCCTCTGCAC-3′) were designed from aligned sequences (Chaves et al., 2010) to simultaneously amplify the highly variable PBR (exon 2) from the three class IIB genes of the B-locus. The designed primers are specific for the MHC-B loci and do not amplify the other known class IIB-like genes that reside outside of the B-locus. Polymerase chain reactions for DGGE analysis were optimized with genomic DNA template using Taq Mastermix (Promega, Corp., Madison, WI, USA) supplemented with 1 × Q solution (Qiagen). Amplifications were performed for 35 cycles with 58 °C annealing temperature and 30 s extension times. Control birds and clone constructs with known class IIB haplotypes and were used to help quantify the differences observed in the wild turkeys.

DGGE was performed using the Dcode Universal Mutation Detection System (Bio-Rad, Hercules, CA, USA). Optimal conditions for examining the class IIB genes (25–65% urea/formamide gradient in 6% acrylamide, 1 × Tris/acetic acid/EDTA buffer, at 130 V and 60 °C for 4 h) were determined using perpendicular DGGE and a time-series analysis. These conditions were used in parallel denaturing gels to compare class IIB amplicons between individuals. PCR products were denatured at 95 °C for 5 min, incubated at 65 °C for 1 h and then allowed to slowly cool to room temperature (RT) before addition of loading dye. Gels were visualized by staining with ethidium bromide.

On the basis of the DGGE results, four birds were selected for sequencing of the class IIB DGGE PCR products to verify the amplification of multiple loci. For consistency, the same PCR products used for DGGE were cloned using a pDrive Cloning Kit (Qiagen), transformed into DH5α cells (Invitrogen, Carlsbad, CA, USA), and over 20 purified plasmids were sequenced per individual. In addition, locus-specific primers (Chaves et al., 2009b) were used to individually sequence portions of the three known MHC-B class IIB genes from the single wild bird collected from Minnesota. Sequences of exon 2 were translated to identify putative PBR alleles.

Results

MHC-B polymorphisms

Over 9 kb of the MHC-B region was sequenced on 40 wild turkeys from across North America. A total of 238 SNVs (Appendix) were identified with MAF ranging between 0.01 and 0.5 (average 0.15) (Figure 2). In all, 37% of the loci had a MAF⩾0.2. The frequency of SNVs in this region (all polymorphisms), 1/40 bp, is higher than the 1/200 bp found in other regions of the turkey genome (Reed et al., 2006) and the 1/70 bp identified within the MHC-B in commercial breeder lines (Chaves et al., 2010). Most SNPs were transition substitutions and located in introns (Table 1). Of the 45 SNPs located in coding sequence, 18 represent non-synonymous substitutions.

Figure 2
figure 2

MAFs of SNPs and DIPs identified in the 40 wild turkeys sequenced across the MHC-B locus.

Table 1 Characterization of SNPs identified within wild turkey subspecies

In addition to SNPs, nine DIPs (deletion–insertion) were present in the studied amplicons. Included in these is a 57 bp biallelic DIP in the first intron of Bzfp2 not previously identified in commercial populations and a large polymorphic repeat element present within intron 6 of LAAO. Examination of this locus by electrophoresis of the PCR amplicons on 3% agarose and 5% denaturing acrylamide gels found five alleles at this locus (one more than identified in commercial breeder lines, Chaves et al., 2010) ranging in size from 180 to ∼350 bp.

Of the 238 SNVs, 112 (109 SNPs and 3 DIPs) were not previously identified in commercial turkeys (Chaves et al., 2010). The novel SNPs had MAF ranging from 0.01 to 0.49 with an average of 0.07. Sixteen SNPs previously found in commercial birds were not present in the wild turkey data set. However, some of these may have been missed through technical failures. Two tri-allelic SNPs were identified at positions 56210 (H) and 93867 (H). Allele frequencies at these loci ranged from 0.038 to 0.875 and averaged 0.25. SNPs at these loci were excluded from further analysis. The TRIM39.1 amplicon showed poor amplification and sequencing success. This is perhaps due to null alleles caused by nucleotide variations within the primer-binding site(s) within wild birds. This locus was not included in haplotype inference. Finally, none of the polymorphisms in the TAP1 amplicon had a MAF ⩾0.1.

Haplotype reconstruction

Haplotype inference was conducted using 45 polymorphic loci that displayed varying allele representation among the subspecies (Figure 3) and that had MAF⩾0.2 (Appendix). Initial analysis combining all individuals identified four homozygotes (MCS6, MCS7, ECL10 and MINN). Of the 40 individuals examined, 70 potential haplotypes were inferred by PHASE analysis with the majority of individuals possessing at least one unique haplotype.

Figure 3
figure 3

The limited extent of LD across the MHC-B region identified in wild turkeys. D’values ( × 100) given for all pairwise comparisons <1.0. Values not given for blocks in complete LD (D’=1.0). Block colors correspond to D’/LOD, where D’ <1 and LOD <2 is white, D’ <1 and LOD ⩾2 is shades of pink/red, D’=1 and LOD <2 are grey, and D’=1 and LOD ⩾2 are bright red.

A second haplotype analysis of 37 markers with MAF ⩾0.2 incorporated genotypes from a panel of 52 commercial turkeys (Chaves et al., 2010). This analysis identified a total of 99 haplotypes, of which the wild turkeys possessed 66 (Supplementary Table S3). Posterior probabilities for haplotype allele assignment at a given SNV position ranged from 0.5 (no support) to 1.0 (complete linkage disequilibrium (LD)) within the sampled individuals. The two rounds of PHASE inference had similar posterior probabilities when averaged per locus. Wild turkey haplotypes differed from each other at 1–26 SNPs with an average of 15 differences (s.d.=5.5). Twelve cases of haplotype sharing were observed among the wild birds (Supplementary Table S3). Only a single haplotype was shared between the commercial and wild birds; EOM6 had a haplotype in common with Nici (NTWF06) the only fully sequenced turkey haplotype (Chaves et al., 2009a).

LD and subspecies variation

Patterns of LD among 37 SNPs are shown in Figure 3. LD within the region is limited with the highest values centered in TRIM39.2. The low level of LD is indicative of recombination within the MHC-B. Analysis of sequence variation in commercial turkeys suggests that recombination in the B-locus occurs at a higher frequency (18 cM per Mbp, Chaves et al., 2010) than the whole genome average (2.5 cM per Mbp, Reed et al., 2007). Within the UMN/NTBF mapping families, the genetic distance between TRIM7.2 and BTN1 (physical distance <100 kb) is just over 1.1 cM.

Rare alleles within individuals were more frequent in the EWW group and the Eastern subspecies, respectively (Figure 4, Table 2). As expected given the physical and genetic distance between populations, loci in Hardy–Weinberg equilibrium occurred more frequently in the samples from distinct locations rather than pooling individuals within a specific subspecies (Table 2). Pairwise FST analysis found significant variation between the three subspecies. The Eastern vs Merriam's subspecies comparison had the lowest FST value (0.08) whereas the value for Eastern vs Rio Grande was 0.12. The greatest pairwise FST value occurred between Rio Grande vs Merriam's (0.18). Genetic distances (corrected average pairwise differences (pi)) between subspecies were similar, with values for Eastern vs Merriam's, Eastern vs Rio Grande, and Merriam's vs Rio Grande of 1.51, 2.52 and 3.75, respectively.

Figure 4
figure 4

Frequency of major alleles at 45 SNP loci used in haplotype inference within three subspecies of wild turkey (E-Eastern, M-Merriam's and R-Rio Grande). Within each subspecies, the frequency of the major allele as determined in the total data set is indicated for each SNP by the black bar. Genes containing the SNPs are indicated above.

Table 2 Locus comparison for sample locations with four or more individuals

Relationships of haplotypes derived from 37 shared SNVs were examined to compare the origin and variation of wild turkey MHC haplotypes within subspecies and to compare wild turkeys to commercial breeder lines (Figure 5). The unrooted neighbor joining (NJ) tree shows an interleaving of commercial and wild haplotypes, and a separate cluster of partially divergent wild haplotypes (Figure 5a). Haplotypes of wild turkeys did not assort by subspecies. Most of the commercial haplotypes cluster near each other but support for major branches, as evidenced by bootstrap values, is weak. The haplotype of Nici (NTWF06) is found in a distant node and is the only commercial haplotype completely shared with the wild birds surveyed. A phylogenetic network (Figure 5b) was constructed to better model haplotype relationships which are assumed to be influenced by reticulate events. The haplotype network retains the overall NJ tree topology, but suggests historical recombination with the B-locus. This is especially evident for haplotypes such as RKW6b that are positioned between major clusters.

Figure 5
figure 5

Relationships of inferred haplotypes from wild and selected commercial turkeys. (a) Neighbor-joining tree constructed with Uncorrected P distances. Bootstrap values (1000 replicates) are given for the basal branches. (b) Neighbor-net network constructed using ordinary least squares variance. Haplotypes originating in Eastern, Merriam's and Rio Grande subspecies are indicated in red, green and blue, respectively. Selected commercial haplotypes are indicated in black.

Class IIB variation

DGGE was used as a means to visualize variation at the class IIB genes to determine the effectiveness of locus-wide SNP genotyping. Observed DGGE patterns for control birds and 37 of the wild turkeys are presented in Figure 6. Included for comparison were birds homozygous at all three class IIB loci (6002, S-02, Nici and MN Wild; two or three total alleles), a triple heterozygote (1049, five total alleles) and an individual heterozygous at one locus, but with unique alleles at the other two loci (S-09, four total alleles). The artificial single locus heterozygote (B1*1/1*3) was created using cloned DNAs with a total of 42 bp differences between the cloned alleles.

Figure 6
figure 6

DGGE analysis of class IIB loci in wild turkeys. Lane 1 contains the PCR products from two clones representing alternate alleles (B1*1:B1*3) at a single locus (class IIB1). Included for comparison (lanes 2–7) are five commercial turkeys and the wild turkey from MN all with sequenced class IIB haplotypes. Numbers of birds selected for sequencing of the DGGE products are underlined. Inset: the number of sequence differences among alleles at the class IIB loci for three of the control birds. Individual 6002 is homozygous for 1*3/2*3/3*3 and MN Wild is homozygous for 1*5/2*4/3*4. Individual 1049 is a triple heterozygote (1*2:1*3/2*2:2*3/3*2:3*3) with alleles B1*2 and B3*2 being identical.

As expected, individuals homozygous at the class IIB1, class IIB2 and class IIB3 loci tend to display fewer bands and more simplified DGGE banding patterns. For example, contrast in Figure 6 S-02 (two alleles; allele B1*2 and B3*2 are identical), Nici (three alleles), S-09 (four alleles) and 1049 (five alleles; amplified portions of alleles B1*2 and B3*2 are identical). In this multi-locus analysis, DGGE pattern is determined not only by the number of alleles, but also sequence similarities among alleles. For example, the average number of sequence differences between alleles was lower in the homozygous commercial bird 6002 (45.3) than in MN Wild (52.2) (Figure 6 inset). Both of these birds have three known alleles but very different DGGE patterns. In contrast, the DGGE patterns for birds 6002 and 1049 are complex, but very similar in appearance. Individual 1049 has five known class IIB alleles that differ by an average of 47.9 nucleotides.

Unique banding patters were observed for most of the wild turkeys, although shared bands were observed within sample location. For example, banding patterns are highly similar for ECL5 and 6, MGD8 and 10; EWW11 and 14; and MSL 4, 9 and 12. This is consistent with the SNP data (Supplementary Table S3) in that ELC5 and 6 share a common haplotype (ECL5b and 6b) and MGD8 and 10 have two shared haplotypes (MGD8a/10a and 8b/10b). Haplotype similarities are also reflected by their positions in the cladograms in Figure 5. Thus, haplotype sharing and sequence diversity between SNP haplotypes is directly reflected in the DGGE patterns.

To verify amplification of multiple loci and determine underlying sequence differences, the same PCR products used for DGGE were cloned and sequenced for four individuals. Birds EOM4 and EOM6 were selected because they each displayed several well-defined DGGE bands. MSL9 and MSL12 were selected because they both displayed a DGGE pattern characterized by a single bright band with many indistinct minor bands and shared SNP haplotypes. Of the clones examined, 18 unique DNA sequences were obtained for EOM4 (22 total clones), 14 for EOM6 (21 clones), 8 for MSL9 (23 clones) and 7 for MSL12 (21 clones). In each case, the number of unique sequences exceeds the number of expected class IIB alleles (assuming three loci). As the clones were obtained from PCR amplicons generated with non-proof reading Taq polymerase, many of the unique sequences may be artifact in that they differ at as few as a single nucleotide. Also possible is the production of sequence hybrids during the simultaneous amplification of multiple loci. Finally, we cannot discount the possibility of additional class IIB loci in wild turkeys versus their commercial counterparts.

Relationships among the class IIB sequences for each individual are shown in Supplementary Figure S1. Sequences of EOM4 and EOM6 each cluster into three major groups indicative of three loci, whereas MSL 9 and MSL12 show less sequence complexity. This result is consistent with the number of major bands observed in DGGE (Figure 6). Translation and alignment of the sequence region corresponding to exon 2 reduces the number of different sequences (alleles) per individual; and resulted in similar tree topologies (Supplementary Figure S2). A total of 27 unique exon 2 amino acid sequences were observed in the four birds. These sequences group into three main clusters (Supplementary Figure S3), which do not assort by locus as defined by alleles previously observed in commercial birds (Chaves et al., 2010). Two PBR sequences observed in the wild birds (W24 and 2*5) were identical to PBR alleles seen in commercial turkeys. Aligned amino acid sequences (Supplementary Figure S4) show that the variant peptides correspond well with the position of putative peptide-binding sites defined for humans by Brown et al. (1993).

Discussion

MHC diversity (allelic richness) is particularly advantageous in the face of widely varied pathogens (Wegner et al., 2004). Studies of non-model vertebrate species have focused on polymorphism in the class I and class II MHC genes and pathogen-mediated selection has been considered a primary mechanism in maintaining MHC diversity (Jeffery and Bangham, 2000). Early population genetic studies of MHC variation in turkey relied on Southern hybridizations to elucidate gene content and variation (Zhu et al., 1995, 1996; Jones et al., 2003). Although this approach can be an easy and powerful tool to examine genomic variation within the species, Southern hybridizations (restriction fragment length polymorphism (RFLP)) rely on relatively few variable nucleotides and the technique ultimately fails to successfully capture the full extent of genetic diversity at the MHC. Similarly, sequencing of the exon(s) encoding class I and/or class IIB PBRs may not accurately identify the diversity within the species when PBRs are highly polymorphic and multiple loci occur within the genome. The goals of this project were twofold. First, to demonstrate the ability of locus-wide SNP genotyping to identify MHC haplotypes and second, to examine variation within the MHC-B locus of wild turkeys in contrast to that observed in commercial birds.

MHC genotyping

Detailed genomic descriptions of avian MHC loci are limited to a few species and often focus on variation in one or two of the classical genes. MHC variation can be found at both extremes in wild bird populations, with some species having seemingly unlimited alleles and others presenting almost no polymorphism. For example, a study of class IIB gene diversity in Red-winged Blackbirds (Agelaius phoeniceus) found extensive variation within and between individuals (Edwards et al., 1998) while a sequence survey of the PBR of a class IIB locus from the endangered Galápagos penguin (Spheniscus mendiculus) found only three highly similar alleles (Bollmer et al., 2007). At the upper extreme are species like the bluethroat (Luscinia svecica), in which a minimum of 61 unique alleles of at least 11 functional loci were identified in 20 individuals (Anmarkrud et al., 2010). It is important to recognize that the total number of MHC class I and class II loci is typically not known and heterozygosity and rare polymorphisms may inadvertently be misidentified as novel haplotypes or additional genes.

PCR-based techniques for surveying the polymorphic, multicopy MHC genes rely on primer sequences designed to amplify conserved regions. Of concern is the potential amplification of pseudogenes and/or non-classical loci and the possibility of missing loci and/or alleles because of null amplification (Aguilar et al., 2006). A recent examination of class IIB variation in the Common Yellowthroat (Geothylpis trichas) demonstrates the depth of sequencing that may be required to characterize the number of class IIB alleles/loci (Bollmer et al., 2010). In that study, 39 different exon 2 sequences (up to 20 class IIB loci) were identified in a single bird and expression analysis suggested at least eight transcribed loci. The total extent of variation in locus number and expression within the turkey is currently unknown. Sequencing of the turkey B-locus (Chaves et al., 2009b) established the presence of three classical class IIB loci with at least two loci being expressed in the spleen.

Goto et al. (2002) described the use of single-strand conformation polymorphism for assaying MHC polymorphism in the chicken. This approach relied on PCR primers specific to the class I and class IIB loci for amplification of the highly variable PBRs. The present study used DGGE as an alternative because of the ability to examine larger fragments (440 bp vs 100–300 for single-strand conformation polymorphism) in a relatively quick and repeatable format. As demonstrated, DGGE clearly identifies individuals with divergent class IIB alleles as predicted by the SNP genotyping and haplotype analyses. Sequencing of DGGE PCR products found the number of unique sequences to exceed the number of expected class IIB alleles. Polymerase-induced errors and mosaic sequences can lead to false identification of alleles/loci (Lenz and Becker, 2008) and clearly some of the turkey sequence differences may be artifacts. The reality that PBR alleles can be shared between loci (Ahmed et al., 2007; Chaves et al., 2010, 2011) and the possibility of additional class IIB loci in the wild turkey are confounding factors. As demonstrated in this study, availability of a reference haplotype and the ability to genotype interspersed MHC regions while avoiding multigene families has advantages over other methods of surveying the MHC. The identification of a set of high-frequency SNPs will provide the basis for an efficient SNP genotype assay for the turkey MHC. SNP genotyping of MHC regions proximal to the multi-locus class I and class II loci combined with haplotype inference also provides an effective means to identify individuals with variant MHC haplotypes that can be used in closer scrutiny of the MHC through expanded sequencing.

MHC diversity in wild turkeys

A significant loss of species-wide minor alleles in commercial populations has been well documented in the chicken (Muir et al., 2008). Contraction of MHC haplotypes in commercial lines relative to wild birds has also been observed in a previous study of turkeys. Zhu et al. (1996) found a single, predominate haplotype (identified by Southern hybridization) present in both commercial lines as well as wild turkeys. However, as in this study, wild birds had a higher proportion of unique haplotypes. Similarly, Buchholz et al. (2004) surveyed 40 game farm wild-derived turkeys and identified 40 unique genotypes by Southern hybridization. Interestingly, the single turkey sampled from Minnesota was found to be homozygous at the surveyed loci. Wild turkey populations were at one point extirpated from Minnesota after 1880, as the result of habitat loss during settlement and unregulated hunting. However, the species was successfully reintroduced in the 1970s and the population in the state was estimated at 60 000 in 2006 (Minnesota Department of Natural Resources) with annual harvests over 5000 birds per year since 2000. It would be interesting to determine the degree to which MHC variation may have been restricted in the Eastern subspecies as a result of this reintroduction as it could present a concern with respect to disease resistance.

The high level of sequence diversity is reflected in the inability to establish common haplotypes among wild turkeys, even while using relatively common alleles of extremely close genomic proximity. These results can be explained by high levels of recombination and gene conversion within the region, as has been previously suggested (Chaves et al., 2010). Novel mutations are not likely responsible for the inability to infer and assemble common haplotypes as the polymorphisms used were of higher allele frequencies and thus are likely ancient polymorphisms segregating within the species.

LD was virtually non-existent within the wild turkey MHC-B, although the MHC is one of the highest LD regions of the human genome (Horton et al., 2004). Although LD within the chicken MHC has yet to be reported, LD has been studied on other chromosomes. In a large study of several commercial breeding lines, Andreescu et al. (2007) found the extent of LD in the chicken genome lower than other agricultural species. Strong LD (r2>0.8) was observed in 10% of SNPs within 0.5 cM. Further, 24% of SNP pairs (within 0.5 cM) had an r2 value >0.5. In a study of wild chickens (Red Jungle Fowl) LD was found to be higher across a 200 kb region of chromosome 1 compared with two other commercial lines (Rao et al., 2008). In the study of a second galliform, Berlin et al. (2008) examined nucleotide diversity at 18 autosomal loci in wild populations of the Scandinavian willow grouse (Lagopus lagopus). Measures of recombination and population structure at the nuclear level indicated a mature panmictic population with large effective population size. High levels of nucleotide diversity and low LD were observed suggesting LD was shaped by high recombination and purifying selection. Unfortunately, the MHC was not included in the willow grouse study.

The SNPs used to measure LD in the turkey MHC span a distance of 100 kb, which correspond to a genetic distance of 0.25 cM in commercial families (Chaves et al., 2010). Within the wild birds, only four SNP pairs (0.6%) had an r2 value >0.8 and 14 (2%) had an r2>0.5. These values are lower in the wild birds. Several factors could be contributing to the low LD. As stated above, high levels of recombination and/or gene conversion can lead to the loss of LD. On the basis of the results of this study, the total number of haplotypes present in wild turkeys is high. The total number of samples surveyed may not have been large enough to accurately and successfully identify alleles in disequilibrium. Further, sequencing samples in this manner may not sufficiently address the potential for null alleles, which can lead to the loss of LD.

This study focused on sequencing dispersed amplicons in the turkey B-locus to avoid the inherent complications of quantifying variation in polymorphic loci within multi-gene families. This sequencing of 40 individuals found nearly twice as many polymorphisms per base pair in wild turkey as compared with commercial birds (Chaves et al., 2010) indicating a reduction in MHC alleles and potential heterozygosity loss through domestic selection. Over 100 new MHC sequence polymorphisms were identified, with many novel SNPs having lower MAF compared with those shared among wild and commercial birds. Although some of these may be indicative of relatively recent mutations, lower MAF directly reflects the MHC diversity of the sampled individuals. This study indicates that wild turkeys represent a valuable reservoir of MHC diversity. MHC alleles from wild birds could be introgressed into commercial populations if, for example, a wild MHC haplotype is shown to be particularly resistant to a disease of commercial significance.