Recombination and selection in the major histocompatibility complex of the endangered forest musk deer (Moschus berezovskii)

The forest musk deer (Moschus berezovskii) is a high elevation species distributed across western China and northern Vietnam. Once abundant, habitat loss and poaching has led to a dramatic decrease in population numbers prompting the IUCN to list the species as endangered. Here, we characterized the genetic diversity of a Major Histocompatibility Complex (MHC) locus and teased apart driving factors shaping its variation. Seven DRB exon 2 alleles were identified among a group of randomly sampled forest musk deer from a captive population in the Sichuan province of China. Compared to other endangered or captive ungulates, forest musk deer have relatively low levels of MHC genetic diversity. Non-synonymous substitutions primarily occurred in the putative peptide-binding region (PBR), with analyses suggesting that recombination and selection has shaped the genetic diversity across the locus. Specifically, inter-allelic recombination generated novel allelic combinations, with evidence for both positive selection acting on the PBR and negative selection on the non-PBR. An improved understanding of functional genetic variability of the MHC will facilitate better design and management of captive breeding programs for this endangered species.

Scientific RepoRts | 5:17285 | DOI: 10.1038/srep17285 -both inter and intra-allelic -is also thought to be operating on the generation of novel MHC alleles 5,11,12 and facilitating adaptive evolution 13 . It has been proposed that MHC genes, due to their important biological functions, should play a role in the design of programs to conserve genetic diversity in captive populations of endangered species, and generally be maintained in natural populations 14,15 . Thus, characterizing the levels of genetic variability and exploration on mechanisms maintaining such diversity at MHC loci are prerequisites if such information is to be included when designing captive breeding and management programs. Further, by borrowing assays developed in closely related species (i.e. genome enabled 2 ), the ability to screen functional diversity in wild, often rare and endangered populations has been greatly enhanced.
One species that could benefit from such investigations is the endangered forest musk deer (Moschus berezovskii). China encompasses the primary range of the forest musk deer, but poaching and habitat loss have had a severe impact. The population has declined dramatically in the past five decades, prompting an endangered listing by the International Union for Conservation of Nature (IUCN 16 ), Appendix I designation by the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES 2000), and a National Class I protected species designation in China. Forest musk deer in the wild are estimated to number between 100,000-200,000 17 , with this number declining 16 . At present, approximately 7,000 forest musk deer are kept in pseudo captivity 18 . These programs are important for both conservation efforts and economic profit because the musk produced by males is used in Chinese and Vietnamese medicines (and is the driver behind the illegal harvest).
Previous studies have described the variation of MHC DRB genes of forest musk deer 19,20 , but these data are not in public data repositories nor have been examined in a broader context (i.e. estimate species wide functional genetic diversity). In this study, we genotyped the MHC class II locus in a captive forest musk deer population, and analyzed patterns of genetic diversity and molecular evolution of DRB exon 2, specifically focusing on inferring how recombination and selection shape genetic diversity. We compared our diversity data to additional species to put our data in context with other ungulate MHC diversity estimates, and discussed the conservation and evolutionary potential of forest musk deer.

Results
Applying the criterion that an allele must be observed in at least two independent PCRs, we detected seven unique alleles in musk deer with six being present in multiple individuals (Genbank Accession nos: KP763631-KP763636, KP835297), with one haplotype (Mobe-DRB*04) contained deletion. We only ever observed a maximum of two alleles per individual, suggestion that only one locus was amplified.
The allelic sequences of 249 bp of DRB exon 2 were aligned and translated into the corresponding amino acid sequences. No amino acid sequence showed evidence of a reading-frame shift or stop codons. DRB sequences of forest musk deer along with standard Bos taurus DRB MHC loci (Bola DRB1, DRB2, DRB3) were included to examine whether all of the Mobe DRB sequences were derived from the same locus. The phylogenetic trees showed that all of the Mobe DRB alleles fell into the DRB3 category (Fig. 1). Sequence variation. At the sequence level, 43 of the 249 nucleotides (17%) were variable; 19 amino acid residues out of 83 (23%) were polymorphic. Thirteen of the 19 variable amino acid sites (68%) were found within the putative PBR (Fig. 2). Compared to the bovine DRB3 loci in our analysis, there was relatively low diversity observed in the forest musk deer (Table 1).

Recombination and selection analysis.
Under the assumption of orthology, we detected between 1 and 4 recombination events ( Fig. 2 and Table S2). The relative rates of non-synonymous (dN) and synonymous (dS) are shown in Table 2: notably, the ratio of dN to dS was 1.88 in the PBR, and 0.31 in the non-PBR. Across the entire exon, we detected a positive Tajima's D (2.01; Table 1). Visual characterizing via a sliding-window analysis of π and Tajima's D suggested that Mobe DRB locus is under both positive   Table 2. Relative rates of non-synonymous (dN) and synonymous (dS) substitutions (with standard errors) calculated for bovine and Mobe DRB exon 2, averaged over all sites, the peptide binding region (PBR) and non-PBR. Statistical significance was tested using pair-wise T-tests. and negative selection pressures (Fig. 3). We found 8 residues showing evidence of positive selection ( Fig. 2), mostly corresponding to the location of the putative PBR (there are 24 residues within the putative PBR in humans 21 ). Accounting for recombination, the selection tests produced differing results; notably, all of three methods taking into consideration the presence of recombination detected fewer residues under positive selection (Tables S4 and S5).

Discussion
We assayed the MHC diversity in an endangered population of forest musk deer, showing that positive selection and recombination has contributed to the contemporary diversity. While the genetic signatures are consistent with selection, these could be confounded the by the demographic history (i.e. a bottleneck also produces a positive Tajima's D value), or a combination of both. Visually assessing the DRB amino acid structure of Li et al. 19 , we found that our samples contain 5 of the 6 alleles observed in that population, suggesting we have captured a large portion of the MHC diversity seen in the species for this region. This is an important observation as the wild population numbers are declining, and captive populations are suffering outbreaks of disease 22 , and thus characterizing and understanding the driver behind MHC diversity is critical for this species. We observed seven Mobe DRB exon 2 alleles in the forest musk deer (two different from what was observed in Li et al. 19 that screened 65 individuals). While comparisons among species are confounded by a multitude of factors (i.e. demographic and phylogeographic histories, spatial and temporal sampling), they can be useful for identifying broader trends in MHC diversity 23 , and in the case of musk deer, assaying the relative diversity of an important functional gene where diversity is thought to harbor benefits is clearly informative. When compared with other studies on endangered or bottlenecked ungulates (see Table 3), forest musk deer were similar to Spanish ibex and chamois 24,25 , more diverse than Arabian oryx and mountain goat 26,27 , but less diverse than white-tailed deer and bighorn sheep 28,29 . The latter two species are common, relatively widely distributed ungulates in North America, so this relative ranking of diversity makes some biological sense. However, other immune genes have been shown to harbor high levels of diversity when the DRB locus appears depauperate 27 , thus it is possible that forest musk deer genetic diversity is higher among other immune-relevant genes.
Collectively, our study and that of Li et al. 19 suggests the haplotype diversity we found reflect broader patterns of species diversity, at least in Sichuan province. It is reasonable to believe that the lack of extensive polymorphism at the DRB locus is due to the initial founding event of the captive population and/ or a historical population decline (as would be supported by the positive Tajima's D). Regardless of the   driving factor, this relative low level of variation in its adaptive immune system might make this species more susceptible to disease outbreaks 20 . Numerous mechanisms help generate diversity at the MHC loci 5,12,30 . The higher rate of non-synonymous substitutions, we suggest, are likely in response to co-evolution with exogenous antigens as the polymorphisms were predominantly located in the PBR. We also found evidence for multiple (intra-allelic) recombination events creating novel DRB haplotypes. While the methods varied in their conservative nature, the breakpoints at position 87 and 221 were detected by all seven approaches are close to the known connection sites (position 88 to 93, and position 223 to 228) between three domains of β -strand region, the major part of α -helical region, and the carboxyterminal part of α -helical region 9 (Fig. 2). We hypothesize that the differences in secondary structure might produce a recombination hotspot in this region of the MHC. Furthermore, our estimated population scaled recombination rate (ρ = 11.28) and mutation rate (θ = 13.88) revealed that recombination and mutation together function on the generation of Mobe DRB diversity (ρ /θ = 0.81), which is similar to observations made on other ungulates 31 . It is worth noting that Mobe-DRB*04′ contains a codon-deletion, which has also been found in both bovine and cervid DRB genes 32 , and the breakpoint found directly at the codon-deletion site (position 171) suggests inter-allelic recombination 32 .
Fitness-related genes, such as MHC genes, are often under a variety of selection pressures 33 . Previous studies concluded that balancing selection maintains allelic diversity at MHC in vertebrates 34,35 and acts in either an overdominant or frequency-dependent fashion 36,37 . Protein-coding sequences are often subject to purifying selection due to constraints on function, with positive selection operating on a small number of non-synonymous sites 38 . As the MHC class II molecule identifies and binds to continually changing array of foreign peptides, positive selection is expected in parts of the PBR 31 . In fact, some of the musk deer residues under positive selection (i.e. amino acid position 11), were the same as those observed by Schaschl et al. 31 that screened over a dozen ungulate species. Thus, collectively our analyses support a role for recombination and selection in the generation and maintenance of forest musk deer MHC diversity.
Given the conservation status of musk deer, what does the observed MHC diversity mean for their persistence and survival? Forest musk deer have seen a drastic decline in numbers over the past half-century and are classified as endangered. Poaching and habitat loss have been the main drivers 15 , and this has brought on additional issues associated with small or declining populations. In particular, a high rate of mortality caused by musk deer purulent disease 20 has become a major problem in captive populations. Emaciation and malnutrition caused by parasites is another problem captive populations are facing. Li et al. 20 found differing susceptibility associated with the disease according to MHC class II haplotype (in a sample size of 99 individuals, 53 with disease symptoms), with nearly all the DRB exon 2 diversity observed in that study also being observed in our captive population. This is useful information for selective breeding programs as it suggests the species potentially harbors the required diversity to fend off pathogens. While a more comprehensive screening of resistant and susceptible haplotypes is warranted given the small sample size of Li et al. 20 and the novel alleles detected in this study, we argue that managers-under the working assumption that certain haplotypes might harbor disease resistance in this species-should consider maximizing MHC variation in their breeding pool. We also advocate for further screening of the Class I loci, especially given the viral nature of purulent disease 39 , as well as non-MHC immune genes 40,41 . An enhanced screening program should be obtainable given the number of animals in captivity and genome-enabled nature of musk deer: we therefore urge musk deer captive breeders and wildlife managers and veterinarians to work together in this regard.

Methods
Sampling and DNA preparation. Fifty-two forest musk deer were randomly selected from a musk deer breeding centre in Li county, Sichuan province, China (N 31.662°, E 102.810°). The breeding centre houses approximately 500 individuals. Fresh blood samples were collected in vacuum tubes with EDTA and then brought to the laboratory and stored at − 80 °C. All samples were collected in strict compliance with the Chinese Wildlife Conservation Act. This study was approved by the Wildlife Protection Station of Li county. All surgery was performed with the help of a local veterinarian, and all efforts were made to minimize suffering. This study was also approved by the Sichuan Pianzaihuang Musk Deer Corporation, who managed the musk deer we sampled. No other endangered or protected species were involved. Genomic DNA of forest musk deer was extracted from samples using the blood DNA extraction kit (Biomed technology Ltd.) according to the manufacturer's protocol.
PCR and MHC genotyping. The entire Exon 2 of the DRB gene was amplified with polymerase chain reaction (PCR) using primers LA31: 5′ -GATGGATCCTCTCTCTGCAGCACATTTCCT; and LA32: 5′ -CTTGAATTCGCGTCACCTCGCCGCTG 42 . This pair of primers was chosen from those previously designed bovine oligos. We assume a high degree of synteny based on the close phylogenetic relationship between Bovidae and Muschidae revealed by Hassanin and Douzery 43 . The amplification reaction was performed in 30 μ l and contained 50 to 100 ng of genomic DNA, 15 μ l of 2 × Taq PCR Master Mix and 0.5 μ M forward and reverse primers. The PCR cycling parameters were as follows: denaturation at 94 °C for 5 minutes, 34 cycles of 94 °C for 30 seconds, 50 °C for 30 seconds, 72 °C for 45 seconds, and a final extension at 72 °C for 8 minutes. PCR products were visually assessed on a 1% agarose gel, and purified Scientific RepoRts | 5:17285 | DOI: 10.1038/srep17285 using the Axygen purification kit. The purified products were directly sequenced on an ABI PRISM 3730 XL DNA Sequencer.
Visual examination of the chromatograms showed that for the some individuals, multiple exon sequences were present suggesting multiple alleles or loci were present. For non-homogenous amplicons (i.e. samples with multiple sequences), PCR products were cloned. Here, heterozygous amplicons were cloned into the PMD18-T vector (Takara Bio Inc.) and transformed into DH5α competent cells following manufacturer's protocol. A minimum of 16 transformed sub-clones were picked for each individual and then subjected to a PCR test to verify the insert size. Here, each 10 μ l reaction mixture contained 2 × Taq PCR Master Mix, 0.5 μ M specific forward and reverse primers under the following thermocycling conditions: 95 °C for 5 minutes, 33 cycles of denaturation at 94 °C for 30 seconds, annealing at 57 °C for 30 seconds, elongation at 72 °C for 45 seconds and a final extension at 72 °C for 10 minutes. Inserts with an appropriate length (~300 bp), assessed by eye on a 1% agarose gel, were then sequenced using the plasmid primers (M13).
Cloned sequences were compared within individuals and then checked for consistency by comparison with the sequences derived from direct sequencing of the amplicons. Additional subclones were sequenced when not all heterozygous sites (multi-peaks at the same site) observed in the direct sequencing were obtained (sequencing between 4 to 21 subclones were required). To ensure allele calls were not spurious or the result of sequencing error, each haplotype obtained for each individual had to be observed in two separate clonal sequences. Possible chimeras produced in the process of PCR were identified by comparison within individuals and then removed from the sequence pool. Nomenclature of forest musk deer DRB alleles followed Klein et al. 44 , henceforth referred to as Mobe DRB.
Sequence analysis of polymorphism. Sequence alignment and translation were performed using MEGA 5 45 . The putative peptide-binding residues (PBR) were identified based on human HLA molecules 21 . The best-fitting nucleotide substitution model and the corresponding transition/transversion ratio (R) were determined by MEGA with the selected model based on the lowest BIC scores (Bayesian Information Criterion). A maximum likelihood (ML) phylogenetic tree was constructed using MEGA 5.0 and confidence assessed with one thousand bootstrap replicates. We also constructed a phylogenetic tree using MrBayes 3.2 46 and the following criteria: 1,000,000 generations, with tree sampling every 10 generations and burn-in after 25,000 trees. DnaSP 5 47 was used to calculate sequence polymorphism indices, including average number of nucleotide differences (k), number of segregating sites (S), and the nucleotide diversity per site (π) calculated for the whole sequence (π total ), for non-synonymous sites (π n ) and for synonymous sites (π s ).
Test for intragenic recombination. We employed an array of methods to detect recombination in order to minimize false positives 48 . First, we used the RDP3 package 49 , which employs a variety of methods (Table S1) to detect recombination breakpoint locations: GENECONV examines only silent polymorphic sites and is not strongly influenced by mutation variation or selective effects; MAXCHI identifies crossover points that maximize the difference between the proportions of polymorphic sites occupied by the same and by different bases, before and after the crossover; CHIMAERA uses variable sites to evaluate any possible combination of three sequences; RDP examines all possible triplets of sequences in a sliding window approach to estimate the probability of a recombination event; 3SEQ identifies variable sites that support different partitions of the data 47 . We used an alpha value of 0.05 in these analyses to determine significance. Population scaled recombination rate ρ and mutation rate θ were obtained by using LDhat recombination rate scan 50 . Finally, the GARD method 51 was employed to search for possible recombination partitions.
Detecting selection. We estimated the rate of non-synonymous (dN) and synonymous (dS) substitutions among all pairwise comparisons of Mobe DRB exon and corresponding bovine DRB3 alleles using DnaSP. A sliding-window analysis of π and Tajima's D was performed using a window size of 5 bp and a step size of 2 bp for Mobe DRB alleles. Coalescent simulations with 1000 replicates were run to test window analysis significance of Tajima's D value. Metrics were also estimated for the entire exon, PBR and non-PBR designations, respectively. Statistical significance was tested using T-tests implemented in SPSS 19.
We used site-specific models of codon evolution available in the package PAML 52 to assess the selection. Likelihood values were compared between each pair of models: M0 (one ratio) and M3 (discrete), M1 (nearly neutral) versus M2 (positive selection), and M7 (nearly neutral with the beta distribution) versus M8 (positive selection with the beta distribution). Likelihood ratio tests (LRTs) were carried out for the six models. A Bayesian approach implemented in CODEML (part of PAML) was used to identify residues under positive selection in the MHC class II DRB sequences. It should be noted that most traditional methods of detecting selection assume a single lineage and can be misleading in the presence of recombination 53 . The GARD approach avoids this by screening all alignments to locate all non-recombinant fragments (partitions), and allows each partition to have its own phylogenetic history. Thus the GARD results were incorporated into selection analyses so as to account for the presence of recombination. Three methods, single-likelihood ancestor counting (SLAC), fixed effects likelihood (FEL) and random effects likelihood (REL) analyses, were run on Mobe DRB alleles to detect sites under selection. Briefly, the SLAC compares the observed ratio of non-synonymous and synonymous substitutions using a maximum likelihood reconstruction of ancestral codon states; FEL infers model parameters shared by all sites (e.g., branch lengths) using the entire alignment and then fits dS and dN rates individually at every site; REL allows both synonymous and non-synonymous substitution rates to vary among sites 54 .