Main

MHC gene products determine the repertoire of T cell responses that an individual can generate against pathogens and foreign tissues1,2. The genes encoding MHC class I sequences are among the most polymorphic in vertebrate genomes3. Therefore, comprehensive MHC genotyping methods are a major foundation for the study of T cell responses.

Rhesus (Macaca mulatta), cynomolgus (M. fascicularis), and pig-tailed (M. nemestrina) macaque monkeys provide essential preclinical models for infectious disease, vaccine, biodefense and transplantation research4,5,6,7,8,9. Unfortunately, the utility of macaque models for immunological research has been hindered by the unprecedented complexity of their MHCs. Whereas human leukocyte antigen (HLA) haplotypes contain only three classical class I genes (HLA-A, HLA-B and HLA-C), macaque class I loci have undergone a complex series of segmental duplications such that gene content varies between macaque MHC haplotypes10. Genomic sequencing of the MHC region suggests that rhesus and cynomolgus macaques have at least 22 functional class I genes transcribed at varying levels11,12,13,14. Furthermore, MHC class I allelic polymorphisms are largely species specific, with geographically isolated subpopulations of the same species rarely sharing MHC class I sequences15,16,17,18,19. More than 900 macaque MHC class I sequences are currently known, but many more remain to be characterized. Robust genotyping assays are available for less than 5% of these sequences20.

The development of an ultra–high-throughput platform for comprehensive MHC class I genotyping of macaques is needed to maximize the utility of these animals as research models. Here we describe the adaptation of massively parallel pyrosequencing of cDNA-PCR amplicons for MHC genotyping of rhesus, cynomolgus and pig-tailed macaques. This technology reveals that the number of MHC class I transcripts in each macaque is higher than previously recognized, underscores the number of MHC class I sequences yet to be characterized and provides a feasible approach for complete MHC class I genotyping of all macaques used in biomedical research.

Results

Macaque MHC genotyping by pyrosequencing

We designed a universal 190–base pair (bp) cDNA-PCR amplicon with primers based on highly conserved sequences within macaque MHC class IA and IB loci (Fig. 1). This amplicon spans the first of two highly polymorphic peptide binding domains encoded by class I loci1. Diagnostic polymorphisms within this amplicon allow for unambiguous resolution of 175 of 418 (42%) rhesus macaque class I sequences currently available in the Immuno Polymorphism Database21. The vast majority of MHC sequences that cannot be uniquely resolved are closely related variants that can be assigned to distinct class I lineages.

Figure 1: Polymorphic variation of known Mamu class I gene products.
figure 1

(a) Domain structure of macaque class I genes. Exon 2 corresponds to the α1 domain. (b) Distribution of amino acid variability for Mamu class I gene products. We aligned predicted amino acid sequences of 418 previously described Mamu-A and Mamu-B alleles and plotted the frequency of differences from consensus for each amino acid residue. Arrows indicate locations of the PCR primers used in this study in highly conserved domains flanking the peptide-binding domain encoded by exon 2.

We performed pyrosequencing of amplicons from 48 cynomolgus, pig-tailed, Indian-origin and Chinese-origin rhesus macaques in a single pilot run on a Genome Sequencer FLX (GS FLX) instrument. We subdivided these amplicons into four pools, each containing products from 12 macaques that were distinguished by 10-bp multiplex identifier (MID) tags, molecular barcodes incorporated during the primary PCR (Supplementary Note). We acquired nearly 500,000 high-quality sequence reads containing a total of just over 100 million high-quality bases. These data translated into an average of 9,315 reads per macaque (range, 7,538–10,769 reads) for the Indian rhesus macaque amplicon pool.

To evaluate the detection of known macaque class I alleles and test the sensitivity of the GS FLX pyrosequencing approach, we first examined four Mauritian cynomolgus macaques that are homozygous for well-characterized MHC haplotypes22. This geographically isolated population has extremely limited MHC diversity due to its recent expansion from a small founder population. We observed all MHC class I A (Mafa-A) and MHC class I B (Mafa-B) sequences previously described for the most frequent Mauritian M1 haplotype, with transcript levels ranging from 27.8% of total class I sequence reads for Mafa-B*0440101 down to 1.4% for Mafa-B*0550101 (Fig. 2a). In addition, we detected five sequences not previously observed by cloning and Sanger sequencing (transcript levels between 0.3–2.2% of total sequence reads) (Fig. 2a). We obtained comparable results for the remaining three MHC-homozygous Mauritian cynomolgus macaques as well as for eight heterozygous macaques (Supplementary Figs. 1 and 2). Each of the Mauritian MHC haplotypes carries an average of seven transcribed Mafa-B sequences plus two or three classical Mafa-A and nonclassical Mafa-E class I sequences.

Figure 2: MHC class I transcript abundance profiles.
figure 2

The frequency of each class I sequence is indicated as a percentage of the total MHC class I sequence reads that we evaluated for each macaque. Open bars indicate MHC class I sequences that have not been described previously. Group-specific designations such as Mafa-A2*05g indicate the large Mafa-A2*05–like family of sequences, which differ by a few nucleotide substitutions outside exon 2. Slashes indicate that a given sequence is ambiguous for two or more class I alleles. (a) Mauritian cynomolgus macaque that is homozygous for the M1 haplotype22. (b) Indian rhesus macaque that is homozygous for the B24 haplotype17. (c) Chinese rhesus macaque that is homozygous for a previously unknown Mamu-B haplotype and expresses several abundant Mamu-B sequences that have not been described previously.

We obtained analogous results from rhesus macaques (Supplementary Figs. 1 and 3). For example, one Indian-origin rhesus macaque (Fig. 2b) is homozygous for a common MHC class I B (Mamu-B) haplotype that we detected in nine unrelated macaques (Supplementary Fig. 3). Together with the abundant transcripts for Mamu-B*02401 and Mamu-B*01901, we detected seven additional Mamu-B–like sequences that had not previously been associated with this haplotype at relatively low transcript levels (0.4–6.7% of total class I sequence reads)17 (Fig. 2b). In contrast to the comparatively well-characterized class I sequences of Indian-origin rhesus macaques, in a homozygous Chinese-origin rhesus macaque (Fig. 2c), four of six Mamu-B–like sequences had not been reported previously; two of these represent the predominant Mamu-B transcripts expressed by this Chinese rhesus macaque. The prevalence of previously undescribed sequences was even more pronounced for pig-tailed macaques, in which only limited class I allele discovery efforts have been described to date. Of the 136 distinct MHC class I sequences observed in 12 pig-tailed macaques, we detected over 100 previously unknown MHC class I transcripts (Supplementary Figs. 1 and 4).

The success of our pilot study prompted us to examine whether we could maximize the efficiency of GS FLX genotyping for large cohorts by reducing the depth of sequence coverage. In a follow-up study, we pyrosequenced four amplicon pools containing 12 rhesus macaques each in one of 16 regions of a 70 × 75 mm Standard PicoTiterPlate. This decreased the sequencing depth by an order of magnitude to 800 sequence reads per macaque. Even with this reduced depth of coverage, we identified an average of 20.5 distinct MHC class I sequences per macaque, as compared to 24.3 sequences per macaque in our pilot study. This modest reduction in sensitivity notwithstanding, GS FLX analysis still provides considerably more comprehensive genotyping than existing methods15,16,17,18,19,20. The MHC class I sequences detected for these additional 48 macaques, as well as their relative transcript levels, are shown in Supplementary Figures 1 and 3.

Accuracy of pyrosequencing-based MHC genotyping of macaques

Sequence-based genotyping methods may be confounded by errors that accumulate as a result of polymerase misincorporations or sequencing artifacts. To diminish the number of sequence artifacts evaluated manually for each macaque, we added a simple filtering step, requiring a minimum of five (pilot study) or two (follow-up study) identical reads for a sequence to be included in the downstream Nucleotide Basic Local Alignment Search Tool (BLASTN) analysis (Supplementary Note). More than 98.3% of the resulting filtered reads were consistent with known or previously undescribed MHC class I sequences by BLASTN analysis (Table 1). With the filter step, we reduced the overall error rate of these data to <1.7% of the sequence reads evaluated subsequently, for both the representative macaques illustrated in Figure 2 and the full cohort (detailed analysis available in Supplementary Fig. 5). Excluding this low level of artifacts entails straightforward, manual editing, accomplished by intra- and intermacaque sequence comparison. Thus, the error rate in GS FLX pyrosequencing is acceptably low. We applied this multi-step analysis process to all of the MHC class I genotyping data presented here.

Table 1 Analysis of sequence artifacts

To exclude the possibility that the MHC sequences detected at low levels represented experimental artifacts, we examined the distribution of MHC class I sequences in pedigreed cynomolgus macaques. These sequences should not be inherited if they are resulting from random errors during reverse transcription or PCR. Each progeny inherited the same haplotype from the sire, whereas the haplotypes of the dam segregated between her offspring (Fig. 3a). The relative abundance of each MHC transcript was remarkably consistent on the haplotypes shared among the offspring and their parents (Fig. 3a). Notably, we detected even those alleles that are present in as little as 0.2% of the total class I transcripts for these shared haplotypes (Fig. 3a).

Figure 3: Shared MHC class I transcript abundance profiles.
figure 3

(a) The four haplotypes in a breeding group of cynomolgus macaques are labeled 1–4. Both progeny inherited haplotype 2 from the sire, whereas haplotypes 3 and 4 of the dam segregated between the offspring. (b) These three Indian rhesus macaques share the Mamu-B11a haplotype, for which a complete genomic sequence has been published. InRh designates animal identification numbers, whereas InB and ChB indicate Mamu-B haplotypes of Indian and Chinese origin, respectively.

As a second approach to examine the accuracy of this genotyping method, we analyzed Indian rhesus macaques that share the B11a haplotype11,17. This haplotype is of special interest as it represents the only complete macaque genomic sequence currently available for this exceptionally complex region12. The B11a haplotype carries 19 Mamu-B–like loci that have the potential to encode at least 14 functional gene products. Previous cDNA cloning and Sanger sequencing identified transcripts for only eight of these loci11,17. However, with the increased sensitivity of GS FLX analysis, we identified messenger RNA transcripts from at least 13 of the loci predicted by genomic sequencing (Fig. 3b). Between six and 13 Mamu-B sequences are transcribed from each of the haplotypes carried by these three macaques (Fig. 3b). As with the cynomolgus macaque breeding group described above, the relative transcript abundance of class I sequences detected from the shared B11a haplotype was very similar, despite the order of magnitude difference in depth of sequencing (Fig. 3b). Furthermore, we consistently observed similar class I transcript profiles for other ancestral haplotypes shared by unrelated macaques (Fig. 3b), suggesting that GS FLX analysis provides at least a semiquantitative representation of the relative class I transcript levels within an individual. We illustrate transcript profiles for additional shared haplotypes in Supplementary Figure 6, further demonstrating the reproducibility of this technique.

Identification of high-frequency Mamu class I sequences

Overall, we generated comprehensive MHC class I genotypes and expression profiles for 68 Indian- and Chinese-origin rhesus macaques obtained from four independent sources. These results allowed us to begin to identify class I sequences that are relatively frequent in rhesus macaques. Of the 287 distinct class I sequences detected within our rhesus macaque cohort, there were 33 distinct Mamu-A, Mamu-B and Mamu-E sequences in at least 10% of this cohort and expressed at relatively high transcript levels (≥4% of the total sequences per macaque) (Table 2). These high-frequency alleles may represent high-priority targets for additional functional immune characterization.

Table 2 Common rhesus macaque class I sequences that are highly expressed

Using this genotype data, we also inferred the gene content of MHC haplotypes (Supplementary Figs. 3 and 7) and considerably extended the number of MHC class I sequences associated with previously described Mamu-A and Mamu-B haplotypes of Indian- and Chinese-origin rhesus macaques11,16,17. Unexpectedly, all but six of 64 haplotypes observed in our Indian rhesus macaques could be accounted for by 12 previously described Indian-origin Mamu-B haplotypes (Supplementary Figs. 3 and 7). Consistent with the greater genetic diversity expected for Chinese-origin rhesus macaques, less than one third of the 72 Mamu-B haplotypes in our cohort reflected previously reported configurations17,18. However, we did infer at least eight new Mamu-B haplotypes in these macaques on the basis of the sharing of five or more identical class I sequences between two or more macaques (Supplementary Figs. 3 and 7).

Discussion

These data prove that massively parallel pyrosequencing can provide comprehensive and cost effective MHC class I genotyping. We applied this technology to macaques, which have the most complex MHC genetics of any primate species described to date and have frustrated genotyping efforts for more than a decade. Comprehensive MHC genotyping has the potential to revolutionize the use of macaques in infectious disease and transplantation research and to guide functional immunology studies. Retrospective genotyping of macaques previously used in pathogenesis research may provide a more complete understanding of MHC-restricted cellular immune responses that are key in protective immunity and resistance to infectious diseases6,23,24. Prescreening of macaques used in vaccine trials could balance these MHC sequences between experimental groups and reduce complications from overrepresentation of specific sequences that influence the quality of the cellular immune response25. This technology could also rapidly identify the most common MHC class I sequences in every macaque population used in biomedical research, enabling the selection of macaques predicted to share T cell responses or prioritizing sequences for functional characterization.

There are straightforward ways to improve upon the results obtained here. We designed the 190-bp amplicon to span the most polymorphic region of MHC class I molecules (Fig. 1) while retaining compatibility with current sequencing technology. Longer amplicons would allow for unique discrimination of more alleles and allelic variants, with the ultimate goal of full-length transcript sequencing to unambiguously determine the exact complement of class I sequences in an individual. We have performed preliminary studies with a 367-bp amplicon that uses an alternative reverse primer located in exon 3. This longer amplicon provided improved resolution between closely related class I alleles and overcomes concerns about sequence artifacts resulting from contamination with genomic DNA, as the longer amplicon spans an intron26. Pyrosequencing technology is rapidly improving and will soon allow for read lengths up to 500 bp. With this advance in mind, we have designed a new amplicon that spans 477 bp between conserved sequences in exons 2 and 4 of macaque class I genes. Genotyping with this longer amplicon will allow unambiguous resolution of three out of the four of the rhesus macaque class I sequences currently available in the Immuno Polymorphism Database21. Additionally, data from overlapping amplicons could be assembled to provide full-length MHC class I sequences. In silico studies with representative Indian rhesus macaques suggest that full-length class I sequences can be reconstructed from three overlapping amplicons once a pyrosequencing read length of at least 400 bp can be achieved (R.W.W., D.H.O., T.H. and B. Simen, unpublished data). Together, these approaches will allow for the new sequence fragments identified by genotyping to be resolved into full-length MHC class I transcript sequences.

Pyrosequencing may also be used to dramatically improve upon existing technologies for genotyping other highly polymorphic loci. Obvious candidates include MHC class II, killer immunoglobulin receptor or T cell receptor transcripts. This approach may also accelerate HLA class I genotyping of humans. As there are only three HLA class I genes per chromosome, each transcribed at roughly equal levels, genotyping can be achieved with far fewer sequence reads than in macaques. Given the yield from our macaque studies, HLA class I genotypes for thousands of individuals could be generated in a single GS FLX instrument run. Such ultra–high-throughput typing may be valuable for tissue donor registry programs as well as genetic epidemiology and whole-genome association studies27.

Methods

Macaque samples. We examined samples from 92 macaques obtained from nine institutions (Supplementary Note). Indian-origin and Chinese-origin rhesus macaques were represented by 32 and 36 samples, respectively, whereas 12 samples each came from cynomolgus and pig-tailed macaques. All macaques were cared for according to the regulations and guidelines of the Institutional Care and Use Committees at their respective institutions (Supplementary Note).

Primary cDNA-PCR and pooling strategy. We converted total cellular RNAs to cDNA using a Superscript III First-Strand Synthesis System (Invitrogen). We generated primary cDNA-PCR amplicons spanning 190 bp of exon 2 of macaque class I sequences with high-fidelity Phusion polymerase (New England Biolabs). Each PCR primer contained one of 12 distinct 10-bp MID tags along with adaptor sequences for 454 Sequencing (Supplementary Note). After purification, we normalized primary amplicons to equimolar concentrations and pooled groups of 12 macaques for GS FLX analysis.

Emulsion PCR and pyrosequencing. We performed the emulsion PCR and pyrosequencing steps with Genome Sequencer FLX instruments (Roche/454 Life Sciences) using GS FLX protocols according to the manufacturer's specifications (454 Life Sciences)27,28 at the 454 Sequencing Center and the University of Illinois at Urbana-Champaign High-Throughput Sequencing Center (Supplementary Note). We sequenced each amplicon pool of twelve macaques in one fourth of a 70 × 75 mm Standard PicoTiterPlate (Roche/454 Life Sciences) for the pilot study, whereas we used one-sixteenth plate regions for each of four pools in the follow-up experiment.

Data analysis. After image processing and base calling with GS FLX software (454 Life Sciences), we binned high-quality sequence reads by their respective MID tags and assembled the reads into contigs with 100% identity for each macaque using SeqMan Pro Version 8.0.2 (DNASTAR). We performed BLASTN analyses for the resulting contigs against a custom in-house database of macaque MHC class I sequences (Supplementary Note). To normalize transcript abundance levels between macaques, we divided the number of sequence reads detected for each distinct class I sequence by the total number of sequences reads which formed contigs in each macaque. We designated MHC class I sequences not previously deposited in GenBank with a species abbreviation and the locus to which they are most similar (Mf-B*nov001 is the first class IB–like sequence identified in cynomolgus macaques). We would like to note that macaque class I nomenclature has been modified recently to include an extra '0' in the allele lineage designations to maintain consistency with human HLA nomenclature and cover ever expanding allele lists (for example, Mamu-A*01 is now Mamu-A1*001). Information concerning relationships to previous nomenclature and details for each sequence are available at the Immuno Polymorphism Database (www.ebi.ac.uk/ipd/mhc/nhp/nomenclature.html)21.

Accession codes.

We deposited new MHC class I sequences identified in this study to GenBank under accession numbers GQ153320GQ153527 (Supplementary Fig. 1).

Note: Supplementary information is available on the Nature Medicine website.