Introduction

The human leukocyte antigen (HLA) gene cluster, located on the short arm of chromosome 6, is among the most polymorphic regions in human genome with thousands of documented alleles.1, 2 This cluster includes several genes involved in functions of the immune system, including major histocompatibility complex (MHC) classes I and II.3 Both MHC classes are encoded by three major loci (MHC class I: HLA-A, HLA-B and HLA-C; MHC class II: HLA-DP, HLA-DQ and HLA-DR), which are co-dominantly expressed. The polymorphisms in the HLA region have been reported to play critical roles in rejection and graft-versus-host disease of hematopoietic stem cell transplants and the risks of several diseases, including autoimmune diseases.4, 5 Certain HLA genotypes were shown to link with increased risk of drug-induced skin hypersensitivity and liver inquiry.6, 7 In cancer research, HLA information is very important because HLA class I molecules are critical mediators of the cytotoxic T-cell response, presenting antigen peptides on the cell surface to be recognized by the T cell receptor.8, 9 The recognition of HLA-peptide complexes on cancer cells by T-cell receptor of the cytotoxic CD8+ T cells is crucial for anti-tumor immune responses. Indeed, HLA dysfunction caused by genetic and epigenetic alteration in the HLA genes or β2-microglobulin has implicated as a possible mechanism of immune evasion during the development and progression processes of cancer.10, 11, 12, 13, 14 Because of the success of immune checkpoint inhibitors, there is no skepticism against immune-based therapies for cancer and it is almost certain that cancer cells present cancer-specific antigens on HLA molecules. With advances in genome sequencing, it is possible to predict a new class of tumor-specific antigens (‘neoantigens’) derived from somatically mutated proteins that are present uniquely in tumor cells. Immune checkpoint antibody therapies, such as anti-CTLA-4 and anti-PD-1 antibodies, have been used for the treatment of melanoma, non-small cell lung cancer and kidney cancer.15, 16, 17, 18 In fact, accumulative evidence has supported that higher somatic mutation burden and predicted-neoantigen load were strongly associated with better clinical outcome in patients treated with CTLA-4 and PD-1 blockades.19, 20 These lines of evidence support the significance of neoantigen prediction in cancer immunotherapy. Since cancer genome information is usually readily available in these studies, it is important that HLA class I genotype information can be accurately extracted so that possible neoantigens can be predicted, and neoantigen-specific T cells are in a scope for development of next-generation cancer immunotherapies.

HLA allele definition comprises the gene name indicating the locus (that is, A, B or C) followed by successive sets of digits separated by colons.21 While the first two digits (field 1) specify the allele groups by serological activity (allele level resolution, for example, A*01 or A*02), the second field indicates the protein sequence (protein level resolution, for example, A*02:01 or A*02:02). The remaining two sets distinguish synonymous polymorphisms and non-coding variations. HLA typing can be done with different degrees of resolution.22 Conventional HLA typing is performed using serology- and/or PCR-based methods, such as sequence-specific oligonucleotide and sequencing-based typing techniques. These techniques are labor-intensive, time-consuming, and often lead to ambiguous genotyping results because of limitation of oligonucleotide probe design or phase ambiguity for HLA allele assignment.23, 24 Several protocols have recently been established for HLA-targeted multiplexed PCR or long-range PCR methods coupled with next-generation sequencing that enable us to obtain accurate and super high resolution of HLA information up to fourth field level,25, 26, 27 although errors introduced by PCR artifacts, sequencing procedures, bioinformatics or inadequately genotyped sequence references are still inherent problems.28, 29 Recently, several tools to extract HLA allele information from genome-wide sequencing data, including whole-exome, whole-genome and transcriptome data, but it is still challenging to improve the accuracy of these tools.29

Materials and methods

Samples

We used genomic DNA from 12 patients with malignant methothelioma. All samples were obtained under Institutional Review Board approval in the University of Chicago (No. IRB15-0128) and with written informed consent.

HLA typing

Genomic DNA was extracted from peripheral blood mononuclear cells using Qiagen DNA Mini Kit (Qiagen, Valencia, CA, USA). PCR amplicon-based high-resolution typing on MiSeq (Illumina, San Diego, CA, USA) was performed for HLA-A, HLA-B and HLA-C in Scisco Genetics Inc. (Seattle, WA, USA).25

Whole-exome sequencing

DNA libraries were prepared using SureSelect XT Human All Exon V5 (Agilent Technologies, Santa Clara, CA, USA) and whole-exome sequencing was performed by 100-bp paired-end reads on HiSeq2500 (Illumina) according to the manufacturers’ protocol.

Fastq data of the 1000 Genomes Project samples

To compare with PCR-based approaches, we selected a total of 961 samples across 13 different populations in the 1000 Genomes Project database (http://www.1000genomes.org/) that had both exome data, which were sequenced by paired-end reads on Illumina sequencer (Genome Analyzer II or HiSeq 2000), and HLA type information experimentally determined by Sanger sequencing (Supplementary Table 1).30 The fastq files were downloaded from the 1000 Genomes Project database.

Read mapping to hg19 reference genome

After the exclusion of low-quality reads (base quality of<20 for more than 80% of bases) using FASTX toolkit (http://hannonlab.cshl.edu/fastx_toolkit/), sequence reads were mapped to the human reference genome GRCh37/hg19 using Burrows-Wheeler Aligner (v0.7.10).31 BAM files were generated using SAMTools (v0.1.19),32 and possible PCR duplicated reads were removed using Picard v1.117 (http://broadinstitute.github.io/picard/).

HLA typing tools

We have compared publically available algorithms for HLA typing, including OptiType,33 Polysolver,34 PHLAT,35 HLAreporter,36 HLAforest,37 HLAminer38 and seq2HLA.39 All algorithms were run according to respective instructions. For Polysolver analysis, we used BAM files generated as described above as an input. For the other algorithms, we used fastq files after filtering low-quality reads (base quality of<20 for more than 80% of bases) as an input. Multiple predictions for an allele at a locus detected in HLAreporter were considered as ambiguous results, and only first field information was used.

Data analysis

Sanger sequencing-based HLA genotype data were used as a reference.30 The accuracy was simply calculated as the ratio between the number of correctly called alleles and the total number of the alleles. The total number of the alleles were 2 alleles × number of samples for the accuracy calculation in each of the HLA-A, HLA-B or HLA-C gene, or 2 alleles × 3 loci × number of samples to calculate the accuracy among all three HLA class I genes (HLA-A,B,C). Ambiguous results were considered as incorrect predictions. Error rate for each miscall was calculated by dividing the number of miscalls by total number of the alleles in the 961 samples. The significance of the error rate was evaluated by Fisher’s exact test. Since ambiguous HLA alleles existed in Sanger sequencing results,30 we compared all possibilities of ambiguous HLA alleles to test concordance of identified alleles, while we merged ambiguous HLA alleles into the most common HLA allele among ambiguous HLA alleles to calculate allele frequency or error rates. Allelic frequency of HLA in 961 samples from the 1000 Genomes Project was summarized in Supplementary Table 2. HLA coding DNA and genomic nucleotide sequences and feature annotation were obtained from the IMGT/HLA (https://www.ebi.ac.uk/ipd/imgt/hla/)1, 2 or dbMHC database (http://www.ncbi.nlm.nih.gov/projects/gv/mhc/). To visualize the mapping data, we used IGV software.40, 41

Results

As a preliminary screening, we evaluated seven reported algorithms for HLA typing: Optitype,33 Polysolver,34 PHLAT,35 HLAreporter,36 HLAforest,37 HLAminer38 and seq2HLA,39 using exome data from 12 clinical samples. Default parameters were applied for all the algorithms, with BAM files mapped to hg19 reference genome as an input for Polysolver analysis, while with fastq files as an input for the other algorithms according to the manual of each tool. As summarized in Supplementary Table 3, the results were discordant among these algorithms. To evaluate the accuracy of each exome-based software, we genotyped HLA of these 12 samples using PCR amplicon-based high-resolution HLA typing on MiSeq.25 We calculated the accuracies in each of HLA-A, HLA-B and HLA-C genes, and the accuracy of all three HLA class I genes (HLA-A,B,C), which was a percentage of correctly identified alleles among the total 72 alleles (2 alleles × 3 loci × 12 samples; see Materials and methods). The accuracies for HLA-A,B,C in each software varied from 36.1 to 100% at the resolution level of the second field (Supplementary Table 4). Based on this result, we focused on three algorithms, Optitype, Polysolver and PHLAT, which showed more than 90% accuracy, for further analyses to investigate patterns and possible causes of miscalls. These algorithms are most recently reported three, and Figure 1 briefly summarizes the workflow for these software. These three used different aligners and different statistical analysis to determine most-likely HLA alleles, and only two of them, Optitype and Polysolver adopted pre-selection step of HLA reads.

Figure 1
figure 1

Workflows of Optitype, Polysolver and PHLAT. The parts enclosed by dotted lines are the steps included in each software.

We selected 961 samples in the 1000 Genomes Project database that had both fastq files of paired-end Illumina exome data (N=1142) and experimentally determined HLA genotype information by Sanger sequencing technique (N=1274) (Supplementary Table 1),30 then ran exome-based HLA typing algorithms to determine HLA class I genotype (Supplementary Table 5). Among these three algorithms, Optitype showed the highest accuracy of 97.2% for HLA class I alleles (HLA-A,B,C) at the second field level, followed by 94.0% in Polysolver and 85.6% in PHLAT (Figure 2 and Supplementary Table 6). The allele estimation accuracies for each of HLA-A, HLA-B and HLA-C by Optitype were as high as 97.3%, 96.6% and 97.7%, respectively. Polysolver achieved the accuracies as high as 93.4% for HLA-A, 92.5% for HLA-B and 96.1% for HLA-C. However, in PHLAT, the accuracies were significantly lower than the other two methods, with the accuracies of 79.1%, 85.1% and 92.8% for HLA-A, HLA-B and HLA-C, respectively. Importantly, nearly 20% of HLA-A alleles were incorrectly assigned by PHLAT.

Figure 2
figure 2

Performance comparison of three exome-based HLA-typing algorithms. Accuracy of HLA alleles typed at the second field (4-digit) and first field (2-digit) resolution in 961 individuals from the 1000 Genomes Project. Accuracy was calculated by the fraction of total number of alleles that were correctly called.

We next investigated specific patterns of discordance in each algorithm. In Optitype, 44 (86.3%) of 51 errors observed in HLA-A were within the same serological allele group, especially in the HLA-A*02 group (60.8%) (Figures 3a and b and Supplementary Table 7). Only 7 (13.7%) of the 51 errors were across different serological types. Similar to HLA-A, most miscalls found in Optitype were within the same serological allele type in HLA-B (43 of 65 (66.2%)) and in HLA-C (40 of 45 (89.9%)) (Figures 3c–f and Supplementary Tables 8, 9). The discordances across serological allele groups detected in multiple samples were HLA-B*45:01 to HLA-B*44:15 and HLA-C*08:02 to HLA-C*05:01, with the error rate of 10.0% and 4.1%, respectively. HLA-C*08:02 to HLA-C*05:01 was also detected in Polysolver and PHLAT with error rates of 4.1% and 6.1%, respectively (Table 1). Although these alleles showed high sequence similarity (Supplementary Figure 1), exact causes for these miscalls are still unclear.

Figure 3
figure 3figure 3

Distribution of the pattern of genotyping errors in HLA-A, HLA-B and HLA-C. (a, b) The pattern of discordant HLA-A alleles within the same serological allele (a) and across different serological alleles (b). (c, d) The pattern of discordant HLA-B alleles within the same serological allele (c) and across different serological alleles (d). (e, f) The pattern of discordant HLA-C alleles within the same serological allele (e) and across different serological alleles (f). nd; not determined (as shown as gray bars).

Table 1 Summary of miscalls observed in multiple software and at least twice in one software

In Polysolver, 78 (61.9%) of 126 discordances in HLA-A were across different serological alleles, and the most common (14 out of 126) miscalls were HLA-A*25:01 to HLA-A*26:01 with an error rate of 93.3% (P=2.1 × 10−7, Figures 3a and b and Supplementary Table 10). This error was observed in one sample using PHLAT and an opposite miscall of HLA-A*26:01 to HLA-A*25:01 was observed in two samples analyzed with Optitype (Table 1). The HLA-A*25:01 to HLA-A* 26:01 error was also observed in two of the 12 samples in our preliminary screening samples (Supplementary Table 3). This miscall was observed regardless to the allele combination; that is, heterozygous with HLA-A*01:01 (N=3), HLA-A*02:01 (N=3), HLA-A*03:01 (N=3), HLA-A*11:01 (N=2) and HLA-A*24:02 (N=3). As shown in Figure 4a, HLA-A*25:01 and HLA-A*26:01 showed 99.2% similarity in 1098 nucleotides in its coding sequence, with only eight nucleotide differences at the 3′ part (226th to 246th bases) of exon 2. This 3′ part of exon 2 in HLA-A*25:01 also has nine nucleotide differences from HLA-A*03:01, which is the HLA-A allele in the hg19 reference. During the first step of the Polysolver pipeline,34 all exome sequence reads were aligned to the human hg19 reference genome (Figure 1), and then the reads mapped to HLA gene loci were extracted as HLA reads. IGV view of a homozygous sample for HLA-A*25:01 after mapping to the hg19 reference showed that there was only one read mapped to the 3′ part of exon 2 of the HLA-A gene (Figure 4b and Table 2). Similarly, in the heterozygous sample for HLA-A*25:01 and HLA-A*03:01, all reads mapped to this 3′ part of exon 2 matched to the HLA-A*03:01 allele (average 23.7 ×) but not to the HLA-A*25:01 allele. Since the sequence reads in HLA-A*03:01 homozygous sample were correctly mapped with an average of 59.9 × depth, we assumed that the HLA-A*25:01 reads were not efficiently mapped to the hg19 reference genome because of the mismatches between HLA-A*25:01 and HLA-A*03:01 sequences. To examine this possibility, we attempted to replace the 3′ part of HLA-A exon 2 of the hg19 reference genome to the corresponding sequences of HLA-A*25:01 allele, and analyzed the same data using this modified genome sequence (Figure 4c). As a result, we could successfully map the sequence reads of the HLA-A*25:01/A25:01 sample to the 3′ part of exon 2 with an average depth of 37.9. In the heterozygous sample with HLA-A*03:01/A25:01, sequence reads mapped to this region were in average 28.2 depth; however, 90% (25.1 ×) of the reads corresponded to the HLA-A*25:01, and the numbers (in average 3.1 ×) of reads corresponding to HLA-A*03:01 were significantly decreased from 23.7 × when we used the standard hg19 reference sequence with HLA-A*03:01. These results clearly indicated that when reference genome sequence with only a single HLA allele was used, serious bias in the mapping of HLA reads occurred, and the initial extracting process of HLA reads using the hg19 reference genome with only HLA-A03:01 allele was the main cause of miscalls in Polysolver. The second most common error in Polysolver was HLA-A*02 to HLA-A*30 with the error rate as low as 1.5% (P=0.0076). In HLA-B and HLA-C genes, no specific discordant pattern was observed for Polysolver, although there were several miscall patterns observed in multiple samples (Figures 3c–f and Supplementary Tables 11, 12).

Figure 4
figure 4

Bias among the HLA sequences affecting exome-based HLA-typing results. (a) Sequence alignment of exon 2 region of HLA-A*03:01 (a coordinate with the hg19 reference genome), HLA-A*25:01 and HLA-A*26:01. The sequence underlined in red is the region where nucleotides are different between HLA-A*25:01 and HLA-A*26:01, and between HLA-A*25:01 and HLA-A*03:01, which is corresponding to the red lines in (b) and (c). (b) IGV views of HLA-A exon 2 region after mapping to hg19 reference genome (corresponding to HLA-A*03:01 allele) in the samples with HLA-A*25:01/HLA-A*25:01, HLA-A*25:01/HLA-A*03:01 and HLA-A*03:01/HLA-A*03:01. Red lines represent nucleotides different between HLA-A*25:01 and HLA-A*26:01 as shown in (a). (c) IGV views of HLA-A exon 2 region after mapping to modified hg19 reference genome, which was replaced by HLA-A*25:01 exon 2 sequences, in the samples with HLA-A*25:01/HLA-A*25:01, HLA-A*25:01/HLA-A*03:01 and HLA-A*03:01/HLA-A*03:01. In IGV viewing, standard hg19 was used as a reference sequence. Red lines represent nucleotides different between HLA-A*25:01 and HLA-A*26:01 as shown in (a).

Table 2 Sequence depth and proportion of each allele around the 3′ part of exon 2 region in HLA-A gene

In PHLAT, 246 (60.7%) of 405 miscalls were across different serological types when predicting HLA-A genotypes (Figures 3a and b and Supplementary Table 13). Notably, PHLAT erroneously assigned various types of allele to either HLA-A*01:22 or HLA-A*01:81 (that is, the error rates of HLA-A*02:01 to HLA-A*01:22, HLA-A*02:01 to HLA-A*01:81, HLA-A*02:06 to HLA-A*01:81, HLA-A*26:01 to HLA-A*01:22 and HLA-A*68:01 to HLA-A*01:22 were 6.1%, 9.5%, 26.3%, 16.3% and 17.0% with P-values of 4.0 × 10−8, 3.8 × 10−15, 0.0010, 0.0057 and 0.0027, respectively), despite relatively low sequence similarities between HLA-A*01:22 or HLA-A*01:81 and these alleles (Supplementary Figure 2a). These errors were PHLAT platform-specific errors, and not found in Optitype or Polysolver (Supplementary Tables 7, 10). The second frequent miscalls in PHLAT were between HLA-A*33:03 and HLA-A*31:01, which showed high sequence similarities (error rate of 25.4%, P=1.0 × 10−5; Supplementary Figure 2b). Proportion of miscalls across different serological types were 100 (34.7%) of 288 for HLA-B and 33 (23.6%) of 140 for HLA-C. A few specific patterns were observed such as HLA-B*40:01 to HLA-B*07:02 (error rate of 10.3%, P=3.6 × 10−4) and HLA-C*08 to HLA-C*05:01 (error rate of 6.9%, P=0.0069), although HLA-C*08 to HLA-C*05:01 miscalls were also found in Optitype and Polysolver as described above (Figures 3c–f and Supplementary Tables 14, 15).

Discussion

In this study, we evaluated seven algorithms for HLA typing using whole-exome data: Optitype,33 Polysolver,34 PHLAT,35 HLAreporter,36 HLAforest,37 HLAminer38 and seq2HLA,39 using 12 clinical samples, and further evaluated three of them, Optitype, Polysolver and PHLAT, which displayed the highest accuracy (>90%) in the preliminary screening, using 961 samples from the 1000 Genomes Project database.

One of the common platform-specific miscalls is from HLA-A*25:01 to HLA-A*26:01 that was found in Polysolver. This miscall is caused by the lacking of HLA-A*25:01 reads during the extraction of HLA reads using the human hg19 reference genome, which has a single HLA allele of HLA-A*03:01 at HLA-A locus. On the other hand, Optitype used reference sequences that are a collection of all the HLA allele sequences when extracting HLA reads. Although extraction of HLA reads is included in the script of Polysolver, we may be able to improve the accuracy of Polysolver if we apply HLA read extraction using a collection of all the HLA allele sequences.

Among the software we tested, Optitype showed the highest performance with 97.2% accuracy at the second field resolution. However, this value was still lower than the accuracy in HLA typing using PCR-based next-generation sequencing methods, which is nearly 100% accuracy.25, 28 One of the causes of lower accuracy in exome-based HLA typing may be because of the low number of HLA reads. The lower accuracy was found in the samples with lower number of HLA reads (accuracy of 82.0%, 95.4% and 98.0% in the groups with HLA reads of <1000 (N=25), 1000–2000 (N=193) and >2000 (N=743), respectively). Although this explains only a part of the causes of miscalls in Optitype, input number of reads into the analysis is critical for accurate HLA genotype prediction.

In this study, we analyzed the potential problems that contributed to prediction errors in computational analysis, including the mapping process to the reference genome sequence with a single HLA allele. Currently, Optitype showed the highest performance; however, certain patterns of miscalls still occurred and needed to be addressed. In order to apply these techniques into clinical settings, further studies and validations are required to solve these problems to improve the accuracy of HLA genotype estimation.