HLA-VBSeq v2: improved HLA calling accuracy with full-length Japanese class-I panel

Wang, Yen-Yen; Mimori, Takahiro; Khor, Seik-Soon; Gervais, Olivier; Kawai, Yosuke; Hitomi, Yuki; Tokunaga, Katsushi; Nagasaki, Masao

doi:10.1038/s41439-019-0061-y

Download PDF

Software Report
Open access
Published: 19 June 2019

HLA-VBSeq v2: improved HLA calling accuracy with full-length Japanese class-I panel

Yen-Yen Wang^1,2,
Takahiro Mimori²,
Seik-Soon Khor ORCID: orcid.org/0000-0002-6809-4731^3,4,
Olivier Gervais^2,5,6,
Yosuke Kawai^3,4,
Yuki Hitomi ORCID: orcid.org/0000-0002-6713-7875^3,7,
Katsushi Tokunaga^3,4 &
…
Masao Nagasaki^1,2,5,6

Human Genome Variation volume 6, Article number: 29 (2019) Cite this article

3481 Accesses
9 Citations
Metrics details

Subjects

Abstract

HLA-VBSeq is an HLA calling tool developed to infer the most likely HLA types from high-throughput sequencing data. However, there is still room for improvement in specific genetic groups because of the diversity of HLA alleles in human populations. Here, we present HLA-VBSeq v2, a software application that makes use of a new Japanese HLA reference panel to enhance calling accuracy for Japanese HLA class-I genes. Our analysis showed significant improvements in calling accuracy in all HLA regions, with prediction accuracies achieving over 99.0, 97.8, and 99.8% in HLA-A, B and C, respectively.

The evaluation of Bcftools mpileup and GATK HaplotypeCaller for variant calling in non-human species

Article Open access 05 July 2022

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

Article Open access 11 April 2022

Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment

Article Open access 13 December 2022

Introduction

The human leukocyte antigen (HLA) system is located within the 6p21.3 region on chromosome 6 and encodes the major histocompatibility complex (MHC) proteins that are essential to the immune system. HLA genes are associated with a wide range of disorders, including cancer, organ transplants, and autoimmune and infectious diseases^1,2,3, and the current version of the IPD-IMGT/HLA Database, which indexes HLA sequences, contains more than 20,000 alleles of HLA subtypes, illustrating the extreme polymorphism of HLA alleles⁴. The accurate inference of HLA genotypes from whole-genome sequencing (WGS) data is therefore expected to significantly contribute not only to disease studies but also to fields such as pharmacogenomics. However, inferring HLA genotypes (HLA calling) remains challenging due to the substantial sequence similarity within the cluster and the exceptionally high variability of the loci.

In the past, we proposed a statistical method that infers the most probable HLA types from high-throughput sequencing data for each individual based on the optimal alignment of reads to the reference HLA sequences ⁵ and developed a computational tool called HLA-VBSeq⁶. While HLA-VBSeq uses the reference sequences of the IPD-IMGT/HLA Database for the alignment of read sequences, it is still possible to improve the prediction accuracy in specific populations since the allele distribution and haplotype frequencies of HLA genes largely depend on geographic location and genetic group^7,8. In this study, we present an updated version, HLA-VBSeq v2, which adds a Japanese reference panel of class-I HLA genes (the ToMMo HLA panel⁹) to the IPD-IMGT/HLA Database to form a new reference panel.

To evaluate the performance of HLA-VBSeq v2 with the new reference panel, we compared the accuracy of HLA-VBSeq and HLA-VBSeq v2 by using WGS data in the Japanese population. Additionally, we compared the performance of HLA-VBSeq v2 with that of another HLA calling tool, HLA*PRG:LA, which calls for the alleles at G group resolution (a group of HLA alleles that have identical nucleotide sequences across the exons encoding the peptide-binding domains)¹⁰.

We also provide an exhaustive list of the ambiguous samples observed in our validation datasets. Ambiguous HLA alleles occurred because of the typing strategies. The PCR-SSOP-Luminex method only uses exons 2 and 3 for HLA class-I analysis (the HLA class-I region contains 8 exons), and this may result in more than one possible typing result within a given region. By using the WGS data, for which the HLA region is fully phased, HLA-VBSeq v2 displayed high performance in ambiguous samples; we therefore suggest that HLA-VBSeq v2 can be used to validate the typing results obtained with the PCR-SSOP-Luminex method.

The ToMMo HLA panel is a full-length Japanese reference panel of class-I HLA genes constructed from 208 samples at the Tohoku Medical Megabank Organization⁹. The ToMMo HLA panel consists of 139 alleles that were extended from knowns in the IPD-IMGT/HLA Database and includes 40 novel alleles compared with the closest subtypes in the IPD-IMGT/HLA Database. We added the ToMMo HLA panel to the IPD-IMGT3.31.0 reference sequences to create a new set of sequences to be used as a custom reference panel.

We used two independent datasets to evaluate and validate the prediction accuracy of HLA-VBSeq v2: the Tokyo Healthy Controls (THC) dataset and the Stevens-Johnson syndrome (SJS) dataset, using WGS data in both cases, i.e. with the HLA region fully phased. The THC dataset consists of 418 healthy Japanese individuals from the Tokyo area, and the SJS dataset consists of 117 Japanese individuals with cold medicine-related Stevens-Johnson syndrome with severe ocular complications.

To evaluate the performance of each calling tool, we calculated the prediction accuracy, defined as the percentage of inferences that are correct relative to the ‘true’ HLA types (the expected HLA types). In this study, we defined the ‘true’ HLA types by using the PCR-SSOP Luminex method and next-generation sequencing (NGS)-based HLA typing, which we implemented in both datasets. NGS-based HLA typing was performed with a NXType™ Class-I NGS HLA typing kit and the AllType™ NGS 11-loci Amplification Kit (Thermo Fisher Scientific, Waltham, MA, USA), which phased the full genes. The PCR-SSOP-Luminex method is a high-resolution genotyping method that combines polymerase chain reaction (PCR) and sequence-specific oligonucleotide probe (SSOP) protocols with the Luminex 100 xMAP technology at the HLA-A, B, and C loci, amplified by PCR on only exon 2 and exon 3¹¹.

In this study, if PCR-SSOP-Luminex and NGS-based HLA typing gave the same result, it was considered to be the ‘true’ HLA types. By contrast, when samples displayed inconsistent typing results (‘ambiguous’ typing results), they were reexamined by Sanger sequencing-based typing (SBT). Since in all cases SBT gave the same result as NGS-based HLA typing, NGS-based HLA typing was considered to be the most reliable method to determine the ‘true’ HLA types.

In our analysis, we assessed the prediction accuracy of Japanese HLA calling from WGS data at the 2-field resolution, which specifies the amino acid sequence of the encoded protein. We evaluated two HLA alleles (either heterozygous or homozygous) in three HLA regions (HLA-A, B and C) or six HLA alleles in total for all individuals. Furthermore, we compared the performance of HLA-VBSeq v2 with that of HLA*PRG:LA, another HLA calling software application that infers alleles at G group resolution (the group of HLA alleles that have identical nucleotide sequences across the exon encoding the peptide-binding domains and are denoted by G at the 3-field resolution., e.g., A*01:01:01 G)¹⁰. HLA*PRG:LA reports a group of alleles for each gene at the 2-field resolution, but here we only used the first allele as the calling result.

The results obtained using HLA-VBSeq v2 show significant improvements in prediction accuracy in all HLA regions after including the ToMMo HLA panel in both the SJS and THC datasets (Table 1). Regarding the performance of HLA-VBSeq v2 compared to HLA*PRG:LA, though both programs achieved prediction accuracies of over 97.8%, HLA-VBSeq v2 displayed slightly better prediction accuracies for the HLA-A and HLA-C regions (99.0 and 99.8% vs. 98.1 and 98.4%, respectively) but lower performance for HLA-B genes (97.8 vs. 99.6%). HLA-VBSeq v2 displayed slightly better prediction accuracies for the HLA-A and -C regions in the THC and SJS datasets but lower performance for HLA-B genes. Moreover, in the SJS dataset, HLA-VBSeq v2 exhibited 100% prediction accuracy in the HLA-C region (Table 1).

Table 1 Comparison of the prediction accuracy between the software programs studied for each dataset and HLA gene region

Full size table

This study shows that including the ToMMo HLA panel substantially improves the calling accuracy of HLA class-I genes from WGS data in the Japanese population. In Table 2, we summarized the inference results obtained with HLA-VBSeq and HLA-VBSeq v2 that displayed inconsistencies with the ‘true’ genotype. We observed that the performance improved significantly with HLA-VBSeq v2 in some alleles, such as B*40:06, B*54:01, B*55:02 and C*14:03. For instance, HLA-VBSeq failed to correctly call C*14:03 (it was identified as C*14:02 in a high number of cases) in both datasets. The reference panel of HLA-VBSeq consists of reference sequences of C*14:02 and C*14:03 but not full genomic sequences (including exons, full introns and regulatory regions). By contrast, HLA-VBSeq v2 uses the ToMMo HLA panel, which contains the full-length reference sequences of C*14:02 and C*14:03 for the Japanese population, and therefore identified C*14:03 correctly. This indicates that full genomic sequences are highly informative in HLA calling and that using HLA-VBSeq v2 leads to more accurate calling results in the Japanese population.

Table 2 Details of the calling results inconsistent with the “true” HLA types. “fail” indicates that the program failed to reach an appropriate calling result

Full size table

Ambiguous allele combinations occurred in all HLA loci; a possible reason for this may lie in the presence of heterozygous sequences, i.e., the presence of more than one possible pair of alleles within the region analyzed or the existence of a polymorphism outside of the region analyzed¹². Because the PCR-SSOP-Luminex method only covers exons 2 and 3 for HLA class-I analysis (there are 8 exons in HLA-A, B and C regions)¹¹, we also observed some ambiguous samples in our datasets (Table 3).

Table 3 Typing results between the Luminex method and NGS-based HLA typing for all inconsistent samples. Bold font corresponds to the typing results from NGS-based HLA typing

Full size table

Table 3 shows that HLA-VBSeq v2 provided outstanding performance in ambiguities in the Japanese population. For instance, since C*04:01 and C*04:82 have identical nucleotide sequences in exons 2 and 3, it is impossible to distinguish them only by analyzing exon-2 and -3 sequences, leading to an ambiguous result. Because the ToMMo HLA panel includes the full-length reference sequences of C*04:82, HLA-VBSeq v2 distinguished C*04:01 from C*04:82 and made correct calls.

Table 3 also shows that in most instances, HLA*PRG:LA and Luminex inferred the same allele type when dealing with ambiguous cases. Similarly, NGS-based HLA calling and HLA-VBSeq v2 reached the same result in a high number of cases. The reason why HLA*PRG:LA and Luminex tend to provide the same calling results likely lies in the design of HLA*PRG:LA to call for the alleles at G group resolution¹⁰, which means that the calling methods focus on exons 2 and 3. By contrast, HLA-VBSeq v2 is designed for 8-digit resolution and includes the full-length HLA sequences for the analysis, making it more likely that the results will coincide with those from NGS-based HLA typing (“true” HLA types). Because of these differences, the performance of HLA calling in ambiguous cases varies greatly depending on the software application used. Given the high performance of HLA-VBSeq v2 in ambiguities, we suggest reexamining PCR-SSOP-Luminex typing results with HLA-VBSeq v2 to obtain more reliable results regarding the correct HLA type in the Japanese population.

In conclusion, the addition of a reference panel for the Japanese population was highly effective for improving the calling accuracy of HLA class-I genes from WGS data in the Japanese population. This solution is promising for the inference of HLA class-II genes and other highly diverse regions, e.g., killer-cell immunoglobulin-like receptors (KIRs). Additionally, this study suggests that the ToMMo HLA panel is a valuable resource whose use is not limited to HLA-VBSeq v2 but could be expanded to other HLA calling tools.

Software availability

HLA-VBSeq v2 is available from the following URL: http://nagasakilab.csml.org/hla/.

References

Shukla, S. A. et al. Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. Nat Biotechnol 33, 1152–1158 (2015).
Article CAS Google Scholar
Flomenberg, N. et al. Impact of HLA class I and class II high-resolution matching on outcomes of unrelated donor bone marrow transplantation: HLA-C mismatching is associated with a strong adverse effect on transplantation outcome. Blood 104, 1923–1930 (2004).
Article CAS Google Scholar
Shiina, T., Hosomichi, K., Inoko, H. & Kulski, J. K. The HLA genomic loci map: expression, interaction, diversity and disease. J Hum Genet 54, 15–39 (2009).
Article CAS Google Scholar
Robinson, J. et al. TheIPD and IPD-IMGT/HLADatabase: allele variant databases. Nucleic Acids Res 43, D423–D431 (2015).
Article CAS Google Scholar
Nariai, N., Hirose, O., Kojima, K. & Nagasaki, M. TIGAR: transcript isoform aboundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference. Bioinformatics 29, 2292–2299 (2013).
Article CAS Google Scholar
Nariai, N. et al. HLA-VBSeq: accurate HLA typing at full resolution from whole-genome sequencing data. BMC Genomics 16(Suppl 2), S7 (2015).
Article Google Scholar
Gourraud, P. A. et al. HLA diversity in the 1000 genomes dataset. PloS One 9, e97282 (2014).
Article Google Scholar
Pappas, D. J., Tomich, A., Garnier, F., Marry, E. & Gourraud, P. A. Comparison of high-resolution human leukocyte antigen haplotype frequencies in different ethnic groups: Consequences of sampling fluctuation and haplotype frequency distribution tail truncation. Hum Immunol 76, 374–380 (2015).
Article Google Scholar
Mimori, T. et al. Construction of full-length Japanese reference panel of class I HLA genes with single-molecule, real-time sequencing. Pharmacogenomics J. 19, 136–146 (2019).
Article Google Scholar
Dilthey, A. T. et al. High-accuracy HLA type inference from whole-genome sequencing data using population reference graphs. PLoS Comput Biol 12, e1005151 (2016).
Article Google Scholar
Itoh, Y. et al. High-throughput DNA typing of HLA-A, -B, -C, and -DRB1 loci by a PCR-SSOP-Luminex method in the Japanese population. Immunogenetics 57, 717–729 (2005).
Article CAS Google Scholar
Adams, S. D. et al. Ambiguous allele combinations in HLA Class I and Class II sequence-based typing: when precise nucleotide sequencing leads to imprecise allele identification. J Transl Med 2, 30 (2004).
Article Google Scholar

Download references

Acknowledgements

This research was supported by the Japan Agency for Medical Research and Development (AMED) under grant number JP19km0405205. We would also like to thank our technical staff, Sachiyo Sugimoto and Yayoi Sekiya, for the performance evaluation of HLA-VBSeq v2.

Author information

Authors and Affiliations

Graduate School of Information Sciences, Tohoku University, Sendai, Miyagi, Japan
Yen-Yen Wang & Masao Nagasaki
Tohoku Medical Megabank Organization, Tohoku University, Sendai, Miyagi, Japan
Yen-Yen Wang, Takahiro Mimori, Olivier Gervais & Masao Nagasaki
The University of Tokyo, Bunkyo-ku, Tokyo, Japan
Seik-Soon Khor, Yosuke Kawai, Yuki Hitomi & Katsushi Tokunaga
Toyama Genome Medical Science Project, National Center for Global Health and Medicine, Shinjuku-ku, Tokyo, Japan
Seik-Soon Khor, Yosuke Kawai & Katsushi Tokunaga
Graduate School of Medicine, Tohoku University, Sendai, Miyagi, Japan
Olivier Gervais & Masao Nagasaki
Center for the Promotion of Interdisciplinary Education and Research, Kyoto University, Sakyo-ku, Kyoto, Japan
Olivier Gervais & Masao Nagasaki
Department of Microbiology, Hoshi University School of Pharmacy and Pharmaceutical Sciences, Shinagawa-ku, Tokyo, Japan
Yuki Hitomi

Authors

Yen-Yen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Takahiro Mimori
View author publications
You can also search for this author in PubMed Google Scholar
Seik-Soon Khor
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Gervais
View author publications
You can also search for this author in PubMed Google Scholar
Yosuke Kawai
View author publications
You can also search for this author in PubMed Google Scholar
Yuki Hitomi
View author publications
You can also search for this author in PubMed Google Scholar
Katsushi Tokunaga
View author publications
You can also search for this author in PubMed Google Scholar
Masao Nagasaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masao Nagasaki.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, YY., Mimori, T., Khor, SS. et al. HLA-VBSeq v2: improved HLA calling accuracy with full-length Japanese class-I panel. Hum Genome Var 6, 29 (2019). https://doi.org/10.1038/s41439-019-0061-y

Download citation

Received: 25 February 2019
Revised: 26 May 2019
Accepted: 27 May 2019
Published: 19 June 2019
DOI: https://doi.org/10.1038/s41439-019-0061-y

This article is cited by

CRISPR-based targeted haplotype-resolved assembly of a megabase region
- Taotao Li
- Duo Du
- Yun Liu
Nature Communications (2023)
Neoadjuvant PD-L1 plus CTLA-4 blockade in patients with cisplatin-ineligible operable high-risk urothelial carcinoma
- Jianjun Gao
- Neema Navai
- Padmanee Sharma
Nature Medicine (2020)