Article | Published:

Cross-ethnicity tagging SNPs for HLA alleles associated with adverse drug reaction


Reduction of adverse drug reaction (ADR) incidence through screening of predisposing human leucocyte antigen (HLA) alleles is a promising approach for many widely used drugs. However, application of these associations has been limited by the cost burden of HLA genotyping. Use of single nucleotide polymorphisms (SNPs) that can approximate (‘tag’) HLA alleles of interest has been proposed as a cost-effective and simple alternative to conventional genotyping. However, most reported SNP tags have not been validated and there is concern regarding clinical utility of this approach due to tagging inconsistency across different populations. We assess the ability of 67 previously reported and 378 novel tagging SNPs, identified here in 5 HLA reference panels, to tag 15 ADR-associated HLA alleles in a panel of 955 ethnically diverse samples. Tags for 8 HLA alleles of interest were identified with 100% sensitivity and >95% specificity. These SNPs may act as a reliable genotyping approach for the routine screening of patients, without the need to account for patient ethnicity.


Adverse drug reactions (ADRs) are a common and sometimes unpredictable effect of treatment with a wide range of medications [1]. ADRs are a leading cause of hospital admissions and deaths in the US, with an estimated 2,000,000 hospitalisations and 100,000 deaths per year [2]. Severe ADRs may result in withdrawal of a drug from the market, despite clinical utility for the majority of patients [1, 3]. Precision medicine, whereby drug exposure is guided by genetic make-up, is a promising approach for addressing ADRs [1]. Most prominently, class I and class II human leucocyte antigen (HLA) alleles have been linked with idiosyncratic ADRs [1]. Table 1 details 15 HLA alleles annotated in the PharmGKB database with replicated associations with ADRs [4]. By stratifying patients using the presence of HLA biomarkers, it is possible to reduce the occurrence of ADRs in clinical practice [1]. Among the reported HLA–ADR associations, 4 alleles are currently annotated by the US Food and Drug Administration (FDA) as pharmacogenomic biomarkers: HLA-A*31:01, HLA-B*15:02, HLA-B*57:01 and HLA-DRB1*07:01 (as part of the DRB1*07:01-DQA1*02:01 haplotype) [5].

Table 1 Replicated HLA–ADR associations in the PharmGKB database

Despite the high negative predictive value of reported HLA–ADR associations (accuracy of prediction that individuals negative for an allele will not experience an ADR), most have low positive predictive value (most individuals positive for the predisposing allele will not experience an ADR). Only HLA-B*57:01 screening before abacavir therapy is commonplace, with a negative predictive value of 97%, and relatively high positive predictive value (~50%) [6] and risk allele frequency (~5%) [7]. For other ADRs, the uptake of screening has been limited due to the technical and economic barriers associated with conventional HLA genotyping methods [1, 8, 9]. Routine HLA typing is typically performed in specialised laboratories using high-resolution polymerase chain reaction (PCR)-based sequence-specific priming (SSP), sequenced base typing (SBT), micro bead hybridisation approaches [10] or, more recently, next-generation sequencing approaches [11]. These technologies are time consuming and expensive, and as hundreds [9] or thousands [8] of individuals may need to be screened to prevent a single ADR, the cost burden of HLA genotyping means that screening is commonly not cost-effective [12].

One strategy that has been proposed to reduce the cost of HLA screening is single nucleotide polymorphism (SNP) tagging of HLA variants [13]. Due to strong linkage disequilibrium (LD) between HLA variants and SNPs in the major histocompatibility complex (MHC) region, SNPs may act as a highly accurate marker, a ‘tag’, for the inference of HLA genotype [14]. SNP genotyping allows high-throughput screening of samples and is a cost-effective alternative to traditional typing for large-scale research and screening programs of auto-immune disease HLA risk alleles [15, 16]. Combined with rapid point-of-care SNP genotyping devices [17], tag SNPs may act as a rapid and cost-effective strategy for HLA typing.

A comprehensive set of SNP tags for HLA alleles was first reported by de Bakker et al. in 2006 [14]. These tags have not been consistently validated in subsequent verification studies [18]. The most prominently tested association has been the rs2395029 SNP in the HCP5 gene as a proxy for HLA-B*57:01 genotyping. While this SNP was found to have complete LD with HLA-B*57:01 in Mexican [19] and Argentinian [20] populations, as well as a small set of ethnically diverse samples from the US National Bone Marrow Program [18], a reduced tagging sensitivity was observed in both Italians [21] and a large panel from the US [22] raising concerns for the utility of this tag in certain populations. More recently, SNP tags have been identified as part of the construction of large reference panels for HLA and SNP imputation [23,24,25,26]. These tags have not yet been externally validated, nor tested for performance outside of the ethnicity in which they were discovered.

The inability of proposed SNP tags to function across different populations is a key barrier to the clinical relevance of this genotyping approach, mandating multiple tags for each allele to account for patient background [27]. Additionally, misclassification of patient background in a clinical context is not uncommon (~5%) [28] and the use of genetic ancestry to guide the use of HLA biomarkers is a topic of debate [29], further complicating validation and application of proposed tags.

To identify tag SNPs that are appropriate for HLA–ADR stratification across all genetic backgrounds, we evaluate both known and novel tag SNPs to identify tags with 100% sensitivity and >95% specificity for 15 ADR-associated HLA alleles across a multi-ethnic dataset. A total of 378 candidate SNPs are first identified using 5 large European and Asian HLA reference panels. These novel tags, along with 67 tag SNPs identified from previous literature, are validated in a panel of 955 ethnically diverse samples from the 1000 Genomes dataset. The resulting 45 validated tags for 5 ADR-associated HLA alleles may be a reliable alternative to high-resolution HLA genotyping for the routine ADR screening of patients at low cost, without the need to discern genetic background.


Figure 1 provides an overview of the tag discovery and validation performed in the study. We analysed 5 datasets of SNPs and gold-standard HLA typing from different ethnicities to discover novel tag SNPs with perfect sensitivity and high specificity. These novel SNPs, as well as tag SNPs reported in 5 previous studies, were evaluated in the 1000 Genomes dataset. A tag SNP was deemed to be ‘validated’ if it retained perfect sensitivity and greater than 95% specificity on this 1000 Genomes dataset. Tags were also evaluated based on their diagnostic predictive value and Cohen’s kappa.

Fig. 1

Overview of the study design

Data processing—5 discovery datasets

Data for the T1DGC (5193 samples), KOR-REF (413 samples), HAN-MHC (10,689 samples) and HAN-HK (184 samples) panels were provided with HLA variant annotation and could be directly analysed using the discovery method outlined below (Data Access information is detailed in the Supplementary Methods). The PAN-Asian dataset (270 samples) was constructed by combining the Chinese, Indian and Malaysian datasets from the Singapore Genome Variation Project and annotating with the 15 HLA variants of interest from the provided HLA genotyping calls using PLINK ( [30]. SNP rsIDs were annotated in the HAN-MHC dataset using IDs and positions extracted from the GRCh36 assembly BioMart tool ( Imputed SNPs, not available on the Illumina Human610-Quad genotyping array, were removed from the HAN-HK dataset to reduce tagging bias from 1000 Genome-based imputation. A summary of the final datasets is shown in Table 2. Standard quality control (QC) procedures have been previously performed on each dataset as part of their construction prior to public release, and further QC was not considered necessary for this application.

Table 2 Datasets used in the study for discovery and validation of SNP tags

Data processing—validation dataset

Samples from the 1000 Genomes project were used to validate discovered tags due to the ethnic diversity of these populations and the high resolution of SNP annotation in the extended MHC region, encompassing most variants annotated in the discovery cohorts. HLA genotyping information was available for samples from 4 of the 1000 Genomes combined ‘super-populations’; 171 African (AFR), 193 Ad-Mixed American (AMR), 267 East Asian (EAS) and 324 European (EUR) samples, representing a total of 955 samples. Bi-allelic variant calls for these samples were extracted from the 1000 Genomes dataset and annotated with HLA genotyping calls for the 15 HLA alleles of interest. Resolution of HLA genotyping differed by both population subgroups and allele. For some samples, resolution of genotyping was insufficient to exclude all other alleles (Supplementary Table 1). Of the potentially cross-reacting HLA alleles, none were annotated in the Common and Well-documented Alleles catalogue ( or observed in more than a single population in the Allele Frequency Database ( [31, 32]. Given this low incidence, resolution of genotyping was considered sufficient for assessment of tagging performance. A summary of the final validation dataset is shown in Table 2.

Tag SNP discovery

In each dataset, PLINK was used to estimate R2 and D′ for pairs of each SNP and HLA genotype call. Pairs with a D′ <1 were discarded, as these could not represent tags with 100% sensitivity. For the HAN-MHC dataset, an average HLA typing accuracy of 98% was observed with the variant calling pipeline [26]. To accommodate for this known inaccuracy, a D′ <0.98 threshold was instead used in this dataset (though 100% sensitivity and >95% specificity was still required in validation). Pairs with a minor allele frequency (MAF) for the tagging variant less than the frequency of the target HLA allele were discarded, as these tags represented 100% specificity tags with imperfect sensitivity. Tagging specificity for the remaining calls was then estimated as:

$${\rm Specificity} \approx 1 - \frac{{{\rm AF}_{{\rm SNP}} - {\rm AF}_{{\rm HLA}}}}{{1 - {\rm AF}_{{\rm HLA}}}},$$

where AFSNP is the MAF of the SNP and AFHLA is the target HLA frequency.

SNPs with an observed specificity of >95% for a given HLA allele were considered effective tagging variants, in line with previous assessment of tag clinical utility [23].

Reported Tag SNPs

67 unique tagging SNPs for the 15 HLA alleles highlighted in this study were identified in 5 studies (Table 3, Supplementary Table 2). Reported tags from the HAN-HK dataset were not included, as these were derived using imputation from the 1000 Genomes dataset [33]. Reported tags from the Pan-Asian dataset were not included, as this dataset was analysed for tag SNP discovery in this study on the combined Pan-Asian dataset. Reported tagging SNPs from the HAN-MHC dataset were included, as the reported tagging variants were distinct from those identified in this dataset during tag discovery [26].

Table 3 Reported tagging SNPs for ADR-associated alleles of interest

Tag validation method

Performance of reported and discovered tag SNPs was assessed in the 1000 Genomes dataset, with validation requiring 100% sensitivity and >95% specificity. SNP tags which passed validation in this dataset were then reassessed separately for each of the 1000 Genomes ethnic super-populations (AFR, AMR, EAS and EUR). True positive (TP), true negative (TN), false positive (FP) and false negative (FN) haplotype counts for each analysis were estimated using population sample size and PLINK estimations of haplotype frequency. Sensitivity of tagging was calculated as TP/(TP + FN) and specificity of tagging calculated as TN/(FP + TN).

Estimation of clinical performance of tag SNPs

Clinical characteristics of best performing tagging SNPs were assessed using the 1000 Genomes dataset. In contrast to tag validation, where we assess the performance of tagging SNPs on estimated haplotype counts, statistics here were assessed using observed SNP and HLA genotypes, in line with the predicted clinical application of this approach [23]. Sensitivity and specificity of tagging were calculated as described above. Positive predictive value was calculated as TP/(TP + FP). Negative predictive value was calculated as TN/(TN + FN). Cohen’s kappa coefficient [34], a measure of agreement between tagging and gold-standard typing, was also calculated for each tagging variant.

Tag SNP availability of genotyping arrays

Availability of SNPs on commercially available genotyping arrays was determined using the NIH LDlink tool ( [35].

Data visualisation

HLA allele frequency and sensitivity and specificity of highest performing tags were visualised using GraphPad Prism for Windows (Version 7.02) (GraphPad Software Inc., La Jolla, CA, USA).

Statistical analysis

Statistical analyses of sensitivity, specificity, positive predictive and negative predictive values were performed using MedCalc for Windows, version 15.0 (MedCalc Software, Ostend, Belgium). Kappa values were calculated using GraphPad QuickCalcs (GraphPad Software Inc., La Jolla, CA, USA).


Frequency of ADR-associated alleles in the discovery and validation datasets

Given the frequency of HLA alleles is known to vary across different ethnic populations, we initially examined the frequency of relevant alleles to determine whether tag SNPs were likely to be found in each dataset. HLA-A, HLA-C and HLA-DRB1 alleles of interest were observed in all datasets, with the exception of the HAN-HK dataset where only HLA-A and HLA-B were genotyped. HLA-B*15:11 was not observed in the T1DGC or Pan-Asian datasets, and HLA-B*59:01 was not observed in the T1DGC, Pan-Asian or HAN-HK datasets. The dataset frequency for the 15 alleles of interest is shown in Fig. 2, with raw counts detailed in Supplementary Table 3. As some of these datasets have been constructed as part of large-scale studies of disease, these frequencies may not be representative of true population frequency. Frequency of multiple HLA-B variants of interest were below 1% in multiple populations, suggesting that these variants may be difficult to tag using polymorphisms selected for inclusion on commercial genotyping platforms, which are designed to examine variants with an MAF >1% [36].

Fig. 2

Frequency of HLA target alleles in the 5 discovery populations and the 1000 Genomes validation population. Frequencies are only shown for alleles of interest, and as such do not add to 100%

Discovery of novel tagging SNPs

Given the large collection of HLA reference panels available in this study, we searched for novel tag SNPs within each of the 5 HLA reference panels by evaluating each SNP and haplotype pair. A total of 378 unique tagging SNPs with 100% sensitivity (98% in the HAN-MHC dataset) and >95% specificity were identified across the 5 discovery panels (Table 4). Tags were identified for 13/15 alleles of interest, with no tags identified for HLA-B*35:01 or HLA-B*59:01. These tags were pooled with 67 additional tags from previous reports (Supplementary Table 2), representing a total of 445 prospective tagging SNPs.

Table 4 Tags identified in this study, previously reported and validated in this study for 15 ADR-associated HLA alleles

Validation of tagging SNPs

To determine which of the 445 tag SNPs were suitable across ethnic background, we evaluated the performance of each in the ethnically diverse 1000 Genomes datasets, used here as a validation cohort. Of the 445 tagging SNPs, 45 had 100% sensitivity and >95% specificity when considering all individuals in the 1000 Genomes data, including 36 novel tagging variants (Table 4, Supplementary Table 4). These validated tag SNPs accounted for 8 of the 15 alleles of interest. Of the 67 previously reported tags, only 8 of these variants validated in the combined ethnicity cohort (Supplementary Table 4). A further 19 were effective tags in the super-populations most closely aligned with the population in which this tag was reported, suggesting these may be effective tags for restricted use within these specific populations (Supplementary Table 2).

Sensitivity and specificity of tagging from haplotype estimates for the best performing tags for each allele in the validation cohort are shown in Table 5. To clarify clinical utility, these tags were reassessed on the raw genotype calls to determine the predictive value of testing, and concordance of tagging with PCR genotyping (Table 6).

Table 5 Highest performing SNP tags in the validation cohort with 100% sensitivity and >95% specificity
Table 6 Clinical performance highest performing SNP tags

While a perfect sensitivity across all ethnic super-populations is guaranteed due to a 100% sensitivity in the combined populations, it is possible that a given tag may have reduced specificity in a particular ethnic super-population. To assess this, the performance of each of the top 8 tagging variants were reassessed separately in each super-population to remove any possible bias due to the weighting of ethnicity in the validation cohort (Supplementary table 5). Except for tagging of HLA-B*15:02 with rs10484555 in the African super-population (sensitivity 94.7%), observed sensitivity across all populations was above 95%. 7 of 8 SNPs had been previously assayed on high-throughput SNP arrays, suggesting that these may be markers that are readily testable (Table 5).

Discovery of tagging SNPs in the 1000 genomes dataset

For some alleles, tags with a higher sensitivity were observed in the 1000 Genomes dataset with variants not annotated in the discovery datasets. As these tags were found in the validation cohort, they cannot be validated in this work. However, they may be viable tags for assessment in future validation studies. The highest specificity tag with perfect sensitivity for each allele in the 1000 Genomes datasets is shown in Supplementary Table 6 and predicted clinical performance in Supplementary Table 7. While some tags fall below a 95% sensitivity threshold, these may still be useful for pre-screening samples prior to conventional genotyping.


Stratification of treatment using HLA biomarkers can reduce the occurrence of ADR in clinical practice. Application of this approach has been limited due to the cost burden of conventional HLA genotyping technologies. SNP genotyping, based on strong LD with HLA genotype, can improve the cost-effectiveness of patient screening. The inconsistency of previously reported SNP tags across multiple ethnic populations is a key barrier to the clinical relevance of this genotyping approach [18, 21].

We here identify tagging SNPs which function across multiple ethnicities. Using 5 large HLA reference panels to discover novel tag SNPs, and 67 previously reported tag SNPs, we identify 45 tag SNPs that can act as perfect sensitivity, high specificity (>95%) proxies for HLA genotyping across African, Ad Mixed American, East Asian and European populations from the 1000 Genomes dataset, including tags for 3 of the 4 HLA alleles annotated as pharmacogenomic biomarkers by the FDA. To the best of our knowledge, this is the most extensive cross-ethnicity validation of HLA tagging SNP function to date, and sets the standard for how prospective tagging SNPs should be validated in future studies.

Availability of the HLA tag SNPs reported here will allow for confident screening of HLA risk alleles in situations where existing technologies are not cost-effective, and, unlike existing tags, in populations where tagging has not previously been assessed. The 95% specificity threshold applied here has been previously considered as sufficient for clinical utility of tagging SNPs [23]; however, in situations where this specificity is insufficient, the 100% sensitivity of the reported variants may instead be used to confidently predict a reduced ADR risk for most negative individuals (>90%) prior to gold-standard typing, reducing the overall cost of population screening. Due to the high sensitivity of tagging, the SNPs reported here may be integrated, or, if already present, better interpreted, as part of SNP genotyping and sequencing panels for additional prognostic value within the established testing infrastructure at no additional expense.

Identification of tag SNPs using the 1000 Genomes datasets demonstrates the potential of reference panels genotyped from whole-genome sequencing for the identification of tagging SNPs for rare HLA variants. The high resolution of annotation (~169,000 variants across the extended MHC region) enabled the identification of a set of 100% sensitivity SNP tags for each ADR-associated allele, despite the small sample size (955 samples). These tags cannot be validated here. However, as high-throughput sequencing datasets with matched HLA haplotype data become more widely available, the quality of these tags, and others generated from sequencing data, may be better evaluated. A limitation of this approach is the possibility that conventional SNP genotyping hybridisation approaches may not be able to recapture variant calls extracted from whole-genome sequencing due to genomic context restrictions, impacting clinical implementation.

Given the cost burden of additional clinical consultation required to implement HLA genotyping results, rapid HLA testing can improve the cost-effectiveness of HLA-based patient stratification [9]. Recently reported point-of-care genetic testing can perform SNP genotyping, in ~1 hour [17]. In combination with the cross-ethnicity HLA tagging SNPs reported in this work, these tools provide a pathway towards immediate and cost-effective HLA genotyping to aid clinical decision-making.

The success of HLA imputation systems, predicting HLA genotype from a panel of MHC SNPs, suggests that a combination of multiple tagging SNPs may further improve the specificity of the tag SNP approach [37]. As the use of multiple tagging SNPs, mandating multiple testing reactions, reduces the advantage of SNP tagging over established PCR-SSP approaches, we have chosen to focus here on single SNP tags, which allow for higher throughput testing and more rapid translation to the point-of-care. As point-of-care tools grow more robust, the use of multiple SNP tags may be reassessed using the framework described in this work.


This study reports tag SNPs for 8 ADR-associated HLA alleles which have perfect sensitivity, high specificity and validate across a diverse set of ethnic cohorts. Use of these tag SNPs may represent a cost-effective alternative to existing HLA genotyping methods. The framework used should be considered as the minimum standard for reporting of tag SNPs in the future, given the high importance of cross-ethnicity performance of tagging variants. Confident and simple tagging of risk variants, without the need to account for patient ethnicity, has potential to improve the application of HLA biomarkers for the stratification of patient treatment and reduce ADR incidence.


  1. 1.

    Yip VL, Alfirevic A, Pirmohamed M. Genetics of immune-mediated adverse drug reactions: a comprehensive and clinical review. Clin Rev Allergy Immunol. 2015;48:165–75.

  2. 2.

    Lazarou J, Pomeranz BH, Corey PN. Incidence of adverse drug reactions in hospitalized patients: a meta-analysis of prospective studies. JAMA. 1998;279:1200–5.

  3. 3.

    Chen Z, Liew D, Kwan P. Effects of a HLA-B*15:02 screening policy on antiepileptic drug use and severe skin reactions. Neurology. 2014;83:2077–84.

  4. 4.

    Whirl-Carrillo M, McDonagh EM, Hebert JM, Gong L, Sangkuhl K, Thorn CF, et al. Pharmacogenomics knowledge for personalized medicine. Clin Pharmacol Ther. 2012;92:414–7.

  5. 5.

    US Food and Drug Administration. Genomics - Table of Pharmacogenomic Biomarkers in Drug Labeling.

  6. 6.

    Hughes DA, Vilar FJ, Ward CC, Alfirevic A, Park BK, Pirmohamed M. Cost-effectiveness analysis of HLA B*5701 genotyping in preventing abacavir hypersensitivity. Pharmacogenetics. 2004;14:335–42.

  7. 7.

    Young B, Squires K, Patel P, Dejesus E, Bellos N, Berger D, et al. First large, multicenter, open-label study utilizing HLA-B*5701 screening for abacavir hypersensitivity in North America. AIDS. 2008;22:1673–5.

  8. 8.

    Plumpton CO, Alfirevic A, Pirmohamed M, Hughes DA. Cost effectiveness analysis of HLA-B*58:01 genotyping prior to initiation of allopurinol for gout. Rheumatology. 2017;56:1729–39.

  9. 9.

    Chen Z, Liew D, Kwan P. Real-world efficiency of pharmacogenetic screening for carbamazepine-induced severe cutaneous adverse reactions. PLoS ONE. 2014;9:e96990.

  10. 10.

    Varney MD, Castley AS, Haimila K, Saavalainen P. Methods for diagnostic HLA typing in disease association and drug hypersensitivity. Methods Mol Biol. 2012;882:27–46.

  11. 11.

    Mayor NP, Robinson J, McWhinnie AJ, Ranade S, Eng K, Midwinter W, et al. HLA typing for the next generation. PLoS ONE. 2015;10:e0127153.

  12. 12.

    Plumpton CO, Roberts D, Pirmohamed M, Hughes DA. A systematic review of economic evaluations of pharmacogenetic testing for prevention of adverse drug reactions. Pharmacoeconomics. 2016;34:771–93.

  13. 13.

    Monsuur AJ, de Bakker PI, Zhernakova A, Pinto D, Verduijn W, Romanos J, et al. Effective detection of human leukocyte antigen risk alleles in celiac disease using tag single nucleotide polymorphisms. PLoS ONE. 2008;3:e2270.

  14. 14.

    de Bakker PI, McVean G, Sabeti PC, Miretti MM, Green T, Marchini J, et al. A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nat Genet. 2006;38:1166–72.

  15. 15.

    Barker JM, Triolo TM, Aly TA, Baschal EE, Babu SR, Kretowski A, et al. Two single nucleotide polymorphisms identify the highest-risk diabetes HLA genotype: potential for rapid screening. Diabetes. 2008;57:3152–5.

  16. 16.

    Koskinen L, Romanos J, Kaukinen K, Mustalahti K, Korponay-Szabo I, Barisani D, et al. Cost-effective HLA typing with tagging SNPs predicts celiac disease risk haplotypes in the Finnish, Hungarian, and Italian populations. Immunogenetics. 2009;61:247–56.

  17. 17.

    Vu CL, Chan J, Todaro M, Skafidas S, Kwan P. Point-of-care molecular diagnostic devices: an overview. Pharmacogenomics. 2015;16:1399–409.

  18. 18.

    He Y, Hoskins JM, Clark S, Campbell NH, Wagner K, Motsinger-Reif AA, et al. Accuracy of SNPs to predict risk of HLA alleles associated with drug-induced hypersensitivity events across racial groups. Pharmacogenomics. 2015;16:817–24.

  19. 19.

    Sanchez-Giron F, Villegas-Torres B, Jaramillo-Villafuerte K, Silva-Zolezzi I, Fernandez-Lopez JC, Jimenez-Sanchez G, et al. Association of the genetic marker for abacavir hypersensitivity HLA-B*5701 with HCP5rs2395029 in Mexican Mestizos. Pharmacogenomics. 2011;12:809–14.

  20. 20.

    Galvan CA, Elbarcha OC, Fernandez EJ, Beltramo DM, Soria NW. Rapid HCP5 single-nucleotide polymorphism genotyping: a simple allele-specific PCR method for prediction of hypersensitivity reaction to abacavir. Clin Chim Acta. 2011;412:1382–4.

  21. 21.

    Badulli C, Sestini R, Sbarsi I, Baroncelli M, Pizzochero C, Martinetti M, et al. Tag SNPs of the ancestral haplotype 57.1 do not substitute HLA-B*57:01 typing for eligibility to abacavir treatment in the Italian population. Pharmacogenomics. 2012;13:247–9.

  22. 22.

    Melis R, Lewis T, Millson A, Lyon E, McMillin GA, Slev PR, et al. Copy number variation and incomplete linkage disequilibrium interfere with the HCP5 genotyping assay for abacavir hypersensitivity. Genet Test Mol Biomarkers. 2012;16:1111–4.

  23. 23.

    Liu X, Sun J, Yu H, Chen H, Wang J, Zou H, et al. Tag SNPs for HLA-B alleles that are associated with drug response and disease risk in the Chinese Han population. Pharmacogenomics J. 2015;15:467–72.

  24. 24.

    Maekawa K, Nakamura R, Kaniwa N, Mizusawa S, Kitamoto A, Kitamoto T, et al. Development of a simple genotyping method for the HLA-A*31:01-tagging SNP in Japanese. Pharmacogenomics. 2015;16:1689–99.

  25. 25.

    Tohkin M, Kaniwa N, Saito Y, Sugiyama E, Kurose K, Nishikawa J, et al. A whole-genome association study of major determinants for allopurinol-related Stevens-Johnson syndrome and toxic epidermal necrolysis in Japanese patients. Pharmacogenomics J. 2013;13:60–9.

  26. 26.

    Zhou F, Cao H, Zuo X, Zhang T, Zhang X, Liu X, et al. Deep sequencing of the MHC region in the Chinese population contributes to studies of complex disease. Nat Genet. 2016;48:740–6.

  27. 27.

    Ghattaoraya GS, Dundar Y, Gonzalez-Galarza FF, Maia MH, Santos EJ, da Silva AL, et al. A web resource for mining HLA associations with adverse drug reactions: HLA-ADR. Database (Oxford). e002882, 2016;2016.

  28. 28.

    Saunders CL, Abel GA, El Turabi A, Ahmed F, Lyratzopoulos G. Accuracy of routinely recorded ethnic group information compared with self-reported ethnicity: evidence from the English Cancer Patient Experience survey. BMJ Open. 2013;3.

  29. 29.

    Payne PW. Ancestry-based pharmacogenomics, adverse reactions and carbamazepine: is the FDA warning correct? Pharmacogenomics J. 2014;14:473–80.

  30. 30.

    Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.

  31. 31.

    Mack SJ, Cano P, Hollenbach JA, He J, Hurley CK, Middleton D, et al. Common and well-documented HLA alleles: 2012 update to the CWD catalogue. Tissue Antigens. 2013;81:194–203.

  32. 32.

    Gonzalez-Galarza FF, Takeshita LY, Santos EJ, Kempson F, Maia MH, da Silva AL, et al. Allele frequency net 2015 update: new features for HLA epitopes, KIR and disease and HLA adverse drug reaction associations. Nucleic Acids Res. 2015;43 Database issue:D784–8.

  33. 33.

    Gui H, Kwok M, Baum L, Sham PC, Kwan P, Cherny SS. SNP-based HLA allele tagging, imputation and association with antiepileptic drug-induced cutaneous reactions in Hong Kong Han Chinese. Pharmacogenomics J. 2018;2:340–346

  34. 34.

    Berry KJ, Mielke PW Jr. A generalization of Cohen’s kappa agreement measure to interval measurement and multiple raters. Educ Psychol Meas. 1988;48:921–33.

  35. 35.

    Machiela MJ, Chanock SJ. LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics. 2015;31:3555–7.

  36. 36.

    Ha NT, Freytag S, Bickeboeller H. Coverage and efficiency in current SNP chips. Eur J Human Genet. 2014;22:1124–30.

  37. 37.

    Motyer A, Vukcevic D, Dilthey A, Donnelly P, McVean G, Leslie S. Practical use of methods for imputation of HLA alleles from SNP Genotype Data. bioRxiv. 2016: 091009.

  38. 38.

    McCormack M, Alfirevic A, Bourgeois S, Farrell JJ, Kasperaviciute D, Carrington M, et al. HLA-A*3101 and carbamazepine-induced hypersensitivity reactions in Europeans. N Engl J Med. 2011;364:1134–43.

  39. 39.

    Ozeki T, Mushiroda T, Yowang A, Takahashi A, Kubo M, Shirakata Y, et al. Genome-wide association study identifies HLA-A*3101 allele as a genetic risk factor for carbamazepine-induced cutaneous adverse drug reactions in Japanese population. Hum Mol Genet. 2011;20:1034–41.

  40. 40.

    Cristallo AF, Schroeder J, Citterio A, Santori G, Ferrioli GM, Rossi U, et al. A study of HLA class I and class II 4-digit allele level in Stevens-Johnson syndrome and toxic epidermal necrolysis. Int J Immunogenet. 2011;38:303–9.

  41. 41.

    Kang HR, Jee YK, Kim YS, Lee CH, Jung JW, Kim SH, et al. Positive and negative associations of HLA class I alleles with allopurinol-induced SCARs in Koreans. Pharmacogenet Genomics. 2011;21:303–7.

  42. 42.

    Zhang FR, Liu H, Irwanto A, Fu XA, Li Y, Yu GQ, et al. HLA-B*13:01 and the dapsone hypersensitivity syndrome. N Engl J Med. 2013;369:1620–8.

  43. 43.

    Chung WH, Hung SI, Hong HS, Hsih MS, Yang LC, Ho HC, et al. Medical genetics: a marker for Stevens-Johnson syndrome. Nature. 2004;428:486.

  44. 44.

    Kaniwa N, Saito Y, Aihara M, Matsunaga K, Tohkin M, Kurose K, et al. HLA-B*1511 is a risk factor for carbamazepine-induced Stevens-Johnson syndrome and toxic epidermal necrolysis in Japanese patients. Epilepsia. 2010;51:2461–5.

  45. 45.

    Cornejo Castro EM, Carr DF, Jorgensen AL, Alfirevic A, Pirmohamed M. HLA-allelotype associations with nevirapine-induced hypersensitivity reactions and hepatotoxicity: a systematic review of the literature and meta-analysis. Pharmacogenet Genomics. 2015;25:186–98.

  46. 46.

    Chen PL, Shih SR, Wang PW, Lin YC, Chu CC, Lin JH, et al. Genetic determinants of antithyroid drug-induced agranulocytosis by human leukocyte antigen genotyping and genome-wide association study. Nat Commun. 2015;6:7633.

  47. 47.

    Hung SI, Chung WH, Jee SH, Chen WC, Chang YT, Lee WR, et al. Genetic susceptibility to carbamazepine-induced cutaneous adverse drug reactions. Pharmacogenet Genomics. 2006;16:297–306.

  48. 48.

    Mallal S, Phillips E, Carosi G, Molina JM, Workman C, Tomazic J, et al. HLA-B*5701 screening for hypersensitivity to abacavir. N Engl J Med. 2008;358:568–79.

  49. 49.

    Kim SH, Kim M, Lee KW, Kim SH, Kang HR, Park HW, et al. HLA-B*5901 is strongly associated with methazolamide-induced Stevens-Johnson syndrome/toxic epidermal necrolysis. Pharmacogenomics. 2010;11:879–84.

  50. 50.

    Schaid DJ, Spraggs CF, McDonnell SK, Parham LR, Cox CJ, Ejlertsen B, et al. Prospective validation of HLA-DRB1*07:01 allele carriage as a predictive risk factor for lapatinib-induced liver injury. J Clin Oncol. 2014;32:2296–303.

  51. 51.

    Jia X, Han B, Onengut-Gumuscu S, Chen WM, Concannon PJ, Rich SS, et al. Imputing amino acid polymorphisms in human leukocyte antigens. PLoS ONE. 2013;8:e64683.

  52. 52.

    Pillai NE, Okada Y, Saw WY, Ong RT, Wang X, Tantoso E, et al. Predicting HLA alleles from high-resolution SNP data in three Southeast Asian populations. Hum Mol Genet. 2014;23:4443–51.

  53. 53.

    Kim K, Bang SY, Lee HS, Bae SC. Construction and application of a Korean reference panel for imputing classical alleles and amino acids of human leukocyte antigen genes. PLoS ONE. 2014;9:e112546.

  54. 54.

    Gourraud PA, Khankhanian P, Cereb N, Yang SY, Feolo M, Maiers M, et al. HLA diversity in the 1000 genomes dataset. PLoS ONE. 2014;9:e97282.

  55. 55.

    Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature. 2015;526:68–74.

Download references


We thank Adam Kowalczyk for his help in reviewing this manuscript.

Author information

Correspondence to Michael Erlichster.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

Supplementary Material

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark