Abstract
Fine-mapping to plausible causal variation may be more effective in multi-ancestry cohorts, particularly in the MHC, which has population-specific structure. To enable such studies, we constructed a large (n = 21,546) HLA reference panel spanning five global populations based on whole-genome sequences. Despite population-specific long-range haplotypes, we demonstrated accurate imputation at G-group resolution (94.2%, 93.7%, 97.8% and 93.7% in admixed African (AA), East Asian (EAS), European (EUR) and Latino (LAT) populations). Applying HLA imputation to genome-wide association study data for HIV-1 viral load in three populations (EUR, AA and LAT), we obviated effects of previously reported associations from population-specific HIV studies and discovered a novel association at position 156 in HLA-B. We pinpointed the MHC association to three amino acid positions (97, 67 and 156) marking three consecutive pockets (C, B and D) within the HLA-B peptide-binding groove, explaining 12.9% of trait variance.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
A genomic data archive from the Network for Pancreatic Organ donors with Diabetes
Scientific Data Open Access 26 May 2023
-
Human leukocyte antigen alleles associate with COVID-19 vaccine immunogenicity and risk of breakthrough infection
Nature Medicine Open Access 13 October 2022
-
HLA imputation and its application to genetic and molecular fine-mapping of the MHC region in autoimmune diseases
Seminars in Immunopathology Open Access 16 November 2021
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout





Data availability
All data for generating the figures presented in the manuscript are available at https://github.com/immunogenomics/HLA-TAPAS.
Code availability
HLA-TAPAS, https://github.com/immunogenomics/HLA-TAPAS; GATK version 3.6, https://software.broadinstitute.org/gatk/download/archive; HLA*PRG, https://github.com/AlexanderDilthey/MHC-PRG; HLA*LA, https://github.com/DiltheyLab/HLA-PRG-LA; PLINK version 1.90, https://www.cog-genomics.org/plink2; Beagle version 4.1, https://faculty.washington.edu/browning/beagle/b4_1.html; Hapl-o-Mat version 1.1, https://github.com/DKMS/Hapl-o-Mat/; BIGDAWG version 2.3.6, https://cran.r-project.org/web/packages/BIGDAWG/index.html.
Change history
02 November 2021
A Correction to this paper has been published: https://doi.org/10.1038/s41588-021-00979-9
References
International HIV Controllers Study et al. The major genetic determinants of HIV-1 control affect HLA class I peptide presentation. Science 330, 1551–1557 (2010).
Raychaudhuri, S. et al. Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis. Nat. Genet. 44, 291–296 (2012).
Evans, D. M. et al. Interaction between ERAP1 and HLA-B27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-B27 in disease susceptibility. Nat. Genet. 43, 761–767 (2011).
Snyder, A. et al. Genetic basis for clinical response to CTLA-4 blockade in melanoma. N. Engl. J. Med. 371, 2189–2199 (2014).
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Horton, R. et al. Gene map of the extended human MHC. Nat. Rev. Genet. 5, 889–899 (2004).
Gourraud, P.-A. et al. HLA diversity in the 1000 Genomes dataset. PLoS ONE 9, e97282 (2014).
Robinson, J. et al. IPD-IMGT/HLA Database. Nucleic Acids Res. 48, D948–D955 (2020).
Gonzalez-Galarza, F. F. et al. Allele frequency net database (AFND) 2020 update: gold-standard data classification, open access genotype data and new query tools. Nucleic Acids Res. 48, D783–D788 (2020).
Dilthey, A. T., Moutsianas, L., Leslie, S. & McVean, G. HLA*IMP—an integrated framework for imputing classical HLA alleles from SNP genotypes. Bioinformatics 27, 968–972 (2011).
Jia, X. et al. Imputing amino acid polymorphisms in human leukocyte antigens. PLoS ONE 8, e64683 (2013).
Zheng, X. et al. HIBAG—HLA genotype imputation with attribute bagging. Pharmacogenomics J. 14, 192–200 (2014).
Hu, X. et al. Additive and interaction effects at three amino acid positions in HLA-DQ and HLA-DR molecules drive type 1 diabetes risk. Nat. Genet. 47, 898–905 (2015).
McLaren, P. J. et al. Polymorphisms of large effect explain the majority of the host genetic contribution to variation of HIV-1 virus load. Proc. Natl Acad. Sci. USA 112, 14658–14663 (2015).
Tian, C. et al. Genome-wide association and HLA region fine-mapping studies identify susceptibility loci for multiple common infections. Nat. Commun. 8, 599 (2017).
Onengut-Gumuscu, S. et al. Type 1 diabetes risk in African-ancestry participants and utility of an ancestry-specific genetic risk score. Diabetes Care 42, 406–415 (2019).
HIV/AIDS (WHO, 2021); https://www.who.int/news-room/fact-sheets/detail/hiv-aids
McLaren, P. J. et al. Fine-mapping classical HLA variation associated with durable host control of HIV-1 infection in African Americans. Hum. Mol. Genet. 21, 4334–4347 (2012).
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).
Okada, Y. et al. Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nat. Commun. 9, 1631 (2018).
Mitt, M. et al. Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel. Eur. J. Hum. Genet. 25, 869–876 (2017).
Hirata, J. et al. Genetic and phenotypic landscape of the major histocompatibilty complex region in the Japanese population. Nat. Genet. 51, 470–480 (2019).
1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Nelis, M. et al. Genetic structure of Europeans: a view from the north-east. PLoS ONE 4, e5472 (2009).
Dilthey, A., Cox, C., Iqbal, Z., Nelson, M. R. & McVean, G. Improved genome inference in the MHC using a population reference graph. Nat. Genet. 47, 682–688 (2015).
Dilthey, A. T. et al. High-accuracy HLA type inference from whole-genome sequencing data using population reference graphs. PLoS Comput. Biol. 12, e1005151 (2016).
Dilthey, A. T. et al. HLA*LA-HLA typing from linearly projected graph alignments. Bioinformatics 35, 4394–4396 (2019).
Mellors, J. W. et al. Quantitation of HIV-1 RNA in plasma predicts outcome after seroconversion. Ann. Intern. Med. 122, 573–579 (1995).
Bartha, I. et al. Estimating the respective contributions of human and viral genetic variation to HIV control. PLoS Comput. Biol. 13, e1005339 (2017).
Blanco-Gelaz, M. A. et al. The amino acid at position 97 is involved in folding and surface expression of HLA-B27. Int. Immunol. 18, 211–220 (2006).
Stewart-Jones, G. B. E. et al. Structures of three HIV-1 HLA-B*5703-peptide complexes and identification of related HLAs potentially associated with long-term nonprogression. J. Immunol. 175, 2459–2468 (2005).
Archbold, J. K. et al. Natural micropolymorphism in human leukocyte antigens provides a basis for genetic control of antigen recognition. J. Exp. Med. 206, 209–219 (2009).
Kløverpris, H. N. et al. HIV control through a single nucleotide on the HLA-B locus. J. Virol. 86, 11493–11500 (2012).
Gaiha, G. D. et al. Structural topology defines protective CD8+ T cell epitopes in the HIV proteome. Science 364, 480–484 (2019).
Browning, B. L. & Browning, S. R. A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet. 88, 173–182 (2011).
Hill, A. V. et al. Common West African HLA antigens are associated with protection from severe malaria. Nature 352, 595–600 (1991).
Sanchez-Mazas, A. et al. The HLA-B landscape of Africa: signatures of pathogen-driven selection and molecular identification of candidate alleles to malaria protection. Mol. Ecol. 26, 6238–6252 (2017).
Maiers, M., Gragert, L. & Klitz, W. High-resolution HLA alleles and haplotypes in the United States population. Hum. Immunol. 68, 779–788 (2007).
Chen, J. J. et al. Hardy–Weinberg testing for HLA class II (DRB1, DQA1, DQB1, AND DPB1) loci in 26 human ethnic groups. Tissue Antigens 54, 533–542 (1999).
Tshabalala, M. et al. Human leukocyte antigen-A, B, C, DRB1, and DQB1 allele and haplotype frequencies in a subset of 237 donors in the South African Bone Marrow Registry. J. Immunol. Res. 2018, 2031571 (2018).
Hagenlocher, Y. et al. 6-Locus HLA allele and haplotype frequencies in a population of 1075 Russians from Karelia. Hum. Immunol. 80, 95–96 (2019).
Nothnagel, M., Fürst, R. & Rohde, K. Entropy as a measure for linkage disequilibrium over multilocus haplotype blocks. Hum. Hered. 54, 186–198 (2002).
Okada, Y. et al. Construction of a population-specific HLA imputation reference panel and its application to Graves’ disease risk in Japanese. Nat. Genet. 47, 798–802 (2015).
Okada, Y. eLD: entropy-based linkage disequilibrium index between multiallelic sites. Hum. Genome Var. 5, 29 (2018).
Chikata, T. et al. Host-specific adaptation of HIV-1 subtype B in the Japanese population. J. Virol. 88, 4764–4775 (2014).
Nomura, E. et al. Mapping of a disease susceptibility locus in chromosome 6p in Japanese patients with ulcerative colitis. Genes Immun. 5, 477–483 (2004).
Price, P. et al. The genetic basis for the association of the 8.1 ancestral haplotype (A1, B8, DR3) with multiple immunopathological diseases. Immunol. Rev. 167, 257–274 (1999).
Horton, R. et al. Variation analysis and gene annotation of eight MHC haplotypes: the MHC Haplotype Project. Immunogenetics 60, 1–18 (2008).
Graham, R. R. et al. Visualizing human leukocyte antigen class II risk haplotypes in human systemic lupus erythematosus. Am. J. Hum. Genet. 71, 543–553 (2002).
Miller, F. W. et al. Genome-wide association study identifies HLA 8.1 ancestral haplotype alleles as major genetic risk factors for myositis phenotypes. Genes Immun. 16, 470–480 (2015).
Haapasalo, K. et al. The psoriasis risk allele HLA-C*06:02 shows evidence of association with chronic or recurrent Streptococcal tonsillitis. Infect. Immun. 86, e00304–e00318 (2018).
Salter-Townshend, M. & Myers, S. Fine-scale inference of ancestry segments without prior knowledge of admixing groups. Genetics 212, 869–889 (2019).
Zhou, Q., Zhao, L. & Guan, Y. Strong selection at MHC in Mexicans since admixture. PLoS Genet. 12, e1005847 (2016).
Meyer, D., C Aguiar, V. R., Bitarello, B. D., C Brandt, D. Y. & Nunes, K. A genomic perspective on HLA evolution. Immunogenetics 70, 5–27 (2018).
Norris, E. T. et al. Admixture-enabled selection for rapid adaptive evolution in the Americas. Genome Biol. 21, 29 (2020).
Guan, Y. Detecting structure of haplotypes and local ancestry. Genetics 196, 625–642 (2014).
Maples, B. K., Gravel, S., Kenny, E. E. & Bustamante, C. D. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 93, 278–288 (2013).
Degenhardt, F. et al. Construction and benchmarking of a multi-ethnic reference panel for the imputation of HLA class I and II alleles. Hum. Mol. Genet. 28, 2078–2092 (2019).
Ambardar, S. & Gowda, M. High-resolution full-length HLA typing method using third generation (Pac-Bio SMRT) sequencing technology. Methods Mol. Biol. 1802, 135–153 (2018).
Macdonald, W. A. et al. A naturally selected dimorphism within the HLA-B44 supertype alters class I structure, peptide repertoire, and T cell recognition. J. Exp. Med. 198, 679–691 (2003).
Kloverpris, H. N. et al. HLA-B*57 micropolymorphism shapes HLA allele-specific epitope immunogenicity, selection pressure, and HIV immune control. J. Virol. 86, 919–929 (2012).
Carrington, M. & Walker, B. D. Immunogenetics of spontaneous control of HIV. Annu. Rev. Med. 63, 131–145 (2012).
Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).
Khera, A. V. et al. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell 177, 587–596 (2019).
Torkamani, A. & Topol, E. Polygenic risk scores expand to obesity. Cell 177, 518–520 (2019).
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1–11.10.33 (2013).
Julg, B. et al. Possession of HLA class II DRB1*1303 associates with reduced viral loads in chronic HIV-1 clade C and B infection. J. Infect. Dis. 203, 803–809 (2011).
Schäfer, C., Schmidt, A. H. & Sauter, J. Hapl-o-Mat: open-source software for HLA haplotype frequency estimation from ambiguous and heterogeneous data. BMC Bioinformatics 18, 284 (2017).
Pappas, D. J., Marin, W., Hollenbach, J. A. & Mack, S. J. Bridging ImmunoGenomic Data Analysis Workflow Gaps (BIGDAWG): an integrated case–control analysis pipeline. Hum. Immunol. 77, 283–287 (2016).
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Price, A. L. et al. Long-range LD can confound genome scans in admixed populations. Am. J. Hum. Genet. 83, 132–135 (2008).
Pasaniuc, B. et al. Analysis of Latino populations from GALA and MEC studies reveals genomic loci with biased local ancestry estimation. Bioinformatics 29, 1407–1415 (2013).
McLaren, P. J. et al. Association study of common genetic variants and HIV-1 acquisition in 6,300 infected cases and 7,200 controls. PLoS Pathog. 9, e1003515 (2013).
Okada, Y. et al. Contribution of a non-classical HLA gene, HLA-DOA, to the risk of rheumatoid arthritis. Am. J. Hum. Genet. 99, 366–374 (2016).
Lenz, T. L. et al. Widespread non-additive and interaction effects within HLA loci modulate the risk of autoimmune diseases. Nat. Genet. 47, 1085–1090 (2015).
Acknowledgements
The study was supported by the National Institutes of Health (NIH) TB Research Unit Network, Grant U19 AI111224-01. We thank C. Willer, B. Vanderwerff and B. Klunder from the University of Michigan for help facilitating getting the constructed reference panel on the Michigan Imputation Server. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute (NHLBI); the National Institutes of Health; or the US Department of Health and Human Services. The GaP Registry at The Feinstein Institute for Medical Research provided fresh, de-identified human plasma; blood was collected from control participants under an institutional review board–approved protocol (IRB No. 09-081) and processed to isolate plasma. The GaP is a sub-protocol of the Tissue Donation Program at Northwell Health and a national resource for genotype–phenotype studies (https://www.feinsteininstitute.org/robert-s-boas-center-for-genomics-and-human-genetics/gap-registry/). For some HIV cohort participants, DNA and data collection was supported by NIH/NIAID AIDS Clinical Trial Group (ACTG) grants UM1 AI068634, UM1 AI068636 and UM1 AI106701, and ACTG clinical research site grants A1069412, A1069423, A1069424, A1069503, AI025859, AI025868, AI027658, AI027661, AI027666, AI027675, AI032782, AI034853, AI038858, AI045008, AI046370, AI046376, AI050409, AI050410, AI050410, AI058740, AI060354, AI068636, AI069412, AI069415, AI069418, AI069419, AI069423, AI069424, AI069428, AI069432, AI069432, AI069434, AI069439, AI069447, AI069450, AI069452, AI069465, AI069467, AI069470, AI069471, AI069472, AI069474, AI069477, AI069481, AI069484, AI069494, AI069495, AI069496, AI069501, AI069501, AI069502, AI069503, AI069511, AI069513, AI069532, AI069534, AI069556, AI072626, AI073961, RR000046, RR000425, RR023561, RR024156, RR024160, RR024996, RR025008, RR025747, RR025777, RR025780, TR000004, TR000058, TR000124, TR000170, TR000439, TR000445, TR000457, TR001079, TR001082, TR001111 and TR024160. Molecular data for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the NHLBI. See the TOPMed omics support table (Supplementary Table 23) for study-specific omics support information. Core support including centralized genomic read mapping and genotype calling, along with variant quality metrics and filtering, was provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support including phenotype harmonization, data management, sample-identity QC and general program coordination was provided by the TOPMed Data Coordinating Center (R01HL-120393; U01HL-120393; contract HHSN268201800001I). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. The COPDGene project was supported by Award Number U01 HL089897 and Award Number U01 HL089856 from the NHLBI. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NHLBI or the National Institutes of Health. The COPDGene project is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer Ingelheim, GlaxoSmithKline, Novartis, Pfizer, Siemens and Sunovion. A full listing of COPDGene investigators can be found at http://www.copdgene.org/directory. The JHS is supported and conducted in collaboration with Jackson State University (HHSN268201800013I), Tougaloo College (HHSN268201800014I), Mississippi State Department of Health (HHSN268201800015I) and University of Mississippi Medical Center (HHSN268201800010I, HHSN268201800011I and HHSN268201800012I) contracts from the NHLBI and the National Institute on Minority Health and Health Disparities. We also thank the staff and participants of the JHS. MESA and the MESA SHARe project are conducted and supported by the NHLBI in collaboration with MESA investigators. Support for MESA is provided by contracts 75N92020D00001, HHSN268201500003I, N01-HC-95159, 75N92020D00005, N01-HC-95160, 75N92020D00002, N01-HC-95161, 75N92020D00003, N01-HC-95162, 75N92020D00006, N01-HC-95163, 75N92020D00004, N01-HC-95164, 75N92020D00007, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-001079 and UL1-TR-001420. MESA Family is conducted and supported by the NHLBI in collaboration with MESA investigators. Support is provided by grants and contracts R01HL071051, R01HL071205, R01HL071250, R01HL071251, R01HL071258 and R01HL071259, and by the National Center for Research Resources, grant UL1RR033176. The provision of genotyping data was supported in part by the National Center for Advancing Translational Sciences, CTSI grant UL1TR001881, and the National Institute of Diabetes and Digestive and Kidney Disease Diabetes Research Center (DRC) grant DK063491 to the Southern California Diabetes Endocrinology Research Center. This project has been funded in whole or in part with federal funds from the Frederick National Laboratory for Cancer Research, under Contract No. HHSN261200800001E. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products or organizations imply endorsement by the US Government. This research was supported in part by the Intramural Research Program of the NIH, Frederick National Laboratory, Center for Cancer Research. The Diabetes Heart Study was supported by R01 HL92301, R01 HL67348, R01 NS058700, R01 AR48797, R01 DK071891, R01 AG058921, the General Clinical Research Center of the Wake Forest University School of Medicine (M01 RR07122, F32 HL085989), the American Diabetes Association and a pilot grant from the Claude Pepper Older Americans Independence Center of Wake Forest University Health Sciences (P60 AG10484). A. Metspalu is supported by Gentransmed grant 2014-2020.4.01.15-0012. D.W.H. is supported by NIH grants AI110527, AI077505, TR000445, AI069439 and AI110527. J.T.E. and P.E.S. were supported by NIH/NIAMS R01 AR042742, R01 AR050511 and R01 AR063611. Y.O. was supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI (19H01021, 20K21834), AMED (JP20km0405211, JP20ek0109413, JP20ek0410075, JP20gm4010006 and JP20km0405217), Takeda Science Foundation, JST Moonshot R&D (JPMJMS2021, JPMJMS2024) and the Bioinformatics Initiative of Osaka University Graduate School of Medicine, Osaka University.
Author information
Authors and Affiliations
Consortia
Contributions
Y. Luo and S. Raychaudhuri conceived, designed and performed analyses, wrote the manuscript and supervised the research. M. Kanai implemented the omnibus test for the HIV-1 fine-mapping study. Y. Luo, W.C., M. Kanai, P.E.S., J.T.E. and B. Han contributed to the development of the HLA-TAPAS pipeline. X. Li performed the selection analysis. S. Sakaue performed imputation comparison between Beagle v.4 and Minimac4. L. Forer, S. Schoenherr, C. Fuchsberger and A.V.S. hosted the HLA imputation server. J.T.E., M.G.-A. and P.K.G. helped with the GaP data acquisition. K. Yamamoto, K.O., D.W.H., X.G., N.D.P., Y.-D.I.C., J.I.R., K.D.T., S.S.R., A.C., J.G.W., S. Kathiresan, M.H.C., A. Metspalu, T.E. and Y.O. contributed to the WGS data acquisition. J. Fellay, M. Carrington and P.J.M. contributed to the HIV-1 data acquisition. All authors contributed to the writing of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
M.H.C. has received consulting or speaking fees from Illumina and AstraZeneca, and grant support from GSK and Bayer. The remaining authors declare no competing interests.
Additional information
Peer review information Nature Genetics thanks Pierre-Antoine Gourraud and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 HLA nomenclature.
Description of a classical HLA allele using current standard nomenclature. The first field corresponds to the serological antigen. The second field distinguishes HLA alleles that differ by one or more missense variants. The third field distinguishes HLA alleles that differ by one or more synonymous variants. The G-group distinguishes HLA alleles that differ by one or more synonymous variants within the exons that encode the peptide binding groove regions (exon 2 and 3 for HLA class I genes and exon 2 for HLA class II genes).
Extended Data Fig. 2 Correlation between imputed and typed dosage (dosage r2) of classical HLA alleles in 1,067 Admixed African HIV-1 individuals.
The x-axis shows the minor allele frequency observed in the SBT dataset. Blue points show G-group HLA alleles. Red points show one-field HLA alleles.
Extended Data Fig. 3 Association tests within the MHC to HIV-1 viral load.
The x-axis shows the genomic positions of chromosome 6 (build 37), and the y-axis is the -log10 (P-value) obtained from two-sided regression analyses for SNPs (gray), classical HLA alleles (blue) and amino acids (red). The dashed black line indicates the genome-wide significance threshold (P = 5 × 10−8). For biallelic markers, results were calculated by a linear regression model including sex, cohort-specific principal components and ancestry indicator as covariates (circle). Association at amino acid positions with more than two residues was calculated using a multi-degree-of-freedom omnibus test (one-sided F-test) including the same covariates (diamond). The top associated amino acid, classical HLA allele and SNPs are annotated in the figure. a, Of all variants tested, the top hit maps to amino acid position 97 in HLA-B. b, Subsequent conditional analysis controlling for all residues at position 97 in HLA-B revealed an independent association at position 67 in HLA-B. c, Results conditioned on position 97 and 67 in HLA-B showed a third signal at position 156 in HLA-B. d, Results conditioned on position 97, 67 and 156 in HLA-B showed position 77 in HLA-A has the strongest association signal outside HLA-B among all amino acid positions. e, Results conditioned on all amino acid positions in HLA-B. Notably, amino acid positions were more significant than any single SNP or classical HLA allele in each conditional analysis for the three amino acid positions in HLA-B.
Extended Data Fig. 4 Effect on set point viral load of individual residues at position 97 in HLA-B.
Mean set point viral load (spVL, RNA copies per milliliter) and its standard error of all six residues at position 97 in HLA-B in three populations independently. Data are presented as mean values ± standard errors. Residues are ranked from the most protective to the riskiest in the overall population. There are 3,901 AA, 7,455 EUR, and 677 LAT independent individuals included in the analysis.
Extended Data Fig. 5 Global diversity of the MHC region.
Principal component analysis of the pairwise IBD distance between 21,546 individuals using MHC region markers. The first two principal components show separation of continental groups.
Extended Data Fig. 6 Diversity of eight classical HLA genes in the constructed multi-ancestry MHC reference panel.
Each gene is stratified by six populations (AA, Admixed African; EAS, East Asian; EUR, European; LAT, Latino; SAS, South Asian). The top two most common alleles within each classical gene of each population are plotted across all panels. Alleles that have frequencies greater than 1% are also labelled in the bar plots. a, Class I genes. b, Class II genes.
Extended Data Fig. 7 Allele diversity of eight classical HLA genes in global populations.
For each gene, the top five most frequent alleles across all populations are shown (light blue, most frequent; dark blue, second frequent; light green, third frequent; dark green, fourth frequent; red, fifth frequent; gray, all other alleles).
Extended Data Fig. 8 Pairwise normalized entropy (ε) among all population groups.
The normalized entropy (ε) measures the difference of the haplotype frequency distribution for linkage disequilibrium and linkage equilibrium, and takes values between 0 (no LD) to 1 (perfect LD).
Extended Data Fig. 9 Deviation from average genome-wide ancestry in Admixed African and Latino populations.
a,b, The x-axis is the genomic position of chromosome 6. The y-axis shows the local African ancestry deviation measure inferred at a given position for Admixed Africans (a) and Latinos (b). The MHC region (chr6:28Mb-34Mb) is highlighted in red shading. Local ancestries were estimated using RFMix (red) and ELAI (blue). The ancestry deviation measure is the difference between African ancestry at a given genomic position with respect to the genome-wide average estimated by ADMIXTURE with K = 3, normalized by the standard deviation of the ancestry estimate. The dashed line indicates the genome-wide significance threshold at ±4.42 standard deviation of the ancestry estimate deviated from the genome-wide average.
Extended Data Fig. 10 Conditional analysis of other previously reported independently associated amino acid positions.
a,b, Manhattan plots of amino acid positions in the six classical HLA genes. Each point shows a single amino acid position and its omnibus P-value after controlling for independent positions that are associated with spVL in this study (position 97, 67 and 156 in HLA-B) (a) and independent positions that are only reported in previous studies14,18 and not in the presented work (position 45, 63 and 116 in HLA-B and position 77, 95 in HLA-A) (b). Independently associated amino acid positions that are only reported in the European population14 are shown in blue. Independently associated amino acid positions that are only reported in the African American population18 are shown in purple. Independently associated amino acid positions identified in this study are shown in red.
Supplementary information
Supplementary Information
Supplementary Figs. 1–18, Tables 1–23 and Note.
Supplementary Data 1
Summary of inferred HLA alleles at G-group resolution. For each allele observed in the reference panel, we listed its overall frequency (Freq); P value (Pval) for difference in frequencies across populations (a two-sided chi-square test with four degrees of freedom); frequency within each continental population (European, EUR; AA, admixed African; LAT, Latino; SAS, South Asian; EAS, East Asian) and its accuracy in two validation cohorts (JPN (n = 288 independent individuals) and 1KG (n = 955 independent individuals)).
Supplementary Data 2
Imputation dosage r2 of each allele. Each row shows each imputed classical HLA allele and its imputation accuracy (measure in dosage r2) in three cohorts with validation data (G1K (n = 955), GaP Registry (GAP, n = 75) and HIV (n = 1,067)). We also report the allelic frequency using the gold standard and the inferred alleles.
Supplementary Data 3
Association study within the MHC to HIV-1 viral load. For each row, we list the results of association testing for each of the binary markers that we imputed across the extended MHC region. For each marker, its unique identifier (ID), genome base pair position in Grch37 (BP), reference allele (REF), alternative allele (ALT), minor allele frequency (MAF), effect size (BETA) and standard error (SE) in each conditional analysis are listed.
Supplementary Data 4
HLA haplotype frequency based on eight classical HLA alleles inferred at G-group resolution. All haplotypes with count > 1 in each and overall population are listed. AA represents individuals of admixed African ancestry; EAS represents individuals of East Asian ancestry; EUR represents individuals of European ancestry; LAT represents individuals of Native American ancestry; SAS represents individuals of South Asian ancestry.
Rights and permissions
About this article
Cite this article
Luo, Y., Kanai, M., Choi, W. et al. A high-resolution HLA reference panel capturing global population diversity enables multi-ancestry fine-mapping in HIV host response. Nat Genet 53, 1504–1516 (2021). https://doi.org/10.1038/s41588-021-00935-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-021-00935-7
This article is cited by
-
A genomic data archive from the Network for Pancreatic Organ donors with Diabetes
Scientific Data (2023)
-
Human leukocyte antigen alleles associate with COVID-19 vaccine immunogenicity and risk of breakthrough infection
Nature Medicine (2023)
-
Towards a global view of multiple sclerosis genetics
Nature Reviews Neurology (2022)
-
The human genetic epidemiology of COVID-19
Nature Reviews Genetics (2022)
-
HLA imputation and its application to genetic and molecular fine-mapping of the MHC region in autoimmune diseases
Seminars in Immunopathology (2022)