The human major histocompatibility complex (MHC) region has been shown to be associated with numerous diseases. However, it remains a challenge to pinpoint the causal variants for these associations because of the extreme complexity of the region. We thus sequenced the entire 5-Mb MHC region in 20,635 individuals of Han Chinese ancestry (10,689 controls and 9,946 patients with psoriasis) and constructed a Han-MHC database that includes both variants and HLA gene typing results of high accuracy. We further identified multiple independent new susceptibility loci in HLA-C, HLA-B, HLA-DPB1 and BTNL2 and an intergenic variant, rs118179173, associated with psoriasis and confirmed the well-established risk allele HLA-C*06:02. We anticipate that our Han-MHC reference panel built by deep sequencing of a large number of samples will serve as a useful tool for investigating the role of the MHC region in a variety of diseases and thus advance understanding of the pathogenesis of these disorders.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Sequence Read Archive
MHC Sequencing Consortium. Complete sequence and gene map of a human major histocompatibility complex. Nature 401, 921–923 (1999).
Trowsdale, J. The MHC, disease and selection. Immunol. Lett. 137, 1–8 (2011).
Trowsdale, J. & Knight, J.C. Major histocompatibility complex genomics and human disease. Annu. Rev. Genomics Hum. Genet. 14, 301–323 (2013).
Corvin, A. & Morris, D.W. Genome-wide association studies: findings at the major histocompatibility complex locus in psychosis. Biol. Psychiatry 75, 276–283 (2014).
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP–trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
Horton, R. et al. Variation analysis and gene annotation of eight MHC haplotypes: the MHC Haplotype Project. Immunogenetics 60, 1–18 (2008).
de Bakker, P.I.W. et al. A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nat. Genet. 38, 1166–1172 (2006).
Raychaudhuri, S. et al. Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis. Nat. Genet. 44, 291–296 (2012).
Patsopoulos, N.A. et al. Fine-mapping the genetic association of the major histocompatibility complex in multiple sclerosis: HLA and non-HLA effects. PLoS Genet. 9, e1003926 (2013).
Kim, K. et al. The HLA-DRβ1 amino acid positions 11–13–26 explain the majority of SLE–MHC associations. Nat. Commun. 5, 5902 (2014).
Cao, H. et al. An integrated tool to study MHC region: accurate SNV detection and HLA genes typing in human MHC region using targeted high-throughput sequencing. PLoS One 8, e69388 (2013).
1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Robinson, J. et al. The IPD and IMGT/HLA database: allele variant databases. Nucleic Acids Res. 43, D423–D431 (2015).
Gourraud, P.-A. et al. HLA diversity in the 1000 Genomes dataset. PLoS One 9, e97282 (2014).
Prugnolle, F. et al. Pathogen-driven selection and worldwide HLA class I diversity. Curr. Biol. 15, 1022–1027 (2005).
Gragert, L., Madbouly, A., Freeman, J. & Maiers, M. Six-locus high resolution HLA haplotype frequencies derived from mixed-resolution DNA typing for the entire US donor registry. Hum. Immunol. 74, 1313–1320 (2013).
Browning, S.R. & Browning, B.L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).
Detrait, M. et al. Suggestive evidence of a role of HLA-DRB4 mismatches in the outcome of allogeneic hematopoietic stem cell transplantation with HLA-10/10–matched unrelated donors: a French–Swiss retrospective study. Bone Marrow Transplant. 50, 1316–1320 (2015).
Erlich, H.A. et al. Next generation sequencing reveals the association of DRB3*02:02 with type 1 diabetes. Diabetes 62, 2618–2622 (2013).
Okada, Y. et al. Risk for ACPA-positive rheumatoid arthritis is driven by shared HLA amino acid polymorphisms in Asian and European populations. Hum. Mol. Genet. 23, 6916–6926 (2014).
Kim, K., Bang, S.-Y., Lee, H.-S. & Bae, S.-C. Construction and application of a Korean reference panel for imputing classical alleles and amino acids of human leukocyte antigen genes. PLoS One 9, e112546 (2014).
Jia, X. et al. Imputing amino acid polymorphisms in human leukocyte antigens. PLoS One 8, e64683 (2013).
Tang, H. et al. A large-scale screen for coding variants predisposing to psoriasis. Nat. Genet. 46, 45–50 (2014).
Zhang, X., He, P., Wei, S., Chen, S. & Xu, S. Evidence for a major psoriasis susceptibility locus at 6p21 (PSORS1) and a novel candidate region at 4q31 by genome-wide scan in Chinese Hans. J. Invest. Dermatol. 21, 1361–1366 (2002).
Fan, X. et al. Fine mapping of the psoriasis susceptibility locus PSORS1 supports HLA-C as the susceptibility gene in the Han Chinese population. PLoS Genet. 4, e1000038 (2008).
Feng, B.J. et al. Multiple loci within the major histocompatibility complex confer risk of psoriasis. PLoS Genet. 5, e1000606 (2009).
Okada, Y. et al. Fine mapping major histocompatibility complex associations in psoriasis and its clinical subtypes. Am. J. Hum. Genet. 95, 162–172 (2014).
Chen, P.-L. et al. Comprehensive genotyping in two homogeneous Graves' disease samples reveals major and novel HLA association alleles. PLoS One 6, e16635 (2011).
Arnett, H.A. & Viney, J.L. Immune modulation by butyrophilins. Nat. Rev. Immunol. 14, 559–569 (2014).
Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Zuo, X. et al. Whole-exome SNP array identifies 15 new susceptibility loci for psoriasis. Nat. Commun. 6, 6793 (2015).
Chan, C.J., Smyth, M.J. & Martinet, L. Molecular mechanisms of natural killer cell activation in response to cellular stress. Cell Death Differ. 21, 5–14 (2014).
Suh, W.K. et al. Interaction of MHC class I molecules with the transporter associated with antigen processing. Science 264, 1322–1326 (1994).
Myers, E.W. et al. A whole-genome assembly of Drosophila. Science 287, 2196–2204 (2000).
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).
Cao, H. et al. De novo assembly of a haplotype-resolved human genome. Nat. Biotechnol. 33, 617–622 (2015).
Jiang, L. et al. Novel risk loci for rheumatoid arthritis in Han Chinese and congruence with risk variants in Europeans. Arthritis Rheumatol. 66, 1121–1132 (2014).
Dubois, P.C. et al. Multiple common variants for celiac disease influencing immune gene expression. Nat. Genet. 42, 295–302 (2010).
Chu, X. et al. A genome-wide association study identifies two new risk loci for Graves' disease. Nat. Genet. 43, 897–901 (2011).
Li, Y. et al. A genome-wide association study in Han Chinese identifies a susceptibility locus for primary Sjögren's syndrome at 7q11.23. Nat. Genet. 45, 1361–1365 (2013).
Mbarek, H. et al. A genome-wide association study of chronic hepatitis B identified novel risk locus in a Japanese population. Hum. Mol. Genet. 20, 3884–3892 (2011).
Chang, S.-W. et al. A genome-wide association study on chronic HBV infection and its clinical progression in male Han–Taiwanese. PLoS One 9, e99724 (2014).
Hu, Z. et al. New loci associated with chronic hepatitis B virus infection in Han Chinese. Nat. Genet. 45, 1499–1503 (2013).
Zhang, X.-J. et al. Psoriasis genome-wide association study identifies susceptibility variants within LCE gene cluster at 1q21. Nat. Genet. 41, 205–210 (2009).
Yin, X. et al. Genome-wide meta-analysis identifies multiple novel associations and ethnic heterogeneity of psoriasis susceptibility. Nat. Commun. 6, 6916 (2015).
McClellan, J. & King, M.-C. Genetic heterogeneity in human disease. Cell 141, 210–217 (2010).
Zhou, F. et al. Supporting data for “Deep sequencing of human major histocompatibility complex region contributes to studies of complex disease”. GigaScience. Database http://dx.doi.org/10.5524/100156 (2015).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).
Kumar, P., Henikoff, S. & Ng, P.C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009).
Chun, S. & Fay, J.C. Identification of deleterious mutations within three human genomes. Genome Res. 19, 1553–1561 (2009).
Schwarz, J.M., Rödelsperger, C., Schuelke, M. & Seelow, D. MutationTaster evaluates disease-causing potential of sequence alterations. Nat. Methods 7, 575–576 (2010).
We thank the faculty and staff at Anhui Medical University and BGI-Shenzhen who contributed to the Han-MHC project. We acknowledge grant support from the Key Program of the National Natural Science Foundation of China (81130031), the National Science Fund for Excellent Young Scholars (81222022), the Top-Notch Young Talents Program of China, the Pre-National Basic Research Program of China (973 plan; 2012CB722404), the National Natural Science Foundation of China (81573035, 81273301, 81271747, 81370044, 8157120504 and 81502713), the Natural Science Foundation of Anhui Province (1508085JGD05), the Program of Outstanding Talents of Anhui Medical University and the Shenzhen municipal government of China (CXZZ20140904154910774).
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Distribution of healthy control samples included in the MHC sequencing study.
In total, 10,689 normal individuals were recruited in the sequencing stage.
Supplementary Figure 2 The average depth distribution along the MHC region of the sequenced samples.
The average SNP number in these samples approached 15,000, and the average indel number approached 2,000.
Allelic frequencies for HLA-B, HLA-DRB1, HLA-C, HLA-A, MICA, HLA-DQB1, HLA-DQA1 and HLA-DPB1 (sorted on the basis of diversity and arranged from highest to lowest) in the Chinese population are given in each chart. Only the top 20 alleles for each gene are shown in the diagram.
Haplotypes are given as HLA A-B-C-DRB1-DQB1. Of the extended haplotypes, HLA A30-B13-C06-DR07-DQ02 was the most common haplotype with a frequency of 3.89%.
(a–c) Results are shown for SNPs (a), amino acids and HLA types (b) and MHC haplotypes (c). Green, southern versus central China; black, northern versus central China; blue, southern versus northern China.
Only haplotypes with a frequency of ≥0.5% in the Han Chinese population are shown. The red line represents the overall frequency in the Han Chinese population. The blue, green and brown lines represent the frequency in northern, central and southern Han Chinese populations, respectively.
Supplementary Figure 11 The Pearson r2 value between imputed and standard alleles of the five classical HLA genes at different allele frequencies.
The mean r2 value is 0.97 for common alleles, 0.93 for low-frequency alleles and 0.81 for rare alleles.
The first six rows show H3K4me3 (green) data for CD4+ memory primary cells, CD4+ naive primary cells, CD8+ memory primary cells, CD8+ naive primary cells, Treg primary cells and the GM12878 cell line (B lymphocyte, lymphoblastoid; International HapMap Project CEPH/Utah, European Caucasian, Epstein–Barr virus). The next five rows show H3K27ac (blue) data for CD4+ memory primary cells, CD4+ naive primary cells, CD8+ memory primary cells, CD8+ naive primary cells and the GM12878 cell line. Then, the next six rows show H3K36me3 (green) data for CD4+ memory primary cells, CD4+ naive primary cells, CD8+ memory primary cells, CD8+ naive primary cells, Treg primary cells and the GM12878 cell line. The chromatin states displayed are for CD4+ memory primary cells, CD4+ naive primary cells, CD8+ memory primary cells, CD8+ naive primary cells and the GM12878 cell line. The detailed color schemes for the chromatin states are listed below. Briefly, red corresponds to active transcriptional start sites (TSSs), yellow corresponds to enhancers, green corresponds to transcription and white corresponds to quiescent regions. DNase I hypersensitivity sites are for CD4+ primary cells, CD8+ primary cells, CD14+ monocytes, Treg, TH1 and TH2 cells, and the GM12878 cell line. All data are publicly available from ENCODE and NIH Roadmap. Raw data were plotted using the website http://epigenomegateway.wustl.edu/browser/.
Key amino acid positions identified in psoriasis association analysis are highlighted.
(a) The relationship between MAFs and functional prediction scores from SIFT, PolyPhen-2, LRT and MutationTaster. Each prediction score showed a significant negative correlation with MAF in the studied samples. (b) The rare variant excess in most functional sequences, which varies systematically between types (for example, transcription factor motif variants have higher rare variant excess than splicing variants). Interestingly, the least conserved nonsynonymous variants show similar rare variant loads to UTR and synonymous regions, suggesting that these alternative transcripts are under very weak selective constraint.
Supplementary Figures 1–15, Supplementary Tables 2, 4, 6, 7 and 9 and Supplementary Note. (PDF 2188 kb)
HLA type frequency. (XLSX 31 kb)
Selected tagging SNPs. (XLSX 33 kb)
MHC haplotype frequency. (XLSX 214 kb)
HLA types from sequencing data and the 1000 Genomes Project. (XLSX 13 kb)
About this article
Cite this article
Zhou, F., Cao, H., Zuo, X. et al. Deep sequencing of the MHC region in the Chinese population contributes to studies of complex disease. Nat Genet 48, 740–746 (2016). https://doi.org/10.1038/ng.3576
Nature Communications (2021)
MHC associations of ankylosing spondylitis in East Asians are complex and involve non-HLA-B27 HLA contributions
Arthritis Research & Therapy (2020)
HLA-A*02:01 allele is associated with tanshinone-induced cutaneous drug reactions in Chinese population
The Pharmacogenomics Journal (2020)
Novel susceptibility loci for A(H7N9) infection identified by next generation sequencing and functional analysis
Scientific Reports (2020)
Human Leukocyte Antigen alleles associated with Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS)
Scientific Reports (2020)