Genetic studies have revealed that autoimmune susceptibility variants are over-represented in memory CD4+ T cell regulatory elements1,2,3. Understanding how genetic variation affects gene expression in different T cell physiological states is essential for deciphering genetic mechanisms of autoimmunity4,5. Here, we characterized the dynamics of genetic regulatory effects at eight time points during memory CD4+ T cell activation with high-depth RNA-seq in healthy individuals. We discovered widespread, dynamic allele-specific expression across the genome, where the balance of alleles changes over time. These genes were enriched fourfold within autoimmune loci. We found pervasive dynamic regulatory effects within six HLA genes. HLA-DQB1 alleles had one of three distinct transcriptional regulatory programs. Using CRISPR–Cas9 genomic editing we demonstrated that a promoter variant is causal for T cell–specific control of HLA-DQB1 expression. Our study shows that genetic variation in cis-regulatory elements affects gene expression in a manner dependent on lymphocyte activation status, contributing to the interindividual complexity of immune responses.
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Code for key analyses in this study are publicly available in GitHub (https://github.com/immunogenomics/dynamicASE) or upon request to the authors.
Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124–130 (2013).
Onengut-Gumuscu, S. et al. Fine mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers. Nat. Genet. 47, 381–386 (2015).
Farh, K. K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).
Simeonov, D. R. et al. Discovery of stimulation-responsive immune enhancers with CRISPR activation. Nature 549, 111–115 (2017).
Gutierrez-Arcelus, M., Rich, S. S. & Raychaudhuri, S. Autoimmune diseases—connecting risk alleles with molecular traits of the immune system. Nat. Rev. Genet. 17, 160–174 (2016).
Raj, T. et al. Polarization of the effects of autoimmune and neurodegenerative risk alleles in leukocytes. Science 344, 519–523 (2014).
Dimas, A. S. et al. Common regulatory variation impacts gene expression in a cell type-dependent manner. Science 325, 1246–1250 (2009).
Gutierrez-Arcelus, M. et al. Passive and active DNA methylation and the interplay with genetic variation in gene regulation. eLife 2, e00523 (2013).
Chen, L. et al. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell 167, 1398–1414.e24 (2016).
Ishigaki, K. et al. Polygenic burdens on cell-specific pathways underlie the risk of rheumatoid arthritis. Nat. Genet. 49, 1120–1125 (2017).
Ye, C. J. et al. Intersection of population variation and autoimmunity genetics in human T cell activation. Science 345, 1254665 (2014).
Hu, X. et al. Regulation of gene expression in autoimmune disease loci and the genetic basis of proliferation in CD4+ effector memory T cells. PLoS Genet. 10, e1004404 (2014).
Buil, A. et al. Gene–gene and gene–environment interactions detected by transcriptome sequence analysis in twins. Nat. Genet. 47, 88–91 (2015).
Moyerbrailean, G. A. & et al. High-throughput allele-specific expression across 250 environmental conditions. Genome Res. 26, 1627–1638 (2016).
van de Geijn, B., McVicker, G., Gilad, Y. & Pritchard, J. K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods 12, 1061–1063 (2015).
Knowles, D. A. et al. Allele-specific expression reveals interactions between genetic variation and environment. Nat. Methods 14, 699–702 (2017).
Hu, X. et al. Additive and interaction effects at three amino acid positions in HLA-DQ and HLA-DR molecules drive type 1 diabetes risk. Nat. Genet. 47, 898–905 (2015).
Sollid, L. M. et al. Evidence for a primary association of celiac disease to a particular HLA-DQ alpha/beta heterodimer. J. Exp. Med. 169, 345–350 (1989).
Burmester, G. R., Yu, D. T., Irani, A. M., Kunkel, H. G. & Winchester, R. J. Ia+ T cells in synovial fluid and tissues of patients with rheumatoid arthritis. Arthritis Rheumatol. 24, 1370–1376 (1981).
Yu, D. T. et al. Peripheral blood Ia-positive T cells. Increases in certain diseases and after immunization. J. Exp. Med. 151, 91–100 (1980).
Ko, H. S. Ia determinants on stimulated human T lymphocytes. Occurrence on mitogen- and antigen-activated T cells. J. Exp. Med. 150, 246–255 (1979).
Rao, D. A. et al. Pathologically expanded peripheral T helper cell subset drives B cells in rheumatoid arthritis. Nature 542, 110–114 (2017).
Fonseka, C. Y. et al. Mixed-effects association of single cells identifies an expanded effector CD4+ T cell subset in rheumatoid arthritis. Sci. Transl. Med. 10, eaaq0305 (2018).
Lanzavecchia, A., Roosnek, E., Gregory, T., Berman, P. & Abrignani, S. T cells can present antigens such as HIV gp120 targeted to their own surface molecules. Nature 334, 530–532 (1988).
LaSalle, J. M., Tolentino, P. J., Freeman, G. J., Nadler, L. M. & Hafler, D. A. Early signaling defects in human T cells anergized by T cell presentation of autoantigen. J. Exp. Med. 176, 177–186 (1992).
Brandes, M., Willimann, K. & Moser, B. Professional antigen-presentation function by human γδ T cells. Science 309, 264–268 (2005).
Guo, M. H. et al. Comprehensive population-based genome sequencing provides insight into hematopoietic regulatory mechanisms. Proc. Natl Acad. Sci. USA 114, E327–E336 (2017).
Bild, D. E. et al. Multi-Ethnic Study of Atherosclerosis: objectives and design. Am. J. Epidemiol. 156, 871–881 (2002).
Roadmap Epigenomics Consortium, et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Wong, D. et al. Genomic mapping of the MHC transactivator CIITA using an integrated ChIP–seq and genetical genomics approach. Genome Biol. 15, 494 (2014).
GTEx Consortium, et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Nédélec, Y. et al. Genetic ancestry and natural selection drive population differences in immune responses to pathogens. Cell 167, 657–669.e21 (2016).
Aguiar, V. R. C., César, J., Delaneau, O., Dermitzakis, E. T. & Meyer, D. Expression estimation and eQTL mapping for HLA genes with a personalized pipeline. PLoS Genet. 15, e1008091 (2019).
Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384.e19 (2016).
Schofield, E. C. et al. CHiCP: a web-based tool for the integrative and interactive visualization of promoter capture Hi-C datasets. Bioinformatics 32, 2511–2513 (2016).
Chun, S. et al. Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat. Genet. 49, 600 (2017).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Castel, S. E. et al. Modified penetrance of coding variants by cis-regulatory variation contributes to disease risk. Nat. Genet. 50, 1327–1334 (2018).
Raychaudhuri, S. et al. Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis. Nat. Genet. 44, 291–296 (2012).
Raj, P. et al. Regulatory polymorphisms modulate the expression of HLA class II molecules and promote autoimmunity. eLife 5, e12089 (2016).
Cavalli, G. et al. MHC class II super-enhancer increases surface expression of HLA-DR and HLA-DQ and affects cytokine production in autoimmune vitiligo. Proc. Natl Acad. Sci. USA 113, 1363–1368 (2016).
Vandiedonck, C. et al. Pervasive haplotypic variation in the spliceo-transcriptome of the human major histocompatibility complex. Genome Res. 21, 1042–1054 (2011).
Pelikan, R. C. et al. Enhancer histone-QTLs are enriched on autoimmune risk haplotypes and influence gene expression within chromatin networks. Nat. Commun. 9, 2905 (2018).
Senju, S. et al. Allele-specific expression of the cytoplasmic exon of HLA-DQB1 gene. Immunogenetics 36, 319–325 (1992).
Baecher-Allan, C., Wolf, E. & Hafler, D. A. MHC class II expression identifies functionally distinct human regulatory T cells. J. Immunol. 176, 4622–4631 (2006).
Reinherz, E. L. et al. Ia determinants on human T-cell subsets defined by monoclonal antibody. Activation stimuli required for expression. J. Exp. Med. 150, 1472–1482 (1979).
Engleman, E. G., Benike, C. J. & Charron, D. J. Ia antigen on peripheral blood mononuclear leukocytes in man. II. Functional studies of HLA-DR-positive T cells activated in mixed lymphocyte reactions. J. Exp. Med. 152, 114s–126s (1980).
Jia, X. et al. Imputing amino acid polymorphisms in human leukocyte antigens. PLoS ONE 8, e64683 (2013).
GAP Registry (The Feinstein Institute for Medical Research, accessed 27 February 2019); https://www.feinsteininstitute.org/robert-s-boas-center-for-genomics-and-human-genetics/gap-registry/
Liao, Y., Smyth, G. K. & Shi, W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res. 41, e108 (2013).
Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).
R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2013).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284–287 (2012).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
1000 Genomes Project Consortium, et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Delaneau, O., Marchini, J. & Zagury, J.-F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2011).
Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Castel, S. E., Levy-Moonshine, A., Mohammadi, P., Banks, E. & Lappalainen, T. Tools and best practices for data processing in allelic expression analysis. Genome Biol. 16, 195 (2015).
Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, 1–48 (2015).
Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).
Robinson, J. et al. The IPD and IMGT/HLA database: allele variant databases. Nucleic Acids Res. 43, D423–D431 (2015).
Dilthey, A., Cox, C., Iqbal, Z., Nelson, M. R. & McVean, G. Improved genome inference in the MHC using a population reference graph. Nat. Genet. 47, 682–688 (2015).
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv https://doi.org/10.1101/201178 (2017).
Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017).
Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 109, 21.29.1–21.29.9 (2015).
Schumann, K. et al. Generation of knock-in primary human T cells using Cas9 ribonucleoproteins. Proc. Natl Acad. Sci. USA 112, 10437–10442 (2015).
Richardson, C. D., Ray, G. J., DeWitt, M. A., Curie, G. L. & Corn, J. E. Enhancing homology-directed genome editing by catalytically active and inactive CRISPR–Cas9 using asymmetric donor DNA. Nat. Biotechnol. 34, 339–344 (2016).
Slowikowski, K., Hu, X. & Raychaudhuri, S. SNPsea: an algorithm to identify cell types, tissues and pathways affected by risk loci. Bioinformatics 30, 2496–2497 (2014).
Phipson, B. & Smyth, G. K. Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn. Stat. Appl. Genet. Mol. Biol. https://doi.org/10.2202/1544-6115.1585 (2010).
We are indebted to G. Klein RN for her outstanding management of the Genotype and Phenotype (GaP) registry at the Feinstein Institute, to the Raychaudhuri laboratory members for critical discussions and feedback and to H. Long and P. Cejas for support on primary T cell ATAC-seq experiments. This work was supported by the National Institutes of Health (grant nos. U19AI111224, U01GM092691, U01HG009379 and R01AR063759 to S.R., NHGRI T32 HG002295 to T.A.), the Swiss National Science Foundation (Early Postdoc Mobility Fellowship to M.G.-A.), the Broad Institute through the SPARC mechanism (S.R.), the Estonian Research Council (PUT1660 to T.E.) and the European Union Horizon 2020 (grant no. MP1GI18418R to T.E.). Whole-genome sequencing (WGS) for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute (NHLBI). WGS for ‘NHLBI TOPMed: Multi-Ethnic Study of Atherosclerosis (MESA)’ (accession no. phs001416.v1.p1) was performed at the Broad Institute of MIT and Harvard (grant no. 3U54HG003067-13S1). Centralized read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (grant no. 3R01HL-117626-02S1; contract no. HHSN268201800002I). Phenotype harmonization, data management, sample identity quality control and general study coordination were provided by the TOPMed Data Coordinating Center (grant no. 3R01HL-120393-02S1; contract no. HHSN268201800001I). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. MESA and the MESA SHARe project are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support for MESA is provided by contract nos. HHSN268201500003I, N01-HC-95159, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-001079 and UL1-TR-001420. The provision of genotyping data was supported in part by the National Center for Advancing Translational Sciences, CTSI grant no. UL1TR001881, and the National Institute of Diabetes and Digestive and Kidney Disease Diabetes Research Center (DRC) grant no. DK063491 to the Southern California Diabetes Endocrinology Research Center. The full authorship list for the NHLBI TOPMed consortium can be found in https://www.nhlbiwgs.org/topmed-banner-authorship.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
For two individuals, we performed full time-course replicates (from the same CD4+ memory T cell isolation batch, but independent stimulation experiment and RNA-seq library preparation). From the dynamic ASE events called significant in replicate A at 5% FDR (as explained in main text and Methods), we asked how do the P-values and betas look in replicate B. Left plots show distribution of P-values in replicate B, middle plots show correlation of betas for time, right plots show correlation of betas for time squared. a, Individual TB03072560. b, Individual TB03073798.
Examples of a dynamic ASE event significant in individual TB03072560 (a) and TB03073798 (b). Shown are allelic counts for heterozygous SNP (left) and reference fraction over time (right) for replicate A (top panels) and replicate B (bottom panels).
Here we wanted to assess whether dynamic ASE replicates well in different heterozygous individuals for the same SNP. First, from the 561 dynASE events at 5% FDR we took the top 356 unique SNPs (ensuring one heterozygous individual per SNP), and then asked how do the P-values look in other heterozygous individuals for those 356 SNPs. a, Qqplot depicting the observed P-values in the other heterozygous individuals (y-axis), compared to the expected uniform distribution of P-values (x-axis). b, Next, within all 561 significant events at 5% FDR, we evaluated the correlation of betas for time (left) and time squared (right) for all pairwise combinations of heterozygous individuals for the same SNP, i.e. het1 and het2 in x and y axis labels.
a–c, Shown are gene expression levels across 24 individuals (left), and allele counts (SNP and individual indicated) and reference fraction (P-value and FDR for dynASE indicated) for heterozygous SNPs in corresponding gene.
Extended Data Fig. 5 Scheme depicting HLA allelic expression quantification with HLA-personalized genome.
In order to quantify robustly allele-specific expression in the highly polymorphic HLA genes, we first create an HLA-personalized genome per individual. We do this by inserting into the reference genome the cDNA sequences of each HLA allele as separate sequences (12 in total given that we sequenced or typed 6 HLA genes), and masking the exonic sequences corresponding to those cDNAs in chromosome 6 of the reference genome. Next, we map the RNA-seq reads to this HLA-personalized genome, we remove PCR duplicates and we count the number of uniquely mapped reads to each HLA cDNA allele.
Allelic fraction over time for the 3 HLA class II genes (a) and 3 HLA class I genes (b), for the two pilot individuals with full time course replicates. Replicate A in black, replicate B in blue.
PCA performed for 48 HLA-DQB1 allelic expression profiles of 24 individuals (log2(FPKM+1) values over time. Allelic profiles are colored by 4-digit classical HLA-DQB1 allele (a), and by the k-means cluster to which they belong (b). Average allelic expression was computed for samples with replicates. Twelve hour time point was removed because of high number of missing values. These plots depict how 4-digit alleles group near each other (a), and how PCA also captures the three distinct cis regulatory programs (Fluctuating, Constant-Low and Late-Spike) (b).
a, r2 between Late-Spike haplotype dosage and SNPs within 1Mb of HLA-DQB1 in Estonian cohort. Orange vertical lines indicate location of HLA-DQB1. Dots that are colored pink are intragenic SNPs in HLA-DQB1, HLA-DRB1, and HLA-DQA1. Right plot is zoomed in on HLA-DQB1 region to show top SNPs (reference genome hg19). b, HLA-DQB1 gene expression levels (log2(FPKM+1)) at 72 hours after stimulation for individuals separated by their rs71542466 genotype. c, Same as in (a) but in European MESA cohort (reference genome GRCh38). d, r2 comparison between Estonian and European MESA cohort, for all SNPs in the region (left) or the subset of SNPs in the regions that do not overlap HLA-DQB1, HLA-DRB1 or HLA-DQA1 start-end genomic coordinates (right). The 6 intergenic SNPs with top r2 in Estonians are highlighted, with 3 of them having top r2 in the European MESA cohort too. Identity line marked. These results show that our top candidate SNP rs71542466 (and the other candidate SNPs) tracks well with the Late-Spike haplotype in both the Estonian and the MESA cohort of individuals of European ancestry recruited in the United States.
Extended Data Fig. 9 Genomic location of nearest gRNAs to tested causal SNPs and representative flow cytometry plot of CRISPR-Cas9 edited HH cells.
a, Location of SNPs (red) is shown in reference to the nearest exon (blue) both upstream and downstream of HLA-DQB1. The nearest gRNA sequences used for targeting the regions are highlighted with their corresponding colors (rs71542466 - dark green, rs71542467 - light purple, rs71542468 - purple, rs72844401 - beige/orange, rs4279477 - blue, rs28451423 - light green). Alignments were plotted using SnapGene(v3.2.1). b, Representative staining of HLA-DQ on CRISPR-Cas9 modified HH cells. Cells were modified with proximal gRNA as shown in (a) and labelled accordingly. Cells stained 7-10 days after modification with HLA-DQ antibodies as a bulk population.
Extended Data Fig. 10 Sanger sequencing alignment of HH reference and base-edited clones reveal seamless editing.
Genomic DNA from expanded clones was sequenced and aligned to the reference (hg38) and visualized using SnapGene(v3.2.1). Red colored nucleotide indicates the location of the rs71542466 SNP in the reference. Highlighted red nucleotides indicate mismatches from the reference and yellow colored nucleotides indicate unresolved/heterozygous sequences.
Supplementary Figs. 1–19, Note and unprocessed EMSAs from Supplementary Fig. 15
Supplementary Table 1. Dynamic allele-specific expression for SNPs genome wide with FDR < 0.05. Supplementary Table 2. Reported eQTLs for HLA-DQB1 and LD with the Late-Spike regulatory SNP. Supplementary Table 3. Primers, probes and oligonucleotide sequences.
About this article
Cite this article
Gutierrez-Arcelus, M., Baglaenko, Y., Arora, J. et al. Allele-specific expression changes dynamically during T cell activation in HLA and other autoimmune loci. Nat Genet 52, 247–253 (2020). https://doi.org/10.1038/s41588-020-0579-4
Human Molecular Genetics (2020)
Cell Communication-mediated Nonself-Recognition and -Intolerance in Representative Species of the Animal Kingdom
Journal of Molecular Evolution (2020)