Explaining the genetics of many diseases is challenging because most associations localize to incompletely characterized regulatory regions. Using new computational methods, we show that transcription factors (TFs) occupy multiple loci associated with individual complex genetic disorders. Application to 213 phenotypes and 1,544 TF binding datasets identified 2,264 relationships between hundreds of TFs and 94 phenotypes, including androgen receptor in prostate cancer and GATA3 in breast cancer. Strikingly, nearly half of systemic lupus erythematosus risk loci are occupied by the Epstein–Barr virus EBNA2 protein and many coclustering human TFs, showing gene–environment interaction. Similar EBNA2-anchored associations exist in multiple sclerosis, rheumatoid arthritis, inflammatory bowel disease, type 1 diabetes, juvenile idiopathic arthritis and celiac disease. Instances of allele-dependent DNA binding and downstream effects on gene expression at plausibly causal variants support genetic mechanisms dependent on EBNA2. Our results nominate mechanisms that operate across risk loci within disease phenotypes, suggesting new models for disease origins.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Prevalence of chronic comorbidities in people with multiple sclerosis: descriptive study based on administrative data in Tuscany (Central Italy)
Neurological Sciences Open Access 18 August 2022
Genome-wide association study identifies Sjögren’s risk loci with functional implications in immune and glandular cells
Nature Communications Open Access 27 July 2022
An Artificial Intelligence-guided signature reveals the shared host immune response in MIS-C and Kawasaki disease
Nature Communications Open Access 16 May 2022
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Fujinami, R. S., von Herrath, M. G., Christen, U. & Whitton, J. L. Molecular mimicry, bystander activation, or viral persistence: infections and autoimmune disease. Clin. Microbiol. Rev. 19, 80–94 (2006).
Ercolini, A. M. & Miller, S. D. The role of infections in autoimmune disease. Clin. Exp. Immunol. 155, 1–15 (2009).
Sener, A. G. & Afsar, I. Infection and autoimmune disease. Rheumatol. Int. 32, 3331–3338 (2012).
James, J. A. et al. An increased prevalence of Epstein-Barr virus infection in young patients suggests a possible etiology for systemic lupus erythematosus. J. Clin. Invest. 100, 3019–3026 (1997).
Hanlon, P., Avenell, A., Aucott, L. & Vickers, M. A. Systematic review and meta-analysis of the sero-epidemiological association between Epstein-Barr virus and systemic lupus erythematosus. Arthritis Res. Ther. 16, R3 (2014).
McClain, M. T. et al. Early events in lupus humoral autoimmunity suggest initiation through molecular mimicry. Nat. Med. 11, 85–89 (2005).
Harley, J. B. & James, J. A. Epstein-Barr virus infection induces lupus autoimmunity. Bull. NYU Hosp. Jt. Dis. 64, 45–50 (2006).
Ascherio, A. & Munger, K. L. EBV and autoimmunity. Curr. Top. Microbiol. Immunol. 390, 365–385 (2015).
Draborg, A. H., Duus, K. & Houen, G. Epstein-Barr virus in systemic autoimmune diseases. Clin. Dev. Immunol. 2013, 535738 (2013).
Vaughn, S. E., Kottyan, L. C., Munroe, M. E. & Harley, J. B. Genetic susceptibility to lupus: the biological basis of genetic risk found in B cell signaling pathways. J. Leukoc. Biol. 92, 577–591 (2012).
Alarcón-Riquelme, M. E. et al. Genome-wide association study in an Amerindian ancestry population reveals novel systemic lupus erythematosus risk loci and the role of European admixture. Arthritis Rheumatol. 68, 932–943 (2016).
Bentham, J. et al. Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus. Nat. Genet. 47, 1457–1464 (2015).
Sun, C. et al. High-density genotyping of immune-related loci identifies new SLE risk variants in individuals with Asian ancestry. Nat. Genet. 48, 323–330 (2016).
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009).
Fang, H., Knezevic, B., Burnham, K. L. & Knight, J. C. XGR software for enhanced interpretation of genomic summary data, illustrated by application to immunological traits. Genome Med. 8, 129 (2016).
Schweizer, M. T. & Yu, E. Y. Persistent androgen receptor addiction in castration-resistant prostate cancer. J. Hematol. Oncol. 8, 128 (2015).
Asch-Kendrick, R. & Cimino-Mathews, A. The role of GATA3 in breast carcinomas: a review. Hum. Pathol. 48, 37–47 (2016).
Almohmeed, Y. H., Avenell, A., Aucott, L. & Vickers, M. A. Systematic review and meta-analysis of the sero-epidemiological association between Epstein Barr virus and multiple sclerosis. PLoS One 8, e61110 (2013).
Pender, M. P. & Burrows, S. R. Epstein-Barr virus and multiple sclerosis: potential opportunities for immunotherapy. Clin. Transl. Immunology 3, e27 (2014).
Márquez, A. C. & Horwitz, M. S. The role of latently infected B cells in CNS autoimmunity. Front. Immunol. 6, 544 (2015).
Ricigliano, V. A. et al. EBNA2 binds to genomic intervals associated with multiple sclerosis and overlaps with vitamin D receptor occupancy. PLoS One 10, e0119605 (2015).
Hu, X. et al. Integrating autoimmune risk loci with gene-expression data identifies specific pathogenic immune cell subsets. Am. J. Hum. Genet. 89, 496–506 (2011).
Trynka, G. et al. Disentangling the effects of colocalizing genomic annotations to functionally prioritize non-coding variants within complex-trait loci. Am. J. Hum. Genet. 97, 139–152 (2015).
Zhou, H. et al. Epstein-Barr virus oncoprotein super-enhancers control B cell growth. Cell Host Microbe 17, 205–216 (2015).
Gewurz, B. E. et al. Canonical NF-κB activation is essential for Epstein-Barr virus latent membrane protein 1 TES2/CTAR2 gene regulation. J. Virol. 85, 6764–6773 (2011).
Ersing, I., Bernhardt, K. & Gewurz, B. E. NF-κB and IRF7 pathway activation by Epstein-Barr virus Latent Membrane Protein 1. Viruses 5, 1587–1606 (2013).
Price, A. M. et al. Analysis of Epstein-Barr virus-regulated host gene expression changes through primary B-cell outgrowth reveals delayed kinetics of latent membrane protein 1-mediated NF-κB activation. J. Virol. 86, 11096–11106 (2012).
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
Farh, K. K. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).
Zimber-Strobl, U. et al. Epstein-Barr virus nuclear antigen 2 exerts its transactivating function through interaction with recombination signal binding protein RBP-Jκ, the homologue of Drosophila Suppressor of Hairless. EMBO J. 13, 4973–4982 (1994).
Grossman, S. R., Johannsen, E., Tong, X., Yalamanchili, R. & Kieff, E. The Epstein-Barr virus nuclear antigen 2 transactivator is directed to response elements by the J kappa recombination signal binding protein. Proc. Natl. Acad. Sci. USA 91, 7568–7572 (1994).
Henkel, T., Ling, P. D., Hayward, S. D. & Peterson, M. G. Mediation of Epstein-Barr virus EBNA2 transactivation by recombination signal-binding protein J kappa. Science 265, 92–95 (1994).
Scala, G. et al. Epstein-Barr virus nuclear antigen 2 transactivates the long terminal repeat of human immunodeficiency virus type 1. J. Virol. 67, 2853–2861 (1993).
Wang, J. H. et al. Aiolos regulates B cell activation and maturation to effector state. Immunity 9, 543–553 (1998).
Lu, F. et al. EBNA2 drives formation of new chromosome binding sites and target genes for B-cell master regulatory transcription factors RBP-jκ and EBF1. PLoS Pathog. 12, e1005339 (2016).
Bailey, S. D., Virtanen, C., Haibe-Kains, B. & Lupien, M. ABC: a tool to identify SNVs causing allele-specific transcription factor binding from ChIP-Seq experiments. Bioinformatics 31, 3057–3059 (2015).
Buchkovich, M. L. et al. Removing reference mapping biases using limited or no genotype data identifies allelic differences in protein binding at disease-associated loci. BMC Med. Genomics 8, 43 (2015).
Kumasaka, N., Knights, A. J. & Gaffney, D. J. Fine-mapping cellular QTLs with RASQUAL and ATAC-seq. Nat. Genet. 48, 206–213 (2016).
Shi, W., Fornes, O., Mathelier, A. & Wasserman, W. W. Evaluating the impact of single nucleotide variants on transcription factor binding. Nucleic Acids Res. 44, 10106–10116 (2016).
Ma, B., Huang, J. & Liang, L. RTeQTL: real-time online engine for expression quantitative trait loci analyses. Database (Oxford) 2014, bau066 https://doi.org/10.1093/database/bau066 (2014).
Kryworuckho, M., Diaz-Mitoma, F. & Kumar, A. CD44 isoforms containing exons V6 and V7 are differentially expressed on mitogenically stimulated normal and Epstein-Barr virus-transformed human B cells. Immunology 86, 41–48 (1995).
Gonnella, R. et al. PKC theta and p38 MAPK activate the EBV lytic cycle through autophagy induction. Biochim. Biophys. Acta 1853, 1586–1595 (2015).
Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).
Griffith, M. et al. DGIdb: mining the druggable genome. Nat. Methods 10, 1209–1210 (2013).
Harter, M. R. et al. BS69/ZMYND11 C-terminal domains bind and inhibit EBNA2. PLoS Pathog. 12, e1005414 (2016).
Li, Y. et al. A genome-wide association study in Han Chinese identifies a susceptibility locus for primary Sjögren’s syndrome at 7q11.23. Nat. Genet. 45, 1361–1365 (2013).
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).
GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Mifsud, B. et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat. Genet. 47, 598–606 (2015).
Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384.e19 (2016).
Liang, L. et al. A cross-platform analysis of 14,177 expression quantitative trait loci derived from lymphoblastoid cell lines. Genome Res. 23, 716–726 (2013).
Stranger, B. E. et al. Population genomics of human gene expression.Nat. Genet. 39, 1217–1224 (2007).
Veyrieras, J. B. et al. High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet. 4, e1000214 (2008).
Pickrell, J. K. et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772 (2010).
Montgomery, S. B. et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010).
Mangravite, L. M. et al. A statin-dependent QTL for GATM expression is associated with statin-induced myopathy. Nature 502, 377–380 (2013).
Dimas, A. S. et al. Common regulatory variation impacts gene expression in a cell type-dependent manner. Science 325, 1246–1250 (2009).
Gaffney, D. J. et al. Dissecting the regulatory architecture of gene expression QTLs. Genome Biol. 13, R7 (2012).
Trynka, G. et al. Dense genotyping identifies and localizes multiple common and rare variant association signals in celiac disease. Nat. Genet. 43, 1193–1201 (2011).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Liu, T. et al. Cistrome: an integrative platform for transcriptional regulation studies. Genome Biol. 12, R83 (2011).
Portales-Casamar, E. et al. The PAZAR database of gene regulatory information coupled to the ORCA toolkit for the study of regulatory sequences. Nucleic Acids Res. 37, D54–D60 (2009).
Griffon, A. et al. Integrative analysis of public ChIP-seq experiments reveals a complex multi-cell regulatory landscape. Nucleic Acids Res. 43, e27 (2015).
Barrett, T. et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 41, D991–D995 (2013).
Weirauch, M. T. et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 (2014).
1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Smigielski, E. M., Sirotkin, K., Ward, M. & Sherry, S. T. dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Res. 28, 352–355 (2000).
Kottyan, L. C. et al. Genome-wide association analysis of eosinophilic esophagitis provides insight into the tissue specificity of this allergic disease. Nat. Genet. 46, 895–900 (2014).
Verma, S. S. et al. Imputation and quality control steps for combining multiple genome-wide datasets. Front. Genet. 5, 370 (2014).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).
Flicek, P. et al. Ensembl 2013. Nucleic Acids Res. 41, D48–D55 (2013).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).
Birkenbach, M., Josefsen, K., Yalamanchili, R., Lenoir, G. & Kieff, E. Epstein-Barr virus-induced genes: first lymphocyte-specific G protein-coupled peptide receptors. J. Virol. 67, 2209–2220 (1993).
Chen, C. C. et al. NF-κB-mediated transcriptional upregulation of TNFAIP2 by the Epstein-Barr virus oncoprotein, LMP1, promotes cell motility in nasopharyngeal carcinoma. Oncogene 33, 3648–3659 (2014).
Craig, F. E. et al. Gene expression profiling of Epstein-Barr virus-positive and -negative monomorphic B-cell posttransplant lymphoproliferative disorders. Diagn. Mol. Pathol. 16, 158–168 (2007).
Smith, N. et al. Induction of interferon-stimulated genes on the IL-4 response axis by Epstein-Barr virus infected human B cells; relevance to cellular transformation. PLoS One 8, e64868 (2013). 8.
Portis, T., Dyck, P. & Longnecker, R. Epstein-Barr virus (EBV) LMP2A induces alterations in gene transcription similar to those observed in Reed-Sternberg cells of Hodgkin lymphoma. Blood 102, 4166–4178 (2003).
Lee, I. S., Shin, Y. K., Chung, D. H. & Park, S. H. LMP1-induced downregulation of CD99 molecules in Hodgkin and Reed-Sternberg cells. Leuk. Lymphoma 42, 587–594 (2001).
We thank J. Lee, C. Schroeder, Y. Huang, X. Lu, Z. Patel, E. Zoller and The CCHMC DNA Sequencing and Genotyping Core for experimental support; C. Gunawan, K. Ernst and T. Hong for analytical support; B. Cobb for administrative support; R. Kopan, C. Karp, W. Miller, J. Whitsett, M. Fisher, A. Strauss, S. Hamlin, L. Muglia, H. Singh, J. Oksenberg, I. Chepelev, S. Waggoner, S. Thompson and H. Moncrieffe for constructive feedback and guidance; and Y. Yuan (University of Penn) and D. Thorley-Lawson (Tufts Institute) for generous donation of cell lines (Mutu and IB4, respectively). We also thank our colleagues who have made their data available to us, without which this project and its results would not have been possible. Funding sources: National Institutes of Health (NIH) R01 NS099068, NIH R21 HG008186, Lupus Research Alliance “Novel Approaches”, CCRF Endowed Scholar, CCHMC CpG Pilot study award and CCHMC Trustee Awards to M.T.W.; NIH R01 AI024717, NIH U01 HG008666, NIH U01 AI130830, NIH P30 AR070549, NIH R24 HL105333, NIH KL2 TR001426, NIH R01 AI031584, Kirkland Scholar Award and US Department of Veterans Affairs I01 BX001834 to J.B.H.; NIH R01 DK107502 to L.C.K; NIH DP2 GM119134 to A.B.
J.B.H., M.T.W. and L.C.K. have a submitted patent application relating to these findings. A.B. is a cofounder of Datirium, LLC.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Figures 1–11 and Supplementary Tables 1 and 2
List of all variants for each phenotype. Spreadsheet providing all genetic variants used in this study with associated information.
Sources of functional genomics datasets. Spreadsheet providing information and references for all functional genomics datasets used in this study.
Full RELI results. Spreadsheet providing all RELI results for (1) TF ChIP-seq datasets; (2) non-TF datasets (e.g., histone marks, DNase-seq); (3) Autoimmune ‘fine mapping’ variants; (4) Random ChIP-seq libraries (False Positive Rate estimation).
Locus plots of EBV+/– analysis for all seven EBNA2 disorders. Plots showing the full results of intersections for all TFs with available EBV+ and EBV– B cell ChIP-seq datasets.
Locus plots for additional phenotypes of interest. Full locus plot results for the diseases shown in Figure 1 and other phenotypes.
Full RELI results for EBNA2 cofactor analysis. Spreadsheet providing the RELI results and a summary table identifying potential EBNA2 cofactors occupying the seven EBNA2 disorder loci.
Additional information for allele-dependent EBNA2 autoimmune variants. Table providing additional information for the variants shown in Table 2.
Full MARIO allelic ChIP-seq analysis results. Spreadsheet providing information for all disease-associated genetic variants with allelic EBNA2 binding.
RNA-seq differential expression results. Spreadsheet providing the full results from the differential expression analysis between EBV+ and EBV– Ramos B cells.
Allelic RNA-seq results. Spreadsheet providing information for all genetic variants with allelic RNA-seq reads.
Full RELI cell type results broken down by data type and disease. Plots showing the significance of the intersection between the loci of each of the seven EBNA2 disorders and various markers of active regulatory regions across cell types (related to Fig. 4a,b).
Locus plots broken into EBV-infected B cell and T cell datasets for the seven EBNA2 disorders. Plots showing the presence and absence of ChIP-seq peaks in B and T cells at the loci of each of the EBNA2 disorders (related to Fig. 1)
Phenotypes examined in this study, with associated information. Spreadsheet providing information for all phenotypes examined in this study.
About this article
Cite this article
Harley, J.B., Chen, X., Pujato, M. et al. Transcription factors operate across disease loci, with EBNA2 implicated in autoimmunity. Nat Genet 50, 699–707 (2018). https://doi.org/10.1038/s41588-018-0102-3
Nature Reviews Microbiology (2022)
Scientific Reports (2022)
Nature Cardiovascular Research (2022)
Prevalence of chronic comorbidities in people with multiple sclerosis: descriptive study based on administrative data in Tuscany (Central Italy)
Neurological Sciences (2022)
Seminars in Immunopathology (2022)