Genome-wide association studies (GWAS) have identified thousands of noncoding loci that are associated with human diseases and complex traits, each of which could reveal insights into the mechanisms of disease1. Many of the underlying causal variants may affect enhancers2,3, but we lack accurate maps of enhancers and their target genes to interpret such variants. We recently developed the activity-by-contact (ABC) model to predict which enhancers regulate which genes and validated the model using CRISPR perturbations in several cell types4. Here we apply this ABC model to create enhancer–gene maps in 131 human cell types and tissues, and use these maps to interpret the functions of GWAS variants. Across 72 diseases and complex traits, ABC links 5,036 GWAS signals to 2,249 unique genes, including a class of 577 genes that appear to influence multiple phenotypes through variants in enhancers that act in different cell types. In inflammatory bowel disease (IBD), causal variants are enriched in predicted enhancers by more than 20-fold in particular cell types such as dendritic cells, and ABC achieves higher precision than other regulatory methods at connecting noncoding variants to target genes. These variant-to-function maps reveal an enhancer that contains an IBD risk variant and that regulates the expression of PPIF to alter the membrane potential of mitochondria in macrophages. Our study reveals principles of genome regulation, identifies genes that affect IBD and provides a resource and generalizable strategy to connect risk variants of common diseases to their molecular and cellular functions.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Alzheimer's Research & Therapy Open Access 15 September 2022
Journal of Animal Science and Biotechnology Open Access 04 July 2022
Building regulatory landscapes reveals that an enhancer can recruit cohesin to create contact domains, engage CTCF sites and activate distant genes
Nature Structural & Molecular Biology Open Access 16 June 2022
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Data for the immune cell line ATAC-seq and H3K27ac ChIP–seq analyses can be found in the NCBI GEO under accession number GSE155555. gRNA counts from CRISPRi screens can be found in Supplementary Tables 3, 14. UK Biobank fine-mapping data for 71 traits are available from https://www.finucanelab.org/data. ABC predictions in 131 biosamples can be found at https://www.engreitzlab.org/abc/.
The ABC model is available on GitHub (https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction). This is the codebase used to generate the ABC predictions for this manuscript, and can be used to run the ABC model on new biosamples. ABC-Max and paper-specific analyses can be found on GitHub (https://github.com/EngreitzLab/ABC-GWAS-Paper). This repository implements the ABC-Max pipeline and can be used to reproduce specific analyses in this study.
Claussnitzer, M. et al. A brief history of human disease genetics. Nature 577, 179–189 (2020).
Farh, K. K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Fulco, C. P. et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
Westra, H.-J. & Franke, L. From genome to function by studying eQTLs. Biochim. Biophys. Acta 1842, 1896–1902 (2014).
Gasperini, M., Tome, J. M. & Shendure, J. Towards a comprehensive catalogue of validated and target-linked human enhancers. Nat. Rev. Genet. 21, 292–310 (2020).
van Arensbergen, J., van Steensel, B. & Bussemaker, H. J. In search of the determinants of enhancer–promoter interaction specificity. Trends Cell Biol. 24, 695–702 (2014).
The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Huang, H. et al. Fine-mapping inflammatory bowel disease loci to single-variant resolution. Nature 547, 173–178 (2017).
Maller, J. B. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012).
Ulirsch, J. C. et al. Interrogation of human hematopoiesis at single-cell and single-variant resolution. Nat. Genet. 51, 683–693 (2019).
Fulco, C. P. et al. Systematic mapping of functional enhancer–promoter connections with CRISPR interference. Science 354, 769–773 (2016).
Rescigno, M. & Di Sabatino, A. Dendritic cells in intestinal homeostasis and disease. J. Clin. Invest. 119, 2441–2450 (2009).
Graham, D. B. & Xavier, R. J. Pathway paradigms revealed from the genetics of inflammatory bowel disease. Nature 578, 527–539 (2020).
Mountjoy, E. et al. Open Targets Genetics: an open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Preprint at https://doi.org/10.1101/2020.09.16.299271 (2020).
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Stacey, D. et al. ProGeM: a framework for the prioritization of candidate causal genes at molecular quantitative trait loci. Nucleic Acids Res. 47, e3 (2019).
Chun, S. et al. Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat. Genet. 49, 600–605 (2017).
Carvalho-Silva, D. et al. Open Targets Platform: new developments and updates two years on. Nucleic Acids Res. 47, D1056–D1065 (2019).
Barbeira, A. N. et al. Integrating predicted transcriptome from multiple tissues improves association detection. PLoS Genet. 15, e1007889 (2019).
Hauberg, M. E. et al. Large-scale identification of common trait and disease variants affecting gene expression. Am. J. Hum. Genet. 100, 885–894 (2017).
Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 (2016).
Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015).
Cao, Q. et al. Reconstruction of enhancer–target networks in 935 samples of human primary cells, tissues and cell lines. Nat. Genet. 49, 1428–1436 (2017).
Liu, Y., Sarkar, A., Kheradpour, P., Ernst, J. & Kellis, M. Evidence of reduced recombination rate in human regulatory domains. Genome Biol. 18, 193 (2017).
Granja, J. M. et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat. Biotechnol. 37, 1458–1465 (2019).
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Sheffield, N. C. et al. Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions. Genome Res. 23, 777–788 (2013).
Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).
Gao, T. & Qian, J. EnhancerAtlas 2.0: an updated resource with enhancer annotation in 586 tissue/cell types across nine species. Nucleic Acids Res. 48, D58–D64 (2020).
Whalen, S., Truty, R. M. & Pollard, K. S. Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat. Genet. 48, 488–496 (2016).
GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Engreitz, J. M. et al. Local regulation of gene expression by lncRNA promoters, transcription and splicing. Nature 539, 452–455 (2016).
Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019).
Franke, A. et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nat. Genet. 42, 1118–1125 (2010).
Jostins, L. et al. Host–microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–124 (2012).
Linares, P. M. & Gisbert, J. P. Role of growth factors in the development of lymphangiogenesis driven by inflammatory bowel disease: a review. Inflamm. Bowel Dis. 17, 1814–1821 (2011).
Wang, X. & Goldstein, D. B. Enhancer domains predict gene pathogenicity and inform gene discovery in complex disease. Am. J. Hum. Genet. 106, 215–233 (2020).
Imielinski, M. et al. Common variants at five new loci associated with early-onset inflammatory bowel disease. Nat. Genet. 41, 1335–1340 (2009).
Elrod, J. W. & Molkentin, J. D. Physiologic functions of cyclophilin D and the mitochondrial permeability transition pore. Circ. J. 77, 1111–1122 (2013).
Ip, W. K. E., Hoshi, N., Shouval, D. S., Snapper, S. & Medzhitov, R. Anti-inflammatory effect of IL-10 mediated by metabolic reprogramming of macrophages. Science 356, 513–519 (2017).
Bick, A. G. et al. Inherited causes of clonal haematopoiesis in 97,691 whole genomes. Nature 586, 763–768 (2020).
Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol .Biol. 109, 21.29.1–21.29.9 (2015).
Zhu, J. et al. Genome-wide chromatin state transitions associated with developmental and environmental cues. Cell 152, 642–654 (2013).
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Amemiya, H. M., Kundaje, A. & Boyle, A. P. The ENCODE blacklist: identification of problematic regions of the genome. Sci. Rep. 9, 9354 (2019).
Vierstra, J. et al. Global reference mapping of human transcription factor footprints. Nature 583, 729–736 (2020).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).
de Lange, K. M. et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat. Genet. 49, 256–261 (2017).
Loh, P. R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).
Zhou, W. et al. Efficiently controlling for case–control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Benner, C. et al. Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am. J. Hum. Genet. 101, 539–551 (2017).
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. B 82, 1273–1300 (2020).
McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
Fujita, P. A. et al. The UCSC genome browser database: update 2011. Nucleic Acids Res. 39, D876–D882 (2011).
Carrillo-de-Santa-Pau, E. et al. Automatic identification of informative regions with epigenomic changes associated to hematopoiesis. Nucleic Acids Res. 45, 9244–9259 (2017).
Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429 (2016).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Chen, L. et al. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell 167, 1398–1414 (2016).
Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
Kerimov, N. et al. eQTL Catalogue: a compendium of uniformly processed human gene expression and splicing QTLs. Preprint at https://doi.org/10.1101/2020.01.29.924266 (2021).
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLOS Comput. Biol. 11, e1004219 (2015).
Kerpedjiev, P. et al. HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol. 19, 125 (2018).
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Novakovic, B. et al. β-Glucan reverses the epigenetic state of LPS-induced immunological tolerance. Cell 167, 1354–1368 (2016).
Donnard, E. et al. Comparative analysis of immune cells reveals a conserved regulatory lexicon. Cell Syst. 6, 381–394 (2018).
This work was supported by the Broad Institute (E.S.L.); an NIH Pathway to Independence Award (K99HG009917 and R00HG009917 to J.M.E.); an NHGRI Genomic Innovator Award (R35HG011324 to J.M.E.); the Harvard Society of Fellows (J.M.E.); Gordon and Betty Moore and the BASE Research Initiative at the Lucile Packard Children’s Hospital at Stanford University (J.M.E.); NHGRI P50HG006193 (N.H.); NIDDK P30DK043351 (R.J.X.); NIH U01CA200059 (H.P.); NHGRI U01HG009379, NIMH R01MH101244 and NIMH R37MH107649 (A.K.P.); NIDDK K01DK114379 (H.H.); the Zhengxu and Ying He Foundation, the Stanley Center for Psychiatric Research and NIAID K22AI153648 (J.P.R.); NHGRI U24HG009446 (A.K.); an NSF Graduate Research Fellowship (DGE-1656518 to B.R.D.); and a Siebel Scholarship (F.L.). We thank L. Schweitzer, M. Gentili, M. Biton, C. Smillie, A. Regev, M. Kanai, D. Graham, N. Shoresh, S. Gazal, B. Cleary, R. Cui, P. Rogers, V. Subramanian, G. Schnitzler, R. Gupta, M. Claussnitzer, N. Sinnott-Armstrong, T. Majarian, A. Manning and members of the Lander lab, Hacohen lab and Variant-to-Function Initiative for discussions or technical assistance. This research has been conducted using the UK Biobank Resource.
J.M.E., C.P.F. and E.S.L. are inventors on a patent application on CRISPR methods filed by the Broad Institute related to this work (16/337,846). Until recently, E.S.L. served on the Board of Directors for Codiak BioSciences and Neon Therapeutics; served on the Scientific Advisory Board of F-Prime Capital Partners and Third Rock Ventures; was affiliated with several non-profit organizations including serving on the Board of Directors of the Innocence Project, Count Me In and Biden Cancer Initiative, and the Board of Trustees for the Parker Institute for Cancer Immunotherapy; and served on various federal advisory committees. E.S.L. is currently on leave from MIT and Harvard. C.P.F. is now an employee of Bristol Myers Squibb. T.A.P. is now an employee of Boston Consulting Group. R.J.X. is a cofounder of Jnana Therapeutics and Celsius Therapeutics. M.J.D. is a founder of Maze Therapeutics. N.H. holds equity in BioNTech and consults for Related Therapeutics. All other authors declare no competing interests.
Peer review information Nature thanks Annique Claringbould, Judith Zaugg and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
a, Overview of approach. b, ABC predictions connect two IBD GWAS signals to IL10. Signal tracks show DNase-seq or ATAC-seq (based on availability of data). Red arrows represent ABC predictions connecting variants to IL10. Dashed line shows the TSS. Grey bars highlight fine-mapped variants that overlap with ABC enhancers in at least one cell type. Credible set 1 contains two variants, both of which overlap with enhancers predicted to regulate IL10 in various cell types. Credible set 2 contains four variants, one of which overlaps with an enhancer predicted to regulate IL10 in monocytes stimulated with LPS.
a, Cumulative fraction of the number of ABC enhancers within each biosample (median = 17,605). b, Cumulative fraction of the number of enhancer–gene connections within each biosample (median = 48,441). c, Cumulative fractions of the number of enhancers predicted to regulate each gene across all biosamples (black line; median = 2, mean = 2.8) and the mean number of enhancers predicted to regulate each gene within each biosample (red line; median = 2.8). d, Cumulative fractions of the number of genes regulated by each ABC enhancer across all genes and all biosamples (black line; median = 1, mean = 2.7) and the mean number of genes regulated by each ABC enhancer within each biosample (red line; median = 2.7). e, Cumulative fractions of the genomic distances between the enhancer and the gene for each predicted enhancer–gene connection across all genes and all biosamples (black line; median = 62,929 bp) and the median genomic distance between each enhancer–gene connection within each biosample (red line; median = 62,782 bp). f, Number of ABC enhancers predicted in 131 biosamples stratified by whether the epigenomic data for the biosample is derived from one or multiple donors. We do not observe significant differences between these distributions (two-sided Wilcoxon rank-sum test, P = 0.10). Box plot displays median, 25th and 75th percentiles. g, Summary of ABC predictions in K562. Plot includes 122,410 non-promoter Dnase hypersensitive sites (DHS elements) in K562. Each element is classified as an ‘ABC enhancer’ if the element is predicted to regulate at least one gene, or ‘other accessible region’ otherwise. The x axis represents the distance from the element to the closest TSS of an expressed gene. The y axis represents the percentile bin of the activity of the element (in terms of DHS and H3K27ac signals) among these 122,410 elements. The colouring of the heat map represents the fraction of elements in the corresponding distance and activity bins that are ABC enhancers.
a, Distinctness of predictions across biosamples. Biosample versus biosample (131 × 131) heat map. The colour of the (i,j) pixel in the heat map represents the fraction of enhancer–gene connections (‘E-G connections’—which are defined to be an element–gene pair for which the ABC score is greater than 0.015) in biosample i that have a corresponding overlapping prediction in biosample j. Two connections are considered overlapping if the predicted genes are the same and the enhancer elements overlap. Rows and columns are ordered by hierarchical clustering. A median of 19% (median of row medians) of enhancer–gene connections are shared across distinct biosamples. b, Distribution of shared connections by relatedness of samples. Distribution of the fraction of shared connections in a stratified by the relatedness of the samples. Each pair of biosamples is classified as: ‘same cell line’, which indicates the same cell line under different perturbation conditions or from different compendia; ‘same primary tissue type’, which indicates the same tissue type from different compendia; ‘same lineage’, which indicates samples from the same lineage classification as in a; ‘other’ refers to all other pairs of samples. c, Quantitative reproducibility of ABC predictions. ABC scores computed using independent biological replicates of epigenomic data (ATAC-seq and H3K27ac ChIP–seq) from the BJAB cell line. Each data point is an element–gene pair. d, Fraction of shared enhancer–gene connections between replicates increases as ABC score cut-off increases. x axis, cut-off on the ABC score; y axis, for a given cut-off of the ABC score, the fraction of element–gene pairs with an ABC score greater than the cut-off in sample 1 that have an ABC score > 0.015 in sample 2. Each biosample is classified as: ‘multiple donors’, which indicates that the epigenetic data for this biosample is derived from different donors, or ‘single donor’, which indicates that the epigenetic data for this biosample is derived from the same donor or cell line. For ‘single donor’ biosamples, replicates represent independent epigenomic experiments from the same donor or cell line; for ‘multiple donor’ biosamples, replicates represent epigenomic experiments from different donors. Separate curves are computed for each biosample and then the average across biosamples is plotted. e, Fraction of shared enhancer–gene connections increases as reproducibility of underlying epigenetic data increases. Each data point represents a biosample. x axis, geometric mean of correlation of ATAC-seq (or DNase-seq) and H3K27ac ChIP–seq signal in candidate regions computed using replicate epigenetic experiments. y axis, fraction of enhancer–gene connections with ABC score > 0.015 in replicate 1 that also have ABC score > 0.015 in replicate 2. Colours as in d.
Extended Data Fig. 4 ABC performs well at identifying regulatory enhancer–gene connections in CRISPR datasets.
a, Comparison of enhancer–gene predictors to experimental CRISPR data in K562 cells. Each of these predictors makes K562-specific predictions. Curves represent continuous predictors. Dots represent binary predictors as follows: E, each gene is predicted to be regulated only by the element closest to its TSS; G, each element is predicted to regulate only the nearest (to TSS) expressed gene; T, TargetFinder method31; L, elements and genes at opposite ends of HiCCUPS loops derived from Hi-C data are predicted as a connection67; D, an element–gene pair is a predicted positive if and only if the element and the gene are contained within the same contact domain67. The red dot on ABC score curve: precision and recall achieved using a threshold on the ABC score of 0.015. Dashed black line, rate of experimental positives. b, Comparison of ABC predictions using a binary distance threshold to experimental CRISPR data in K562 cells. ‘Activity (<X kb)’ represents a model in which the score for an element–gene pair is the activity of the element (in terms of DHS and H3K27ac signals) multiplied by a binary indicator (1 if the distance is <X kb, and 0 otherwise). The ABC model using quantitative Hi-C outperforms the models based on binary thresholds indicating that Hi-C data are a critical component of the ABC model. c, Comparison of ABC and other enhancer–gene predictors in full CRISPR dataset. Comparison of enhancer–gene predictors to experimental CRISPR data in K562, GM12878, NCCIT, BJAB (with or without stimulation), Jurkat (with or without stimulation), THP1 (with or without stimulation) cells and primary hepatocytes. For ABC, we used the predictions in the cell type corresponding to the CRISPR experiments. Because ABC is the only method that makes predictions in all of these cell types, we used this plot to compare ABC to other methods that make predictions without cell-type information. We consider each enhancer–gene pair predicted by these methods to be a prediction in all cell types. d, Comparison of ABC and Ernst-Roadmap predictions25. Comparison of enhancer–gene predictors to experimental CRISPR data in K562, GM12878 and unstimulated Jurkat, BJAB and THP1 cells. The red line represents a comparison of ABC scores computed using epigenetic data from the same cell type as the CRISPR experiment was performed. To compare Roadmap predictions to CRISPR data, we made cell-type substitutions because the Roadmap predictions did not include BJAB, Jurkat and THP1 cells: for BJAB CRISPR data we compared to predictions in the Roadmap B cell sample (E032); for THP1 data we used the Roadmap monocyte sample (E124); and for Jurkat data we used the Roadmap T cell sample (E034). To directly compare the performance of ABC and Ernst-Roadmap methods in matched cell types, we also calculated ABC performance using the same cell type substitutions (green line)—for example, CRISPR data in BJAB cells were compared to ABC scores computed using epigenetic data from the Roadmap B cell sample (E032). e, Comparison of ABC to PC-Hi-C. Comparison of enhancer–gene predictors to experimental CRISPR data in K562 and unstimulated BJAB, THP1 and Jurkat cells. The red line represents a comparison of ABC scores computed using epigenetic data from the same cell type as the CRISPR experiment was performed. To compare PC-Hi-C CHiCAGO predictions (purple line) to CRISPR data, we made cell-type substitutions because PC-HiC data are not available for K562, BJAB, Jurkat and THP1 cells: for K562 CRISPR data we compared to CHiCAGO scores in erythroblasts; for BJAB CRISPR data we compared to total B cells; for THP1 data we compared to monocytes; and for Jurkat data we compared to activated CD4+ T cells. To directly compare the performance of ABC and PC-HiC methods in matched cell types, we also calculated ABC performance using the same cell-type substitutions (green lines). The solid green line represents ABC scores for which the contact component is derived from the average Hi-C dataset used throughout this study. The dashed green line represents ABC scores for which the contact component is derived from the raw counts in PC-HiC experiments (see Methods). f–h, Comparison of ABC to PC-Hi-C stratified by distance. These panels represent the comparison of the same predictors as in e while stratifying the experimental dataset in e based on the distance between the tested element and gene TSS. Of the 4,078 element–gene connections in the experimental dataset, 398 are at a distance of <50 kb (of which 94 are experimental positives, 24% positive rate), 1,102 are between 50 kb and 200 kb (20 positives, 2% positive rate) and 2,578 are at a distance of >200 kb (10 positives, 0.4% positive rate). Given the differences in positive rates between the stratifications (indicated by dashed black lines), it is appropriate to compare precision–recall curves within each stratification, but it is not appropriate to compare the precision–recall curve of the same predictor across stratifications.
a, Number of credible sets analysed for 72 diseases and complex traits. Light grey shows the total number of fine-mapped credible sets. Dark grey shows the number of such credible sets with no coding or splice site variants, and at least one variant with PIP ≥ 10%. Red shows the number of credible sets for which ABC-Max makes a prediction (that is, a variant with PIP ≥ 10% overlaps an ABC enhancer in a biosample that shows global enrichment for that trait). See Supplementary Table 7 for trait descriptions and additional statistics. b, Enrichment of fine-mapped variants (PIP ≥ 10%) associated with four blood cell traits in ABC enhancers in the corresponding blood cell types or progenitors. Enrichment = (fraction of fine-mapped variants/fraction of all common variants) overlapping regions in each cell type. Numbers of biosamples in each category are shown in parentheses. c, Enrichment of fine-mapped IBD variants (PIP ≥ 10%) in ABC enhancers and other sets of previously defined enhancers. Cumulative density function shows distribution across cell types. d, Enrichment of fine-mapped variants (PIP ≥ 10%) in ABC enhancers resized in different ways. Regions of at least 500-bp (blue line) are used to count reads, as defined previously. Regions were then shrunk by 150-bp on each side (minimum size of element = 200 bp) for overlapping with variants. Grey lines show alternative sizes, which do not appear to notably affect enrichments of fine-mapped variants. e, Percentage of noncoding variants across all traits that overlap an ABC enhancer in an enriched biosample, as a function of the number of cell types analysed. Biosamples (131) were grouped into 74 cell types or tissues; and analysed in random order. Black line, mean across 20 random orderings. Dashed grey lines, 95% confidence intervals. f, Fraction of variants or heritability for all 72 traits contained in different categories of genomic regions: coding sequences (CDS), untranslated regions (UTR), splice sites (within 10 bp of an intron–exon junction of a protein-coding gene), promoters (±250 bp from the gene TSS), ABC enhancers in 131 biosamples, other accessible regions not called as ABC enhancers, and other intronic or intergenic regions. In cases in which a variant overlaps more than one category, the variant was assigned to the first category that it overlapped (that is, variants in coding sequences were not also counted in the ABC category; Methods). Left, all common variants or heritability (h2, as estimated by S-LDSC in inverse-variance-weighted meta-analysis across 72 traits). Right, fraction of variants above a threshold on the fine-mapping PIP.
a, ABC predictions for IBD credible sets linked to IL10. Heat map shows ABC scores for each gene within 1 Mb in selected primary immune cell types. Credible set 1 is linked by ABC to multiple genes, but IL10 (red) has the strongest ABC score in any cell type. b, Cumulative density plot showing enrichment for gene sets in MSigDB among the genes prioritized by each method63. Methods are coloured and categories as in Fig. 1c. For each method, we first identified the top 5 most enriched significant gene sets in the predictions of that method (82 gene sets total). Then, we calculated the levels of enrichment of all 82 gene sets in the predictions of each method. c, Comparison of predictions for the 37 IBD credible sets near known genes. Fraction predictions shared = (credible sets for which both methods predict the same gene)/(credible sets for which both methods make a prediction). For example, 16 credible sets have predictions from both ABC-Max and ChromHMM-RNA correlation, and the two methods predict the same gene in 14 out of 16 credible sets. d, Enrichment of likely causal genes for 10 blood traits (defined by common coding variants) for various prediction methods. Enrichment reflects the number of correctly predicted genes identified divided by the baseline of choosing random genes in each of the loci with a prediction. e, Precision–recall plot for identifying known IBD-associated genes, comparing additional variations on the prediction methods (related to Fig. 1c). For ABC, we compared ABC-Max (assigning each credible set to the gene with the maximum ABC score, red circle), ABC-Max excluding all immune and gut tissue biosamples (orange circle) and ABC-All (assigning each credible set to all genes linked to enhancers, red triangle). For other methods that provided quantitative scores, we similarly compared choosing the gene with the best score per locus (circles) with choosing all genes above the global thresholds previously reported in each study (triangles). In most cases, the best gene per locus outperformed using a global threshold.
a, b, ABC-Max predictions and chromatin state in primary immune cells and fetal colon tissue at two IBD loci: LRRC32 (a) and RASL11A (b). Red marks variants, enhancer–gene connections and target genes identified by ABC-Max. Grey bars highlight the variants overlapping ABC enhancers. Vertical dotted lines represent TSSs. ‘DCs + LPS’, dendritic cells stimulated with bacterial LPS for 4 h.
a, A comparison of the number of biosample groups (cell type lineages) in which the gene promoter is active versus the number of categories in which a variant is predicted to regulate the gene by ABC-Max. b, Heat map of ABC scores for predicted IBD-associated genes in resting and stimulated mononuclear phagocytes (from epigenomic data in monocytes68 and dendritic cells69). IRF4 and PDGFB (bold) are two examples for which ABC predictions are specific to a particular stimulated state (+LPS) and are not observed in unstimulated states. c, Enrichment for top gene sets identified when performing enrichment analysis among the 23 IBD-associated genes predicted by ABC-Max in mononuclear phagocytes (dark grey), versus when performing the same analysis among the 43 IBD-associated genes predicted in any biosample (light grey). The enrichment for a given gene is calculated as the ratio of the frequency at which ABC-predicted genes belong to the gene set, compared to the frequency at which all genes within 1 Mb of these loci belong to the gene set (Methods). d, A variant in an intron of ANKRD55 is predicted by the ABC model to regulate IL6ST in thymus. The grey bar highlights the variant overlapping the predicted ABC enhancer. Vertical dotted lines represent TSSs. The red arc at the top denotes the ABC-Max prediction. The red arc at the bottom denotes that CRISPRi of the highlighted enhancer significantly affects the expression of IL6ST only in Jurkat cells.
a, ABC links IKZF1 to 2 traits by variants in 18 credible sets. Red boxes mark enhancers predicted to regulate IKZF1. The thick black line marks the IKZF1 TSS. Black dots mark fine-mapped noncoding variants (PIP ≥ 10%) associated with one or more traits linked to IKZF1 by ABC-Max. b, Genes linked to different traits via different variants have more complex enhancer landscapes. Cumulative distribution plots show the number of ABC enhancer–gene connections in all 131 biosamples (left) and the distance between the TSSs of the two closest neighbouring genes on either side of a gene, for each gene linked by ABC-Max to zero traits, one trait, or two or more traits through different variants (right). c, The complexity of the enhancer landscape of a gene is correlated with the odds of the gene being linked to multiple GWAS traits. The x axis shows the Wald odds ratio that a gene is connected to multiple GWAS traits, comparing genes in the top decile versus all other deciles of the corresponding enhancer complexity metric. The three enhancer complexity metrics are defined for each gene: the total number of enhancers linked to the gene by ABC in any biosample, the number of enhancers linked to a gene per biosample in which the promoter of the gene is active, and the genomic distance to the closest neighbouring TSS on either side of the gene. Dot, mean of the top decile genes (n = 1,838) versus all others (n = 16,550). Whiskers, 95% confidence intervals.
a, ABC predictions for variants near PPIF. Black dots represent either fine-mapped variants (PIP ≥ 10%) for IBD and UK Biobank traits, or lead variants for any phenotype from the GWAS Catalog16 (the latter to show the approximate locations of signals for traits for which fine-mapping is not yet available). The ‘IBD’ label points to rs1250566. The ‘MS’ (multiple sclerosis) label points to rs1250568 (fine-mapped in ref. 2). Red boxes mark enhancers predicted to regulate PPIF. Thick back lines mark TSSs. Thin black lines mark selected variants. b, CRISPRi-FlowFISH data for PPIF in seven immune cell lines and stimulated states. Red boxes mark distal enhancers (CRISPR gRNAs lead to a significant decrease in the expression of PPIF). Dark grey box marks the gene body of PPIF, for which CRISPRi cannot accurately assess the effects of putative regulatory elements4. c, Chromatin accessibility in 5-kb regions around the PPIF enhancer (e-PPIF). Signal tracks show ATAC-seq (for THP1 and BJAB) or DNase-seq (for GM12878 and Jurkat) data in reads per million. Arrows show the locations of variants associated with multiple sclerosis and lymphocyte count (Lym, rs1250568) and with IBD (rs1250566), which overlap with enhancers that regulate PPIF in distinct sets of cell types. d, Effect of each tested gRNA on PPIF expression, as measured by CRISPRi-FlowFISH (Methods). Dots, gRNAs for which the effect estimate is >0% (black) or <0% (red). Red bars show regions for which gRNAs have a significant effect on gene expression (FDR < 0.05), compared by a two-sided t-test to negative control gRNAs. e, Effects of eight individual gRNAs on PPIF expression in THP1 cells, as measured by CRISPRi and qPCR (Methods). PPIF expression is normalized to expression of GAPDH and to cells expressing negative control, non-targeting gRNAs (Ctrl). Error bars, 95% confidence intervals of the mean (n = 6 replicates per gRNA). f, Schema of pooled CRISPRi screen to examine the effects of PPIF and e-PPIF on mitochondrial membrane potential (Δψm). Cells expressing a pool of gRNAs were stained with MitoTracker Red and MitoTracker Green and sorted into three bins of increasing red:green ratios. gRNAs from cells in each bin were PCR-amplified, sequenced and counted. g, Effects of CRISPRi gRNAs (targeting e-PPIF, PPIF promoter or negative controls) on Δψm, quantified as the frequency of THP1 cells carrying those gRNAs with low or medium versus high MitoTracker Red signal (corresponding to bins 1, 2 and 3, respectively; superset of data in Fig. 4d). We tested THP1 cells in unstimulated conditions, stimulated with LPS, and differentiated with PMA and stimulated with LPS (Methods). Error bars, 95% confidence intervals for the mean of 40, 9, and 5 gRNAs for control, PPIF and e-PPIF, respectively. Two-sided Wilcoxon rank-sum test versus control; *P = 0.0163, **P = 0.00426, ***P = 0.000356. h, Ratios of MitoTracker Red (mitochondrial membrane potential) to MitoTracker Green (mitochondrial mass) signal in THP1 cells at baseline, stimulated with LPS and differentiated into macrophages with PMA and stimulated with LPS in biological duplicate (from left to right, n = 8,044, 99,683, 99,982, 99,968, 99,886 and 99,878; replicates were cultured, stimulated, stained and flow-sorted independently). Box represents median and interquartile range; whiskers show minimum and maximum. Stimulation with either LPS alone or both PMA and LPS leads to a reduction in red:green signal, indicating a reduction in mitochondrial membrane potential normalized to mitochondrial mass.
This file contains Supplementary Notes 1-5, Supplementary Figures 1-2, full legends for Supplementary Tables 1-14 and Supplementary References.
Epigenomic data collected in immune cell lines.
Metrics for ABC predictions in 131 biosamples.
This zipped file contains CRISPRi-FlowFISH data: data per guide.
CRISPRi-FlowFISH data: summary per candidate element.
Comparison of CRISPR data to enhancer-gene predictions.
Enrichment of GWAS variants in ABC enhancers across biosamples.
Summary of diseases and traits.
ABC predictions for IBD GWAS loci.
Causal gene predictions in IBD GWAS loci with known genes.
ABC-Max predictions for 72 diseases and complex traits.
References linking predicted IBD genes to effects on experimental colitis.
ABC and ABC-Max metrics for all genes.
PPIF and mitochondrial membrane potential: CRISPRi data per guide.
About this article
Cite this article
Nasser, J., Bergman, D.T., Fulco, C.P. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021). https://doi.org/10.1038/s41586-021-03446-x
This article is cited by
An optimal variant to gene distance window derived from an empirical definition of cis and trans protein QTLs
BMC Bioinformatics (2022)
Contribution of 3D genome topological domains to genetic risk of cancers: a genome-wide computational study
Human Genomics (2022)
Characterising sex differences of autosomal DNA methylation in whole blood using the Illumina EPIC array
Clinical Epigenetics (2022)
Regulatory analysis of single cell multiome gene expression and chromatin accessibility data with scREG
Genome Biology (2022)
Enhancer-promoter interaction maps provide insights into skeletal muscle-related traits in pig genome
BMC Biology (2022)