Main

Genome-wide association studies (GWAS) have identified more than 200 k variants spanning over 3,000 human complex traits and diseases1. However, over 93% of lead single-nucleotide polymorphisms (SNPs) in GWAS loci lie in noncoding regions and are coinherited with many other variants in regions of linkage disequilibrium (LD). Thus, it is challenging to recognize the context and gene-regulatory circuits through which these disease-associated genetic variants act2,3,4,5, such as (1) the specific contexts including tissue/cell types where the variants function6,7,8,9, (2) the causal genetic variants that drive the alterations10,11,12,13,14 and (3) the impact of genetic variants on regulatory elements and target genes at varying genomic distances15,16,17,18,19.

Tissue-specific functional genomics data characterize molecular effects that may mediate the impact of genetic variants on human diseases/traits, enabling us to address the above challenges. The Genotype-Tissue Expression (GTEx) project20,21,22 and other efforts23,24 identified cis-acting genetic variants affecting gene expression and splicing, that is, expression of quantitative trait loci (eQTLs) and splicing QTL (sQTLs), in multiple human tissues. The enhancing GTEx (eGTEx) project16 has expanded the GTEx map with protein25, telomere-length26, DNA methylation27,28 and the epitranscriptomic mark m6A29. These resources connect genetic and phenotypic variations via rich molecular phenotypes. The rationale for eGTEx is twofold—it enables understanding the impact of genetic variation on molecular traits and the diverse selective forces acting on the human genome, and it elucidates the functions of GWAS variants, providing more stepping stones when they are also captured by eQTLs or indicating context of action when their impacts are condition-/cell-type-specific and missed by eQTL in steady-state postmortem tissues.

Notably, the current eGTEx resource lacks information on promoter/enhancer activity variation. Multiple studies have shown that tissue-specific enhancers are also strongly enriched for genetic variants associated with tissue-relevant complex traits and diseases3,9,18,19,30,31,32, and interindividual enhancer variation can help interpret disease-associated loci33,34,35,36,37,38. The epigenomic mark most directly associated with enhancer activity is H3K27ac, found primarily at both active enhancers and promoters.

Here we expand the eGTEx collection with H3K27ac chromatin immunoprecipitation followed by high-throughput sequencing (ChIP–seq) of 387 samples across brain, heart, muscle and lung from 256 GTEx participants and characterize promoter/enhancer variation associated with tissue/cell types, sex, genetic variants, transcription variation and clinical traits (Fig. 1a). We detect 282 k active regulatory elements (AREs) and classify them into 14 groups based on coactivity patterns across 240 reference epigenomes18,19. We also identify 2,436 sex-biased AREs and 130 k distinct cis-acting histone acetylation quantitative trait loci (haQTLs) targeting 5,397 distinct genetically influenced AREs (gAREs) across tissues. Leveraging these findings, we investigate the genetics of complex diseases and traits with 55 GWAS. We predict candidate tissues/cell types of action for diseases/traits and prioritize 614 distinct GWAS–haQTL-colocalized gAREs. Finally, we calculate ARE–gene linking scores based on genetic evidence (gLink scores) and prioritize 228 target genes for 161 GWAS–haQTL-colocalized gAREs across four tissues.

Fig. 1: Study overview and lower-dimensional projection of H3K27ac profiles.
figure 1

a, Study overview. Left, H3K27ac samples in this study; middle, regulation of AREs and target genes; right, functional implications in human disease/traits genetics. Organ sketch plots were created with BioRender.com. b, Lower-dimensional projection of 387 samples from this study and 240 samples from the reference epigenomes. Different colors represent different tissues or cell types, and different shapes represent different studies; boxed texts label the samples of brain, heart, muscle and lung from reference epigenomes; the top left inset shows the sample sizes of the H3K27ac experiments from various sample sources.

Results

H3K27ac ChIP–seq profiles in brain, heart, muscle and lung

We selected the following four tissues for epigenomic profiling: brain (prefrontal cortex), heart (left ventricle), muscle (skeletal) and lung. These tissues show strong relevance for multiple diseases, including psychiatric, cardiac, metabolic and respiratory traits, and also capture developmental diversity as they span all three germ layers of ectoderm (brain), mesoderm (heart and muscle) and endoderm (lung). We performed H3K27ac ChIP–seq, preprocessed the data with the ENCODE pipeline39, selected higher-quality samples and generated a peak set in each tissue using the highest-quality samples (Methods).

We reported a total of 387 genome-wide H3K27ac profiles, including 113 brain, 100 heart, 108 muscle and 66 lung samples selected for analysis after QC (Fig. 1a and Supplementary Table 1). They outnumbered the H3K27ac experiments previously undertaken by the Roadmap Epigenomics31 (n = 124), ENCODE18 (n = 98) and Genomics of Gene Regulation40 (n = 18) consortia (total: n = 240, combined in the EpiMap resource19; Fig. 1b, inset), providing an important resource to the field of epigenomics and gene regulation.

To understand the relationship between our sample profiles and those of other consortia, we projected all 627 H3K27ac profiles into a lower-dimensional embedding space41 (Fig. 1b and Methods). We found that our H3K27ac profiles from the same tissues are clustered closely to tissue-matched samples from Roadmap, ENCODE and GGR (Fig. 1b and Supplementary Note). We also confirmed that tissue-matched pairs of the H3K27ac profiles from this study and those from previous studies show high correlations (Extended Data Fig. 1 and Methods).

ARE activity across tissues, cell types and sex

We identified AREs from brain (190 k), heart (132 k), muscle (143 k) and lung (107 k) samples based on H3K27ac ChIP–seq peaks (Methods). We then aggregated overlapping AREs across tissues as a reference set of 282 k distinct AREs, of which 14% (39 k) are fully shared across all four tissues, 24% (68 k) are partially shared in two or three tissues and 62% are tissue-specific (brain, 86 k; heart, 31 k; muscle, 37 k and lung, 21 k; Fig. 2a and Extended Data Fig. 2a). Fully shared AREs were enriched for promoter annotations (54–80% in promoters versus 26–44% in background across tissues), while tissue-specific AREs were enriched for enhancers (79–93% enhancers versus 56–74% expected).

Fig. 2: ARE activity across tissues and cell types.
figure 2

a, ARE detection from brain, heart, muscle and lung in this study. Left, schematic diagram showing the AREs detected from each tissue and how they are merged as a reference set, with the number of AREs shown on the right; right, ARE-sharing and specificity across the four tissues, with promoter and enhancer AREs in different colors. b, 282 k AREs identified in this study forming 14 groups of 127 modules based on coactivity across 240 reference epigenomes. Top, 127 ARE modules and module size; middle, proportion of AREs annotated as promoter/enhancer-related chromatin states and different genomic features for each module; bottom, average ARE activity of each module (by column) across 240 reference epigenomes (by row). Representative sample names labeled for each sample cluster on the right; 14 ARE groups labeled at the bottom; red, green, orange and blue boxes showing blood/immune (G3), skeletal/cardiac muscle (G9), brain/neuron (G12) modules and lung-specific modules.

To distinguish brain/heart/muscle/lung-specific AREs, broadly active AREs or AREs that are primarily active in other tissues, we leveraged the diversity of tissues and cell types of H3K27ac profiles in Roadmap/ENCODE/GGR and clustered our 282 k AREs based on their H3K27ac activity patterns across 240 reference epigenomes, resulting in 127 modules, which we further clustered into 1,413 submodules and also aggregated into 14 groups (G1–G14; Fig. 2b, Extended Data Fig. 2b and Methods). We found that 19 modules (74 k AREs, 26%) were broadly or multitissue active. Only 25 of the 127 modules (67 k, 24%) were active primarily in our four tissues, while 82 modules of these AREs (112 k, 40%) were primarily active in other tissues from the reference samples, such as those in blood/immune (G3). Lastly, one module of AREs (G14, 29 k, 10%) did not show activity in the previous reference epigenomes and was newly detected here.

We next annotated these 127 modules from ARE groups based on enrichments of chromatin states, genomic annotations, transcription factors (TFs) binding sites and proximal gene function annotations (Fig. 2b and Extended Data Figs. 2c and 3a; Methods). These annotations are consistent with their primary tissue of activity. For example, broadly active AREs (G1) were enriched for transcription start site (TSS)-proximal regions, promoter/enhancer chromatin states and housekeeping genes (P < 2.2 × 10−16); brain/neuron AREs (G12) were enriched for brain enhancers/promoters, trans-synaptic signaling pathways and neuronal-function TFs binding sites, such as NEUROG2 (ref. 42). Blood/immune AREs (G3) were enriched for immune functions and immune-related TFs binding sites, including IRF5 and IRF8. Newly detected AREs (G14), showing weak activities in both datasets (Extended Data Fig. 2b), required more samples to detect than other AREs (Extended Data Fig. 3b). G14 were enriched for brain-specific AREs, intergenic regions, synaptic-related pathways and neuronal-function TFs binding sites such as NFATC1 (ref. 43), indicating the brain as the primary detection tissue.

Fig. 3: Genetic drivers of ARE activity and their tissue specificity.
figure 3

a, Overlaid Manhattan plot of lead haQTLs from brain, heart, muscle and lung; x axis shows the genomic position and y axis shows the -log10(empirical P value); empirical P value is haQTL P value adjusted for multiple SNPs for each loci as described in Methods; examples 1–5 in b are marked with numbers in colored circles. b, Examples for type I (1–2), type II (3) and type III (4–5), along with the information of ARE group, ARE, lead haQTL, gARE activity and SNP alleles for each sample across tissues and number of samples (n); boxes = 25th to 75th percentile (that is, IQR); line = median; whiskers = 1.5× IQR; haQTLs P values shown in b and c are all nominal P values based on linear regression (two-sided) implemented in FastQTL as described in Methods. c, Directionality consistency-based haQTL tissue-sharing analysis. Lower, between-tissue directionality consistency (y axis on the right) of the lead haQTL effect size (y axis on the left) increases as the P value significance of the haQTLs in the replication tissue (x axis) increases, separated by positive effect (right half-plane) and negative effect (left half-plane) of the matched haQTL–ARE pair in the discovery tissue; the examples 1–3 (shared ARE between tissues) from a are labeled. Upper, density of lead haQTLs in each of the nominal P value bins; the proportion and the corresponding consistency rates of each group are shown in parentheses in the figure legend. d, The estimated proportion of haQTL tissue sharing for each tissue pair. e, The number of types I, II and III gAREs in each tissue. Colors for different types of gAREs are shown in the legend below the panel. f, Enrichment of different types of gAREs in 14 ARE groups (defined in Fig. 2b). x axis, each ARE group; y axis, percentage of different types of AREs in each ARE group; asterisk shows the significance level of enrichment (two-sided Fisher’s exact test, only for odds ratio > 1) compared to all AREs in each tissue (exact P value shown in Supplementary Table 4); colors for different types of gAREs are shown in the legend above the panel. IQR, interquartile range.

We then identified 192 (brains), 1,211 (hearts), 1,214 (muscles) and 23 (lung) sex-biased AREs after controlling for unwanted variations, including tissue-archetype fractions (estimated by our deconvolution approach; Extended Data Fig. 4a,b) and phenotypic variables (Extended Data Fig. 5a, Supplementary Table 2 and Supplementary Note). These sex-biased AREs, enriched near previously identified sex-biased genes (Extended Data Fig. 5b), show consistent directionality with those sex-biased genes in the matched tissues from GTEx44 (one-sided proportion test, P = 2.8 × 10−10 for female-biased and 1.9 × 10−5 for male-biased AREs; heart as an example in Extended Data Fig. 5c), validating our results across the modalities of transcriptomes and epigenome. Gene Ontology (GO) annotation shows that genes near the sex-biased AREs were enriched for housekeeping processes such as endoplasmic reticulum-unfolded protein response and tissue-specific processes such as sensory perception of smell for the brain (Extended Data Fig. 5b), suggesting that sex can affect both types of programs across tissues.

Genetic drivers of ARE activity in each tissue

We next discover cis-acting haQTLs associated with variations of enhancer/promoter activity (genetically influenced AREs and gAREs) across individuals in each tissue. Following the GTEx eQTL calling pipeline, we removed top peer factors45 and other covariates and mapped haQTL using FastQTL46 within 100 kb of AREs (Methods and Extended Data Fig. 6a–d). We found a total of 83 k brain haQTLs (targeting 2,162 gAREs), 44 k heart haQTLs (1,311 gAREs), 62 k muscle haQTLs (1,816 gAREs) and 8 k lung haQTLs (537 gAREs; Fig. 3a and Supplementary Table 3).

To understand the tissue specificity of genetic effects on epigenomic regulation, we next studied the sharing of haQTLs and gAREs between tissues. We distinguished three types of gAREs, which are as follows: (1) haQTL-shared gARE, where both gARE and its haQTL are detected in another tissue (examples 1 and 2 in Fig. 3b); (2) haQTL-specific gARE, where the ARE is present in another tissue but the haQTL effect is tissue-specific (example 3 in Fig. 3b) and (3) ARE-specific gARE, where the ARE is tissue-specific and so is the haQTL (examples 4 and 5 in Fig. 3b).

We assessed pairwise haQTL tissue sharing for the shared gAREs by evaluating the directionality consistency between ‘discovery’ and ‘replication’ tissues (Methods; Fig. 3c and Extended Data Fig. 7a), a strategy that mitigates limitations due to QTL detection power29. We evaluated this consistency at varying levels of haQTL significance in the replication tissue, which are as follows: strongly replicated (P < 10−5; examples 1 and 2 in Fig. 3b), medium-replicated (10−5 < P < 10−3), weakly replicated (10−3 < P < 0.1) and no-effect (P > 0.1; example 3 in Fig. 3b). Strongly and medium-replicated haQTLs showed over 98% directionality consistency. Even weakly replicated haQTLs showed 82% consistency, suggesting an underestimation of haQTL effect sharing if applying haQTL discovery threshold for both tissues (Fig. 3c). Furthermore, weakly replicated haQTLs show higher effect size similarity with the discovery tissue than those from the no-effect bin (Extended Data Fig. 7b,c; Methods), confirming the tissue sharing of subthreshold haQTLs in the replication tissue.

We then calculated the proportion of haQTL tissue sharing for each pair of tissues based on directionality consistency and found 18% to 49% haQTL tissue sharing (Fig. 3d; Methods). The haQTL tissue sharing between muscle and heart is higher compared to other tissue pairs (Fig. 3d), consistent with shared germ layer origin. For individual gAREs, we defined haQTL-shared gAREs based on the nominal P-value threshold (P ≤ 0.02) in the replication tissues (Extended Data Fig. 7d and Supplementary Table 4; Methods), and then looked at the proportions of all three gARE types in each tissue. The haQTL-specific gARE is the most common type except in the brain, while ARE-specific gAREs comprise a much larger proportion in the brain than in other tissues (Fig. 3e), consistent with a larger proportion of tissue-specific AREs in the brain (Fig. 2a).

We then tested the enrichment of ARE groups for each of the three gARE types (Fig. 3f). The haQTL-shared and haQTL-specific gAREs were primarily enriched in broadly active AREs (G1; examples 1 and 3). haQTL-specific gAREs were particularly enriched in multitissue AREs (G2) for brain and muscle. ARE-specific gAREs were enriched in newly detected AREs (G14; except lung) and tissue-specific ARE groups, such as G12 for brain (example 5), G9 for heart and muscle, and G7 and G8 AREs for lungs (example 4). These results showed that haQTL-specific and ARE-specific gAREs provide tissue-specific genetic regulation with distinct mechanisms. Separately, we also revealed significantly positive associations of tissue specificity between haQTL and eQTL in all tissues except for lungs (Extended Data Fig. 7e and Supplementary Note).

ARE regulation reveals mechanisms for disease/trait genetics

Next, we leveraged ARE regulation associated with tissues/cell types and genetics to gain insight into disease genetics. We first studied the enrichment of common genetic variants associated with diverse traits from 55 GWAS (Supplementary Table 5) in 14 ARE groups with distinct primary tissues of activity using LD score regression analysis9. Cardio-metabolic traits (Fig. 4a, top group), including coronary artery disease and hypertension, were enriched in broadly active (G1) and multitissue (G2) AREs detected in multiple tissues in our study, and AREs from other groups mainly detected in our heart samples47. Allergy/immune-related traits (Fig. 4a, second group) were enriched exclusively in broadly active (G1), multitissue (G2) or blood/immune (G3) AREs, consistent with the immune-related functions for these ARE groups (Extended Data Fig. 2c). Most psychiatric-neurologic-associated traits (Fig. 4a, third group) were enriched in brain/neuron (G12) and newly detected (G14) AREs active in our brain samples, consistent with brain-related functions for these groups (Extended Data Fig. 2c). Schizophrenia (SCZ) and bipolar disorder (BD) also showed enrichment for blood/immune (G3) AREs, consistent with increasingly recognized immune roles in the diseases48,49. Lastly, lung-function-associated traits (Fig. 4a, bottom group) were enriched in broadly active (G1), multitissue (G2) and smooth muscle (G11) AREs only detected in our nonbrain samples, highlighting the importance of smooth muscle in respiratory traits50,51.

Fig. 4: ARE regulation reveals mechanisms for disease/trait genetics.
figure 4

a, Enrichment of GWAS signals for diseases and traits in 14 ARE groups, partitioned by the tissue presence for each ARE. Rows represent GWAS studies (full names in Supplementary Table 5) from four categories (right label); columns represent different ARE groups; different triangles in each square denote whether the ARE in a group from a tissue is substantially enriched for the corresponding trait. b, Overview of 54 GWAS–haQTL-colocalized gARE loci for schizophrenia. Top annotations include the ARE group that each gARE belongs to, whether it is tissue-specific colocalization (an example in d), whether the GWAS signal is missed by eQTL from top GWAS–haQTL colocalization tissue (an example in e), and whether there is a different causal SNP from the lead SNP suggested by haQTL (an example in b). c, An example of different GWAS causal SNPs suggested by colocalization. x axis represents genomic location; y axis for each panel from top to bottom, schizophrenia GWAS signal (−log10(P)) and haQTL signals (−log10(P)) for SNPs in brain; P values in Fig. 4 are all nominal P (two-sided) for either GWAS (chi-squared) or QTL (linear regression); the colors of data points represent the correlation (r2) with lead GWAS SNP and symbols denote different types of SNPs as shown in the legend underneath e; the inset indicates the alternative causal SNPs. d, An example of tissue-specific regulation based on colocalization. Panels similar as c; panels for both haQTL and eQTL signals in brain for tissue-specific colocalization are boxed in orange. e, An example of bulk-eQTL-missing GWAS–haQTL-colocalized gARE loci. Bottom, the strongest GWAS–eQTL colocalization detected for RALGAPB; other panels similar as c. f, The counts of bulk-eQTL-missing and -capturing GWAS–haQTL-colocalized gAREs for each tissue.

We then applied coloc, a Bayesian framework of colocalization analysis52, to estimate the sharing of causal genetic variants between GWAS and haQTL signals within our gARE loci and to recognize potentially impacted AREs (Supplementary Note). We also used the same analysis for GTEx eQTLs within our gARE loci, to reveal the disease loci that show epigenomic and/or transcriptional effects (Extended Data Fig. 8a). We found 1,070 GWAS–haQTL colocalized events (614 unique AREs, coloc PP4 ≥ 0.5) across our 4 tissues and 46 traits, 403 of which (38%) are also confirmed by Mendelian randomization (MR, adjust P < 0.2; Supplementary Table 6 and Supplementary Note).

As a proof of principle, we focused specifically on the 54 GWAS–haQTL-colocalized gAREs for schizophrenia (Fig. 4b), finding that most of them show tissue-specific colocalization (46, 85%) primarily in brain tissue (30, 56%), consistent with the expected tissue of action. For 11 gARE loci, haQTL pinpoints causal SNPs different from the lead GWAS SNPs. For example, the GWAS lead SNP rs77831 was not detected as a significant haQTL in brain tissue (Fig. 4c), while the SNPs identified as significant haQTLs (green box) may impact gene-regulatory alterations in the brain related to schizophrenia etiology.

For 34 of 54 gARE loci (63%), haQTLs–GWAS-colocalized variants also impacted gene expression in the matched tissue. For example, a brain/neuron (G12) ARE, haQTL detected only in the brain (type II ‘haQTL-specific’ gARE), shows GWAS–haQTL colocalization in the brain, together with GWAS–eQTL colocalization only in the brain (orange box, Fig. 4d). These results together reveal a brain-specific SNP–ARE–gene regulatory circuit involved in schizophrenia.

For the other 20 of 54 gARE loci (37%), we did not observe any eQTL colocalization effect (coloc PP4 < 0.1, ‘bulk-eQTL-missing GWAS loci’ in Fig. 4b). For example, the lead schizophrenia GWAS SNP rs6065094, also an haQTL of a brain/neuron (G12) ARE, does not show eQTL effect on any genes nearby (Fig. 4e). Using SuSIE followed by colocalization53, we confirmed that the majority of these eQTL-missing loci are not due to multiple independent eQTLs confounding the analysis (Extended Data Fig. 8b and Supplementary Note). We then carried out colocalization analysis using cell-type-level eQTLs from a brain single-nucleus RNA-seq dataset54, to check the possibility that the bulk-eQTL-missing may arise from lower power to capture cell-type-specific effect. In total, 10 out of 13 brain bulk-eQTL-missing loci could be explained by cell-type-level eQTLs (Extended Data Fig. 8c), implying that bulk haQTLs may have better power than bulk eQTLs to reveal cell-type-level function for these disease genetic variants. For all GWAS–haQTL colocalization events identified across traits, we found that 13–32% of them were missed by bulk eQTL across four tissues (Fig. 4f).

gLink scores prioritize gARE–eGene pairs based on genetic variations

We next identified potential target genes for gAREs to uncover gene-regulatory circuits underlying disease genetics. We first linked a gARE to a gene if it is proximal to the gene’s fine-mapped eQTLs (FMeQTLs), which are more likely to pinpoint causal SNPs than eQTLs after incorporating the LD structure. As expected, gAREs tend to be proximal to eQTLs (2 kb window) compared to various backgrounds, and this enrichment is even higher for FMeQTLs (Fig. 5a, Extended Data Fig. 9a and Supplementary Note). FMeQTLs located in gAREs are also more likely to interrupt TF binding sites than those at other AREs across tissues (P < 1 × 10−20), providing independent evidence for FMeQTLs functioning within gAREs (Extended Data Fig. 9b and Supplementary Note). These results affirmed the functional dependency between FMeQTL and gARE, suggesting that we can use FMeQTLs to pinpoint gAREs for target genes. We thus used the genomic distance between an FMeQTL and its proximal gARE (gARE-dist-to-FMeQTL) and the most significant P value of gARE-proximal eQTLs of a target gene (gARE-proximal-eQTL) as metrics to quantify the gARE–gene linkage.

Fig. 5: gLink scores as a novel framework to prioritize gARE–eGene linking.
figure 5

a, gLink approach 1, quantifying gARE–eGene linking based on eQTL proximity to ARE. Top, a schematic view of the approach. Bottom, enrichment of the QTL-proximal regions of gAREs over other regions; x axis, percentage of QTL-proximal regions among all target regions; y axis, fold enrichment of QTL-proximal regions in target regions over those in shuffled genomic regions (the fourth background genomic region in the legend); colors denote types of QTL and symbols denote different target regions as shown in the legend. b, gLink approach 2, quantifying ARE–gene linking based on shared genetic regulation. Top, a schematic view of the approach. Middle, solid dots denote the gAREs linked to MADCAM1 supported by different scores (by row) in different colors; gLink scores are grouped by two approaches with label on the left; the gARE labeled by black dashed vertical line (right one) is supported by scores from both approaches; the ARE labeled by green dashed vertical line (left one) is supported by scores only from approach 2. Bottom, correlation of polygenic gene expression of MADCAM1 (y axis) and activity of the ARE only captured by scores from approach 2 (x axis) across individuals, with r2 and P value (two-sided based on linear regression) labeled, and regression line (solid red) and its 95% confidence interval (dashed blue) shown. c, Comparison of gLink scores and other scores to the ABC score as the benchmark dataset. Top, a schematic view of candidate gAREs and different linking scores considered; Bottom, truncated PRC (x axis: 0–0.2) for gLink scores and EpiMap across four tissues; the inset shows the AUPRC (x axis) for different scores and background; the black dashed line shows the background proportion of the positive links based on the ABC score.

We also noticed the limitation of this approach—a substantial part of these gARE–gene links do not show shared genetic regulation (26–61% across four tissues) at either the locus level (coloc PP4 < 0.1) or the SNP level (haQTL P > 0.1; Extended Data Fig. 9c). Alternatively, we directly inferred gARE–gene linking based on shared genetic regulation33 with four linking scores (Supplementary Note) as follows: (1) coloc PP4, the posterior probability of sharing the same causal variant for gARE–gene pair determined by coloc analysis52; (2) coloc PP4/PP3, the ratio of the posterior probability of the pair sharing the same causal variant (PP4) versus having different causal variants (PP3) based on coloc; (3) MR, the causal effect of gARE activity on gene expression based on haQTL and eQTL and (4) ExpPGS-gARE-corr., the correlation between the gARE activity and the genetically inferred gene expression. These scores can capture additional gARE–eGene linkages that are missed by the first approach (Fig. 5b).

We next compiled gLink scores with six gARE–eGene linking scores from the two approaches above to analyze all candidate gARE–eGene (genes with eQTL) pairs, where gAREs are located between 2 kb and 1 Mb from the eGene’s TSS (Supplementary Table 7 and Supplementary Note). We then tested their performance by using each of the gLink scores as the benchmark. We found that scores from the second approach show higher consistency among each other compared to the scores from the first approach, and gLink scores show overall higher performance than the score based on the distance between ARE and gene (ARE-gene-dist; Extended Data Fig. 9d). We also compared gLink scores with the following two state-of-art linking scores: (1) EpiMap score19, which is based on the correlation between gene expression and ARE activity across different tissues and cell types and the genomic distance between the pair and (2) ABC score17, which is based on enhancer–gene chromatin interaction frequency weighted by the enhancer activities. Coloc PP4 and gARE-distance-to-FMeQTL consistently perform better than EpiMap and other gLink scores based on area under precision–recall curve (AUPRC) across tissues with ABC score as the benchmark (Fig. 5c). However, all gLink scores showed poor performance with EpiMap score as the benchmark (Extended Data Fig. 9e). These results suggest that gLink and EpiMap scores may prioritize different subsets of enhancer–gene links.

Downstream ARE–gene circuits for GWAS variants

As gLink scores identify gARE–eGene pairs based on genetic variation, we hypothesized that gLink scores might prioritize gAREs associated with disease/trait impactful genetic variants. This was confirmed in 19 GWAS results, where at least one tissue-matched gLink score substantially prioritized higher fractions of GWAS–haQTL-colocalized gAREs based on the linking scores to target genes compared to the background, showing consistently higher performance than ARE-gene-dist, EpiMap and ABC score (Fig. 6a, Extended Data Fig. 10a and Supplementary Note).

Fig. 6: gLink scores prioritize gARE–gene circuits for diseases and traits.
figure 6

a, Enrichment of GWAS–haQTL colocalized gAREs in gAREs with predicted links for brain. X axes, proportion of GWAS–haQTL-colocalized gAREs in gAREs with predicted ARE–gene links based on different linking scores (per mile); different rows in each panel denote different linking scores (gLink scores are shown in red text). Asterisk shows the significance level (one-sided proportion test); exact P value is shown in Supplementary Table 7. b, Comparison of the number of target genes from different linking scores with GWAS–eQTL colocalization. The heatmap shows the proportion of target genes prioritized by each linking score (by row) that are supported by GWAS–eQTL colocalization in brain tissue for each of ten diseases/traits in a (by column); the overlapped gene number is shown in each cell; the total gene number for each row is shown on the right. c, Schizophrenia GWAS–haQTL-colocalized gARE–gene circuits in the brain. The heatmap shows the genetic evidence of association between target gene and schizophrenia for each gARE–gene circuit (by row) in each tissue (by column); genomic position of ARE and ARE group is shown on the left; for each cell, the upper triangle shows the evidence supported by GWAS–eQTL colocalization (PP4), and the lower triangle shows the number of gLink scores that connect GWAS–haQTL-colocalized gARE to the same gene. d, Schizophrenia GWAS and brain QTL signals at CORO7NMRAL1 locus prioritized in c. From top to bottom, gene and gARE-genes linking annotation, schizophrenia GWAS signals, haQTL signals for gARE (chr16:4409428–4410046) and eQTL signals for CORO7 and NMRAL1 in brain with colocalization PP4 shown in parenthesis; P values are all nominal P (two-sided) for either GWAS (Chi-squared) or QTL (linear regression). AD, Alzheimer’s disease; NEUROT_UKB, neuroticism; ADHD, attention-deficit/hyperactivity disorder; SLEEP, sleep duration; CHRONO, chronotype; FIS, fluid intelligence score, EDU, education years; UKB, GWAS from UK biobank (detailed information in Supplementary Table 5);.

We then integrated GWAS–haQTL-colocalized gAREs (PP4 ≥ 0.1) and different gARE–gene link scores to prioritize target genes. For brain-related traits, gLink scores perform better in identifying disease target genes supported by GWAS–eQTL-colocalization (Fig. 6b). We then compared the genes we prioritized to the target genes predicted previously for brain-related traits55, which were based on GWAS summary statistics and gene features, independent of eQTL. For five brain-related traits where any of the scores show substantially enriched overlap with the predictions from the literature, we found that at least one of our gLink scores showed significant and higher enrichment of benchmark genes than ABC and EpiMap scores (Extended Data Fig. 10b). In addition, we compared target genes for brain-related GWAS predicted based on different linking scores. Results from gLink scores approach 2 are more consistent with each other compared to others (Extended Data Fig. 10c). Overall, gLink scores tend to prioritize distal genes compared to ABC score and ARE-gene-dist (Extended Data Fig. 10d), suggesting that they provide a unique enhancer–gene link framework to understand cis-regulatory circuits associated with disease genetics.

We then identified 1,284 (brain), 1,005 (heart), 2,006 (muscle) and 734 (lung) high-confidence gARE–eGene circuits using a unified score derived from gLink scores (Supplementary Table 8 and Supplementary Note) and looked into specific circuits for schizophrenia. They include 11 gAREs linking 14 genes, with 12 of those genes supported by GWAS–eQTL colocalization, suggesting their potential roles in schizophrenia etiology (Fig. 6c). Although most of these gAREs are tissue-shared, 13 out of 17 disease-associated gARE–eGene circuits are brain-specific. This underscores the influence of multiple factors in tissue specificity of genetic regulation, including genetic regulation of an ARE, its association with GWAS signal and gene–ARE linking. For example, at a schizophrenia GWAS locus on chromosome 16, a brain/neuron (G12) gARE links to two target genes CORO7 and NMRAL1, both previously genetically correlated to the disease56 (Fig. 6d). Their regulation of vesicle transport57 and cellular response to redox changes58, respectively, and their links to the same gARE imply a coordinated dysregulation of both processes in schizophrenia etiology59,60. We also detected 11 regulatory circuits in the heart or muscle potentially contributing to schizophrenia genetics (Extended Data Fig. 10e), also supported by GWAS–eQTL colocalization analysis across multiple heart or muscle tissues in GTEx (Extended Data Fig. 10f). This implies that haQTL signals and gLink scores in muscle and heart may capture disease genetic signals in tissue-shared vascular cells. Two of these six genes, WBP1L and MFHAS1, were previously identified as marker genes for brain perivascular fibroblast subtype and capillary endothelial cell subtype61, respectively, indicating their potential roles in vascular homeostasis in psychiatric disorders62.

Discussion

In this study, we generated a multitissue, multi-individual epigenomic dataset with 387 H3K27ac maps in brain, heart, muscle and lung GTEx samples. We characterized the association of 282 k AREs with tissues/cell types, sex and genetics and inferred enhancer–gene links using genetic, epigenomic and transcriptomic covariation. Finally, we used these integrative analyses to gain new insights into diseases/traits genetics.

We investigated the primary tissues/cell types of activity for AREs detected in this study (Fig. 2e), providing the epigenomic contexts for the overlapping GWAS variants and helping prioritize tissues/cell types of action for diseases and traits (Fig. 4a). For example, we identified smooth muscle cells for cardiac disease and lung functions and immune cells for autoimmune and psychiatric diseases. Our large sample size also enabled us to identify newly detected AREs (G14) missed by the reference epigenomes. Despite their low activity, G14 was substantially enriched for neuron-related pathways (Extended Data Fig. 2c) and GWAS signals from psychiatric or neurologic traits (Fig. 4a), suggesting their potential as proto-enhancers in neural development63. We also investigated the genetic regulation of AREs, revealing three types of gAREs, particularly ‘ARE-specific’ and ‘haQTL-specific’ gAREs. They show different mechanisms of tissue-specific genetic regulation and were reported to provide crucial context information for GWAS signals in previous31,64 and current studies (Fig. 4a,c), highlighting an important direction for future research.

We found that most of the ‘bulk-eQTL-missing GWAS loci’ (Fig. 4e,f) could be captured by cell-type-level eQTL signals. However, we applied stringent and permissive cutoffs for GWAS–haQTL (PP4 < 0.5) and GWAS–eQTL (PP4 < 0.1) colocalization, respectively, to detect robust loci missed by eQTL signal. Using the same cutoff of 0.5 for both, we found 11 GWAS loci captured by haQTL but not by either level of eQTLs (Extended Data Fig. 8c and Supplementary Note). Weak GWAS–eQTL colocalization signals might be due to eQTLs from nondisease-relevant conditions missing condition-specific regulation. This, coupled with cell-type-specific factors, may affect eQTL detection, explaining the ‘missing regulation’ bias noted in a previous report65. These results also indicate that bulk epigenomic variation may capture the impact of genetic variants (as shown here by haQTL effects) with greater power than bulk eQTL for certain loci where the effect on gene expression may only become visible in specific cell types under specific conditions.

We proposed gLink scores, a framework of enhancer–gene linking based on genetics. Our contributions include the following factors: (1) distinguishing two existing enhancer–gene linking methods based on eQTL66 and shared genetic regulation33; (2) refining the first one by focusing on gAREs; (3) providing a uniform framework of gLink scores to integrate the two and (4) discriminating gLink scores from other state-of-art approaches based on prediction consistency, distribution of distances between each pair and ability to pinpoint genes for GWAS–haQTL-colocalized gAREs. Unlike EpiMap, gLink scores prioritize enhancer–gene pairs based on interindividual genetic variation within the same tissue, rather than intertissue/cell type variation. gLink scores show stronger enrichment for GWAS–haQTL-colocalized gAREs compared to gene–ARE distance, EpiMap and ABC scores, likely due to their capacity to capture gARE-proximal genetic signals related to gene expression. Therefore, gLink scores offer a unique and complementary perspective on the enhancer–gene regulatory network.

Despite the unique opportunity provided to bridge genetic and phenotypic variation, our study has limitations. Our sample size and read depth (Extended Data Fig. 6b,c) were underpowered to detect haQTL signals, affecting haQTL sharing estimation (Fig. 3d,e), tissue specificity correlation between haQTL and eQTL for lung (Extended Data Fig. 7e), GWAS–haQTL colocalization (Extended Data Fig. 8c) and gLink scores’ performance (especially in the lung; Fig. 5c). Future research could increase the power by augmenting sample sizes or utilizing deep models to borrow regulation information across the genome67,68. Secondly, our epigenomic signals lacked cell-type resolution. Although we developed a deconvolution strategy for bulk samples and mitigated confounding factors due to cell fraction changes, we could not accurately deconvolve ARE activity and estimate cell-type-level haQTL without cell-sorted H3K27ac ChIP–seq data. Future studies could address this by generating cell-type-level reference epigenomic profiles for deconvolution or by mapping cell-type-level haQTL from enough cell-sorted samples across individuals. Thirdly, the majority of GTEx samples was from healthy individuals, mostly of European descent. Future studies should incorporate samples from disease-relevant contexts and diverse ancestries to enhance power and accuracy in identifying disease-related and ancestry-relevant GWAS–haQTL-colocalized gAREs and regulatory links for precision medicine purpose69,70.

Taken together, our dataset and integrative analyses, including ARE groups, enriched TFs binding sites in each group, sex-biased AREs, gAREs, haQTLs and their tissue sharing, gLink scores, GWAS–haQTL-colocalized gAREs and gARE–gene linking, provide a comprehensive view of ARE regulation in human primary tissues and their implications in disease genetics. Our analyses help elucidate functions of GWAS variants, boosting the efforts to accurately identify the context of function. Our dataset represents a unique resource, together with the GTEx dataset and other eGTEx projects, to explore the genetics of gene expression from different regulatory layers with samples of the same cohort.

Methods

Samples for H3K27ac ChIP–seq

Samples were collected by the GTEx Consortium. The donor enrollment and consent, informed consent approval, histopathological review procedures, and biospecimen procurement methods and fixation were the same as previously described22. No compensation was provided to the families of participants. Massachusetts Institute of Technology Committee on the Use of Humans as Experimental Participants approved that this study does not involve human participants as defined by federal regulations. H3K27ac profiling was carried out across four human primary tissues, including brain prefrontal cortex (n = 113), heart left ventricle (n = 100), skeletal muscle (n = 108) and lung (n = 66; Supplementary Table 1).

Statistics and reproducibility

No statistical method was used to predetermine sample size, which was mostly based on the availability of the samples in the GTEx cohort. The experiments were not randomized, and the investigators were not blinded to allocation during experiments and outcome assessment, as the study is not a randomized controlled trial study. We used data from 387 samples that passed the quality control as described in the section below in the downstream analysis. We carried out most statistical tests with R (v3.6.0) unless specified, including Fisher’s exact test and proportion test for enrichment analysis with basic function fisher.test and prop.test in R, respectively, as well as test for sex-biased ARE identification with R packages limma voom71 and sva72. We mapped haQTL with linear regression implemented in FastQTL46. Please refer to the corresponding sections in Methods or Supplementary Note for details on statistical tests.

Nuclei isolation and crosslinking protocol

Tissues were sliced up in seven to eight small chunks while kept on ice at all times. The pieces were transferred to 2 ml Safe-Lock Eppendorf Tubes (round bottom). NF-1 buffer (1 ml; 0.5% Triton X-100, 0.1 M sucrose, 5 mM MgCl2, 1 mM EDTA and 10 mM Tris–HCL pH 8.0) with proteinase inhibitors (Roche complete tablets), and sterile RNAse- and DNAse-free steel beads were added to the tube.

Qiagen Tissuelyser allowed for the processing of 48 samples at the time and was used to perform the following lysing of the tissue. Heart and muscle tissues were processed for 2–2.5 min at 30 Hz. Lung and brain tissues were processed for 2.5–3 min at 30 Hz. The samples were then transferred to 1.5 ml tubes and centrifuged at 40 °C at 1,000g for 10 min. The supernatant was discarded, and 937.5 µl of PBS-containing proteinase inhibitor was added. Crosslinking was performed by adding 62.5 µl of 16% formaldehyde (final concentration of 1%) and put on an elliptical rotator for 15 min at room temperature. The formaldehyde crosslinking was stopped/quenched by adding 125 mM glycine (50 µl of 2.5 M glycine stock solution) and left at room temperature for 5 min. The samples were then spun at 2,200 rpm for 10 min at 4 °C. The supernatant was then discarded, and the pellet was resuspended in 1 ml PBS-containing proteinase inhibitor. The sample was spun again at 2,200 rpm for 10 min at 4 °C. The supernatant was discarded, and the pellet was resuspended in 200 µl of PBS-containing proteinase inhibitor. The samples were then transferred to a 2D barcoded tube, flash frozen, combined in 96 samples 2D barcoded sample racks and sent for ChIP processing.

ChIP–seq protocol

Sample plates containing the cross-linked samples were completely thawed on ice. The cross-linked sample pellets were resuspended in 100–300 µl of 1% SDS lysis buffer + protease inhibitors for 10 min on ice (1% SDS, 10 mM EDTA and 50 mM Tris–HCL pH 8.1). The volume was then adjusted to 1 ml with ChIP dilution buffer + protease inhibitors (16.7 mM Tris–HCl pH 8.1, 167 mM NaCl, 0.01% SDS, 1.1% Triton X-100 and 1.2 mM EDTA). The samples were then sheared following the previously optimized protocol using a Covaris LE-200 sonicator.

The desired amount of sheared chromatin (equivalent to 1 million cells per ml) was normalized to the desired concentration and volume, using ChIP dilution buffer (16.7 mM Tris–HCl pH 8.1, 167 mM NaCl, 0.01% SDS, 1.1% Triton X-100 and 1.2 mM EDTA) keeping the final concentration of SDS to less than 0.1%. The desired antibody (1 µg; Cell Signaling Technology, 8173; 1:100 dilution) was added to each tube per well and incubated overnight at 4 °C.

Protein A/G (50 µl; 50/50) magnetic beads per reaction were independently added with blocking buffer (1 ml) containing proteinase inhibitor and incubated on a 360° rotator at 4 °C. The tubes were placed on a magnet, the supernatant was discarded and an additional 100 µl of blocking buffer was added to each tube per well. This step was repeated one more time. The tubes per well were incubated at 4 °C on head over tail rotator at 4 °C for 1 h. The ChIPs (chromatin + antibody complex) from rotation at 4 °C were combined with the blocked beads and incubated for 1 h at 4 °C. The ChIPs were removed from rotation and briefly spun to remove any sample from the caps and placed on a magnet. The supernatant was discarded. The ChIPs/beads complexes were removed from the magnet and washed with 175 µl of RIPA/140 mM NaCl buffer (cold; 0.1% DOC, 0.1% SDS, 1% Triton X-100, 140 mM NaCl, 1 mM EDTA and 20 mM Tris–HCl pH 8.1). The ChIPs/beads complexes were washed two more times with 200 µl of RIPA/500 mM NaCl Buffer (cold; 0.1% DOC, 0.1% SDS, 1% Triton X-100, 500 mM NaCl, 1 mM EDTA and 20 mM Tris–HCl pH 8.1). The ChIPs/beads complexes were washed twice with 200 µl LiCl Buffer (cold; 0.25 M LiCl, 1% NP40, 1% Na deoxycholate, 1 mM EDTA and 10 mM Tris–HCl pH 8.1). Then, washed twice with a TE buffer. ChIP Elution Buffer (50 µl; 10 mM Tris–HCl pH 8.0, 5 mM EDTA, 300 mM NaCl, 0.1% SDS and directly before use 5 mM DTT) and reverse crosslinking buffer (8 µl; 250 mM Tris–HCl pH 6.5, 1.25 M NaCl, 62.5 mM EDTA, 5 mg ml−1 proteinase K, and 62.5 μg ml−1 RNAse A) was added to each well. The samples were incubated at 65 °C for 3 h or overnight. Note that the negative control ChIP samples were parallelly reverse cross-linked at that step.

The reversed cross-linked samples were placed on a magnet. The supernatant was transferred to a new tube/plate. Each sample was successively 1.8× SPRI beads cleaned and 2.2× SPRI beads cleaned. Each sample was washed twice with 200 µl 70% ethanol. The ChIP material was eluted in 22 µl of 10 mM Tris–HCl pH 8.0, at room temperature for 3 min.

We quantified ChIP material using Qubit, constructed ChIP–seq library with Nextera XT DNA Library Prep Kit (Illumina) according to the manufacturer’s instruction and sequenced for 101-bp paired-end reads by Illumina HiSeq 2500 at Broad Technology Labs (BTL) in the Broad Institute. BTL implemented ChIP–seq as a robust, standardized, low-variability, automatically optimized procedure. We have automated all steps of the ChIP–seq protocol, including shearing, immunoprecipitation and library preparation, with all steps after shearing handled in series on a single liquid-handling robot with very limited human intervention. This has allowed major sources of variability and failure to be minimized or eliminated, including overcoming lot-to-lot variability in antibody performance.

Quality control of H3K27ac ChIP–seq

We carried out peak calling with the ENCODE pipeline (https://github.com/kundajelab/chipseq_pipeline)18. Briefly, we mapped reads to human genome assembly hg19 (bwa, v0.5.9 (ref. 73), command: bwa aln Homo_sapiens_assembly19.fasta -q 5 -l 32 -k 2 -t $NSLOTS -o 1 -f), filtered low-quality and multiple-mapped reads (samtools, v1.3.1 (ref. 74)) and applied MACS2 (v2.1.1)75 for peak calling. Samples were then filtered based on the following three QC metrics (Supplementary Table 1): (1) relative strand cross-correlation coefficient (RSC), based on the (shifted) agreement between forward-strand and reverse-strand reads, providing a measure of signal-to-noise enrichment ratio; (2) sequencing depth, counted as total reads; (3) H3K27ac signal correlation between our samples and the samples from Roadmap epigenomes based on 405 k tissue-specific AREs identified from Roadmap. We defined ‘Tier 1’ samples as those with RSC score ≥ 0.8, total reads ≥ 10 M and for which their matched Roadmap tissue was in the top three most correlated Roadmap epigenomes. We used high-quality samples of ‘Tier 1’ to generate a peak set in each tissue. We defined ‘Tier 2’ samples with relatively permissive cutoffs—RSC score ≥ 0.6, total reads ≥ 5 M and the same criteria of matched-tissue correlation. Also, we specifically included lung samples with RSC score ≥ 0.8 and total reads ≥ 5 M in ‘Tier 2’ samples, regardless of whether they passed the criteria of matched-tissue correlation, considering the limited lung samples in Roadmap. These criteria resulted in 113 brain, 100 heart, 108 muscle and 66 lung samples, with median total unique reads of 10.8 M, 11.7 M, 9.9 M and 10.5 M for each tissue, respectively.

ARE detection, reference ARE set generation, ARE activity quantification and normalization

We only included H3K27ac ChIP–seq peaks (q-value ≤ 0.01) present in at least two ‘Tier 1’ samples of each tissue in the tissue ARE set or overlapping with peaks from a tissue-matched reference sample from the ROADMAP project, resulting in 335 k, 363 k, 227 k and 148 k AREs in brain, heart, muscle and lung, respectively. We further filtered AREs based on their overlap with promoter- and enhancer-related chromatin states in tissue-matched samples in Roadmap (overlapping region ≥ 50 bp), resulting in 189,681, 131,952, 142,822 and 107,151 AREs for brain, heart, muscle and lung, respectively. AREs from the four tissues were merged as a reference set of 282 k AREs (BEDTools, v2.26.0 (ref. 76)).

We also carried out ARE replication analysis, by repeating ARE detection for each tissue with the same procedure in 80% of Tier 1 samples randomly selected 10 times, and checked the proportions of these AREs that are also detected in the rest of Tier 1 and all Tier 2 samples. We observed a replication rate of AREs at 84% in brain, 83% in heart, 88% in muscle and 78% in lung. For newly detected AREs (G14), these numbers are slightly lower, as expected, because these AREs vary more across samples and are thus less likely to replicate using our metrics—they are 80% in brain, 68% in heart, 77% in muscle and 55% in lung.

We quantified the activity of each ARE for each sample based on fragment coverage on a specific ARE divided by its length:

$${{\rm{Activity}}}_{i,k}={\sum }_{j=1}^{N}({\rm{overlap}}({\rm{Frag}}_{j,k},{{\rm{ARE}}}_{i}))/{{\rm{length}}}_{i},$$

where Activityi,k represents activity of AREi in sample j, Fragj,k represents a fragment j extended from a read j toward the 3' end to the same length as the estimated fragment length in sample k, overlap(Fragj,k, AREi) represents the length of the overlapping part between Fragj,k and AREi, and lengthi is the length of AREi. ARE activity was normalized for read depths of different samples, by a factor estimated for each sample based on geometry mean of read coverage of all AREs as implemented in DESeq2 package77 and then corrected for bias due to GC-content78.

Identification of ARE modules, submodules and groups

We identified ARE modules based on a previous approach19. Briefly, we extracted -log10P signal from 240 EpiMap H3K27ac samples for each ARE in our reference set (bigWigAverageOverBed, v2 (ref. 79)), binarized the matrix by a cutoff of 2 and clustered by a k-centroids algorithm implemented in R package flexclust80 (V1.4-0) with the Jaccard index as the distance metric and cluster number set as 141 (corresponding to 2,000 AREs per module on average). To check the robustness of our modules, we further clustered AREs from each module into submodules following the same approach above, with the cluster number set corresponding to 200 AREs per submodule on average. The resultant 1,413 submodules showed largely homogeneous ARE activities across submodules of the same module both for our samples and for the reference epigenomes, indicating that further clustering might not lead to new biologically interpretable subclusters (Extended Data Fig. 2b).

We then classified ARE modules into 14 groups with multiple steps as follows: (1) defining ubiquitous score as the proportion of samples with average ARE activity ≥ 0.2 for each ARE module; (2) defining ARE modules as broadly active (G1) if their ubiquitous scores were above 0.5; (3) defining ARE modules as multitissue (G2) if their ubiquitous scores were between 0.1 and 0.5; (4) defining the ARE module with no activity in any of the samples as newly identified (G14); (5) clustering 240 reference samples into 11 clusters based on average ARE signals across 127 modules by hierarchical clustering with 1-Pearson correlation coefficient as the distance metric, naming each sample cluster manually based on the names of majority samples and (6) grouping the rest of the ARE modules based on the sample cluster where the ARE module shows the strongest signal. The ARE modules are visualized by R package ComplexHeatmap (v2.4.3, R-4.0)81 in Fig. 2b.

Annotation of ARE modules

We annotated ARE modules based on following different functional categories: (1) housekeeping gene82—enrichment calculated for the genes assigned for each ARE module (assign gene to an ARE if within 2 kb); (2) chromatin states—we applied the results from an 18 chromatin states model from matched tissue in EpiMap19 to annotate ARE (overlap ≥ 50 bp); ‘TssA’ and ‘TssBiv’ are counted as promoter states, while ‘EnhG1’, ‘EnhG2’, ‘EnhA1’, ‘EnhA2’, ‘EnhWk’, and ‘EnhBiv’ are counted as enhancer states (Fig. 2b); (3) genomic regions—annotation calculated by R package annotatr (v1.6.0)83 (Fig. 2b); (4) biological processes from GO—enrichment of nearby genes is calculated by R package rGREAT (v1.2.1)84,85 (Extended Data Fig. 2c) and (5) TF binding sites—we used position weight matrix of binding sites for TF families collected in EpiMap and scanned ARE modules for enriched motifs by Homer (v4.11.1)86 with background autoselected, we filtered TF motifs based on adjust P value (cutoff of 0.05) and odds ratio (cutoff of 1.2) and we visualized the enrichment for each TF family (mean odds ratio) (Extended Data Fig. 3a).

We found that specifically broadly active AREs (G1) were enriched for TSS-proximal regions, promoter and enhancer chromatin states, and significantly associated with housekeeping genes (P < 2.2 × 10−16). Multitissue AREs (G2) were enriched for TSS-distal enhancer chromatin states and biological processes, including immune response, posttranscriptional regulation and epithelial tube morphogenesis. Blood/immune AREs (G3, red box) were enriched for immune functions and binding sites of immune-related TFs, including IRF5 and IRF8. Skeletal/cardiac muscle AREs (G9, green box) were enriched for muscle and heart enhancers and promoters, associated with cardiac muscle tissue development pathways and enriched for binding sites of skeletal muscle function-related TF, such as STAT3 (ref. 87). Brain/neuron AREs (G12, orange box) were enriched for brain enhancers and promoters, associated with trans-synaptic signaling, and enriched for binding sites of neuronal-function TFs, such as NEUROG2 (ref. 29). Lung-active AREs (modules 59 and 92, blue box) were enriched for the binding sites of lung homeostasis-related TFs, such as ERG88, FOXA2 (ref. 89) and FOXL1 (ref. 90).

Map haQTL

We inferred the latent factors for our samples in each of the four tissues by Peer45 (R package v1.0) and identified the optimum number of Peer factors (5, 5, 10 and 2 for brain, heart, muscle and lung) to correct for during haQTL calling. Additionally, we took into account age, batches for H3K27ac ChIP–seq experiment, top five genotype PCs, and those technical effects related to platform and PCR. We found strong correlations between the top peer factors and the tissue-archetype fractions estimated, confirming the importance of their removal to discover cis-acting genetic variants (brain as an example in Extended Data Fig. 6a). We applied FastQTL (v2.184)46 to map haQTLs within 100 kb to the center of each ARE on autosomes for variants with minor allele frequency ≥ 0.05. An empirical P value for each ARE was estimated based on the lead nominal P value and permutation results (--permute 1000 10000) by fitting a β distribution to account for multiple variants tested. We used an empirical P value threshold of 0.005 to identify gAREs as in a previous study29, and then for each gARE, we applied the nominal P value threshold corresponding to the empirical P value of 0.005 for each locus to identify haQTLs (Fig. 3a).

We used two strategies to estimate and summarize the power of haQTL calling (Extended Data Fig. 6b,c). The first one is using powereQTL (v0.3.4)91 with the parameters observed from our haQTLs, including sample size, variance of H3K27ac signal and haQTL mean effect size, as well as different values of minor allele frequency. The second strategy is to sequentially and randomly downsample the 109 brain samples to smaller sample sizes ten times each, and measure the fraction of haQTLs that can be recapitulated with these randomly selected samples. As expected, haQTLs peaked within 25 kb of their target gAREs (Extended Data Fig. 6d), validating our choice of 100-kb distance cutoff.

Tissue sharing of ARE genetic regulation and identification of three types of gAREs

We defined the following three types of ARE/gARE based on their tissue-sharing: type I ‘haQTL-shared’, where both the ARE and gARE are shared between tissues; type II ‘haQTL-specific’, where the ARE is shared while the gARE is specific to a certain tissue and type III ‘ARE-specific’, where the ARE is specific to a certain tissue and thus the gARE is specific too. For type I/II AREs, we estimated haQTL tissue-sharing using a previously reported method which is based on directionality consistency and is less dependent on the discovery power29. The underlying reasoning is that the tissue-sharing haQTLs would show consistent directionality between tissues, even if they did not pass the P-value threshold due to the power. Specifically, for all the lead haQTLs that are significant in at least one tissue (discovery tissue), we tested whether they show consistent directionality of haQTL effect in each of the rest of the tissues (replication tissues). The directionality consistency was quantified as the percentage of gARE–haQTL pairs that show the same effect directionality (either positive or negative) between the discovery tissue and the replication tissues. We grouped the haQTLs into the following four bins based on their nominal P value in the replication tissues: (1) strong-replicated (P < 10−5), (2) medium-replicated (10−5 < P < 10−3), (3) weak-replicated (10−3 < P < 0.1) and (4) no-effect (P > 0.1) and calculated the directionality consistency for each of these P-value bins. The proportion of tissue-sharing (Fig. 3d) haQTLs was quantified as follows: Tissue-sharing \(={\sum }_{(i=1)}^{n{\rm{bin}}}{\rm{Per}}{\rm{c}}_{i}\times [{\rm{D{C}}}_{i}-(1-{\rm{D{C}}}_{i})]\), where Perci represents the percentage of haQTLs of bin i, and DCI represents the haQTL directionality consistency of bin i.

To define haQTL-shared AREs between discovery and replication tissues, we set the nominal P-value threshold equal to 0.02 in the replication tissue, which makes the directionality consistency between the two tissues over 95% (Extended Data Fig. 7d).

As an additional approach to detect subthreshold haQTL tissue sharing, we quantified the similarity of haQTL effect size between the discovery tissue and the replication tissue in the same bins as above, defined as the regression coefficient of the effect size between the replication tissue and the discovery tissue. We observed that the haQTLs with stronger P values in the replication tissue tend to show greater similarity in effect size with the discovery tissue. We also observed that the haQTLs from the weak-replicated bin (10−3 < P < 0.1) show greater similarities (0.3–0.6) than those from the no-effect bin (~0.05; Extended Data Fig. 7b,c). These results further confirm the tissue sharing of subthreshold haQTLs in the replication tissue.

GWAS signal enrichment in ARE groups by LDSC

We applied LD score regression (v1.0.0)9,92 with default parameters to estimate the enrichment of GWAS signals from 55 datasets (Supplementary Table 5) for AREs from each group active in each of the four tissues as genomic annotations. We carried out Benjamini–Hochberg correction for testing multiple ARE groups of each tissue for each trait (adjust P cutoff 0.05).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.