Genome-wide characterization of mammalian promoters with distal enhancer functions

Article metrics


Gene expression in mammals is precisely regulated by the combination of promoters and gene-distal regulatory regions, known as enhancers. Several studies have suggested that some promoters might have enhancer functions. However, the extent of this type of promoters and whether they actually function to regulate the expression of distal genes have remained elusive. Here, by exploiting a high-throughput enhancer reporter assay, we unravel a set of mammalian promoters displaying enhancer activity. These promoters have distinct genomic and epigenomic features and frequently interact with other gene promoters. Extensive CRISPR–Cas9 genomic manipulation demonstrated the involvement of these promoters in the cis regulation of expression of distal genes in their natural loci. Our results have important implications for the understanding of complex gene regulation in normal development and disease.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Comparison of proximal and distal DHSs with enhancer activity in two mouse cell lines.
Figure 2: CapStarr-seq with human promoters.
Figure 3: Genomic and epigenomic properties of Epromoters.
Figure 4: Expression of neighboring genes and promoter–promoter interactions.
Figure 5: Epromoters function as bona fide enhancers and regulate distal gene expression.
Figure 6: Epromoters are involved in a long-range response to IFN-α signaling.
Figure 7: eQTL association within Epromoters.

Accession codes

Primary accessions

Gene Expression Omnibus


  1. 1

    Kim, T.K. & Shiekhattar, R. Architectural and Functional Commonalities between Enhancers and Promoters. Cell 162, 948–959 (2015).

  2. 2

    Andersson, R. Promoter or enhancer, what's the difference? Deconstruction of established distinctions and presentation of a unifying model. BioEssays 37, 314–323 (2015).

  3. 3

    Core, L.J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 46, 1311–1320 (2014).

  4. 4

    Pekowska, A. et al. H3K4 tri-methylation provides an epigenetic signature of active enhancers. EMBO J. 30, 4198–4210 (2011).

  5. 5

    Koch, F. et al. Transcription initiation platforms and GTF recruitment at tissue-specific enhancers and promoters. Nat. Struct. Mol. Biol. 18, 956–963 (2011).

  6. 6

    Li, G. et al. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 148, 84–98 (2012).

  7. 7

    Leung, D. et al. Integrative analysis of haplotype-resolved epigenomes across human tissues. Nature 518, 350–354 (2015).

  8. 8

    Schaffner, W. Enhancers, enhancers - from their discovery to today's universe of transcription enhancers. Biol. Chem. 396, 311–327 (2015).

  9. 9

    Zabidi, M.A. et al. Enhancer-core-promoter specificity separates developmental and housekeeping gene regulation. Nature 518, 556–559 (2015).

  10. 10

    Scruggs, B.S. et al. Bidirectional Transcription Arises from Two Distinct Hubs of Transcription Factor Binding and Active Chromatin. Mol. Cell 58, 1101–1112 (2015).

  11. 11

    Nguyen, T.A. et al. High-throughput functional comparison of promoter and enhancer activities. Genome Res. 26, 1023–1033 (2016).

  12. 12

    Sanyal, A., Lajoie, B.R., Jain, G. & Dekker, J. The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012).

  13. 13

    Arnold, C.D. et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013).

  14. 14

    Schoenfelder, S. et al. The pluripotent regulatory circuitry connecting promoters to their long-range interacting elements. Genome Res. 25, 582–597 (2015).

  15. 15

    Engreitz, J.M. et al. Local regulation of gene expression by lncRNA promoters, transcription and splicing. Nature 539, 452–455 (2016).

  16. 16

    Kowalczyk, M.S. et al. Intragenic enhancers act as alternative promoters. Mol. Cell 45, 447–458 (2012).

  17. 17

    Dailey, L. High throughput technologies for the functional discovery of mammalian enhancers: new approaches for understanding transcriptional regulatory network dynamics. Genomics 106, 151–158 (2015).

  18. 18

    Vanhille, L. et al. High-throughput and quantitative assessment of enhancer activity in mammals by CapStarr-seq. Nat. Commun. 6, 6905 (2015).

  19. 19

    Nejepinska, J., Malik, R., Moravec, M. & Svoboda, P. Deep sequencing reveals complex spurious transcription from transiently transfected plasmids. PLoS One 7, e43283 (2012).

  20. 20

    Duttke, S.H. et al. Human promoters are intrinsically directional. Mol. Cell 57, 674–684 (2015).

  21. 21

    Roy, A.L. & Singer, D.S. Core promoters in transcription: old problem, new insights. Trends Biochem. Sci. 40, 165–171 (2015).

  22. 22

    Forrest, A.R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).

  23. 23

    Mahat, D.B., Salamanca, H.H., Duarte, F.M., Danko, C.G. & Lis, J.T. Mammalian Heat Shock Response and Mechanisms Underlying Its Genome-wide Transcriptional Regulation. Mol. Cell 62, 63–78 (2016).

  24. 24

    Hardison, R.C. & Taylor, J. Genomic approaches towards finding cis-regulatory modules in animals. Nat. Rev. Genet. 13, 469–483 (2012).

  25. 25

    Michaud, J. et al. HCFC1 is a common component of active human CpG-island promoters and coincides with ZNF143, THAP11, YY1, and GABP transcription factor occupancy. Genome Res. 23, 907–916 (2013).

  26. 26

    Whalen, S., Truty, R.M. & Pollard, K.S. Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat. Genet. 48, 488–496 (2016).

  27. 27

    Jin, F. et al. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature 503, 290–294 (2013).

  28. 28

    Li, Y. et al. CRISPR reveals a distal super-enhancer required for Sox2 expression in mouse embryonic stem cells. PLoS One 9, e114485 (2014).

  29. 29

    Feuerborn, A. & Cook, P.R. Why the activity of a gene depends on its neighbors. TIG 31, 483–490 (2015).

  30. 30

    Kagey, M.H. et al. Mediator and cohesin connect gene expression and chromatin architecture. Nature 467, 430–435 (2010).

  31. 31

    Mombaerts, P., Terhorst, C., Jacks, T., Tonegawa, S. & Sancho, J. Characterization of immature thymocyte lines derived from T-cell receptor or recombination activating gene 1 and p53 double mutant mice. Proc. Natl. Acad. Sci. USA 92, 7420–7424 (1995).

  32. 32

    Dao, L.T.M., Vanhille, L., Griffon, A., Fernandez, N. & Spicuglia, S. CapStarr-seq protocol. Protocol Exchange (2015).

  33. 33

    Glusman, G., Caballero, J., Robinson, M., Kutlu, B. & Hood, L. Optimal scaling of digital transcriptomes. PLoS One 8, e77885 (2013).

  34. 34

    Reimand, J. et al. g:Profiler-a web server for functional interpretation of gene lists (2016 update). Nucleic Acids Res. 44, W1 W83-9 (2016).

  35. 35

    McLean, C.Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).

  36. 36

    Saeed, A.I. et al. TM4: a free, open-source system for microarray data management and analysis. Biotechniques 34, 374–378 (2003).

  37. 37

    Kuznetsova, T. et al. Glucocorticoid receptor and nuclear factor kappa-b affect three-dimensional chromatin organization. Genome Biol. 16, 264 (2015).

  38. 38

    Tang, Z. et al. CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription. Cell 163, 1611–1627 (2015).

  39. 39

    Naito, Y., Hino, K., Bono, H. & Ui-Tei, K. CRISPRdirect: software for designing CRISPR/Cas guide RNA with reduced off-target sites. Bioinformatics 31, 1120–1123 (2015).

  40. 40

    Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013).

  41. 41

    Thorvaldsdóttir, H., Robinson, J.T. & Mesirov, J.P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinform. 14, 178–192 (2013).

  42. 42

    Stadhouders, R. et al. Multiplexed chromosome conformation capture sequencing for rapid genome-scale high-resolution detection of long-range chromatin interactions. Nat. Protoc. 8, 509–524 (2013).

  43. 43

    Stadhouders, R. et al. HBS1L-MYB intergenic variants modulate fetal hemoglobin via long-range MYB enhancers. J. Clin. Invest. 124, 1699–1710 (2014).

  44. 44

    Vieux-Rochas, M., Fabre, P.J., Leleu, M., Duboule, D. & Noordermeer, D. Clustering of mammalian Hox genes with other H3K27me3 targets within an active nuclear domain. Proc. Natl. Acad. Sci. USA 112, 4672–4677 (2015).

  45. 45

    Stadhouders, R. et al. Control of developmentally primed erythroid genes by combinatorial co-repressor actions. Nat. Commun. 6, 8893 (2015).

  46. 46

    Thongjuea, S., Stadhouders, R., Grosveld, F.G., Soler, E. & Lenhard, B. r3Cseq: an R/Bioconductor package for the discovery of long-range genomic interactions from chromosome conformation capture and next-generation sequencing data. Nucleic Acids Res. 41, e132 (2013).

  47. 47

    Klein, F.A. et al. FourCSeq: analysis of 4C sequencing data. Bioinformatics 31, 3085–3091 (2015).

  48. 48

    Walter, C., Schuetzmann, D., Rosenbauer, F. & Dugas, M. Basic4Cseq: an R/Bioconductor package for analyzing 4C-seq data. Bioinformatics 30, 3268–3269 (2014).

  49. 49

    Rusinova, I. et al. Interferome v2.0: an updated database of annotated interferon-regulated genes. Nucleic Acids Res. 41, D1040–D1046 (2013).

  50. 50

    Lawrence, M. et al. Software for computing and annotating genomic ranges. PLOS Comput. Biol. 9, e1003118 (2013).

  51. 51

    Benaglia, T., Chauveau, D., Hunter, D.R. & Young, D.S. mixtools: An R Package for Analyzing Finite Mixture Models. J. Stat. Softw. 32, 1–29 (2009).

Download references


We thank J.-C. Andrau and J. Imbert for critical reading of the manuscript. We thank the IBiSA 'Transcriptomics and Genomics Marseille-Luminy' (TGML) platform for sequencing of CapStarr-seq samples and the cell biology platform for management of cell culture. Work in the laboratory of S.S. was supported by recurrent funding from INSERM and Aix-Marseille University and by specific grants from the European Union's FP7 Programme (282510-BLUEPRINT), ARC (PJA 20151203149) and A*MIDEX (ANR-11-IDEX-0001-02). L.T.M.D., A.G. and G.C. were supported, respectively, by Vietnam International Education Development (911), CONACYT and FRM.

Author information

L.T.M.D. and S.S. conceptualized and designed the experiments. L.T.M.D. performed most experimental work. A.O.G.A. performed most bioinformatics analyses. J.A.C.-M. and J.v.H. performed motif analysis. C.A.-S., T.S., D.M. and E.S. performed 4C–seq experiments and analyses. C.S., A.G. and L.V. performed and analyzed data from mouse CapStarr-seq. J.A., M.T. and N.F. contributed to CRISPR screening and analyses of allelic expression. G.C. and D.P. performed ChIA-PET analyses. A.M.R. performed eQTL analysis. All authors contributed to reading, discussion and commenting on the manuscript. L.T.M.D. and S.S. wrote the manuscript.

Correspondence to Salvatore Spicuglia.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Analysis of DHS STARR-seq in the P5424 cell line.

(a) Luciferase enhancer assays of proximal DHSs defined as active or inactive enhancers by STARR-seq in P5424 cells. For each candidate, both orientations were tested. Data represent the normalized fold change over the vector control. Error bars show s.d. from three independent transfections (***P < 0.001, **P < 0.01, *P < 0.1; two-sided Student’s t test). (b) Enrichment score of lymphoid transcription factors at distal and proximal DHSs based on ChIP–seq data from developing thymocytes. The enrichment score was calculated as the −log10 (P value) obtained with a hypergeometric test (depletion is represented by negative values).

Supplementary Figure 2 CapStarr-seq experimental control and epigenomic profiles of Epromoters in K562 and HeLa cells.

(a) IGV screenshots of STARR-seq signals for four STARR-seq-positive controls in HeLa cells. (b) FACS analysis of GFP expression in K562 (left) and HeLa (right) cells transfected with a human promoter library or empty vector. Controls were untransfected cells. The increase in GFP expression in transfected cells with the promoter library indicates potential enhancer activity in the pooled library. (c) Overlap with CpG islands (50%) and regions conserved in placental mammals (10%) using the EpiExplorer tool. The control is non-Epromoters with equal levels of gene expression as Epromoters in the same cell type. (d) Average profiles of epigenomic features for Epromoters and control promoters with the same expression pattern of associated genes. Statistical significance was calculated in a region centered on the TSS (±1 kb) using two-sided Mann–Whitney U tests. Only significant differences (P < 0.001) are shown. (e) Percentage of chromatin state 4 (strong enhancers) found in K562 Epromoters (left) and HeLa Epromoters (right) across ENCODE cell lines using the EpiExplorer tool. (f,g) Density plots of TSS positions corresponding to the selected promoter regions using CAGE peaks from ENCODE data in HeLa (f, top) and K562 (f, bottom) cells and data from FANTOM3 (g, top) and FANTOM5 (g, bottom) (Kolmogorov test). (h) Percentage of TSSs assigned to RefSeq-defined TSSs using different CAGE databases (from data in Supplementary Table 2b). (i) Comparison of the number of different RefSeq-defined TSSs per coding gene (one-sided Mann–Whitney U test).

Supplementary Figure 3 Assessment of the IFN-αβ signaling pathway.

(a) Cumulative plot of normalized RNA levels (FPKM) for genes from the IFN-αβ signaling pathway (Reactome), based on RNA–seq data from 23 cell lines. The HeLa and K562 cell lines are highlighted (Kolmogorov test). (b) Heat map showing RNA–seq relative expression (FPKM) for genes from the IFN-αβ signaling pathway (Reactome) expressed at significantly higher levels in HeLa cells as compared to the 22 remaining cell lines (SAM analysis; α = 0.5). (c) Transcription signatures related to stress/interferon response significantly enriched in the set of Epromoter-associated genes in HeLa cells (GREAT tool).

Supplementary Figure 4 Enrichment of transcription factors at Epromoters.

(a,b) Average profiles of ChIP–seq signals for ENCODE transcription factors enriched at Epromoters in K562 (a) and HeLa (b) cells. Statistical significances were calculated in a region centered on the TSS (±250 bp) using two-sided Mann–Whitney U tests.

Supplementary Figure 5 Motif enrichment at Epromoters.

(a,b) Heat maps showing the enrichment distribution (log2 (observed/expected)) of the non-redundant collection of motifs obtained by combining transcription factor binding motif (TFBM) databases (Jaspar vertebrates and Hocomoco Human). TFBMs were used to scan the extended Epromoter-associated TSS from −1 kb to +1 kb and clustering was performed based on the binding profiles in K562 (a) and HeLa (b) cells. Motifs enriched around the TSS (black line) were selected. (c,d) Significantly enriched motifs in K562 (c) and HeLa (d) cells were identified by comparing the binding enrichment within the promoter region (−200 bp to +50 bp with respect to the TSS; highlighted as orange boxes) between Epromoters and the non-Epromoters. Binding site distribution (left), motif logos (middle) and E values (right) are shown only for significantly enriched motifs (E < 0.001; χ2 test). (e) Enrichment of Epromoters and non-Epromoters as a function of the number of different TFBMs found. The enrichment score was calculated as the −log10 (P value) obtained by hypergeometric test.

Supplementary Figure 6 Proximal and distal correlations of Epromoters with gene expression.

(a) Scatterplots showing the Pearson correlation between the STARR-seq signal of Epromoters and the expression of associated genes. (b) Examples of consistent promoter–promoter interactions observed with different ChIA-PET data sets in K562 cells.

Supplementary Figure 7 Generation of knockout and knock-in cell clones via CRISPR–Cas9.

(a) General strategy for the generation of (E)promoter knockouts. Two gRNAs, G1 and G2, were designed flanking the genomic target to delete the intervening DNA segment. The CRISPR–Cas9 system creates two double-strand breaks (DSBs) at 3 - 4 nt upstream of the PAM sequences (red) and releases the excised DNA (purple). The resulting DSB is repaired by the NHEJ pathway. The genomic deletion is detected by PCR using primers P1 and P2. (bh) Assessment of (E)promoter knockout. Left, IGV screenshots showing the DNase-seq (ENCODE) and RefSeq tracks for targeted regions. The locations of gRNAs (red boxes) and the expected sizes of deleted regions are indicated. Right, PCR validation of biallelic deletion in corresponding cell clones. Details on the gRNA sequences, PCR primers and expected PCR fragments are provided in Supplementary Table 9. (i) Strategy for the generation of the inverted FAF2 Epromoter knock-in. The two gRNAs, G1 and G2, used to generate DSBs are as in the knockout experiment. The repair template contains upstream and downstream homologous arms (HAs) flanking the inverted FAF2 Epromoter. The HDR-mediated repair pathway generates the inverted FAF2 Epromoter knock-in, which is detected by PCR with the combination of two primer pairs (1 + 2) and (2 + 3). (j) PCR validation of a successful inverted FAF2 Epromoter knock-in cell clone using the combination of primers shown in i. (k) RT–PCR detection of antisense (AS) transcription in an Inv.Ep.FAF2 clone. GAPDH was used as a cDNA loading control.

Supplementary Figure 8 Interaction and epigenetic co-regulation of FAF2 and RNF44.

(a) Genomic tracks showing the 4C–seq analysis of interactions between the FAF2 (a) and RNF44 (b) promoters in WT and knockout K562 clones. The viewpoint from the FAF2 Epromoter is indicated by an arrowhead. The specific interaction between the FAF2 and RNF44 promoters is highlighted by the orange box. (b) UCSC Genome Browser tracks for H3K4me3, H3K27ac and H3K36me3 at the FAF2 locus and nearby regions across the HMEC, K562, HeLa and DND41 cell lines.

Supplementary Figure 9 Additional validations of distal gene regulation by Epromoters.

(a,b) IGV tracks for RNA–seq, ChIP–seq and ChIA-PET Pol II data in K562 and HeLa cells at the CSDE1 (a) and TAGLN2 (b) loci and nearby regions. The promoter–promoter interactions for Epromoters are highlighted in red. (c,d) qPCR analysis of gene expression in WT, ΔEp.CSDE1 (c) and ΔEp.TAGLN2 (d) clones. The number following the gene name is the number of independent cell clones. (e) Allelic frequency of the A versus T variant (chr1:160000435) in PIGM transcripts in WT, ΔEp.TAGLN2 homozygous and ΔEp.TAGLN2 heterozygous K562 clones. The total number of reads is indicated for each sample. The significant deviation of allelic frequency in heterozygous clones with respect to homozygous samples was calculated by performing a one-sided Student’s t test. (f) IGV screenshot showing tracks for RNA–seq, ChIP–seq and ChIA-PET Pol II data in K562 cells at the BAZ2B locus and the nearby region. (g) qPCR analysis of gene expression in WT and ΔEp.BAZ2B clones. Knockout of the BAZ2B Epromoter resulted in significant reduction of MARCH7 expression but had no effect on the nearby gene WDSUB1 or BAZ2B (using primers 1 and 2 shown in h). (h) UCSC Genome Browser tracks showing the different BAZ2B transcripts and primers used in g and i. (i) Alternative promoter usage for the BAZ2B gene was assessed by RT–PCR in K562 cells. The smaller fragment size observed in ΔEp.BAZ2B clones corresponds to the deletion of exon 1 (asterisk in h). (j) IGV tracks for FANTOM3 and ENCODE CAGE data, CapStarr-seq regions and RNA junctions around the TSS of the indicated gene. The red color in CapStarr-seq tracks represents active Epromoters. For c, d and g, error bars show s.d. (n = 3 independent RNA/cDNA preparation; ***P < 0.001, **P < 0.01, *P < 0.1, two-sided Student’s t test).

Supplementary Figure 10 Epromoters involved in IFN-α signaling in K562 cells.

(a) Distribution of expression correlation for ChIA-PET interacting gene pairs including at least one Epromoter (red) or excluding Epromoters (gray) and randomly rewired gene pairs (green) using RNA–seq data from ENCODE. Statistical significance was assessed by Kolmogorov test. (b) Examples of clusters of interferon response genes (green labels) associated with Epromoters (red arrows) in HeLa cells. (ce) qPCR analysis of gene expression in WT, ΔEp.YPEL4 (c,d) and ΔEp.METTL21A (e) cell clones. Error bars show s.d (n = 3 independent RNA/cDNA preparations). (f) Allelic frequency of the T versus C variant (chr11:57319339) in UBE2L6 transcripts in WT, ΔEp.YPEL4 homozygous and ΔEp.YPEL4 heterozygous K562 clones. The total number of reads is indicated for each sample. The significant deviation of allelic frequency in heterozygous clones with respect to homozygous samples was calculated by performing a one-sided Student’s t test.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–10 and Supplementary Note. (PDF 2563 kb)

Supplementary Table 1

Mouse DHS CapStarr-seq data. (XLSX 418 kb)

Supplementary Table 2

Human promoter CapStarr-seq data. (XLSX 7980 kb)

Supplementary Table 3

Functional enrichment of Epromoter-associated genes. (XLSX 18 kb)

Supplementary Table 4

Transcription factor enrichment. (XLSX 40 kb)

Supplementary Table 5

Summary of gene pairs with promoter–promoter interactions detected by ChIA-PET. (XLSX 361 kb)

Supplementary Table 6

List of gene-pairs with ChIA-PET interaction and involved in the interferon response. (XLSX 10 kb)

Supplementary Table 7

ChIA-PET gene pairs that harbor an eQTL within the ±1.5-kb expanded promoter region and that affect the expression of the distal gene. (XLSX 5009 kb)

Supplementary Table 8

Summary of CapStarr-seq sequencing data. (XLSX 8 kb)

Supplementary Table 9

Summary of ENCODE data used in this study. (XLSX 57 kb)

Supplementary Table 10

Oligonucleotide sequences used in this study. (XLSX 23 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Dao, L., Galindo-Albarrán, A., Castro-Mondragon, J. et al. Genome-wide characterization of mammalian promoters with distal enhancer functions. Nat Genet 49, 1073–1081 (2017) doi:10.1038/ng.3884

Download citation

Further reading