Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Complementary Alu sequences mediate enhancer–promoter selectivity

A Publisher Correction to this article was published on 31 July 2023

This article has been updated

Abstract

Enhancers determine spatiotemporal gene expression programs by engaging with long-range promoters1,2,3,4. However, it remains unknown how enhancers find their cognate promoters. We recently developed a RNA in situ conformation sequencing technology to identify enhancer–promoter connectivity using pairwise interacting enhancer RNAs and promoter-derived noncoding RNAs5,6. Here we apply this technology to generate high-confidence enhancer–promoter RNA interaction maps in six additional cell lines. Using these maps, we discover that 37.9% of the enhancer–promoter RNA interaction sites are overlapped with Alu sequences. These pairwise interacting Alu and non-Alu RNA sequences tend to be complementary and potentially form duplexes. Knockout of Alu elements compromises enhancer–promoter looping, whereas Alu insertion or CRISPR–dCasRx-mediated Alu tethering to unregulated promoter RNAs can create new loops to homologous enhancers. Mapping 535,404 noncoding risk variants back to the enhancer–promoter RNA interaction maps enabled us to construct variant-to-function maps for interpreting their molecular functions, including 15,318 deletions or insertions in 11,677 Alu elements that affect 6,497 protein-coding genes. We further demonstrate that polymorphic Alu insertion at the PTK2 enhancer can promote tumorigenesis. Our study uncovers a principle for determining enhancer–promoter pairing specificity and provides a framework to link noncoding risk variants to their molecular functions.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: EPRI map construction and validation.
Fig. 2: EPRI junctions enrich Alu elements.
Fig. 3: Alu elements dictate enhancer–promoter selectivity.
Fig. 4: Tethering Alu and non-Alu RNAs dictate enhancer–promoter selectivity.
Fig. 5: Alu deletions influence tumorigenesis by modulating EPRIs.

Data availability

RIC-seq data for all the cell lines and chromatin RNA-seq for HeLa cells shown in this study are available in the GEO under the accession number GSE190214Source data are provided with this paper.

Code availability

Custom codes used for data analysis in this paper can be found at https://github.com/liangliangibp/Alu.

Change history

References

  1. Schoenfelder, S. & Fraser, P. Long-range enhancer–promoter contacts in gene expression control. Nat. Rev. Genet. 20, 437–455 (2019).

    Article  CAS  PubMed  Google Scholar 

  2. Sanyal, A., Lajoie, B. R., Jain, G. & Dekker, J. The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  3. Vakoc, C. R. et al. Proximity among distant regulatory elements at the β-globin locus requires GATA-1 and FOG-1. Mol. Cell 17, 453–462 (2005).

    Article  CAS  PubMed  Google Scholar 

  4. Amano, T. et al. Chromosomal dynamics at the Shh locus: limb bud-specific differential regulation of competence and active transcription. Dev. Cell 16, 47–57 (2009).

    Article  CAS  PubMed  Google Scholar 

  5. Cai, Z. et al. RIC-seq for global in situ profiling of RNA–RNA spatial interactions. Nature 582, 432–437 (2020).

    Article  ADS  CAS  PubMed  Google Scholar 

  6. Cao, C. et al. Global in situ profiling of RNA–RNA spatial interactions with RIC-seq. Nat. Protoc. 16, 2916–2946 (2021).

    Article  CAS  PubMed  Google Scholar 

  7. Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  8. Corradin, O. & Scacheri, P. C. Enhancer variants: evaluating functions in common disease. Genome Med. 6, 85 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  10. Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  11. Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Li, G. et al. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 148, 84–98 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Mumbach, M. R. et al. Enhancer connectome in primary human cells identifies target genes of disease-associated DNA elements. Nat. Genet. 49, 1602–1612 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Xue, Y. et al. Sequential regulatory loops as key gatekeepers for neuronal reprogramming in human cells. Nat. Neurosci. 19, 807–815 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Morf, J. et al. RNA proximity sequencing reveals the spatial organization of the transcriptome in the nucleus. Nat. Biotechnol. 37, 793–802 (2019).

    Article  CAS  PubMed  Google Scholar 

  16. International HapMap Consortiumet al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).

    Article  Google Scholar 

  17. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  ADS  Google Scholar 

  18. GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

    Article  Google Scholar 

  19. Tang, Z. et al. CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription. Cell 163, 1611–1627 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Hu, G. et al. Systematic screening of CTCF binding partners identifies that BHLHE40 regulates CTCF genome-wide distribution and long-range chromatin interactions. Nucleic Acids Res. 48, 9606–9620 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Weintraub, A. S. et al. YY1 is a structural regulator of enhancer–promoter loops. Cell 171, 1573–1588.e28 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Deng, W. et al. Controlling long-range genomic interactions at a native locus by targeted tethering of a looping factor. Cell 149, 1233–1244 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  25. Nora, E. P. et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381–385 (2012).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  26. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  27. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    Article  ADS  CAS  PubMed  Google Scholar 

  28. Deininger, P. Alu elements: know the SINEs. Genome Biol. 12, 236 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Lu, Z. et al. RNA duplex map in living cells reveals higher-order transcriptome structure. Cell 165, 1267–1279 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Wu, Y. et al. Correction of a genetic disease in mouse via use of CRISPR–Cas9. Cell Stem Cell 13, 659–662 (2013).

    Article  CAS  PubMed  Google Scholar 

  31. Konermann, S. et al. Transcriptome engineering with RNA-targeting type VI-D CRISPR effectors. Cell 173, 665–676.e14 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2017).

    Article  CAS  PubMed  Google Scholar 

  33. Law, M. H. et al. Genome-wide meta-analysis identifies five new susceptibility loci for cutaneous malignant melanoma. Nat. Genet. 47, 987–995 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Sud, A. et al. Genome-wide association study of classical Hodgkin lymphoma identifies key regulators of disease susceptibility. Nat. Commun. 8, 1892 (2017).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  35. Harty, L. C. et al. HLA-DR, HLA-DQ, and TAP genes in familial Hodgkin disease. Blood 99, 690–693 (2002).

    Article  CAS  PubMed  Google Scholar 

  36. Van Nostrand, E. L. et al. A large-scale binding and functional map of human RNA-binding proteins. Nature 583, 711–719 (2020).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  37. Payer, L. M. et al. Structural variants caused by Alu insertions are associated with risks for many human diseases. Proc. Natl Acad. Sci. USA 114, E3984–E3992 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Sulzmaier, F. J., Jean, C. & Schlaepfer, D. D. FAK in cancer: mechanistic findings and clinical applications. Nat. Rev. Cancer 14, 598–610 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Ye, R., Cao, C. & Xue, Y. Enhancer RNA: biogenesis, function, and regulation. Essays Biochem. 64, 883–894 (2020).

    Article  CAS  PubMed  Google Scholar 

  41. Lam, M. T. et al. Rev-Erbs repress macrophage gene expression by inhibiting enhancer-directed transcription. Nature 498, 511–515 (2013).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  42. Li, W. et al. Functional roles of enhancer RNAs for oestrogen-dependent transcriptional activation. Nature 498, 516–520 (2013).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  43. Schaukowitch, K. et al. Enhancer RNA facilitates NELF release from immediate early genes. Mol. Cell 56, 29–42 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Lai, F. et al. Activating RNAs associate with Mediator to enhance chromatin architecture and transcription. Nature 494, 497–501 (2013).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  45. Li, X. et al. GRID-seq reveals the global RNA–chromatin interactome. Nat. Biotechnol. 35, 940–950 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Henninger, J. E. et al. RNA-mediated feedback control of transcriptional condensates. Cell 184, 207–225.e24 (2021).

    Article  CAS  PubMed  Google Scholar 

  47. Sharp, P. A., Chakraborty, A. K., Henninger, J. E. & Young, R. A. RNA in formation and regulation of transcriptional condensates. RNA 28, 52–57 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Quinodoz, S. A. et al. RNA promotes the formation of spatial compartments in the nucleus. Cell 184, 5775–5790.e30 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Bhat, P., Honson, D. & Guttman, M. Nuclear compartmentalization as a mechanism of quantitative control of gene expression. Nat. Rev. Mol. Cell Biol. 22, 653–670 (2021).

    Article  CAS  PubMed  Google Scholar 

  50. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  PubMed  Google Scholar 

  51. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Yu, G. C., Wang, L. G., Han, Y. Y. & He, Q. Y. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics 16, 284–287 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Huang da, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).

    Article  PubMed  Google Scholar 

  56. Lai, F., Gardini, A., Zhang, A. & Shiekhattar, R. Integrator mediates the biogenesis of enhancer RNAs. Nature 525, 399–403 (2015).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  57. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Ramirez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Robinson, J. T. et al. Integrative Genomics Viewer. Nat. Biotechnol. 29, 24–26 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Gaidatzis, D., Burger, L., Florescu, M. & Stadler, M. B. Analysis of intronic and exonic reads in RNA-seq data characterizes transcriptional and post-transcriptional regulation. Nat. Biotechnol. 33, 722–729 (2015).

    Article  CAS  PubMed  Google Scholar 

  63. Concordet, J. P. & Haeussler, M. CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res. 46, W242–W245 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Kryuchkova-Mostacci, N. & Robinson-Rechavi, M. A benchmark of gene expression tissue-specificity metrics. Brief Bioinform. 18, 205–214 (2017).

    CAS  PubMed  Google Scholar 

  67. Bailey, T. L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Reuter, J. S. & Mathews, D. H. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics 11, 129 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  71. Licatalosi, D. D. et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464–469 (2008).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  72. Boix, C. A., James, B. T., Park, Y. P., Meuleman, W. & Kellis, M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature 590, 300–307 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  73. Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  74. Cao, X. et al. Polymorphic mobile element insertions contribute to gene expression and alternative splicing in human tissues. Genome Biol. 21, 185 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank F. Lai for critically reading this manuscript; and B. Zhou, J. Su and W. Zhai for helpful suggestions on handling repeat elements. This work was supported by the National Key Research and Development Program of China (2022YFA1303300), the National Natural Science Foundation of China (91940306, 32025008, 32130064, 91740201 and 81921003), the Strategic Priority Program of Chinese Academy of Sciences (XDB37000000), and the K.C. Wong Education Foundation (GJTD-2020-06) to Y.X.; by the National Key Research and Development Program of China (2018YFA0108700) and the Guangdong Provincial Special Support Program for Prominent Talents (2021JC06Y656) to P.Z.; by the National Natural Science Foundation of China (31900465 and 32070620) to C.C.; and by the National Natural Science Foundation of China (31970610) to L.J.

Author information

Authors and Affiliations

Authors

Contributions

Y.X. conceived and supervised the project. Z.C. cultured cells and constructed the RIC-seq library. L.L. and C.C. performed the bioinformatics analysis and prepared the figures. R.Y. and L.J. performed the LNA ASO knockdown and qPCR. L.J. and J.C. built the Alu-knockout and ALu-knock-in cell lines, and performed the Alu tethering assay with the help of X.Yu., J.Z., Z.B. and R.W. D.W. performed the chromatin RNA-seq. L.J., D.W. and Z.C. did the 3C–qPCR experiments. L.J. performed the DNA and RNA FISH, the colony formation assay, quantified the proliferation rate and invasion ability of HeLa cells, and generated the nude mouse xenograft model. X.Yang and P.Z. advised on bioinformatics analysis. Y.X. wrote the manuscript with the help of C.C. and L.L.

Corresponding author

Correspondence to Yuanchao Xue.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 EPRIs identification and colocalization of the corresponding RNAs.

a, Diagram of strategies for defining enhancer and promoter regions, assigning chimeric reads to enhancer-promoter pairs, and identifying high-confidence EPRIs. b, Pie chart showing the number of RIC-seq identified enhancer-promoter and promoter-promoter RNA interactions for each cell line. c, EPRI maps deduced by RIC-seq. Chr. represents chromosome. The individual chromosome is shown in different colors. d, Spanning distance of EPRIs in the linear genome. Mb, megabases. e, Percentage of enhancers paired with different numbers of promoters. f, Percentage of promoters paired with different numbers of enhancers. g, smFISH showing pairwise interacting TE6189-MGST1 (linear distance: 4.0 Mb), TE6189-TUBA1B (linear distance: 28.8 Mb), SE351-CFD (linear distance: 68.0 Kb), SE351-CNN2 (linear distance: 234.9 Kb), SE351-PTBP1 (linear distance: 5.7 Kb), SE351-POLRMT (linear distance: 134.9 Kb), SE135-CD44 (linear distance: 30.0 Mb), SE135-RBM4 (linear distance: 1.2 Mb), SE638-EXT1 (linear distance: 9.1 Mb), and SE138-SYT12 (linear distance: 25.0 Kb) noncoding RNAs are co-localized in HeLa cells. DAPI, blue; eRNA, green; promoter RNA, red. Scale bar, 5 μm. h, TE6189 and SE351 are not co-localized with their non-target promoter-derived uaRNAs (SLCO1B1 and FGF22). The linear distance for TE6189-SLCO1B1 and SE351-FGF22 is 584.6 Kb and 147.3 Kb, respectively. Scale bar, 5 μm. The experiments in g and h were independently repeated three times with similar results. i, Percentage of cells showing colocalization for examined eRNA and uaRNA pairs within the nucleus in 16 cells. j, DNA FISH showing TE6189 and TUBA1B loci are co-localized, whereas TE6189 and the non-target gene SLC6A12 (linear distance: 20.2 Mb) are not co-localized. Scale bar, 5 μm. k, Percentage of cells showing colocalization for TE6189 and TUBA1B loci in 20 cells. TE6189 non-target gene SLC6A12 serves as a negative control.

Extended Data Fig. 2 Validation of RIC-seq detected EPRIs.

a,b,c, qPCR shows reduced transcription of genes linked to super-enhancer SE135 (a), SE638 (b), and SE138 (c) after knocking down the corresponding eRNAs with LNA ASOs. The nearby genes CHORDC1, PCAT1, and STIP1 are locus-specific negative controls. The blue arc lines represent chimeric reads linking enhancers and promoters. Genes on the Watson or Crick strands were shown as red and blue boxes, respectively. A non-targeting LNA ASO and LNA ASOs targeting unrelated RNAs (Unrelated LNA) serve as negative controls. Data are mean ± s.d.; n = 3 biological replicates, two-tailed unpaired Student’s t-test. d, Scatter plots showing the two biological replicates of chromatin RNA-seq are highly correlated. R, Pearson correlation coefficient. RPM, reads per million mapped reads. e, Scatter plots showing the changed expression of intronic read counts after knocking down SE351 eRNA by chromatin RNA-seq (ChromRNA-seq). Red dots, SE351-linked genes; Pink dot, SE351 eRNA; Blue dot, GAPDH; Green dot, ACTB. f, Bar graph showing the average fold change of SE351-linked genes is significantly lower than randomly selected genes with an equal number across the entire genome. n = 1000 (repeat 1000 times). Data are the mean ± s.e.m. P-value is calculated by a one-sided permutation test. g, Paternal or maternal mutations (red lines) that influence EPRIs usually lead to lower gene expressions in an allele-specific manner. Randomly selected control genes are shown in black lines. The median value for each group is marked. P-values are determined by the one-tailed Kolmogorov–Smirnov test. h, Genes whose EPRIs are affected by polymorphic variants in the human populations showed larger expression variances and higher Gini indexes. Blue, randomly selected abundance-matched gene sets. Data are the mean ± s.d.; n = 1000 (repeat 1000 times).

Source Data

Extended Data Fig. 3 Comparison between EPRIs and enhancer-promoter loops identified by DNA proximity methods.

a, Genomic spanning distances of EPRIs identified by RIC-seq and enhancer-promoter loops detected from DNA proximity methods. Dashed lines indicate the peaks of spanning distances. b, Venn diagram showing the overlap between enhancer-promoter loops identified by DNA proximity contacts (Pol II ChIA-PET, in situ Hi-C, and HiChIP) and EPRIs detected by RIC-seq. c, EPRIs with more RIC-seq chimeric reads show higher overlapping percentages with enhancer-promoter loops detected by Pol II ChIA-PET, Hi-C, and HiChIP. d, Intra-chromosomal EPRIs over 2 Mb apart (left panel) or inter-chromosomal EPRIs (right panel) showed higher Hi-C signals than randomly paired enhancers and promoters. Data are the mean ± s.e.m., one-tailed unpaired Student’s t-test. e, EPRIs with more RIC-seq chimeric reads have higher DNA-DNA interaction strength measured by Hi-C in GM12878, IMR90, and K562 cells. Data are the mean ± s.e.m., two-tailed unpaired Student’s t-test. f, Bar graph showing the effect size (fractional effect on gene expression upon CRISPRi perturbation of an enhancer) for enhancer-gene pairs supported by EPRIs are lower than those without EPRIs. Data are the mean ± s.e.m., one-tailed, Wilcoxon rank-sum test. g, The cumulative curve of ABC scores for enhancer-gene pairs supported by ERPIs versus those without EPRIs. P-values are determined by the two-tailed Kolmogorov–Smirnov test. h, Contact frequency of EPRIs derived from RIC-seq in seven cell lines as a function of genomic distance across the genome shows a power law scaling with a slope of −0.65.

Extended Data Fig. 4 EPRI features.

a, Enhancer-regulated genes tend to be functionally related compared with randomly selected genes. n = 7 cell lines. The center line of the box plot represents the median, the box borders represent the first (Q1) and third (Q3) quartiles, and the whiskers are the most extreme data points within 1.5× the interquartile range (from Q1 to Q3). Two-tailed unpaired Student’s t-test. b, Enhancers-regulated functional clusters. Orange dots, enhancers; outside dots, target genes. c, qPCR showing reduced transcription of target genes linked to TE19551 after eRNA knockdown using the mixture of two independent LNA ASOs. Non-target gene HLA-DQA1 serves as a negative control. Data are the mean ± s.d.; n = 3 biological replicates, two-tailed unpaired Student’s t-test. d–e, Pairwise interacting RNA fragments from enhancers (d) and promoters (e) enrich motifs (red bars) that can cover the most consensus sequence of the Alu element. The color scale indicates E-value for each motif calculated by MEME. fj, The relative distribution of LINE/L1 (f), LINE/L2 (g), SINE/MIR (h), LTR/ERV1 (i), and LTR/ERVL-MaLR (j) repeat elements around pairwise interacting RNA fragments in enhancers (left) and promoters (right). k, The relative distribution of mouse B1 elements around EPRIs in MEF cells. l, Bar graph showing the percentage of Hi-C and HiChIP detected enhancer-promoter loops that contained at least one Alu element. m–n, Enhancer-gene pairs supported by EPRIs (red line, n = 58,594) have higher nascent RNA levels than that supported by Hi-C (m, blue line, n = 3,749) or HiChIP (n, blue line, n = 300,734). Two-tailed Kolmogorov–Smirnov test. oq, EPRI chimeric reads overlapped Alu elements tend to have higher ATAC-seq (o), DNase-seq (p), and GRO-seq signals (q) than other Alu elements from the same enhancer-promoter pairs. Data are the mean ± s.e.m., two-tailed unpaired Student’s t-test.

Source Data

Extended Data Fig. 5 EPRIs contain complementary Alus from the same subfamily.

a, Pie chart showing enriched Alu subtypes at interacting enhancer-promoter RNAs. b, Heatmap showing different Alu sequences enriched at pairwise interacting enhancer-promoter RNAs. Enhancer RNA, left; promoter RNA, right. Color scale, the proportion of Alu elements in each EPRI. c, EPRIs have more Alu elements from the same subfamily than randomly shuffled enhancer-promoter pairs (red bars). Data are the mean ± s.d.; n = 1000 (repeat 1000 times), one-sided permutation test. d, Clustering of Alu elements based on sequences from RepeatMasker. Alu subfamilies with lengths over 250 bp are used for this analysis. e, Alu RNAs from the same subfamilies tend to interact with each other. Color scale, fold enrichment of the observed to expected Alu-Alu RNA interactions; Circle size, −log10 P-values; two-tailed binomial test. Chimeric reads linking Alu sequences are 144,358 for 7 cell lines. Row, enhancer Alus; Column, promoter Alus. f, Box plots showing the EPRI-covered regions tend to contain reverse-oriented Alu sequences in seven cell lines (n = 7). Each point represents one cell line. The center line of the box plot represents the median, the box borders represent the first (Q1) and third (Q3) quartiles, and the whiskers are the most extreme data points within 1.5 × the interquartile range (from Q1 to Q3). Two-tailed paired Student’s t-test. g, h, PARIS chimeric reads are enriched at Alu sequences of pairwise interacting eRNAs (g) and uaRNAs (or promoter RNAs) (h) detected by RIC-seq. i, EPRIs overlapped with PARIS signals. Black arc lines, PARIS chimeric reads; Blue arc lines, RIC-seq chimeric reads; Alu, purple or red boxes. j, Diagram of the MFE calculation strategy. k, Pairwise interacting Alus (orange line) have a lower minimum free energy density than the shuffled controls (blue line). Dashed lines represent the median values.

Extended Data Fig. 6 Alu elements modulate the transcription of enhancer-linked genes.

a, qPCR showing the reduced transcription of AEBP2 after knockout Alu from the linked enhancer TE6189 (left) or its promoter (right). The nearby genes PYROXD1 and ETNK1 serve as negative controls. A genome browser view of DNase-seq, H3K27ac, GRO-seq, and RIC-seq signals around the ablated Alu elements in the grey shadow region is shown as inserts in the middle. Red boxes represent Alu elements. b, qPCR showing the reduced transcription of SPTBN2 after Alu knockout from the super-enhancer SE138 (left) or its promoter (right). The nearby genes GRK2 and SYT12 serve as negative controls. c, qPCR showing the reduced transcription of SET after Alu knockout from TE30536 (left) or its promoter (right). The nearby genes NUP188 and ENDOG serve as negative controls. d, qPCR showing the reduced transcription of ID1 after Alu knockout from the super-enhancer SE424 (left) or its promoter (right). The nearby genes KIF3B and HM13 serve as negative controls. eg, qPCR showing the increased transcription of FOXS1 (e), FRG1BP (f), and REM1 (g) after tethering Alu RNA to their promoter-derived uaRNAs. The genes ID1 and KIF3B serve as negative controls. sg-empty, sgRNA only; sgNT-Alu, non-targeting sgRNA but fused with Alu; sgRNA-GFP, targeting sgRNA but fused with GFP RNA. Data in ag are the mean ± s.d.; n = 3 independent replicates from two different cell lines, two-tailed unpaired Student’s t-test.

Source Data

Extended Data Fig. 7 Non-Alu complementary sequences can mediate EPRIs.

a, Pairwise interacting eRNA-uaRNA fragments without overlapped Alu sequences (red line, n = 580,281 chimeric reads) showed significantly lower MFE values than randomly shuffled sequences (blue line). Two-tailed Kolmogorov–Smirnov test was used to calculate the P-value. b, Heatmap showing the purine content around the pairwise interacting eRNA (left) and uaRNA (or promoter RNA) fragments (right). The line plots with different colors illustrate the nucleotide content of each non-Alu RNA fragment. The color scale indicates purine content. c, The frequency of pyrimidine in eRNA fragments (left, cytosine; right, uracil) positively correlates with the frequency of purine in uaRNA fragments (left, guanine; right, adenine). ρ, Spearman correlation coefficient. Data are the mean ± s.d., and the two-tailed correlation test (cor. test) in R was used to calculate the P-value, n = 580,281 chimeric reads. d, RIC-seq detected complementary base pairing between TE11866 eRNA and CCDC200 promoter RNA was supported by PARIS data. Dark and light blue arc lines represent PARIS or RIC-seq chimeric reads, respectively. The RNA duplex formed between TE11866 eRNA and CCDC200 promoter RNA is shown at the bottom panel. ΔG, free energy.

Extended Data Fig. 8 Regulatory mechanisms of GWAS identified noncoding variants.

a–b, The overlapped variant-gene pairs between GWAS variants-to-function maps and the maps built by EpiMap, ENCODE, or ABC model across all biosamples (a) and the 7 cell types (b) used in our study. c–d, Distribution of EPRI chimeric reads around DHSs located at enhancers (c) and promoters (d). e–f, GRO-seq signals around DHSs located at enhancers (e) and promoters (f) in HeLa cells. g, Mapping GWAS-associated noncoding variants into EPRI map to construct a variant-to-function map. Mutated Alus in enhancers or promoters are shown as red triangles and circles. h, Enriched KEGG pathways for the affected genes in panel g. i, GWAS variant rs6914598 in SE1838 is linked to the promoter of BCAT2. j, The rs9269081 in TE19551 is linked to the promoter of HLA-DRB5. Orange and green boxes represent enhancers and promoters (Pro.). k,l, RBP-binding profiles around the interacting enhancer and promoter RNA fragments in K562 (k) and HepG2 cells (l). The color scale indicates normalized complexity. m, Schematic diagram of Jaccard similarity index used to analyze RBPs co-bound at interacting eRNAs and uaRNAs (or promoter RNAs). n,o, Boxplot showing EPRIs have a higher Jaccard index (red box) than randomly paired enhancers and promoters (blue box) in K562 (n) and HepG2 cells (o). The center line of the box plot represents the median, the box borders represent the first (Q1) and third (Q3) quartiles, and the whiskers are the most extreme data points within 1.5× the interquartile range (from Q1 to Q3). P-values are determined by a two-tailed unpaired Student’s t-test. EPRI chimeric reads are 47,876 and 30,181 for K562 and HepG2 cells, respectively. p,q, SNVs were preferentially positioned around the eCLIP peaks for RBPs enriched on eRNAs and uaRNAs in K562 (p) and HepG2 cells (q).

Extended Data Fig. 9 Alu insertions or deletions at enhancers can influence tumorigenesis.

a–f, The variant-to-function maps were individually constructed for six different cells, including GM12878 (a), H1 hESC (b), IMR90 (c), HepG2 (d), hNPC (e), and K562 (f). g, Enriched KEGG pathways for genes potentially affected by mutated Alu through disrupting EPRIs (red bar). Randomly selected EPRI-regulated genes served as control (blue bar, repeated 100 times). Data are the mean ± s.e.m., two-tailed unpaired Student’s t-test. h, Meta plot shows ICGC variants tend to deplete around Alu elements near known oncogenes (n = 691) but enriched around tumor suppressor genes (TSG, n = 878). P-value is determined by one-sample, two-tailed, Student’s t-test. Alu mutations may impair enhancer-promoter connectivity to influence the transcription of cancer-related genes and induce tumorigenesis. i, qPCR showing that the reduced expression of PTK2 after knockout Alu from the corresponding enhancer (blue bar) and subsequent ectopic expression of PTK2 can fully rescue the reduction (red bar). j, MTT assay showing reduced cell proliferation after knockout Alu from TE29256. Ectopic expression of PTK2 can reverse cell proliferation phenotype. n = 6 biological replicates. k, Colony formation assay shows that knockout Alu element from TE29256 in HeLa cells reduces cell proliferation. In contrast, ectopic expression of PTK2 fully reversed the proliferation phenotype. l, Transwell assay showing that knockout Alu element from TE29256 decreases cell metastasis, and ectopic expression of PTK2 can reverse the effect. Scale bar, 50 μm. The experiments in i,k, and l were independently repeated three times with similar results. Data in il are the mean ± s.d. The P-values in panels i,k, and l are calculated using a two-tailed unpaired Student’s t-test. The P-values in panel j are calculated using a one-tailed paired Student’s t-test.

Source Data

Extended Data Fig. 10 eRNA interacted intronic sites enrich Alu elements.

a, Alu elements are enriched around the eRNA interacting sites in introns. b, The interaction preferences of enhancer Alu RNA sequences and intronic Alu RNA sequences revealed by RIC-seq.

Supplementary information

Supplementary Information

This file contains Supplementary Notes 1 and 2, and additional references. Supplementary Note 1: Chromatin RNA-seq validates SE351-linked genes. This file contains a discussion of the regulatory specificity of EPRIs. Supplementary Note 2: Functional implication of Alu-associated genetic variants in EPRIs. This file contains a discussion of the potential functions of Alu-associated genetic variants in cancer.

Reporting Summary

Supplementary Table 1

Mapping results of RIC-seq libraries. This table contains the numbers of raw reads, clean reads, uniquely mapped reads, uniquely mapped chimeric reads, and the percentage of chimeric reads.

Supplementary Table 2

Risk variants affected EPRIs. This table contains genes that may be affected by ICGC variants, GWAS variants, or Alu deletions by impairing EPRIs.

Supplementary Table 3

List of primers, probes, sgRNAs, and LNA ASOs used in this study. This table contains the sequence of primers, probes, sgRNAs, and LNA ASOs used in this study.

Source data

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, L., Cao, C., Ji, L. et al. Complementary Alu sequences mediate enhancer–promoter selectivity. Nature 619, 868–875 (2023). https://doi.org/10.1038/s41586-023-06323-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-023-06323-x

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing