Abstract
Enhancers determine spatiotemporal gene expression programs by engaging with long-range promoters1,2,3,4. However, it remains unknown how enhancers find their cognate promoters. We recently developed a RNA in situ conformation sequencing technology to identify enhancer–promoter connectivity using pairwise interacting enhancer RNAs and promoter-derived noncoding RNAs5,6. Here we apply this technology to generate high-confidence enhancer–promoter RNA interaction maps in six additional cell lines. Using these maps, we discover that 37.9% of the enhancer–promoter RNA interaction sites are overlapped with Alu sequences. These pairwise interacting Alu and non-Alu RNA sequences tend to be complementary and potentially form duplexes. Knockout of Alu elements compromises enhancer–promoter looping, whereas Alu insertion or CRISPR–dCasRx-mediated Alu tethering to unregulated promoter RNAs can create new loops to homologous enhancers. Mapping 535,404 noncoding risk variants back to the enhancer–promoter RNA interaction maps enabled us to construct variant-to-function maps for interpreting their molecular functions, including 15,318 deletions or insertions in 11,677 Alu elements that affect 6,497 protein-coding genes. We further demonstrate that polymorphic Alu insertion at the PTK2 enhancer can promote tumorigenesis. Our study uncovers a principle for determining enhancer–promoter pairing specificity and provides a framework to link noncoding risk variants to their molecular functions.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Data availability
RIC-seq data for all the cell lines and chromatin RNA-seq for HeLa cells shown in this study are available in the GEO under the accession number GSE190214. Source data are provided with this paper.
Code availability
Custom codes used for data analysis in this paper can be found at https://github.com/liangliangibp/Alu.
Change history
31 July 2023
A Correction to this paper has been published: https://doi.org/10.1038/s41586-023-06475-w
References
Schoenfelder, S. & Fraser, P. Long-range enhancer–promoter contacts in gene expression control. Nat. Rev. Genet. 20, 437–455 (2019).
Sanyal, A., Lajoie, B. R., Jain, G. & Dekker, J. The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012).
Vakoc, C. R. et al. Proximity among distant regulatory elements at the β-globin locus requires GATA-1 and FOG-1. Mol. Cell 17, 453–462 (2005).
Amano, T. et al. Chromosomal dynamics at the Shh locus: limb bud-specific differential regulation of competence and active transcription. Dev. Cell 16, 47–57 (2009).
Cai, Z. et al. RIC-seq for global in situ profiling of RNA–RNA spatial interactions. Nature 582, 432–437 (2020).
Cao, C. et al. Global in situ profiling of RNA–RNA spatial interactions with RIC-seq. Nat. Protoc. 16, 2916–2946 (2021).
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).
Corradin, O. & Scacheri, P. C. Enhancer variants: evaluating functions in common disease. Genome Med. 6, 85 (2014).
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021).
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Li, G. et al. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 148, 84–98 (2012).
Mumbach, M. R. et al. Enhancer connectome in primary human cells identifies target genes of disease-associated DNA elements. Nat. Genet. 49, 1602–1612 (2017).
Xue, Y. et al. Sequential regulatory loops as key gatekeepers for neuronal reprogramming in human cells. Nat. Neurosci. 19, 807–815 (2016).
Morf, J. et al. RNA proximity sequencing reveals the spatial organization of the transcriptome in the nucleus. Nat. Biotechnol. 37, 793–802 (2019).
International HapMap Consortiumet al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Tang, Z. et al. CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription. Cell 163, 1611–1627 (2015).
Hu, G. et al. Systematic screening of CTCF binding partners identifies that BHLHE40 regulates CTCF genome-wide distribution and long-range chromatin interactions. Nucleic Acids Res. 48, 9606–9620 (2020).
Weintraub, A. S. et al. YY1 is a structural regulator of enhancer–promoter loops. Cell 171, 1573–1588.e28 (2017).
Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
Deng, W. et al. Controlling long-range genomic interactions at a native locus by targeted tethering of a looping factor. Cell 149, 1233–1244 (2012).
Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
Nora, E. P. et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381–385 (2012).
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Deininger, P. Alu elements: know the SINEs. Genome Biol. 12, 236 (2011).
Lu, Z. et al. RNA duplex map in living cells reveals higher-order transcriptome structure. Cell 165, 1267–1279 (2016).
Wu, Y. et al. Correction of a genetic disease in mouse via use of CRISPR–Cas9. Cell Stem Cell 13, 659–662 (2013).
Konermann, S. et al. Transcriptome engineering with RNA-targeting type VI-D CRISPR effectors. Cell 173, 665–676.e14 (2018).
MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2017).
Law, M. H. et al. Genome-wide meta-analysis identifies five new susceptibility loci for cutaneous malignant melanoma. Nat. Genet. 47, 987–995 (2015).
Sud, A. et al. Genome-wide association study of classical Hodgkin lymphoma identifies key regulators of disease susceptibility. Nat. Commun. 8, 1892 (2017).
Harty, L. C. et al. HLA-DR, HLA-DQ, and TAP genes in familial Hodgkin disease. Blood 99, 690–693 (2002).
Van Nostrand, E. L. et al. A large-scale binding and functional map of human RNA-binding proteins. Nature 583, 711–719 (2020).
Payer, L. M. et al. Structural variants caused by Alu insertions are associated with risks for many human diseases. Proc. Natl Acad. Sci. USA 114, E3984–E3992 (2017).
Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).
Sulzmaier, F. J., Jean, C. & Schlaepfer, D. D. FAK in cancer: mechanistic findings and clinical applications. Nat. Rev. Cancer 14, 598–610 (2014).
Ye, R., Cao, C. & Xue, Y. Enhancer RNA: biogenesis, function, and regulation. Essays Biochem. 64, 883–894 (2020).
Lam, M. T. et al. Rev-Erbs repress macrophage gene expression by inhibiting enhancer-directed transcription. Nature 498, 511–515 (2013).
Li, W. et al. Functional roles of enhancer RNAs for oestrogen-dependent transcriptional activation. Nature 498, 516–520 (2013).
Schaukowitch, K. et al. Enhancer RNA facilitates NELF release from immediate early genes. Mol. Cell 56, 29–42 (2014).
Lai, F. et al. Activating RNAs associate with Mediator to enhance chromatin architecture and transcription. Nature 494, 497–501 (2013).
Li, X. et al. GRID-seq reveals the global RNA–chromatin interactome. Nat. Biotechnol. 35, 940–950 (2017).
Henninger, J. E. et al. RNA-mediated feedback control of transcriptional condensates. Cell 184, 207–225.e24 (2021).
Sharp, P. A., Chakraborty, A. K., Henninger, J. E. & Young, R. A. RNA in formation and regulation of transcriptional condensates. RNA 28, 52–57 (2022).
Quinodoz, S. A. et al. RNA promotes the formation of spatial compartments in the nucleus. Cell 184, 5775–5790.e30 (2021).
Bhat, P., Honson, D. & Guttman, M. Nuclear compartmentalization as a mechanism of quantitative control of gene expression. Nat. Rev. Mol. Cell Biol. 22, 653–670 (2021).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Yu, G. C., Wang, L. G., Han, Y. Y. & He, Q. Y. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics 16, 284–287 (2012).
Huang da, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).
Lai, F., Gardini, A., Zhang, A. & Shiekhattar, R. Integrator mediates the biogenesis of enhancer RNAs. Nature 525, 399–403 (2015).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Ramirez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
Robinson, J. T. et al. Integrative Genomics Viewer. Nat. Biotechnol. 29, 24–26 (2011).
Gaidatzis, D., Burger, L., Florescu, M. & Stadler, M. B. Analysis of intronic and exonic reads in RNA-seq data characterizes transcriptional and post-transcriptional regulation. Nat. Biotechnol. 33, 722–729 (2015).
Concordet, J. P. & Haeussler, M. CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res. 46, W242–W245 (2018).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Kryuchkova-Mostacci, N. & Robinson-Rechavi, M. A benchmark of gene expression tissue-specificity metrics. Brief Bioinform. 18, 205–214 (2017).
Bailey, T. L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009).
Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).
Reuter, J. S. & Mathews, D. H. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics 11, 129 (2010).
Licatalosi, D. D. et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464–469 (2008).
Boix, C. A., James, B. T., Park, Y. P., Meuleman, W. & Kellis, M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature 590, 300–307 (2021).
Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).
Cao, X. et al. Polymorphic mobile element insertions contribute to gene expression and alternative splicing in human tissues. Genome Biol. 21, 185 (2020).
Acknowledgements
We thank F. Lai for critically reading this manuscript; and B. Zhou, J. Su and W. Zhai for helpful suggestions on handling repeat elements. This work was supported by the National Key Research and Development Program of China (2022YFA1303300), the National Natural Science Foundation of China (91940306, 32025008, 32130064, 91740201 and 81921003), the Strategic Priority Program of Chinese Academy of Sciences (XDB37000000), and the K.C. Wong Education Foundation (GJTD-2020-06) to Y.X.; by the National Key Research and Development Program of China (2018YFA0108700) and the Guangdong Provincial Special Support Program for Prominent Talents (2021JC06Y656) to P.Z.; by the National Natural Science Foundation of China (31900465 and 32070620) to C.C.; and by the National Natural Science Foundation of China (31970610) to L.J.
Author information
Authors and Affiliations
Contributions
Y.X. conceived and supervised the project. Z.C. cultured cells and constructed the RIC-seq library. L.L. and C.C. performed the bioinformatics analysis and prepared the figures. R.Y. and L.J. performed the LNA ASO knockdown and qPCR. L.J. and J.C. built the Alu-knockout and ALu-knock-in cell lines, and performed the Alu tethering assay with the help of X.Yu., J.Z., Z.B. and R.W. D.W. performed the chromatin RNA-seq. L.J., D.W. and Z.C. did the 3C–qPCR experiments. L.J. performed the DNA and RNA FISH, the colony formation assay, quantified the proliferation rate and invasion ability of HeLa cells, and generated the nude mouse xenograft model. X.Yang and P.Z. advised on bioinformatics analysis. Y.X. wrote the manuscript with the help of C.C. and L.L.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 EPRIs identification and colocalization of the corresponding RNAs.
a, Diagram of strategies for defining enhancer and promoter regions, assigning chimeric reads to enhancer-promoter pairs, and identifying high-confidence EPRIs. b, Pie chart showing the number of RIC-seq identified enhancer-promoter and promoter-promoter RNA interactions for each cell line. c, EPRI maps deduced by RIC-seq. Chr. represents chromosome. The individual chromosome is shown in different colors. d, Spanning distance of EPRIs in the linear genome. Mb, megabases. e, Percentage of enhancers paired with different numbers of promoters. f, Percentage of promoters paired with different numbers of enhancers. g, smFISH showing pairwise interacting TE6189-MGST1 (linear distance: 4.0 Mb), TE6189-TUBA1B (linear distance: 28.8 Mb), SE351-CFD (linear distance: 68.0 Kb), SE351-CNN2 (linear distance: 234.9 Kb), SE351-PTBP1 (linear distance: 5.7 Kb), SE351-POLRMT (linear distance: 134.9 Kb), SE135-CD44 (linear distance: 30.0 Mb), SE135-RBM4 (linear distance: 1.2 Mb), SE638-EXT1 (linear distance: 9.1 Mb), and SE138-SYT12 (linear distance: 25.0 Kb) noncoding RNAs are co-localized in HeLa cells. DAPI, blue; eRNA, green; promoter RNA, red. Scale bar, 5 μm. h, TE6189 and SE351 are not co-localized with their non-target promoter-derived uaRNAs (SLCO1B1 and FGF22). The linear distance for TE6189-SLCO1B1 and SE351-FGF22 is 584.6 Kb and 147.3 Kb, respectively. Scale bar, 5 μm. The experiments in g and h were independently repeated three times with similar results. i, Percentage of cells showing colocalization for examined eRNA and uaRNA pairs within the nucleus in 16 cells. j, DNA FISH showing TE6189 and TUBA1B loci are co-localized, whereas TE6189 and the non-target gene SLC6A12 (linear distance: 20.2 Mb) are not co-localized. Scale bar, 5 μm. k, Percentage of cells showing colocalization for TE6189 and TUBA1B loci in 20 cells. TE6189 non-target gene SLC6A12 serves as a negative control.
Extended Data Fig. 2 Validation of RIC-seq detected EPRIs.
a,b,c, qPCR shows reduced transcription of genes linked to super-enhancer SE135 (a), SE638 (b), and SE138 (c) after knocking down the corresponding eRNAs with LNA ASOs. The nearby genes CHORDC1, PCAT1, and STIP1 are locus-specific negative controls. The blue arc lines represent chimeric reads linking enhancers and promoters. Genes on the Watson or Crick strands were shown as red and blue boxes, respectively. A non-targeting LNA ASO and LNA ASOs targeting unrelated RNAs (Unrelated LNA) serve as negative controls. Data are mean ± s.d.; n = 3 biological replicates, two-tailed unpaired Student’s t-test. d, Scatter plots showing the two biological replicates of chromatin RNA-seq are highly correlated. R, Pearson correlation coefficient. RPM, reads per million mapped reads. e, Scatter plots showing the changed expression of intronic read counts after knocking down SE351 eRNA by chromatin RNA-seq (ChromRNA-seq). Red dots, SE351-linked genes; Pink dot, SE351 eRNA; Blue dot, GAPDH; Green dot, ACTB. f, Bar graph showing the average fold change of SE351-linked genes is significantly lower than randomly selected genes with an equal number across the entire genome. n = 1000 (repeat 1000 times). Data are the mean ± s.e.m. P-value is calculated by a one-sided permutation test. g, Paternal or maternal mutations (red lines) that influence EPRIs usually lead to lower gene expressions in an allele-specific manner. Randomly selected control genes are shown in black lines. The median value for each group is marked. P-values are determined by the one-tailed Kolmogorov–Smirnov test. h, Genes whose EPRIs are affected by polymorphic variants in the human populations showed larger expression variances and higher Gini indexes. Blue, randomly selected abundance-matched gene sets. Data are the mean ± s.d.; n = 1000 (repeat 1000 times).
Extended Data Fig. 3 Comparison between EPRIs and enhancer-promoter loops identified by DNA proximity methods.
a, Genomic spanning distances of EPRIs identified by RIC-seq and enhancer-promoter loops detected from DNA proximity methods. Dashed lines indicate the peaks of spanning distances. b, Venn diagram showing the overlap between enhancer-promoter loops identified by DNA proximity contacts (Pol II ChIA-PET, in situ Hi-C, and HiChIP) and EPRIs detected by RIC-seq. c, EPRIs with more RIC-seq chimeric reads show higher overlapping percentages with enhancer-promoter loops detected by Pol II ChIA-PET, Hi-C, and HiChIP. d, Intra-chromosomal EPRIs over 2 Mb apart (left panel) or inter-chromosomal EPRIs (right panel) showed higher Hi-C signals than randomly paired enhancers and promoters. Data are the mean ± s.e.m., one-tailed unpaired Student’s t-test. e, EPRIs with more RIC-seq chimeric reads have higher DNA-DNA interaction strength measured by Hi-C in GM12878, IMR90, and K562 cells. Data are the mean ± s.e.m., two-tailed unpaired Student’s t-test. f, Bar graph showing the effect size (fractional effect on gene expression upon CRISPRi perturbation of an enhancer) for enhancer-gene pairs supported by EPRIs are lower than those without EPRIs. Data are the mean ± s.e.m., one-tailed, Wilcoxon rank-sum test. g, The cumulative curve of ABC scores for enhancer-gene pairs supported by ERPIs versus those without EPRIs. P-values are determined by the two-tailed Kolmogorov–Smirnov test. h, Contact frequency of EPRIs derived from RIC-seq in seven cell lines as a function of genomic distance across the genome shows a power law scaling with a slope of −0.65.
Extended Data Fig. 4 EPRI features.
a, Enhancer-regulated genes tend to be functionally related compared with randomly selected genes. n = 7 cell lines. The center line of the box plot represents the median, the box borders represent the first (Q1) and third (Q3) quartiles, and the whiskers are the most extreme data points within 1.5× the interquartile range (from Q1 to Q3). Two-tailed unpaired Student’s t-test. b, Enhancers-regulated functional clusters. Orange dots, enhancers; outside dots, target genes. c, qPCR showing reduced transcription of target genes linked to TE19551 after eRNA knockdown using the mixture of two independent LNA ASOs. Non-target gene HLA-DQA1 serves as a negative control. Data are the mean ± s.d.; n = 3 biological replicates, two-tailed unpaired Student’s t-test. d–e, Pairwise interacting RNA fragments from enhancers (d) and promoters (e) enrich motifs (red bars) that can cover the most consensus sequence of the Alu element. The color scale indicates E-value for each motif calculated by MEME. f–j, The relative distribution of LINE/L1 (f), LINE/L2 (g), SINE/MIR (h), LTR/ERV1 (i), and LTR/ERVL-MaLR (j) repeat elements around pairwise interacting RNA fragments in enhancers (left) and promoters (right). k, The relative distribution of mouse B1 elements around EPRIs in MEF cells. l, Bar graph showing the percentage of Hi-C and HiChIP detected enhancer-promoter loops that contained at least one Alu element. m–n, Enhancer-gene pairs supported by EPRIs (red line, n = 58,594) have higher nascent RNA levels than that supported by Hi-C (m, blue line, n = 3,749) or HiChIP (n, blue line, n = 300,734). Two-tailed Kolmogorov–Smirnov test. o–q, EPRI chimeric reads overlapped Alu elements tend to have higher ATAC-seq (o), DNase-seq (p), and GRO-seq signals (q) than other Alu elements from the same enhancer-promoter pairs. Data are the mean ± s.e.m., two-tailed unpaired Student’s t-test.
Extended Data Fig. 5 EPRIs contain complementary Alus from the same subfamily.
a, Pie chart showing enriched Alu subtypes at interacting enhancer-promoter RNAs. b, Heatmap showing different Alu sequences enriched at pairwise interacting enhancer-promoter RNAs. Enhancer RNA, left; promoter RNA, right. Color scale, the proportion of Alu elements in each EPRI. c, EPRIs have more Alu elements from the same subfamily than randomly shuffled enhancer-promoter pairs (red bars). Data are the mean ± s.d.; n = 1000 (repeat 1000 times), one-sided permutation test. d, Clustering of Alu elements based on sequences from RepeatMasker. Alu subfamilies with lengths over 250 bp are used for this analysis. e, Alu RNAs from the same subfamilies tend to interact with each other. Color scale, fold enrichment of the observed to expected Alu-Alu RNA interactions; Circle size, −log10 P-values; two-tailed binomial test. Chimeric reads linking Alu sequences are 144,358 for 7 cell lines. Row, enhancer Alus; Column, promoter Alus. f, Box plots showing the EPRI-covered regions tend to contain reverse-oriented Alu sequences in seven cell lines (n = 7). Each point represents one cell line. The center line of the box plot represents the median, the box borders represent the first (Q1) and third (Q3) quartiles, and the whiskers are the most extreme data points within 1.5 × the interquartile range (from Q1 to Q3). Two-tailed paired Student’s t-test. g, h, PARIS chimeric reads are enriched at Alu sequences of pairwise interacting eRNAs (g) and uaRNAs (or promoter RNAs) (h) detected by RIC-seq. i, EPRIs overlapped with PARIS signals. Black arc lines, PARIS chimeric reads; Blue arc lines, RIC-seq chimeric reads; Alu, purple or red boxes. j, Diagram of the MFE calculation strategy. k, Pairwise interacting Alus (orange line) have a lower minimum free energy density than the shuffled controls (blue line). Dashed lines represent the median values.
Extended Data Fig. 6 Alu elements modulate the transcription of enhancer-linked genes.
a, qPCR showing the reduced transcription of AEBP2 after knockout Alu from the linked enhancer TE6189 (left) or its promoter (right). The nearby genes PYROXD1 and ETNK1 serve as negative controls. A genome browser view of DNase-seq, H3K27ac, GRO-seq, and RIC-seq signals around the ablated Alu elements in the grey shadow region is shown as inserts in the middle. Red boxes represent Alu elements. b, qPCR showing the reduced transcription of SPTBN2 after Alu knockout from the super-enhancer SE138 (left) or its promoter (right). The nearby genes GRK2 and SYT12 serve as negative controls. c, qPCR showing the reduced transcription of SET after Alu knockout from TE30536 (left) or its promoter (right). The nearby genes NUP188 and ENDOG serve as negative controls. d, qPCR showing the reduced transcription of ID1 after Alu knockout from the super-enhancer SE424 (left) or its promoter (right). The nearby genes KIF3B and HM13 serve as negative controls. e–g, qPCR showing the increased transcription of FOXS1 (e), FRG1BP (f), and REM1 (g) after tethering Alu RNA to their promoter-derived uaRNAs. The genes ID1 and KIF3B serve as negative controls. sg-empty, sgRNA only; sgNT-Alu, non-targeting sgRNA but fused with Alu; sgRNA-GFP, targeting sgRNA but fused with GFP RNA. Data in a–g are the mean ± s.d.; n = 3 independent replicates from two different cell lines, two-tailed unpaired Student’s t-test.
Extended Data Fig. 7 Non-Alu complementary sequences can mediate EPRIs.
a, Pairwise interacting eRNA-uaRNA fragments without overlapped Alu sequences (red line, n = 580,281 chimeric reads) showed significantly lower MFE values than randomly shuffled sequences (blue line). Two-tailed Kolmogorov–Smirnov test was used to calculate the P-value. b, Heatmap showing the purine content around the pairwise interacting eRNA (left) and uaRNA (or promoter RNA) fragments (right). The line plots with different colors illustrate the nucleotide content of each non-Alu RNA fragment. The color scale indicates purine content. c, The frequency of pyrimidine in eRNA fragments (left, cytosine; right, uracil) positively correlates with the frequency of purine in uaRNA fragments (left, guanine; right, adenine). ρ, Spearman correlation coefficient. Data are the mean ± s.d., and the two-tailed correlation test (cor. test) in R was used to calculate the P-value, n = 580,281 chimeric reads. d, RIC-seq detected complementary base pairing between TE11866 eRNA and CCDC200 promoter RNA was supported by PARIS data. Dark and light blue arc lines represent PARIS or RIC-seq chimeric reads, respectively. The RNA duplex formed between TE11866 eRNA and CCDC200 promoter RNA is shown at the bottom panel. ΔG, free energy.
Extended Data Fig. 8 Regulatory mechanisms of GWAS identified noncoding variants.
a–b, The overlapped variant-gene pairs between GWAS variants-to-function maps and the maps built by EpiMap, ENCODE, or ABC model across all biosamples (a) and the 7 cell types (b) used in our study. c–d, Distribution of EPRI chimeric reads around DHSs located at enhancers (c) and promoters (d). e–f, GRO-seq signals around DHSs located at enhancers (e) and promoters (f) in HeLa cells. g, Mapping GWAS-associated noncoding variants into EPRI map to construct a variant-to-function map. Mutated Alus in enhancers or promoters are shown as red triangles and circles. h, Enriched KEGG pathways for the affected genes in panel g. i, GWAS variant rs6914598 in SE1838 is linked to the promoter of BCAT2. j, The rs9269081 in TE19551 is linked to the promoter of HLA-DRB5. Orange and green boxes represent enhancers and promoters (Pro.). k,l, RBP-binding profiles around the interacting enhancer and promoter RNA fragments in K562 (k) and HepG2 cells (l). The color scale indicates normalized complexity. m, Schematic diagram of Jaccard similarity index used to analyze RBPs co-bound at interacting eRNAs and uaRNAs (or promoter RNAs). n,o, Boxplot showing EPRIs have a higher Jaccard index (red box) than randomly paired enhancers and promoters (blue box) in K562 (n) and HepG2 cells (o). The center line of the box plot represents the median, the box borders represent the first (Q1) and third (Q3) quartiles, and the whiskers are the most extreme data points within 1.5× the interquartile range (from Q1 to Q3). P-values are determined by a two-tailed unpaired Student’s t-test. EPRI chimeric reads are 47,876 and 30,181 for K562 and HepG2 cells, respectively. p,q, SNVs were preferentially positioned around the eCLIP peaks for RBPs enriched on eRNAs and uaRNAs in K562 (p) and HepG2 cells (q).
Extended Data Fig. 9 Alu insertions or deletions at enhancers can influence tumorigenesis.
a–f, The variant-to-function maps were individually constructed for six different cells, including GM12878 (a), H1 hESC (b), IMR90 (c), HepG2 (d), hNPC (e), and K562 (f). g, Enriched KEGG pathways for genes potentially affected by mutated Alu through disrupting EPRIs (red bar). Randomly selected EPRI-regulated genes served as control (blue bar, repeated 100 times). Data are the mean ± s.e.m., two-tailed unpaired Student’s t-test. h, Meta plot shows ICGC variants tend to deplete around Alu elements near known oncogenes (n = 691) but enriched around tumor suppressor genes (TSG, n = 878). P-value is determined by one-sample, two-tailed, Student’s t-test. Alu mutations may impair enhancer-promoter connectivity to influence the transcription of cancer-related genes and induce tumorigenesis. i, qPCR showing that the reduced expression of PTK2 after knockout Alu from the corresponding enhancer (blue bar) and subsequent ectopic expression of PTK2 can fully rescue the reduction (red bar). j, MTT assay showing reduced cell proliferation after knockout Alu from TE29256. Ectopic expression of PTK2 can reverse cell proliferation phenotype. n = 6 biological replicates. k, Colony formation assay shows that knockout Alu element from TE29256 in HeLa cells reduces cell proliferation. In contrast, ectopic expression of PTK2 fully reversed the proliferation phenotype. l, Transwell assay showing that knockout Alu element from TE29256 decreases cell metastasis, and ectopic expression of PTK2 can reverse the effect. Scale bar, 50 μm. The experiments in i,k, and l were independently repeated three times with similar results. Data in i–l are the mean ± s.d. The P-values in panels i,k, and l are calculated using a two-tailed unpaired Student’s t-test. The P-values in panel j are calculated using a one-tailed paired Student’s t-test.
Extended Data Fig. 10 eRNA interacted intronic sites enrich Alu elements.
a, Alu elements are enriched around the eRNA interacting sites in introns. b, The interaction preferences of enhancer Alu RNA sequences and intronic Alu RNA sequences revealed by RIC-seq.
Supplementary information
Supplementary Information
This file contains Supplementary Notes 1 and 2, and additional references. Supplementary Note 1: Chromatin RNA-seq validates SE351-linked genes. This file contains a discussion of the regulatory specificity of EPRIs. Supplementary Note 2: Functional implication of Alu-associated genetic variants in EPRIs. This file contains a discussion of the potential functions of Alu-associated genetic variants in cancer.
Supplementary Table 1
Mapping results of RIC-seq libraries. This table contains the numbers of raw reads, clean reads, uniquely mapped reads, uniquely mapped chimeric reads, and the percentage of chimeric reads.
Supplementary Table 2
Risk variants affected EPRIs. This table contains genes that may be affected by ICGC variants, GWAS variants, or Alu deletions by impairing EPRIs.
Supplementary Table 3
List of primers, probes, sgRNAs, and LNA ASOs used in this study. This table contains the sequence of primers, probes, sgRNAs, and LNA ASOs used in this study.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liang, L., Cao, C., Ji, L. et al. Complementary Alu sequences mediate enhancer–promoter selectivity. Nature 619, 868–875 (2023). https://doi.org/10.1038/s41586-023-06323-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-023-06323-x
This article is cited by
-
Tgfbr1 controls developmental plasticity between the hindlimb and external genitalia by remodeling their regulatory landscape
Nature Communications (2024)
-
Towards targeting transposable elements for cancer therapy
Nature Reviews Cancer (2024)
-
Transcription regulation by long non-coding RNAs: mechanisms and disease relevance
Nature Reviews Molecular Cell Biology (2024)
-
Genome-wide analysis of the interplay between chromatin-associated RNA and 3D genome organization in human cells
Nature Communications (2023)
-
Alu sequences promote long-distance relationships
Nature Reviews Genetics (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.