Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells

Lubelsky, Yoav; Ulitsky, Igor

doi:10.1038/nature25757

Letter
Published: 24 January 2018

Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells

Yoav Lubelsky¹ &
Igor Ulitsky¹

Nature volume 555, pages 107–111 (2018)Cite this article

21k Accesses
225 Citations
103 Altmetric
Metrics details

Subjects

Abstract

Long noncoding RNAs (lncRNAs) are emerging as key parts of multiple cellular pathways¹, but their modes of action and how these are dictated by sequence remain unclear. lncRNAs tend to be enriched in the nuclear fraction, whereas most mRNAs are overtly cytoplasmic², although several studies have found that hundreds of mRNAs in various cell types are retained in the nucleus^3,4. It is thus conceivable that some mechanisms that promote nuclear enrichment are shared between lncRNAs and mRNAs. Here, to identify elements in lncRNAs and mRNAs that can force nuclear localization, we screened libraries of short fragments tiled across nuclear RNAs, which were cloned into the untranslated regions of an efficiently exported mRNA. The screen identified a short sequence derived from Alu elements and bound by HNRNPK that increased nuclear accumulation. Binding of HNRNPK to C-rich motifs outside Alu elements is also associated with nuclear enrichment in both lncRNAs and mRNAs, and this mechanism is conserved across species. Our results thus identify a pathway for regulation of RNA accumulation and subcellular localization that has been co-opted to regulate the fate of transcripts with integrated Alu elements.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: NucLibA analysis and the SIRLOIN element.**

**Figure 3: Effect of SIRLOIN sequence changes on element activity.**

**Figure 4: HNRNPK drives nuclear localization of SIRLOIN-containing transcripts.**

U1 snRNP regulates chromatin retention of noncoding RNAs

Article 11 March 2020

Yafei Yin, J. Yuyang Lu, … Xiaohua Shen

Cellular functions of long noncoding RNAs

Article 02 May 2019

Run-Wen Yao, Yang Wang & Ling-Ling Chen

Context-specific effects of sequence elements on subcellular localization of linear and circular RNAs

Article Open access 05 May 2022

Maya Ron & Igor Ulitsky

Accession codes

Primary accessions

Sequence Read Archive

SRP111756

References

Ulitsky, I. & Bartel, D. P. lincRNAs: genomics, evolution, and mechanisms. Cell 154, 26–46 (2013)
Article CAS Google Scholar
Derrien, T . et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012)
Article CAS Google Scholar
Bahar Halpern, K. et al. Nuclear retention of mRNA in mammalian tissues. Cell Reports 13, 2653–2662 (2015)
Article CAS Google Scholar
Battich, N ., Stoeger, T. & Pelkmans, L. Control of transcript variability in single mammalian cells. Cell 163, 1596–1610 (2015)
Article CAS Google Scholar
Miyagawa, R. et al. Identification of cis- and trans-acting factors involved in the localization of MALAT-1 noncoding RNA to nuclear speckles. RNA 18, 738–751 (2012)
Article CAS Google Scholar
Zhang, B. et al. A novel RNA motif mediates the strict nuclear localization of a long noncoding RNA. Mol. Cell. Biol. 34, 2318–2329 (2014)
Article Google Scholar
Chen, L. L., DeCerbo, J. N. & Carmichael, G. G. Alu element-mediated gene silencing. EMBO J. 27, 1694–1705 (2008)
Article CAS Google Scholar
Prasanth, K. V. et al. Regulating gene expression through RNA nuclear retention. Cell 123, 249–263 (2005)
Article CAS Google Scholar
Schueler, M. et al. Differential protein occupancy profiling of the mRNA transcriptome. Genome Biol. 15, R15 (2014)
Article Google Scholar
Versteeg, R. et al. The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome Res. 13, 1998–2004 (2003)
Article CAS Google Scholar
Lev-Maor, G., Sorek, R., Shomron, N. & Ast, G. The birth of an alternatively spliced exon: 3′ splice-site selection in Alu exons. Science 300, 1288–1291 (2003)
Article CAS ADS Google Scholar
Chen, C., Ara, T. & Gautheret, D. Using Alu elements as polyadenylation sites: A case of retroposon exaptation. Mol. Biol. Evol. 26, 327–334 (2009)
Article CAS Google Scholar
Tajnik, M. et al. Intergenic Alu exonisation facilitates the evolution of tissue-specific transcript ends. Nucleic Acids Res. 43, 10492–10505 (2015)
CAS PubMed PubMed Central Google Scholar
Zarnack, K. et al. Direct competition between hnRNP C and U2AF65 protects the transcriptome from the exonization of Alu elements. Cell 152, 453–466 (2013)
Article CAS Google Scholar
Kelley, D. R. & Rinn, J. L. Transposable elements reveal a stem cell specific class of long noncoding RNAs. Genome Biol. 13, R107 (2012)
Article Google Scholar
Gong, C. & Maquat, L. E. lncRNAs transactivate STAU1-mediated mRNA decay by duplexing with 3′ UTRs via Alu elements. Nature 470, 284–288 (2011)
Article CAS ADS Google Scholar
Johnson, R. & Guigó, R. The RIDL hypothesis: transposable elements as functional domains of long noncoding RNAs. RNA 20, 959–976 (2014)
Article CAS Google Scholar
Dimitrova, N . et al. LincRNA-p21 activates p21 in cis to promote Polycomb target gene expression and to enforce the G1/S checkpoint. Mol. Cell 54, 777–790 (2014)
Article CAS Google Scholar
Chu, C. et al. Systematic discovery of Xist RNA binding proteins. Cell 161, 404–416 (2015)
Article CAS Google Scholar
Maticzka, D., Lange, S. J., Costa, F. & Backofen, R. GraphProt: modeling binding preferences of RNA-binding proteins. Genome Biol. 15, R17 (2014)
Article Google Scholar
Choi, H. S. et al. Poly(C)-binding proteins as transcriptional regulators of gene expression. Biochem. Biophys. Res. Commun. 380, 431–436 (2009)
Article CAS Google Scholar
Paziewska, A., Wyrwicz, L. S., Bujnicki, J. M., Bomsztyk, K. & Ostrowski, J. Cooperative binding of the hnRNP K three KH domains to mRNA targets. FEBS Lett. 577, 134–140 (2004)
Article CAS Google Scholar
Baron-Benhamou, J., Gehring, N. H., Kulozik, A. E. & Hentze, M. W. Using the λN peptide to tether proteins to RNAs. Methods Mol. Biol. 257, 135–154 (2004)
CAS PubMed Google Scholar
Akef, A., Lee, E. S. & Palazzo, A. F. Splicing promotes the nuclear export of β-globin mRNA by overcoming nuclear retention elements. RNA 21, 1908–1920 (2015)
Article CAS Google Scholar
Giulietti, M., Milantoni, S. A., Armeni, T., Principato, G. & Piva, F. ExportAid: database of RNA elements regulating nuclear RNA export in mammals. Bioinformatics 31, 246–251 (2015)
Article CAS Google Scholar
Roy, D., Bhanja Chowdhury, J. & Ghosh, S. Polypyrimidine tract binding protein (PTB) associates with intronic and exonic domains to squelch nuclear export of unspliced RNA. FEBS Lett. 587, 3802–3807 (2013)
Article CAS Google Scholar
Shukla, C. J. et al. High-throughput identification of RNA nuclear enrichment sequences. EMBO J. e98452 (2018)
Cabili, M. N. et al. Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution. Genome Biol. 16, 20 (2015)
Article Google Scholar
Bomsztyk, K., Denisenko, O. & Ostrowski, J. hnRNP K: one protein multiple processes. BioEssays 26, 629–638 (2004)
Article CAS Google Scholar
Braunschweig, U. et al. Widespread intron retention in mammals functionally tunes transcriptomes. Genome Res. 24, 1774–1786 (2014)
Article CAS Google Scholar
Durocher, Y., Perret, S. & Kamen, A. High-level and high-throughput recombinant protein production by transient transfection of suspension-growing human 293-EBNA1 cells. Nucleic Acids Res. 30, E9 (2002)
Article Google Scholar
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011)
Article CAS Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014)
Article Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013)
Article CAS Google Scholar
Gagliardi, M. & Matarazzo, M. R. RIP: RNA immunoprecipitation. Methods Mol. Biol. 1480, 73–86 (2016)
Article CAS Google Scholar
George, T. C. et al. Quantitative measurement of nuclear translocation events using similarity analysis of multispectral cellular images obtained in flow. J. Immunol. Methods 311, 117–129 (2006)
Article CAS Google Scholar
Katz, Y., Wang, E. T., Airoldi, E. M. & Burge, C. B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015 (2010)
Article CAS Google Scholar
Tilgner, H. et al. Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res. 22, 1616–1625 (2012)
Article CAS Google Scholar
Fagerberg, L . et al. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol. Cell. Proteomics 13, 397–406 (2014)
Article CAS Google Scholar

Download references

Acknowledgements

We thank R. Pillai, S. Schwarz, S. Itzkovitz, N. Stern Ginossar, and members of the Ulitsky laboratory for discussions and comments on the manuscript, and Z. Porat from the Weizmann Institute FACS core for assistance with Imaging Flow Cytometry. This research was supported by the Israeli Centers for Research Excellence (1796/12); Israel Science Foundation (1242/14 and 1984/14); European Research Council lincSAFARI; Minerva Foundation; Lapon Raymond; and the Abramson Family Center for Young Scientists. I.U. is incumbent of the Sygnet Career Development Chair for Bioinformatics.

Author information

Authors and Affiliations

Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
Yoav Lubelsky & Igor Ulitsky

Authors

Yoav Lubelsky
View author publications
You can also search for this author in PubMed Google Scholar
Igor Ulitsky
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.L. and I.U. conceived and designed the study. Y.L. carried out all experiments and I.U. carried out computational analysis. Y.L. and I.U. wrote the manuscript.

Corresponding author

Correspondence to Igor Ulitsky.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Reviewer Information Nature thanks J. Ule, C. Wahlestedt and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Figure 1 Effects of tiles in NucLibA on localization.

Nuclear/cytoplasmic ratios for all the tiles in the indicated lncRNAs when cloned into the indicated region of AcGFP. Tiles overlapping repetitive elements are in black and other tiles are in grey. Regions with segments containing the SIRLOIN elements are shaded.

Extended Data Figure 2 Nuclear/cytoplasmic ratios for lncRNAs and mRNAs.

a, Nuclear/cytoplasmic expression ratios for mRNAs and lncRNAs containing the indicated number of SIRLOIN elements or SIRLOIN reverse complement (antisense) in MCF7 cells (ENCODE data). Otherwise, as in Fig. 1f. b, Nuclear/cytoplasmic expression ratio in mRNAs (left) and lncRNAs (right) in ten ENCODE cell lines. Transcripts without SIRLOIN elements, with two SIRLOIN elements, with a SIRLOIN element in one of the internal exons, or with two antisense SIRLOIN elements are compared. Asterisks above the ‘2 SIRLOIN’ and ‘Internal SIRLOIN’ boxes indicate P < 0.01 by two-sided Wilcoxon test relative to the genes with no SIRLOIN elements. Asterisks above the ‘2 Antisense’ boxes indicate P < 0.01 relative to the ‘2 SIRLOIN’ boxes. Otherwise, as indicated in Fig. 1g. n = 29–16,694 genes per group. c, A rearranged version of part of b that facilitates visual comparison between mRNAs with at least two SIRLOIN elements and lncRNAs without SIRLOIN elements. Asterisks indicate P < 0.01 by two-sided Wilcoxon test. n = 50–16,694 genes per group.

Extended Data Figure 3 Comparison of splicing isoforms, chromatin enrichment and mRNA half-lives.

a, Semi-quantitative RT–PCR in the indicated fraction (left) and gene structures (right) of genes with multiple alternative isoforms, some of which contain SIRLOIN elements. Gene structures taken from RefSeq, UCSC or GENCODE genes; Alu annotations are from the UCSC genome browser. Arrows indicate positions of the primers used for RT–PCR. RNA-seq coverage taken from MCF-7 RNA-seq from the ENCODE project. Experiments were repeated once. b, RNA-seq-based expression ratios comparing nuclear/cytoplasmic, chromatin/nucleoplasmic, and nucleolar/nucleoplasmic RNA fractions for RNAs with the indicated number of SIRLOIN or antisense SIRLOIN elements (data from the ENCODE project). n = 82–17,481 genes per group. Boxplots show 5th, 25th, 50th, 75th and 95th percentiles. P values computed using two-sided Wilcoxon test. c, Comparison of half-lives in MCF-7 cells⁹ of protein-coding genes with the indicated number of SIRLOIN elements. Boxplots show 5th, 25th, 50th, 75th and 95th percentiles.

Extended Data Figure 4 Nuclear/cytoplasmic ratios for NucLibB tiles.

Effects of all the tiles in the indicated lncRNAs and mRNAs on nuclear/cytoplasmic expression ratios when cloned into the 3′ UTR of AcGFP. Tiles overlapping other repetitive elements are in black, and other tiles are in grey. Regions of consecutive tiles overlapping SIRLOIN elements are shaded grey.

Extended Data Figure 5 Statistics and mutagenesis results in NucLibB.

a, Correlation between nuclear/cytoplasmic ratios and expression levels of sequences in NucLibB (only wild-type sequences were analysed and two replicates were pooled; n = 2,930 for all data and n = 104 for SIRLOIN data). SIRLOIN-containing sequences are circled. Colouring indicates the local point density. Both replicates are plotted together and correlations and P values were computed using Spearman correlation. b, c, Nuc/Cyt ratios and expression levels of 109-nt fragments containing repetitions of the indicated motifs from JPX#9 (b) and NucLibA PVT1#22 (c), separated by AT dinucleotides. Horizontal lines indicate levels of NucLibB JPX#9 (b) and PVT1#23 (c) sequence (the closest to the NucLibA PVT1#22 sequence). d, Effects of mutations in JPX#9 (2nd replicate) on localization (top) and expression levels (bottom). e, Effects of mutations in PVT1#22. Note that the wild-type sequence of PVT1#22 from NucLibA was not included in NucLibB, and so the values of PVT1#23, which is the closest to that sequence in NucLibB, are shown. f, Effect of shuffles of the indicated kmers in PVT1#22 sequence. g, Predicted secondary structures of full-length AcGFP mRNA with the indicated inserts. Secondary structure prediction by the Vienna package RNAfold server with default parameters. The SIRLOIN elements are indicated in blue.

Extended Data Figure 6 HNRNPK regulation in K562 and MCF7 cells.

a, Ratios of expression levels in the nucleus/cytoplasm for transcripts with the indicated number of eCLIP clusters for HNRNPK in K652 cells. P values indicate significant difference by two-sided Wilcoxon test between genes with the indicated number of clusters and genes without eCLIP clusters. Boxplots show 5th, 25th, 50th, 75th and 95th percentiles. b, Knockdown of HNRNPK in MCF7 cells using siRNAs assessed by western blot. Experiment was performed twice and both replicates are shown. c, HNRNPK mRNA levels measured by qRT–PCR. Each bar presents a single independent experiment. d, Correlation between nuclear/cytoplasmic ratios in MCF7 cells in the ENCODE data and in the RNA-seq data collected in this study. Hexagons are coloured according to the number of genes. n = 13,235. e, Ratios of expression levels in the nucleus/cytoplasm for transcripts with the indicated number of eCLIP clusters for HNRNPK in HepG2 cells, separately for lncRNAs and mRNAs. Otherwise as in Fig. 4f. f, Effects of HNRNPK knockdown on expression levels in the indicated sample, for lncRNAs and mRNAs containing the indicated number of SIRLOIN elements. Otherwise as in Fig. 4f. g, Spearman correlation coefficients between the number of appearances of hexamers in the internal exons of transcripts and the change in nuclear/cytoplasmic ratios following HNRNPK knockdown (average of two replicates). Hexamers are grouped by the number of C bases in the hexamer. Colouring indicates the local point density.

Extended Data Figure 7 Correlations between hexamer occurrences and nuclear/cytoplasmic ratios in human and mouse cells.

a, Spearman correlation coefficients between hexamers with the indicated number of bases and nuclear/cytoplasmic ratios in human HepG2 cells. Each point represents a hexamer sequence, the occurrences of which were counted in all exons of all >200-nt RefSeq transcripts. Colouring indicates the local point density. R values indicate Spearman correlation coefficients between the number of indicated bases in the hexamer and the nuclear/cytoplasmic ratios. b, Spearman correlation coefficients computed as in a for each of ten human ENCODE cell lines and three mouse cell types, when examining the indicated transcript types and the indicated exon subsets.

Extended Data Figure 8 HNRNPK knockdown in HeLa cells.

a, Representative western blot (top) and qRT–PCR quantification (bottom) of HNRNPK knockdown (each bar shows the level in a single experiment). Experiment was repeated twice. b, Changes in nuclear/cytoplasmic ratios of genes with the indicated number of HNRNPK eCLIP peaks following HNRNPK knockdown (DESeq2 analysis of two independent replicates). Otherwise as in Fig. 4f. n = 91–11,016 genes. c, As in b, separately for mRNA and lncRNAs. n = 85–10,077 mRNAs per group, 6–939 lncRNAs per group. d, Changes in nuclear/cytoplasmic ratios of mRNAs and lncRNAs with the indicated number of SIRLOIN elements. Otherwise as in Fig. 4f. n = 53–11,172 mRNAs per group, 10–891 lncRNAs per group.

Extended Data Figure 9 SIRLOIN-like element in mouse.

a, Comparison of nuclear/cytoplasmic ratios for orthologous genes in human HepG2 cells (ENCODE data) and mouse liver³. n = 10,160 conserved genes. Colouring indicates the local point density. Correlation computed using Spearman correlation. b, Top, sequences of the Mlxipl#71/72 tiles in NucLibA. Centre, nuclear/cytoplasmic ratios for each tile with a sufficient number of reads in NucLibA. Bottom, WCE/input expression levels for each tile. The region of tiles #71–72 is shaded in grey. c, Left, difference in log-transformed nuclear/cytoplasmic ratios between orthologous genes with the indicated number of human SIRLOIN elements or mouse SIRLOIN-like elements. Right, differences in overall expression levels; human liver expression data taken from the Human Proteome Atlas³⁹, mouse liver expression data taken from the ENCODE project. P values computed using two-sided Wilcoxon test, comparing to genes without SIRLOIN elements. Boxplots show 5th, 25th, 50th, 75th and 95th percentiles.

Extended Data Figure 10 RNA editing in SIRLOIN elements.

a, Frequencies of all base substitutions in each fraction, normalized to their occurrence in the input libraries. Samples from NucLibB were analysed, and all these samples were sequenced in the same NextSeq sequencing run. The number of substitutions was tallied for reads with fewer than six mutations relative to the reference library sequence. The count of each mutation type in each sample was then divided by the total number of mutations in the sample, the two replicates were averaged and the number of mutations in each sample type was divided by the number in the input library. b, For each reference A base in the indicated sample and the indicated group of segments, we computed the fraction of reads that contained a mutation to a G. The positions were then binned in 1% bins, and the heat map shows that fraction of bases mapping to each bin. In all cases, >90% of As were found in the <1% editing bin, which is not shown. Each row corresponds to a single experiment. c, In each heat map, each row corresponds to an A in a segment containing a SIRLOIN element, and the fraction of reads containing an A→G or A→T/C mutation is shown in each sample. Each column corresponds to a single experiment. d, As in c, but showing a specific SIRLOIN-containing segment. e, Correlation between high levels of RNA editing, localization (left) and expression levels (right). ‘High A→X’ are those segments in which the mean A→X mutation levels were at least three times higher than the median across all segments, in at least three of the four libraries (nucleus/cytoplasm, two replicates). Numbers near each group name correspond to sample size. Nuc/Cyt and WCE/Input levels were averaged across the two replicates. Boxplots show 5th, 25th, 50th, 75th and 95th percentiles and P values were computed using a two-sided Wilcoxon test.

Supplementary information

Supplementary Information

This file contains the Supplementary Notes 1-6, Supplementary References and the uncropped gels. (PDF 1579 kb)

Life Sciences Reporting Summary (PDF 73 kb)

Supplementary Tables

This zipped file contains Supplementary Tables 1-12 and a Supplementary Table guide. (ZIP 2823 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Fig. 4

Source data

Source data to Fig. 1

Source data to Fig. 2

Source data to Fig. 3

Source data to Fig. 4

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lubelsky, Y., Ulitsky, I. Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells. Nature 555, 107–111 (2018). https://doi.org/10.1038/nature25757

Download citation

Received: 14 May 2017
Accepted: 29 December 2017
Published: 24 January 2018
Issue Date: 01 March 2018
DOI: https://doi.org/10.1038/nature25757

This article is cited by

Decryption of sequence, structure, and functional features of SINE repeat elements in SINEUP non-coding RNA-mediated post-transcriptional gene regulation
- Harshita Sharma
- Matthew N. Z. Valentine
- Piero Carninci
Nature Communications (2024)
Massively parallel screen uncovers many rare 3′ UTR variants regulating mRNA abundance of cancer driver genes
- Ting Fu
- Kofi Amoah
- Xinshu Xiao
Nature Communications (2024)
Targeting and engineering long non-coding RNAs for cancer therapy
- Michela Coan
- Simon Haefliger
- Rory Johnson
Nature Reviews Genetics (2024)
Autonomous transposons tune their sequences to ensure somatic suppression
- İbrahim Avşar Ilık
- Petar Glažar
- Tuğçe Aktaş
Nature (2024)
Integrative modeling of lncRNA-chromatin interaction maps reveals diverse mechanisms of nuclear retention
- Shayan Tabe-Bordbar
- Saurabh Sinha
BMC Genomics (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.