Long noncoding RNAs (lncRNAs) are emerging as key parts of multiple cellular pathways1, but their modes of action and how these are dictated by sequence remain unclear. lncRNAs tend to be enriched in the nuclear fraction, whereas most mRNAs are overtly cytoplasmic2, although several studies have found that hundreds of mRNAs in various cell types are retained in the nucleus3,4. It is thus conceivable that some mechanisms that promote nuclear enrichment are shared between lncRNAs and mRNAs. Here, to identify elements in lncRNAs and mRNAs that can force nuclear localization, we screened libraries of short fragments tiled across nuclear RNAs, which were cloned into the untranslated regions of an efficiently exported mRNA. The screen identified a short sequence derived from Alu elements and bound by HNRNPK that increased nuclear accumulation. Binding of HNRNPK to C-rich motifs outside Alu elements is also associated with nuclear enrichment in both lncRNAs and mRNAs, and this mechanism is conserved across species. Our results thus identify a pathway for regulation of RNA accumulation and subcellular localization that has been co-opted to regulate the fate of transcripts with integrated Alu elements.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Sequence Read Archive
Ulitsky, I. & Bartel, D. P. lincRNAs: genomics, evolution, and mechanisms. Cell 154, 26–46 (2013)
Derrien, T . et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012)
Bahar Halpern, K. et al. Nuclear retention of mRNA in mammalian tissues. Cell Reports 13, 2653–2662 (2015)
Battich, N ., Stoeger, T. & Pelkmans, L. Control of transcript variability in single mammalian cells. Cell 163, 1596–1610 (2015)
Miyagawa, R. et al. Identification of cis- and trans-acting factors involved in the localization of MALAT-1 noncoding RNA to nuclear speckles. RNA 18, 738–751 (2012)
Zhang, B. et al. A novel RNA motif mediates the strict nuclear localization of a long noncoding RNA. Mol. Cell. Biol. 34, 2318–2329 (2014)
Chen, L. L., DeCerbo, J. N. & Carmichael, G. G. Alu element-mediated gene silencing. EMBO J. 27, 1694–1705 (2008)
Prasanth, K. V. et al. Regulating gene expression through RNA nuclear retention. Cell 123, 249–263 (2005)
Schueler, M. et al. Differential protein occupancy profiling of the mRNA transcriptome. Genome Biol. 15, R15 (2014)
Versteeg, R. et al. The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome Res. 13, 1998–2004 (2003)
Lev-Maor, G., Sorek, R., Shomron, N. & Ast, G. The birth of an alternatively spliced exon: 3′ splice-site selection in Alu exons. Science 300, 1288–1291 (2003)
Chen, C., Ara, T. & Gautheret, D. Using Alu elements as polyadenylation sites: A case of retroposon exaptation. Mol. Biol. Evol. 26, 327–334 (2009)
Tajnik, M. et al. Intergenic Alu exonisation facilitates the evolution of tissue-specific transcript ends. Nucleic Acids Res. 43, 10492–10505 (2015)
Zarnack, K. et al. Direct competition between hnRNP C and U2AF65 protects the transcriptome from the exonization of Alu elements. Cell 152, 453–466 (2013)
Kelley, D. R. & Rinn, J. L. Transposable elements reveal a stem cell specific class of long noncoding RNAs. Genome Biol. 13, R107 (2012)
Gong, C. & Maquat, L. E. lncRNAs transactivate STAU1-mediated mRNA decay by duplexing with 3′ UTRs via Alu elements. Nature 470, 284–288 (2011)
Johnson, R. & Guigó, R. The RIDL hypothesis: transposable elements as functional domains of long noncoding RNAs. RNA 20, 959–976 (2014)
Dimitrova, N . et al. LincRNA-p21 activates p21 in cis to promote Polycomb target gene expression and to enforce the G1/S checkpoint. Mol. Cell 54, 777–790 (2014)
Chu, C. et al. Systematic discovery of Xist RNA binding proteins. Cell 161, 404–416 (2015)
Maticzka, D., Lange, S. J., Costa, F. & Backofen, R. GraphProt: modeling binding preferences of RNA-binding proteins. Genome Biol. 15, R17 (2014)
Choi, H. S. et al. Poly(C)-binding proteins as transcriptional regulators of gene expression. Biochem. Biophys. Res. Commun. 380, 431–436 (2009)
Paziewska, A., Wyrwicz, L. S., Bujnicki, J. M., Bomsztyk, K. & Ostrowski, J. Cooperative binding of the hnRNP K three KH domains to mRNA targets. FEBS Lett. 577, 134–140 (2004)
Baron-Benhamou, J., Gehring, N. H., Kulozik, A. E. & Hentze, M. W. Using the λN peptide to tether proteins to RNAs. Methods Mol. Biol. 257, 135–154 (2004)
Akef, A., Lee, E. S. & Palazzo, A. F. Splicing promotes the nuclear export of β-globin mRNA by overcoming nuclear retention elements. RNA 21, 1908–1920 (2015)
Giulietti, M., Milantoni, S. A., Armeni, T., Principato, G. & Piva, F. ExportAid: database of RNA elements regulating nuclear RNA export in mammals. Bioinformatics 31, 246–251 (2015)
Roy, D., Bhanja Chowdhury, J. & Ghosh, S. Polypyrimidine tract binding protein (PTB) associates with intronic and exonic domains to squelch nuclear export of unspliced RNA. FEBS Lett. 587, 3802–3807 (2013)
Shukla, C. J. et al. High-throughput identification of RNA nuclear enrichment sequences. EMBO J. e98452 (2018)
Cabili, M. N. et al. Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution. Genome Biol. 16, 20 (2015)
Bomsztyk, K., Denisenko, O. & Ostrowski, J. hnRNP K: one protein multiple processes. BioEssays 26, 629–638 (2004)
Braunschweig, U. et al. Widespread intron retention in mammals functionally tunes transcriptomes. Genome Res. 24, 1774–1786 (2014)
Durocher, Y., Perret, S. & Kamen, A. High-level and high-throughput recombinant protein production by transient transfection of suspension-growing human 293-EBNA1 cells. Nucleic Acids Res. 30, E9 (2002)
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011)
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014)
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013)
Gagliardi, M. & Matarazzo, M. R. RIP: RNA immunoprecipitation. Methods Mol. Biol. 1480, 73–86 (2016)
George, T. C. et al. Quantitative measurement of nuclear translocation events using similarity analysis of multispectral cellular images obtained in flow. J. Immunol. Methods 311, 117–129 (2006)
Katz, Y., Wang, E. T., Airoldi, E. M. & Burge, C. B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015 (2010)
Tilgner, H. et al. Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res. 22, 1616–1625 (2012)
Fagerberg, L . et al. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol. Cell. Proteomics 13, 397–406 (2014)
We thank R. Pillai, S. Schwarz, S. Itzkovitz, N. Stern Ginossar, and members of the Ulitsky laboratory for discussions and comments on the manuscript, and Z. Porat from the Weizmann Institute FACS core for assistance with Imaging Flow Cytometry. This research was supported by the Israeli Centers for Research Excellence (1796/12); Israel Science Foundation (1242/14 and 1984/14); European Research Council lincSAFARI; Minerva Foundation; Lapon Raymond; and the Abramson Family Center for Young Scientists. I.U. is incumbent of the Sygnet Career Development Chair for Bioinformatics.
The authors declare no competing financial interests.
Reviewer Information Nature thanks J. Ule, C. Wahlestedt and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Nuclear/cytoplasmic ratios for all the tiles in the indicated lncRNAs when cloned into the indicated region of AcGFP. Tiles overlapping repetitive elements are in black and other tiles are in grey. Regions with segments containing the SIRLOIN elements are shaded.
a, Nuclear/cytoplasmic expression ratios for mRNAs and lncRNAs containing the indicated number of SIRLOIN elements or SIRLOIN reverse complement (antisense) in MCF7 cells (ENCODE data). Otherwise, as in Fig. 1f. b, Nuclear/cytoplasmic expression ratio in mRNAs (left) and lncRNAs (right) in ten ENCODE cell lines. Transcripts without SIRLOIN elements, with two SIRLOIN elements, with a SIRLOIN element in one of the internal exons, or with two antisense SIRLOIN elements are compared. Asterisks above the ‘2 SIRLOIN’ and ‘Internal SIRLOIN’ boxes indicate P < 0.01 by two-sided Wilcoxon test relative to the genes with no SIRLOIN elements. Asterisks above the ‘2 Antisense’ boxes indicate P < 0.01 relative to the ‘2 SIRLOIN’ boxes. Otherwise, as indicated in Fig. 1g. n = 29–16,694 genes per group. c, A rearranged version of part of b that facilitates visual comparison between mRNAs with at least two SIRLOIN elements and lncRNAs without SIRLOIN elements. Asterisks indicate P < 0.01 by two-sided Wilcoxon test. n = 50–16,694 genes per group.
a, Semi-quantitative RT–PCR in the indicated fraction (left) and gene structures (right) of genes with multiple alternative isoforms, some of which contain SIRLOIN elements. Gene structures taken from RefSeq, UCSC or GENCODE genes; Alu annotations are from the UCSC genome browser. Arrows indicate positions of the primers used for RT–PCR. RNA-seq coverage taken from MCF-7 RNA-seq from the ENCODE project. Experiments were repeated once. b, RNA-seq-based expression ratios comparing nuclear/cytoplasmic, chromatin/nucleoplasmic, and nucleolar/nucleoplasmic RNA fractions for RNAs with the indicated number of SIRLOIN or antisense SIRLOIN elements (data from the ENCODE project). n = 82–17,481 genes per group. Boxplots show 5th, 25th, 50th, 75th and 95th percentiles. P values computed using two-sided Wilcoxon test. c, Comparison of half-lives in MCF-7 cells9 of protein-coding genes with the indicated number of SIRLOIN elements. Boxplots show 5th, 25th, 50th, 75th and 95th percentiles.
Effects of all the tiles in the indicated lncRNAs and mRNAs on nuclear/cytoplasmic expression ratios when cloned into the 3′ UTR of AcGFP. Tiles overlapping other repetitive elements are in black, and other tiles are in grey. Regions of consecutive tiles overlapping SIRLOIN elements are shaded grey.
a, Correlation between nuclear/cytoplasmic ratios and expression levels of sequences in NucLibB (only wild-type sequences were analysed and two replicates were pooled; n = 2,930 for all data and n = 104 for SIRLOIN data). SIRLOIN-containing sequences are circled. Colouring indicates the local point density. Both replicates are plotted together and correlations and P values were computed using Spearman correlation. b, c, Nuc/Cyt ratios and expression levels of 109-nt fragments containing repetitions of the indicated motifs from JPX#9 (b) and NucLibA PVT1#22 (c), separated by AT dinucleotides. Horizontal lines indicate levels of NucLibB JPX#9 (b) and PVT1#23 (c) sequence (the closest to the NucLibA PVT1#22 sequence). d, Effects of mutations in JPX#9 (2nd replicate) on localization (top) and expression levels (bottom). e, Effects of mutations in PVT1#22. Note that the wild-type sequence of PVT1#22 from NucLibA was not included in NucLibB, and so the values of PVT1#23, which is the closest to that sequence in NucLibB, are shown. f, Effect of shuffles of the indicated kmers in PVT1#22 sequence. g, Predicted secondary structures of full-length AcGFP mRNA with the indicated inserts. Secondary structure prediction by the Vienna package RNAfold server with default parameters. The SIRLOIN elements are indicated in blue.
a, Ratios of expression levels in the nucleus/cytoplasm for transcripts with the indicated number of eCLIP clusters for HNRNPK in K652 cells. P values indicate significant difference by two-sided Wilcoxon test between genes with the indicated number of clusters and genes without eCLIP clusters. Boxplots show 5th, 25th, 50th, 75th and 95th percentiles. b, Knockdown of HNRNPK in MCF7 cells using siRNAs assessed by western blot. Experiment was performed twice and both replicates are shown. c, HNRNPK mRNA levels measured by qRT–PCR. Each bar presents a single independent experiment. d, Correlation between nuclear/cytoplasmic ratios in MCF7 cells in the ENCODE data and in the RNA-seq data collected in this study. Hexagons are coloured according to the number of genes. n = 13,235. e, Ratios of expression levels in the nucleus/cytoplasm for transcripts with the indicated number of eCLIP clusters for HNRNPK in HepG2 cells, separately for lncRNAs and mRNAs. Otherwise as in Fig. 4f. f, Effects of HNRNPK knockdown on expression levels in the indicated sample, for lncRNAs and mRNAs containing the indicated number of SIRLOIN elements. Otherwise as in Fig. 4f. g, Spearman correlation coefficients between the number of appearances of hexamers in the internal exons of transcripts and the change in nuclear/cytoplasmic ratios following HNRNPK knockdown (average of two replicates). Hexamers are grouped by the number of C bases in the hexamer. Colouring indicates the local point density.
Extended Data Figure 7 Correlations between hexamer occurrences and nuclear/cytoplasmic ratios in human and mouse cells.
a, Spearman correlation coefficients between hexamers with the indicated number of bases and nuclear/cytoplasmic ratios in human HepG2 cells. Each point represents a hexamer sequence, the occurrences of which were counted in all exons of all >200-nt RefSeq transcripts. Colouring indicates the local point density. R values indicate Spearman correlation coefficients between the number of indicated bases in the hexamer and the nuclear/cytoplasmic ratios. b, Spearman correlation coefficients computed as in a for each of ten human ENCODE cell lines and three mouse cell types, when examining the indicated transcript types and the indicated exon subsets.
a, Representative western blot (top) and qRT–PCR quantification (bottom) of HNRNPK knockdown (each bar shows the level in a single experiment). Experiment was repeated twice. b, Changes in nuclear/cytoplasmic ratios of genes with the indicated number of HNRNPK eCLIP peaks following HNRNPK knockdown (DESeq2 analysis of two independent replicates). Otherwise as in Fig. 4f. n = 91–11,016 genes. c, As in b, separately for mRNA and lncRNAs. n = 85–10,077 mRNAs per group, 6–939 lncRNAs per group. d, Changes in nuclear/cytoplasmic ratios of mRNAs and lncRNAs with the indicated number of SIRLOIN elements. Otherwise as in Fig. 4f. n = 53–11,172 mRNAs per group, 10–891 lncRNAs per group.
a, Comparison of nuclear/cytoplasmic ratios for orthologous genes in human HepG2 cells (ENCODE data) and mouse liver3. n = 10,160 conserved genes. Colouring indicates the local point density. Correlation computed using Spearman correlation. b, Top, sequences of the Mlxipl#71/72 tiles in NucLibA. Centre, nuclear/cytoplasmic ratios for each tile with a sufficient number of reads in NucLibA. Bottom, WCE/input expression levels for each tile. The region of tiles #71–72 is shaded in grey. c, Left, difference in log-transformed nuclear/cytoplasmic ratios between orthologous genes with the indicated number of human SIRLOIN elements or mouse SIRLOIN-like elements. Right, differences in overall expression levels; human liver expression data taken from the Human Proteome Atlas39, mouse liver expression data taken from the ENCODE project. P values computed using two-sided Wilcoxon test, comparing to genes without SIRLOIN elements. Boxplots show 5th, 25th, 50th, 75th and 95th percentiles.
a, Frequencies of all base substitutions in each fraction, normalized to their occurrence in the input libraries. Samples from NucLibB were analysed, and all these samples were sequenced in the same NextSeq sequencing run. The number of substitutions was tallied for reads with fewer than six mutations relative to the reference library sequence. The count of each mutation type in each sample was then divided by the total number of mutations in the sample, the two replicates were averaged and the number of mutations in each sample type was divided by the number in the input library. b, For each reference A base in the indicated sample and the indicated group of segments, we computed the fraction of reads that contained a mutation to a G. The positions were then binned in 1% bins, and the heat map shows that fraction of bases mapping to each bin. In all cases, >90% of As were found in the <1% editing bin, which is not shown. Each row corresponds to a single experiment. c, In each heat map, each row corresponds to an A in a segment containing a SIRLOIN element, and the fraction of reads containing an A→G or A→T/C mutation is shown in each sample. Each column corresponds to a single experiment. d, As in c, but showing a specific SIRLOIN-containing segment. e, Correlation between high levels of RNA editing, localization (left) and expression levels (right). ‘High A→X’ are those segments in which the mean A→X mutation levels were at least three times higher than the median across all segments, in at least three of the four libraries (nucleus/cytoplasm, two replicates). Numbers near each group name correspond to sample size. Nuc/Cyt and WCE/Input levels were averaged across the two replicates. Boxplots show 5th, 25th, 50th, 75th and 95th percentiles and P values were computed using a two-sided Wilcoxon test.
About this article
Cite this article
Lubelsky, Y., Ulitsky, I. Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells. Nature 555, 107–111 (2018). https://doi.org/10.1038/nature25757
Nucleic Acids Research (2020)
Chromosome Research (2020)
Short interspersed nuclear element (SINE)-mediated post-transcriptional effects on human and mouse gene expression: SINE-UP for active duty
Philosophical Transactions of the Royal Society B: Biological Sciences (2020)
Tspan8-Tumor Extracellular Vesicle-Induced Endothelial Cell and Fibroblast Remodeling Relies on the Target Cell-Selective Response
Gene Architecture and Sequence Composition Underpin Selective Dependency of Nuclear Export of Long RNAs on NXF1 and the TREX Complex
Molecular Cell (2020)