Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells


Long noncoding RNAs (lncRNAs) are emerging as key parts of multiple cellular pathways1, but their modes of action and how these are dictated by sequence remain unclear. lncRNAs tend to be enriched in the nuclear fraction, whereas most mRNAs are overtly cytoplasmic2, although several studies have found that hundreds of mRNAs in various cell types are retained in the nucleus3,4. It is thus conceivable that some mechanisms that promote nuclear enrichment are shared between lncRNAs and mRNAs. Here, to identify elements in lncRNAs and mRNAs that can force nuclear localization, we screened libraries of short fragments tiled across nuclear RNAs, which were cloned into the untranslated regions of an efficiently exported mRNA. The screen identified a short sequence derived from Alu elements and bound by HNRNPK that increased nuclear accumulation. Binding of HNRNPK to C-rich motifs outside Alu elements is also associated with nuclear enrichment in both lncRNAs and mRNAs, and this mechanism is conserved across species. Our results thus identify a pathway for regulation of RNA accumulation and subcellular localization that has been co-opted to regulate the fate of transcripts with integrated Alu elements.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: NucLibA analysis and the SIRLOIN element.
Figure 2: NucLibB analysis.
Figure 3: Effect of SIRLOIN sequence changes on element activity.
Figure 4: HNRNPK drives nuclear localization of SIRLOIN-containing transcripts.

Accession codes

Primary accessions

Sequence Read Archive


  1. 1

    Ulitsky, I. & Bartel, D. P. lincRNAs: genomics, evolution, and mechanisms. Cell 154, 26–46 (2013)

    CAS  Article  Google Scholar 

  2. 2

    Derrien, T . et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012)

    CAS  Article  Google Scholar 

  3. 3

    Bahar Halpern, K. et al. Nuclear retention of mRNA in mammalian tissues. Cell Reports 13, 2653–2662 (2015)

    CAS  Article  Google Scholar 

  4. 4

    Battich, N ., Stoeger, T. & Pelkmans, L. Control of transcript variability in single mammalian cells. Cell 163, 1596–1610 (2015)

    CAS  Article  Google Scholar 

  5. 5

    Miyagawa, R. et al. Identification of cis- and trans-acting factors involved in the localization of MALAT-1 noncoding RNA to nuclear speckles. RNA 18, 738–751 (2012)

    CAS  Article  Google Scholar 

  6. 6

    Zhang, B. et al. A novel RNA motif mediates the strict nuclear localization of a long noncoding RNA. Mol. Cell. Biol. 34, 2318–2329 (2014)

    Article  Google Scholar 

  7. 7

    Chen, L. L., DeCerbo, J. N. & Carmichael, G. G. Alu element-mediated gene silencing. EMBO J. 27, 1694–1705 (2008)

    CAS  Article  Google Scholar 

  8. 8

    Prasanth, K. V. et al. Regulating gene expression through RNA nuclear retention. Cell 123, 249–263 (2005)

    CAS  Article  Google Scholar 

  9. 9

    Schueler, M. et al. Differential protein occupancy profiling of the mRNA transcriptome. Genome Biol. 15, R15 (2014)

    Article  Google Scholar 

  10. 10

    Versteeg, R. et al. The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome Res. 13, 1998–2004 (2003)

    CAS  Article  Google Scholar 

  11. 11

    Lev-Maor, G., Sorek, R., Shomron, N. & Ast, G. The birth of an alternatively spliced exon: 3′ splice-site selection in Alu exons. Science 300, 1288–1291 (2003)

    CAS  ADS  Article  Google Scholar 

  12. 12

    Chen, C., Ara, T. & Gautheret, D. Using Alu elements as polyadenylation sites: A case of retroposon exaptation. Mol. Biol. Evol. 26, 327–334 (2009)

    CAS  Article  Google Scholar 

  13. 13

    Tajnik, M. et al. Intergenic Alu exonisation facilitates the evolution of tissue-specific transcript ends. Nucleic Acids Res. 43, 10492–10505 (2015)

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14

    Zarnack, K. et al. Direct competition between hnRNP C and U2AF65 protects the transcriptome from the exonization of Alu elements. Cell 152, 453–466 (2013)

    CAS  Article  Google Scholar 

  15. 15

    Kelley, D. R. & Rinn, J. L. Transposable elements reveal a stem cell specific class of long noncoding RNAs. Genome Biol. 13, R107 (2012)

    Article  Google Scholar 

  16. 16

    Gong, C. & Maquat, L. E. lncRNAs transactivate STAU1-mediated mRNA decay by duplexing with 3′ UTRs via Alu elements. Nature 470, 284–288 (2011)

    CAS  ADS  Article  Google Scholar 

  17. 17

    Johnson, R. & Guigó, R. The RIDL hypothesis: transposable elements as functional domains of long noncoding RNAs. RNA 20, 959–976 (2014)

    CAS  Article  Google Scholar 

  18. 18

    Dimitrova, N . et al. LincRNA-p21 activates p21 in cis to promote Polycomb target gene expression and to enforce the G1/S checkpoint. Mol. Cell 54, 777–790 (2014)

    CAS  Article  Google Scholar 

  19. 19

    Chu, C. et al. Systematic discovery of Xist RNA binding proteins. Cell 161, 404–416 (2015)

    CAS  Article  Google Scholar 

  20. 20

    Maticzka, D., Lange, S. J., Costa, F. & Backofen, R. GraphProt: modeling binding preferences of RNA-binding proteins. Genome Biol. 15, R17 (2014)

    Article  Google Scholar 

  21. 21

    Choi, H. S. et al. Poly(C)-binding proteins as transcriptional regulators of gene expression. Biochem. Biophys. Res. Commun. 380, 431–436 (2009)

    CAS  Article  Google Scholar 

  22. 22

    Paziewska, A., Wyrwicz, L. S., Bujnicki, J. M., Bomsztyk, K. & Ostrowski, J. Cooperative binding of the hnRNP K three KH domains to mRNA targets. FEBS Lett. 577, 134–140 (2004)

    CAS  Article  Google Scholar 

  23. 23

    Baron-Benhamou, J., Gehring, N. H., Kulozik, A. E. & Hentze, M. W. Using the λN peptide to tether proteins to RNAs. Methods Mol. Biol. 257, 135–154 (2004)

    CAS  PubMed  Google Scholar 

  24. 24

    Akef, A., Lee, E. S. & Palazzo, A. F. Splicing promotes the nuclear export of β-globin mRNA by overcoming nuclear retention elements. RNA 21, 1908–1920 (2015)

    CAS  Article  Google Scholar 

  25. 25

    Giulietti, M., Milantoni, S. A., Armeni, T., Principato, G. & Piva, F. ExportAid: database of RNA elements regulating nuclear RNA export in mammals. Bioinformatics 31, 246–251 (2015)

    CAS  Article  Google Scholar 

  26. 26

    Roy, D., Bhanja Chowdhury, J. & Ghosh, S. Polypyrimidine tract binding protein (PTB) associates with intronic and exonic domains to squelch nuclear export of unspliced RNA. FEBS Lett. 587, 3802–3807 (2013)

    CAS  Article  Google Scholar 

  27. 27

    Shukla, C. J. et al. High-throughput identification of RNA nuclear enrichment sequences. EMBO J. e98452 (2018)

  28. 28

    Cabili, M. N. et al. Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution. Genome Biol. 16, 20 (2015)

    Article  Google Scholar 

  29. 29

    Bomsztyk, K., Denisenko, O. & Ostrowski, J. hnRNP K: one protein multiple processes. BioEssays 26, 629–638 (2004)

    CAS  Article  Google Scholar 

  30. 30

    Braunschweig, U. et al. Widespread intron retention in mammals functionally tunes transcriptomes. Genome Res. 24, 1774–1786 (2014)

    CAS  Article  Google Scholar 

  31. 31

    Durocher, Y., Perret, S. & Kamen, A. High-level and high-throughput recombinant protein production by transient transfection of suspension-growing human 293-EBNA1 cells. Nucleic Acids Res. 30, E9 (2002)

    Article  Google Scholar 

  32. 32

    Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011)

    CAS  Article  Google Scholar 

  33. 33

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014)

    Article  Google Scholar 

  34. 34

    Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013)

    CAS  Article  Google Scholar 

  35. 35

    Gagliardi, M. & Matarazzo, M. R. RIP: RNA immunoprecipitation. Methods Mol. Biol. 1480, 73–86 (2016)

    CAS  Article  Google Scholar 

  36. 36

    George, T. C. et al. Quantitative measurement of nuclear translocation events using similarity analysis of multispectral cellular images obtained in flow. J. Immunol. Methods 311, 117–129 (2006)

    CAS  Article  Google Scholar 

  37. 37

    Katz, Y., Wang, E. T., Airoldi, E. M. & Burge, C. B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015 (2010)

    CAS  Article  Google Scholar 

  38. 38

    Tilgner, H. et al. Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res. 22, 1616–1625 (2012)

    CAS  Article  Google Scholar 

  39. 39

    Fagerberg, L . et al. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol. Cell. Proteomics 13, 397–406 (2014)

    CAS  Article  Google Scholar 

Download references


We thank R. Pillai, S. Schwarz, S. Itzkovitz, N. Stern Ginossar, and members of the Ulitsky laboratory for discussions and comments on the manuscript, and Z. Porat from the Weizmann Institute FACS core for assistance with Imaging Flow Cytometry. This research was supported by the Israeli Centers for Research Excellence (1796/12); Israel Science Foundation (1242/14 and 1984/14); European Research Council lincSAFARI; Minerva Foundation; Lapon Raymond; and the Abramson Family Center for Young Scientists. I.U. is incumbent of the Sygnet Career Development Chair for Bioinformatics.

Author information




Y.L. and I.U. conceived and designed the study. Y.L. carried out all experiments and I.U. carried out computational analysis. Y.L. and I.U. wrote the manuscript.

Corresponding author

Correspondence to Igor Ulitsky.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Reviewer Information Nature thanks J. Ule, C. Wahlestedt and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Figure 1 Effects of tiles in NucLibA on localization.

Nuclear/cytoplasmic ratios for all the tiles in the indicated lncRNAs when cloned into the indicated region of AcGFP. Tiles overlapping repetitive elements are in black and other tiles are in grey. Regions with segments containing the SIRLOIN elements are shaded.

Extended Data Figure 2 Nuclear/cytoplasmic ratios for lncRNAs and mRNAs.

a, Nuclear/cytoplasmic expression ratios for mRNAs and lncRNAs containing the indicated number of SIRLOIN elements or SIRLOIN reverse complement (antisense) in MCF7 cells (ENCODE data). Otherwise, as in Fig. 1f. b, Nuclear/cytoplasmic expression ratio in mRNAs (left) and lncRNAs (right) in ten ENCODE cell lines. Transcripts without SIRLOIN elements, with two SIRLOIN elements, with a SIRLOIN element in one of the internal exons, or with two antisense SIRLOIN elements are compared. Asterisks above the ‘2 SIRLOIN’ and ‘Internal SIRLOIN’ boxes indicate P < 0.01 by two-sided Wilcoxon test relative to the genes with no SIRLOIN elements. Asterisks above the ‘2 Antisense’ boxes indicate P < 0.01 relative to the ‘2 SIRLOIN’ boxes. Otherwise, as indicated in Fig. 1g. n = 29–16,694 genes per group. c, A rearranged version of part of b that facilitates visual comparison between mRNAs with at least two SIRLOIN elements and lncRNAs without SIRLOIN elements. Asterisks indicate P < 0.01 by two-sided Wilcoxon test. n = 50–16,694 genes per group.

Extended Data Figure 3 Comparison of splicing isoforms, chromatin enrichment and mRNA half-lives.

a, Semi-quantitative RT–PCR in the indicated fraction (left) and gene structures (right) of genes with multiple alternative isoforms, some of which contain SIRLOIN elements. Gene structures taken from RefSeq, UCSC or GENCODE genes; Alu annotations are from the UCSC genome browser. Arrows indicate positions of the primers used for RT–PCR. RNA-seq coverage taken from MCF-7 RNA-seq from the ENCODE project. Experiments were repeated once. b, RNA-seq-based expression ratios comparing nuclear/cytoplasmic, chromatin/nucleoplasmic, and nucleolar/nucleoplasmic RNA fractions for RNAs with the indicated number of SIRLOIN or antisense SIRLOIN elements (data from the ENCODE project). n = 82–17,481 genes per group. Boxplots show 5th, 25th, 50th, 75th and 95th percentiles. P values computed using two-sided Wilcoxon test. c, Comparison of half-lives in MCF-7 cells9 of protein-coding genes with the indicated number of SIRLOIN elements. Boxplots show 5th, 25th, 50th, 75th and 95th percentiles.

Extended Data Figure 4 Nuclear/cytoplasmic ratios for NucLibB tiles.

Effects of all the tiles in the indicated lncRNAs and mRNAs on nuclear/cytoplasmic expression ratios when cloned into the 3′ UTR of AcGFP. Tiles overlapping other repetitive elements are in black, and other tiles are in grey. Regions of consecutive tiles overlapping SIRLOIN elements are shaded grey.

Extended Data Figure 5 Statistics and mutagenesis results in NucLibB.

a, Correlation between nuclear/cytoplasmic ratios and expression levels of sequences in NucLibB (only wild-type sequences were analysed and two replicates were pooled; n = 2,930 for all data and n = 104 for SIRLOIN data). SIRLOIN-containing sequences are circled. Colouring indicates the local point density. Both replicates are plotted together and correlations and P values were computed using Spearman correlation. b, c, Nuc/Cyt ratios and expression levels of 109-nt fragments containing repetitions of the indicated motifs from JPX#9 (b) and NucLibA PVT1#22 (c), separated by AT dinucleotides. Horizontal lines indicate levels of NucLibB JPX#9 (b) and PVT1#23 (c) sequence (the closest to the NucLibA PVT1#22 sequence). d, Effects of mutations in JPX#9 (2nd replicate) on localization (top) and expression levels (bottom). e, Effects of mutations in PVT1#22. Note that the wild-type sequence of PVT1#22 from NucLibA was not included in NucLibB, and so the values of PVT1#23, which is the closest to that sequence in NucLibB, are shown. f, Effect of shuffles of the indicated kmers in PVT1#22 sequence. g, Predicted secondary structures of full-length AcGFP mRNA with the indicated inserts. Secondary structure prediction by the Vienna package RNAfold server with default parameters. The SIRLOIN elements are indicated in blue.

Extended Data Figure 6 HNRNPK regulation in K562 and MCF7 cells.

a, Ratios of expression levels in the nucleus/cytoplasm for transcripts with the indicated number of eCLIP clusters for HNRNPK in K652 cells. P values indicate significant difference by two-sided Wilcoxon test between genes with the indicated number of clusters and genes without eCLIP clusters. Boxplots show 5th, 25th, 50th, 75th and 95th percentiles. b, Knockdown of HNRNPK in MCF7 cells using siRNAs assessed by western blot. Experiment was performed twice and both replicates are shown. c, HNRNPK mRNA levels measured by qRT–PCR. Each bar presents a single independent experiment. d, Correlation between nuclear/cytoplasmic ratios in MCF7 cells in the ENCODE data and in the RNA-seq data collected in this study. Hexagons are coloured according to the number of genes. n = 13,235. e, Ratios of expression levels in the nucleus/cytoplasm for transcripts with the indicated number of eCLIP clusters for HNRNPK in HepG2 cells, separately for lncRNAs and mRNAs. Otherwise as in Fig. 4f. f, Effects of HNRNPK knockdown on expression levels in the indicated sample, for lncRNAs and mRNAs containing the indicated number of SIRLOIN elements. Otherwise as in Fig. 4f. g, Spearman correlation coefficients between the number of appearances of hexamers in the internal exons of transcripts and the change in nuclear/cytoplasmic ratios following HNRNPK knockdown (average of two replicates). Hexamers are grouped by the number of C bases in the hexamer. Colouring indicates the local point density.

Extended Data Figure 7 Correlations between hexamer occurrences and nuclear/cytoplasmic ratios in human and mouse cells.

a, Spearman correlation coefficients between hexamers with the indicated number of bases and nuclear/cytoplasmic ratios in human HepG2 cells. Each point represents a hexamer sequence, the occurrences of which were counted in all exons of all >200-nt RefSeq transcripts. Colouring indicates the local point density. R values indicate Spearman correlation coefficients between the number of indicated bases in the hexamer and the nuclear/cytoplasmic ratios. b, Spearman correlation coefficients computed as in a for each of ten human ENCODE cell lines and three mouse cell types, when examining the indicated transcript types and the indicated exon subsets.

Extended Data Figure 8 HNRNPK knockdown in HeLa cells.

a, Representative western blot (top) and qRT–PCR quantification (bottom) of HNRNPK knockdown (each bar shows the level in a single experiment). Experiment was repeated twice. b, Changes in nuclear/cytoplasmic ratios of genes with the indicated number of HNRNPK eCLIP peaks following HNRNPK knockdown (DESeq2 analysis of two independent replicates). Otherwise as in Fig. 4f. n = 91–11,016 genes. c, As in b, separately for mRNA and lncRNAs. n = 85–10,077 mRNAs per group, 6–939 lncRNAs per group. d, Changes in nuclear/cytoplasmic ratios of mRNAs and lncRNAs with the indicated number of SIRLOIN elements. Otherwise as in Fig. 4f. n = 53–11,172 mRNAs per group, 10–891 lncRNAs per group.

Extended Data Figure 9 SIRLOIN-like element in mouse.

a, Comparison of nuclear/cytoplasmic ratios for orthologous genes in human HepG2 cells (ENCODE data) and mouse liver3. n = 10,160 conserved genes. Colouring indicates the local point density. Correlation computed using Spearman correlation. b, Top, sequences of the Mlxipl#71/72 tiles in NucLibA. Centre, nuclear/cytoplasmic ratios for each tile with a sufficient number of reads in NucLibA. Bottom, WCE/input expression levels for each tile. The region of tiles #71–72 is shaded in grey. c, Left, difference in log-transformed nuclear/cytoplasmic ratios between orthologous genes with the indicated number of human SIRLOIN elements or mouse SIRLOIN-like elements. Right, differences in overall expression levels; human liver expression data taken from the Human Proteome Atlas39, mouse liver expression data taken from the ENCODE project. P values computed using two-sided Wilcoxon test, comparing to genes without SIRLOIN elements. Boxplots show 5th, 25th, 50th, 75th and 95th percentiles.

Extended Data Figure 10 RNA editing in SIRLOIN elements.

a, Frequencies of all base substitutions in each fraction, normalized to their occurrence in the input libraries. Samples from NucLibB were analysed, and all these samples were sequenced in the same NextSeq sequencing run. The number of substitutions was tallied for reads with fewer than six mutations relative to the reference library sequence. The count of each mutation type in each sample was then divided by the total number of mutations in the sample, the two replicates were averaged and the number of mutations in each sample type was divided by the number in the input library. b, For each reference A base in the indicated sample and the indicated group of segments, we computed the fraction of reads that contained a mutation to a G. The positions were then binned in 1% bins, and the heat map shows that fraction of bases mapping to each bin. In all cases, >90% of As were found in the <1% editing bin, which is not shown. Each row corresponds to a single experiment. c, In each heat map, each row corresponds to an A in a segment containing a SIRLOIN element, and the fraction of reads containing an A→G or A→T/C mutation is shown in each sample. Each column corresponds to a single experiment. d, As in c, but showing a specific SIRLOIN-containing segment. e, Correlation between high levels of RNA editing, localization (left) and expression levels (right). ‘High A→X’ are those segments in which the mean A→X mutation levels were at least three times higher than the median across all segments, in at least three of the four libraries (nucleus/cytoplasm, two replicates). Numbers near each group name correspond to sample size. Nuc/Cyt and WCE/Input levels were averaged across the two replicates. Boxplots show 5th, 25th, 50th, 75th and 95th percentiles and P values were computed using a two-sided Wilcoxon test.

Supplementary information

Supplementary Information

This file contains the Supplementary Notes 1-6, Supplementary References and the uncropped gels. (PDF 1579 kb)

Life Sciences Reporting Summary (PDF 73 kb)

Supplementary Tables

This zipped file contains Supplementary Tables 1-12 and a Supplementary Table guide. (ZIP 2823 kb)

PowerPoint slides

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lubelsky, Y., Ulitsky, I. Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells. Nature 555, 107–111 (2018).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.