Deep transcriptome profiling of mammalian stem cells supports a regulatory role for retrotransposons in pluripotency maintenance

Article metrics


The importance of microRNAs and long noncoding RNAs in the regulation of pluripotency has been documented; however, the noncoding components of stem cell gene networks remain largely unknown. Here we investigate the role of noncoding RNAs in the pluripotent state, with particular emphasis on nuclear and retrotransposon-derived transcripts. We have performed deep profiling of the nuclear and cytoplasmic transcriptomes of human and mouse stem cells, identifying a class of previously undetected stem cell–specific transcripts. We show that long terminal repeat (LTR)-derived transcripts contribute extensively to the complexity of the stem cell nuclear transcriptome. Some LTR-derived transcripts are associated with enhancer regions and are likely to be involved in the maintenance of pluripotency.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: High nuclear complexity of the stem cell transcriptome and identification of stem cell–specific transcripts.
Figure 2: Characterization of NASTs.
Figure 3: LTR-derived transcripts enriched in NASTs.
Figure 4: LTR-associated stem cell–specific regulatory elements.
Figure 5: Implication of repeat-associated NASTs in pluripotency maintenance.
Figure 6: Genomic characteristics of the newly identified stem cell–specific transcripts.


  1. 1

    Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005).

  2. 2

    Cheng, J. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308, 1149–1154 (2005).

  3. 3

    Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).

  4. 4

    Kapranov, P. et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316, 1484–1488 (2007).

  5. 5

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  6. 6

    Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).

  7. 7

    Faulkner, G.J. et al. The regulated retrotransposon transcriptome of mammalian cells. Nat. Genet. 41, 563–571 (2009).

  8. 8

    Santoni, F.A., Guerra, J. & Luban, J. HERV-H RNA is abundant in human embryonic stem cells and a precise marker for pluripotency. Retrovirology 9, 111 (2012).

  9. 9

    Peaston, A.E. et al. Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev. Cell 7, 597–606 (2004).

  10. 10

    Macfarlan, T.S. et al. Embryonic stem cell potency fluctuates with endogenous retrovirus activity. Nature 487, 57–63 (2012).

  11. 11

    Bourque, G. et al. Evolution of the mammalian transcription factor binding repertoire via transposable elements. Genome Res. 18, 1752–1762 (2008).

  12. 12

    Kunarso, G. et al. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat. Genet. 42, 631–634 (2010).

  13. 13

    Kelley, D. & Rinn, J. Transposable elements reveal a stem cell–specific class of long noncoding RNAs. Genome Biol. 13, R107 (2012).

  14. 14

    Kolle, G. et al. Deep-transcriptome and ribonome sequencing redefines the molecular networks of pluripotency and the extracellular space in human embryonic stem cells. Genome Res. 21, 2014–2025 (2011).

  15. 15

    Loewer, S. et al. Large intergenic non-coding RNA-RoR modulates reprogramming of human induced pluripotent stem cells. Nat. Genet. 42, 1113–1117 (2010).

  16. 16

    Guttman, M. et al. lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature 477, 295–300 (2011).

  17. 17

    Ng, S.Y., Johnson, R. & Stanton, L.W. Human long non-coding RNAs promote pluripotency and neuronal differentiation by association with chromatin modifiers and transcription factors. EMBO J. 31, 522–533 (2012).

  18. 18

    Marson, A. et al. Connecting microRNA genes to the core transcriptional regulatory circuitry of embryonic stem cells. Cell 134, 521–533 (2008).

  19. 19

    Wang, Y. et al. Embryonic stem cell–specific microRNAs regulate the G1-S transition and promote rapid proliferation. Nat. Genet. 40, 1478–1483 (2008).

  20. 20

    Melton, C., Judson, R.L. & Blelloch, R. Opposing microRNA families regulate self-renewal in mouse embryonic stem cells. Nature 463, 621–626 (2010).

  21. 21

    Okita, K., Ichisaka, T. & Yamanaka, S. Generation of germline-competent induced pluripotent stem cells. Nature 448, 313–317 (2007).

  22. 22

    Takahashi, H., Lassmann, T., Murata, M. & Carninci, P. 5′ end–centered expression profiling using cap-analysis gene expression and next-generation sequencing. Nat. Protoc. 7, 542–561 (2012).

  23. 23

    Plessy, C. et al. Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan. Nat. Methods 7, 528–534 (2010).

  24. 24

    Cloonan, N. et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods 5, 613–619 (2008).

  25. 25

    Efroni, S. et al. Global transcription in pluripotent embryonic stem cells. Cell Stem Cell 2, 437–447 (2008).

  26. 26

    Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

  27. 27

    Robinson, M.D., McCarthy, D.J. & Smyth, G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

  28. 28

    Guttman, M. et al. Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28, 503–510 (2010).

  29. 29

    FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter level mammalian expression atlas. Nature 507, 462–470 (2014).

  30. 30

    Sigova, A.A. et al. Divergent transcription of long noncoding RNA/mRNA gene pairs in embryonic stem cells. Proc. Natl. Acad. Sci. USA 110, 2876–2881 (2013).

  31. 31

    Min, I.M. et al. Regulating RNA polymerase pausing and transcription elongation in embryonic stem cells. Genes Dev. 25, 742–754 (2011).

  32. 32

    Hinrichs, A.S. et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 34, D590–D598 (2006).

  33. 33

    Rebollo, R., Romanish, M.T. & Mager, D.L. Transposable elements: an abundant and natural source of regulatory sequences for host genes. Annu. Rev. Genet. 46, 21–42 (2012).

  34. 34

    Rowe, H.M. & Trono, D. Dynamic control of endogenous retroviruses during development. Virology 411, 273–287 (2011).

  35. 35

    Karimi, M.M. et al. DNA methylation and SETDB1/H3K9me3 regulate predominantly distinct sets of genes, retroelements, and chimeric transcripts in mESCs. Cell Stem Cell 8, 676–687 (2011).

  36. 36

    Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).

  37. 37

    Rowe, H.M. et al. KAP1 controls endogenous retroviruses in embryonic stem cells. Nature 463, 237–240 (2010).

  38. 38

    Kim, T.K. et al. Widespread transcription at neuronal activity–regulated enhancers. Nature 465, 182–187 (2010).

  39. 39

    Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).

  40. 40

    Neph, S. et al. Circuitry and dynamics of human transcription factor regulatory networks. Cell 150, 1274–1286 (2012).

  41. 41

    Schmidt, D. et al. Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages. Cell 148, 335–348 (2012).

  42. 42

    Rada-Iglesias, A. et al. A unique chromatin signature uncovers early developmental enhancers in humans. Nature 470, 279–283 (2011).

  43. 43

    Creyghton, M.P. et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl. Acad. Sci. USA 107, 21931–21936 (2010).

  44. 44

    Zhang, Y. et al. Chromatin connectivity maps reveal dynamic promoter-enhancer long-range associations. Nature 504, 306–310 (2013).

  45. 45

    Chambers, I. et al. Functional expression cloning of Nanog, a pluripotency sustaining factor in embryonic stem cells. Cell 113, 643–655 (2003).

  46. 46

    Mitsui, K. et al. The homeoprotein Nanog is required for maintenance of pluripotency in mouse epiblast and ES cells. Cell 113, 631–642 (2003).

  47. 47

    St Laurent, G. et al. VlincRNAs controlled by retroviral elements are a hallmark of pluripotency and cancer. Genome Biol. 14, R73 (2013).

  48. 48

    Brons, I.G. et al. Derivation of pluripotent epiblast stem cells from mammalian embryos. Nature 448, 191–195 (2007).

  49. 49

    Tesar, P.J. et al. New cell lines from mouse epiblast share defining features with human embryonic stem cells. Nature 448, 196–199 (2007).

  50. 50

    Xie, M. et al. DNA hypomethylation within specific transposable element families associates with tissue-specific enhancer landscape. Nat. Genet. 45, 836–841 (2013).

  51. 51

    Shimizukawa, R. et al. Establishment of a new embryonic stem cell line derived from C57BL/6 mouse expressing EGFP ubiquitously. Genesis 42, 47–52 (2005).

  52. 52

    Wakayama, T. et al. Differentiation of embryonic stem cell lines generated from adult somatic cells by nuclear transfer. Science 292, 740–743 (2001).

  53. 53

    Ying, Q.L. et al. The ground state of embryonic stem cell self-renewal. Nature 453, 519–523 (2008).

  54. 54

    Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–676 (2006).

  55. 55

    Fusaki, N., Ban, H., Nishiyama, A., Saeki, K. & Hasegawa, M. Efficient induction of transgene-free human pluripotent stem cells using a vector based on Sendai virus, an RNA virus that does not integrate into the host genome. Proc. Jpn. Acad., Ser. B, Phys. Biol. Sci. 85, 348–362 (2009).

  56. 56

    Salimullah, M., Sakai, M., Plessy, C. & Carninci, P. NanoCAGE: a high-resolution technique to discover and interrogate cell transcriptomes. Cold Spring Harb. Protoc. 2011, pdb.prot5559 (2011).

  57. 57

    Lassmann, T., Hayashizaki, Y. & Daub, C.O. TagDust—a program to eliminate artifacts from next generation sequencing data. Bioinformatics 25, 2839–2840 (2009).

  58. 58

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

  59. 59

    Frith, M.C. et al. A code for transcription initiation in mammalian genomes. Genome Res. 18, 1–12 (2008).

  60. 60

    Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

  61. 61

    Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33, D501–D504 (2005).

  62. 62

    Hsu, F. et al. The UCSC Known Genes. Bioinformatics 22, 1036–1046 (2006).

  63. 63

    Cabili, M.N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).

  64. 64

    Yamasaki, C. et al. H-InvDB in 2009: extended database and data mining resources for human genes and transcripts. Nucleic Acids Res. 38, D626–D632 (2010).

  65. 65

    Flicek, P. et al. Ensembl 2013. Nucleic Acids Res. 41, D48–D55 (2013).

  66. 66

    Heintzman, N.D. et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459, 108–112 (2009).

  67. 67

    Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

  68. 68

    Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).

  69. 69

    Zhang, B., Kirov, S. & Snoddy, J. WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res. 33, W741–W748 (2005).

  70. 70

    Maglott, D., Ostell, J., Pruitt, K.D. & Tatusova, T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 33, D54–D58 (2005).

  71. 71

    Hasegawa, Y. et al. CC chemokine ligand 2 and leukemia inhibitory factor cooperatively promote pluripotency in mouse induced pluripotent cells. Stem Cells 29, 1196–1205 (2011).

  72. 72

    Livak, K.J. & Schmittgen, T.D. Analysis of relative gene expression data using real-time quantitative PCR and the 2 − Δ Δ C T method. Methods 25, 402–408 (2001).

Download references


The authors thank the RIKEN GeNAS sequencing platform for sequencing of the libraries. This work was supported by a grant to P.C. from the Japan Society for the Promotion of Science (JSPS) through the Funding Program for Next-Generation World-Leading Researchers (NEXT) initiated by the Council for Science and Technology Policy (CSTP), by a grand-in-aid for scientific research from JSPS to P.C. and A.F., and by a research grant from the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT) to the RIKEN Center for Life Science Technologies. FANTOM5 was made possible by a research grant for the RIKEN Omics Science Center from MEXT Japan to Y. Hayashizaki and by a grant for Innovative Cell Biology by Innovative Technology (Cell Innovation Program) from MEXT to Y. Hayashizaki. A.F. was supported by a JSPS long-term fellowship (P10782) and by a Swiss National Science Foundation Fellowship for Advanced Researchers (PA00P3_142122). K.H. was supported by European Union Framework Programme 7 (MODHEP project) for P.C. A.B. was supported by the Sigrid Juselius Foundation Fellowship. D.Y. and H.K. were supported by the Japan Science and Technology Agency CREST. R.A. and A. Sandelin were supported by funds from FP7/2007-2013/ERC grant agreement 204135, the Novo Nordisk Foundation, the Lundbeck Foundation and the Danish Cancer Society.

Author information

P.C. led the project and oversaw the analyses. P.C., Y.H. and A.R.R.F. contributed to the design of the study. A.F., D.Y., M.S., C.A.K., A. Saxena, A.B., H.S., H.K., Y.N. and Y.H. contributed to data generation. A.F., K.H., I.V., N.B., M.d.H., A.R.R.F., A.K., R.A. and A. Sandelin contributed to data processing and analyses. C.-H.W. and C.-L.W. produced and analyzed ChIA-PET data. A.F., A.B., A.R.R.F. and P.C. wrote the manuscript with input from all authors.

Correspondence to Piero Carninci.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

A full list of members and affiliations appear in the Supplementary Note.

Integrated supplementary information

Supplementary Figure 1 Dedifferentiation of human and mouse iPSCs.

(a,b) Normalized CAGE expression levels (tpm, tags per million) for stem cell marker genes in hiPS.F (a) and miPS.T (b). Expression levels for ESCs (blue, n = 3) and somatic cell types (purple) used for iPSC derivation are shown. (c,d) Immunofluorescence analyses for the expression of Ssea1, Oct4 and Nanog in miPS.T (c) and TRA-1-60, TRA-1-81 and SSEA4 in hiPS.F (d). ESCs are used as controls. (e) Histological sections of teratomas, formed 4 weeks after subcutaneous injection of hiPS.F cells into nude mice, hematoxylin and eosin staining. Three representative germ layers (mesoderm, ectoderm and endoderm) developed from hiPS.F cells. (f) Chimeric mouse derived from miPS.T cells and germline transmission.

Supplementary Figure 2 CAGE tag cluster expression across stem cell samples and sample clustering.

Number of stem cell samples in which CAGE tag clusters are found expressed in human (a) and mouse (b). A value of 0 corresponds to differentiated cell type samples. (c,d) Hierarchical clustering based on Spearman coefficients calculated from CAGE tag cluster expression values for human (c) and mouse (d).

Supplementary Figure 3 Comparison of nuclear and cytoplasmic assembled transcripts.

Numbers and cellular distribution of transcripts identified from RNA-seq assemblies for human (a) and mouse (b) data sets.

Supplementary Figure 4 Differential expression analyses.

(a–d) M-Aplots of differentially expressed CAGE clusters (edgeR27, FDR < 0.01 indicated in red) for the mouse (a,b) and human (c,d) nuclear and cytoplasmic data sets. (e) Proportion of CAGE clusters significantly upregulated in stem cells (Up-Stem) at FDR < 0.01.

Supplementary Figure 5 NAST expression features.

(a–d) Number of stem cell samples that expressed NASTs for human (a,b) and mouse (c,d) data sets. Panels show NASTs identified in the nuclear (Nu), cytoplasmic (Cy) compartments or both (Nu/Cy). (e) Percentage of CAGE clusters overlapped by CAGE-scan 5' tags. (f) Number of tissues and differentiated cell type samples from the FANTOM5 expression atlas29 in which annotated CAGE clusters overexpressed in stem cells are expressed. Bin width = 1. (g) Nuclear (x axis) and cytoplasmic (y axis) normalized expression (tpm, tags per million) for the human CAGE clusters overexpressed in stem cells. Similar plots are shown in h,i for a set of mouse (h) and human (i) nuclear (red) and cytoplasmic (blue) transcripts. (j,k) GRO-Seq30,31 signal enrichment at human (j) and mouse (k) NAST positions.

Supplementary Figure 6 Histone marks at NAST genomic loci.

NASTs were classified as enhancers, promoters or others based on specific combinations of histone marks (Online Methods), using ChIP-seq signal from the ENCODE Project5 for the mouse ES-Bruce4 and ES-E14 cell lines as well as for the human H1-ES cell line. Normalized tag frequencies for all histone mars are plotted for each category and cell line.

Supplementary Figure 7 Expression levels and putative processing of NASTs.

(a,b) CAGE-based normalized expression (tpm, tags per million) for human (a) and mouse (b) NASTs (red) and annotated CAGE tag clusters (blue), identified in the nucleus (Nu) or cytoplasm (Cy) or in both cellular compartments (Nu/Cy). (c) Ct values for five NASTs and Gapdh are shown together with spiked firefly RNAs, used as a reference to evaluate copy number per cell. n = 3; error bars, s.d. (d) Transcript length as defined by RNA-seq assembly for NASTs and annotated genes compared for three expression groups (≥5, 1–5 and 0.1–1 tpm). n, number of clusters or transcripts per group. P values for two-sided Wilcoxon and Mann-Whitney tests are shown. (e) Fraction of NASTs and annotated CAGE clusters, as defined by CAGE-scan, overlapping short RNA-seq (15- to 40-bp fraction) clusters, grouped by expression levels.

Supplementary Figure 8 Histone marks and transcription factor binding at repeat-associated NAST loci.

(a,b) Normalized expression (tpm, tags per million) is plotted for mouse (a) and human (b) annotated genes and NASTs carrying promoter-associated histone marks. P values for Wilcoxon and Mann-Whitney tests are shown. n, number of CAGE clusters per group. (c) Frequency plots of normalized ChIP-seq tag counts (ENCODE data5) for H3K4me3 (promoter) and H3K9me3 (repressive) marks at NAST-associated and non-expressed (N.Exp.) MaLR elements. (d–f) ChIP-seq normalized tag counts for stem cell–specific transcription factors at NAST-associated and non-expressed (N.Exp.) mouse ERVK (d), mouse MaLR (e) and human ERV1 (f) elements. Values for non-expressed elements are shown in gray (dotted lines).

Supplementary Figure 9 Human LTR-derived transcripts.

(a) Repeat family normalized expression values (tpm, tags per million) are plotted for human ESCs, iPSCs and differentiated cells (Dif.). Error bars, s.d. (b–d) Mouse ERVK (b) and MaLR (c) as well as human ERV1 (d) normalized nuclear expression is plotted for ESCs, iPSCs and differentiated cells (Dif). N, number of CAGE tag clusters carrying promoter-associated histone marks. (e) Normalized expression for selected human subfamily repeats are plotted against associated FDR (calculated with edgeR27). (f) The number of repeat elements with at least five CAGE tags is plotted against copy number found in the genome for human LTRs.

Supplementary Figure 10 Stem cell–specific enhancers associated with LTRs.

(a,b) Relative CAGE tag distributions along mouse ERVK-RLTR9E (a) and human ERV1-LTR7 and HERVH-int (b) elements. Gray bars mark the 5' and 3' extremities of each repeat element. Green and purple bars indicate CAGE tags mapping to the plus and minus strands, respectively. (c) Density plot for directionality scores at loci showing divergent transcription overlapping intergenic LTRs (red) and from annotated TSSs (blue). (d–f) Density plots of normalized tag counts for human DNase I footprints40 (d) and ChIP-seq5,41 (e,f) at loci presenting divergent transcription patterns and overlapping LTRs. (f) Promoters, NASTs associated with LTRs and classified as promoters in Figure 2b; enhancers, loci presenting divergent transcription patterns and overlapping LTRs. (g) Number of tissue and differentiated cell type samples from the FANTOM5 expression atlas29 in which LTR enhancer-associated CAGE tag clusters are expressed. Enlarged plots are shown for the first five bins. Bin width = 1 sample. (h) Frequency distribution of the distances between interacting loci identified by ChIA-PET.

Supplementary Figure 11 Multiple negative controls for knockdown experiments in Nanog-GFP iPS_MEF-Ng-20D17 cells.

The normalized Nanog-GFP–positive population, adjusted to the mock control (black bar), quantified by flow cytometry analysis 48 h after siRNA transfections is shown for 12 negative control siRNAs: 2 scrambled sequences, 1 siRNA targeting the luciferase transcript and 7 siRNAs targeting LTR, LINE and SINE elements not expressed in our data set, as well as 2 siRNAs targeting mRNAs originating from genes (Sdr16c6, Wfdc6a) with promoters overlapping LTR elements. Positive control siRNAs (green bars) targeting Nanog and Sox2 are shown for comparison. n = 3; error bars, s.d.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–11 and Supplementary Tables 1–5 (PDF 7729 kb)

Supplementary Data Set 1

Human stem cell transcriptome profiling data sets. (ZIP 28724 kb)

Supplementary Data Set 2

Mouse stem cell transcriptome profiling data sets. (ZIP 22435 kb)

Supplementary Note

Supplementary Note (XLS 54 kb)

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fort, A., Hashimoto, K., Yamada, D. et al. Deep transcriptome profiling of mammalian stem cells supports a regulatory role for retrotransposons in pluripotency maintenance. Nat Genet 46, 558–566 (2014) doi:10.1038/ng.2965

Download citation

Further reading