Long intergenic noncoding RNAs (lincRNAs) are derived from thousands of loci in mammalian genomes and are frequently enriched in transposable elements (TEs). Although families of TE-derived lincRNAs have recently been implicated in the regulation of pluripotency, little is known of the specific functions of individual family members. Here we characterize three new individual TE-derived human lincRNAs, human pluripotency-associated transcripts 2, 3 and 5 (HPAT2, HPAT3 and HPAT5). Loss-of-function experiments indicate that HPAT2, HPAT3 and HPAT5 function in preimplantation embryo development to modulate the acquisition of pluripotency and the formation of the inner cell mass. CRISPR-mediated disruption of the genes for these lincRNAs in pluripotent stem cells, followed by whole-transcriptome analysis, identifies HPAT5 as a key component of the pluripotency network. Protein binding and reporter-based assays further demonstrate that HPAT5 interacts with the let-7 microRNA family. Our results indicate that unique individual members of large primate-specific lincRNA families modulate gene expression during development and differentiation to reinforce cell fate.
At a glance
- The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012). et al.
- Genome-wide computational identification and manual annotation of human long noncoding RNA genes. RNA 16, 1478–1487 (2010). et al.
- Long noncoding RNAs with enhancer-like function in human cells. Cell 143, 46–58 (2010). et al.
- lincRNAs: genomics, evolution, and mechanisms. Cell 154, 26–46 (2013). &
- Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet. 9, e1003470 (2013). et al.
- Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322, 750–756 (2008). , , , &
- Uncovering the role of genomic “dark matter” in human disease. J. Clin. Invest. 122, 1589–1595 (2012). &
- Specific expression of long noncoding RNAs in the mouse brain. Proc. Natl. Acad. Sci. USA 105, 716–721 (2008). , , , &
- Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. USA 106, 11667–11672 (2009). et al.
- RNA traffic control of chromatin complexes. Curr. Opin. Genet. Dev. 20, 142–148 (2010). &
- Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311–1323 (2007). et al.
- Genome regulation by long noncoding RNAs. Annu. Rev. Biochem. 81, 145–166 (2012). &
- Characterization of the human ESC transcriptome by hybrid sequencing. Proc. Natl. Acad. Sci. USA 110, E4821–E4830 (2013). et al.
- Transposable elements reveal a stem cell–specific class of long noncoding RNAs. Genome Biol. 13, R107 (2012). &
- Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution. Cell 147, 1537–1550 (2011). , , , &
- Primate-specific endogenous retrovirus–driven transcription defines naive-like stem cells. Nature 516, 405–409 (2014). et al.
- The retrovirus HERVH is a long noncoding RNA required for human embryonic stem cell identity. Nat. Struct. Mol. Biol. 21, 423–425 (2014). et al.
- Dynamic transcription of distinct classes of endogenous retroviral elements marks specific populations of early human embryonic cells. Cell Stem Cell 16, 135–141 (2015). et al.
- Intrinsic retroviral reactivation in human preimplantation embryos and pluripotent cells. Nature 522, 221–225 (2015). et al.
- Deep transcriptome profiling of mammalian stem cells supports a regulatory role for retrotransposons in pluripotency maintenance. Nat. Genet. 46, 558–566 (2014). et al.
- Isolation of three kinds of human endogenous retrovirus–like sequences using tRNAPro as a probe. Nucleic Acids Res. 15, 9153–9162 (1987). , &
- Comparative primate genomics: emerging patterns of genome content and dynamics. Nat. Rev. Genet. 15, 347–359 (2014). &
- Large intergenic non-coding RNA-RoR modulates reprogramming of human induced pluripotent stem cells. Nat. Genet. 42, 1113–1117 (2010). et al.
- HERV-H RNA is abundant in human embryonic stem cells and a precise marker for pluripotency. Retrovirology 9, 111 (2012). , &
- Characterization of transcriptional networks in blood stem and progenitor cells using high-throughput single-cell gene expression analysis. Nat. Cell Biol. 15, 363–372 (2013). et al.
- Single-cell expression analyses during cellular reprogramming reveal an early stochastic and a late hierarchic phase. Cell 150, 1209–1222 (2012). et al.
- Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122, 947–956 (2005). et al.
- A direct physical interaction between Nanog and Sox2 regulates embryonic stem cell self-renewal. EMBO J. 32, 2231–2247 (2013). et al.
- The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nat. Genet. 38, 431–440 (2006). et al.
- Sall4 interacts with Nanog and co-occupies Nanog genomic sites in embryonic stem cells. J. Biol. Chem. 281, 24090–24094 (2006). et al.
- Sall4 modulates embryonic stem cell pluripotency and early embryonic development by the transcriptional regulation of Pou5f1. Nat. Cell Biol. 8, 1114–1123 (2006). et al.
- Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat. Genet. 42, 631–634 (2010). et al.
- ATP-dependent chromatin remodeling shapes the long noncoding RNA landscape. Genes Dev. 28, 2348–2360 (2014). &
- SWI/SNF chromatin-remodeling complex acts in noncoding RNA–mediated transcriptional silencing. Mol. Cell 49, 298–309 (2013). , , &
- Identification of proteins binding coding and non-coding human RNAs using protein microarrays. BMC Genomics 13, 633 (2012). et al.
- TRBP recruits the Dicer complex to Ago2 for microRNA processing and gene silencing. Nature 436, 740–744 (2005). et al.
- Identifying mRNA sequence elements for target recognition by human Argonaute proteins. Genome Res. 24, 775–785 (2014). et al.
- Opposing microRNA families regulate self-renewal in mouse embryonic stem cells. Nature 463, 621–626 (2010). , &
- The let-7/LIN-41 pathway regulates reprogramming to human induced pluripotent stem cells by controlling expression of prodifferentiation genes. Cell Stem Cell 14, 40–52 (2014). et al.
- LINE-1 retrotransposons and let-7 miRNA: partners in the pathogenesis of cancer? Front. Genet. 5, 338 (2014). , &
- Silencing of LINE-1 retrotransposons contributes to variation in small noncoding RNA expression in human cancer cells. Oncotarget 5, 4103–4117 (2014). &
- Lin28 recruits the TUTase Zcchc11 to inhibit let-7 maturation in mouse embryonic stem cells. Nat. Struct. Mol. Biol. 16, 1021–1025 (2009). , &
- Lin28 mediates the terminal uridylation of let-7 precursor microRNA. Mol. Cell 32, 276–284 (2008). et al.
- DGCR8 is essential for microRNA biogenesis and silencing of embryonic stem cell self-renewal. Nat. Genet. 39, 380–385 (2007). , , , &
- A drug-inducible system for direct reprogramming of human somatic cells to pluripotency. Cell Stem Cell 3, 346–353 (2008). et al.
- A biochemical approach to identifying microRNA targets. Proc. Natl. Acad. Sci. USA 104, 19291–19296 (2007). et al.
- Evolutionary history of mammalian transposons determined by genome-wide defragmentation. PLoS Comput. Biol. 3, e137 (2007). et al.
- Impact of retrotransposons in pluripotent stem cells. Mol. Cells 34, 509–516 (2012). , &
- Alu elements in ANRIL non-coding RNA at chromosome 9p21 modulate atherogenic cell functions through trans-regulation of gene networks. PLoS Genet. 9, e1003588 (2013). et al.
- lncRNAs transactivate STAU1-mediated mRNA decay by duplexing with 3′ UTRs via Alu elements. Nature 470, 284–288 (2011). &
- Functional microRNAs and target sites are created by lineage-specific transposition. Hum. Mol. Genet. 23, 1783–1793 (2014). , &
- Plaid models for gene expression data. Stat. Sin. 12, 61–86 (2002). &
- Gene expression profiling in single cells from the pancreatic islets of Langerhans reveals lognormal distribution of mRNA levels. Genome Res. 15, 1388–1392 (2005). , , &
- Induction of human neuronal cells by defined transcription factors. Nature 476, 220–223 (2011). et al.
- Ago2 immunoprecipitation identifies predicted microRNAs in human embryonic stem cells and neural precursors. PLoS One 4, e7192 (2009). et al.
- Imaging individual mRNA molecules using multiple singly labeled probes. Nat. Methods 5, 877–879 (2008). , , , &
- Supplementary Figure 1: All 23 HPATs are significantly enriched for TEs. (475 KB)
(a) The different classes of TEs are color-coded; corresponding colors are used in b–e. (b,c) Coverage of different TE classes on the genome (exons + introns) and transcript (only exons) levels. The percentage of total length for each TE is represented for all 23 HPATs. Three control genes are included. (d) The 23 HPATs with embedded TEs and genomic length. Displayed are the most highly expressed isoforms for each HPAT gene. Genomic DNA is represented as a black line, and exons are represented by gray boxes. TEs are represented by the colored boxes underneath. Each exon is exonized with TEs (exons overlap TEs). The length of each genomic locus is not to scale.
- Supplementary Figure 2: Molecular and functional analysis of HPAT2, HPAT3 and HPAT5 during preimplantation development. (296 KB)
(a) Magnified view of the ICM in human blastocyst demonstrates a specific staining pattern in the ICM of human blastocysts. Stars depict HPAT3 signal. Arrows depict HPAT5 signal. (b) HPAT2, HPAT3 and HPAT5 are significantly downregulated in human blastocysts injected with siRNAs compared to siScrambled controls (n = 3 blastocysts; data are shown with s.e.m.). (c) Blastomeres with knockdown of HPAT2, HPAT3 and HPAT5 during human embryo development did not contribute to ICM. The presence of ICM was validated with OCT4 and SOX2 staining. The ICM is highlighted by a yellow dashed circle. (d) Fluorescence-positive ICM in blastocysts with HPAT knockdown and control blastocysts.
- Supplementary Figure 3: Primer validation and quality control of single-cell gene expression data. (578 KB)
(a) Histological sections stained with hematoxylin and eosin from teratomas derived from established iPSCs (iPSCs that resulted from derivation from BJ fibroblasts are termed fully established iPSCs and were used as the last time point for collection (see b). (b) Tracking of morphological changes of BJ fibroblasts during mRNA reprogramming with the Yamanaka factors. Depicted are the days at which cells were collected. Fibroblasts transfected with GFP only for five consecutive days are shown as well with GFP signal (images are representative). (c) Representative example of a dilution series for all 96 assays. Ct values were plotted as a function of the dilution factors (1:2) on a log scale. Linear regression analysis is depicted by the red line. Eight assays with R2 <0.97 were excluded, thus leaving 88 assays. (d) Distribution histogram of calculated primer efficiencies for 88 DELTAgene assays estimated from the slopes of standard curve plots. The average efficiency is 1.02 with s.d. = 0.06. (e) Quantile-quantile plot with experimentally estimated efficiencies (y axis) and the values expected for a normal distribution with mean efficiency = 1.02 and s.d. = 0.06 (x axis). The black line indicates the values expected for a normal distribution (y = x). Efficiency values that were derived from plots with three points in the standard curve are depicted in blue. Values derived from plots with >3 points in the standard curve are depicted in red. (f) Microscopic view of two capture sides on the C1 Single-Cell Auto Prep System microfluidic chip. The left capture side has no cell, and the right capture side has one captured cell (red arrow). (g) Representative example of primer specificity evaluation using melting curve analysis (here with HPAT2). The graph shows the relative change in fluorescence signal (EvaGreen) over the temperature range for all 96 cells on a single array. The area in red depicts false positive signals with incorrect melting curve temperatures (determined with bulk RNA and based on the melting curve temperature provided by Fluidigm). The area in green depicts the correct melting curves. Data outside the correct melting curve were set to 0. (h) Correlation analysis of mean Ct values generated 96 cells of three dynamic IFC arrays (single cells of (i) BJ fibroblasts, (ii) BJ fibroblasts transfected with mRNA encoding GFP for 2 d, (iii) BJ fibroblasts transfected with mRNA encoding GFP for 5 d). Genes that were detected in at least 20% of the 96 cells for each dynamic IFC array are considered. Shown are all three comparisons. Outliers are shown in green (GFP) and red. The assays in red (total of six) were excluded from subsequent analysis due to a non-correlative pattern among the arrays, leaving the 82 assays that are listed in Supplementary Table 2. i, Schematic overview of the quality assessment before normalization of single-cell gene expression. Nine dynamic IFC arrays (96.96 Fluidigm chips) were used for gene expression analysis. Two GFP control chips along with one fibroblast chip were used for correlation analysis (h) followed by initial quality assessment. Processed chips were used for a second round of quality assessment, resulting in 578 normalized single cells.
- Supplementary Figure 4: Single-cell gene expression analysis during nuclear reprogramming and reactivation of HPAT expression during in vitro transdifferentiation from fibroblasts into neurons. (679 KB)
(a–c) Heat map plot of single-cell gene expression of different markers during nuclear reprogramming. Single cells are in rows. Genes are in columns. Fibroblast markers decrease over time, and pluripotency-specific markers, including selected HPATs, increase over time as cell progress toward iPSCs (n = 87, 85, 72, 70, 86, 83 and 95 for fibroblasts, day 2, day 5, day 7, day 10, day 12 and iPSCs, respectively). Normalization was performed accordingly (the Supplementary Note provides details). White color indicates no expression. (d) PCA of 578 single cells collected at different time points during nuclear reprogramming. (e) Heat map and unsupervised clustering for 578 single-cell gene expression values resulted in clustering of novel genes implicating a similar biological context during reprogramming. Samples are color-coded according to the specific gene groups (horizontal) and the day at which single cells were collected (vertical). (f) The pluripotency marker POU5F1 (red) and HPAT2 (red; representative of all HPATs in this study) were exclusively expressed in H9 cells (hESCs) but not in (i) cDNA from colon, liver and lung (endoderm) and (ii) during neuronal transdifferentiation from fibroblasts (gray; samples collected at day 5 and day 30 are labeled iN-D5 and iN-D30, respectively) or cDNA from brain) (all ectoderm). EN2 and PAX6, included as ectoderm control markers, were detected during neuron differentiation and in brain samples (n = 3; data shown with s.e.m.). (g) Heat map of bicluster analysis illustrating a different bicluster within each plot (Supplementary Table 3). Three different algorithms for bicluster calculation were applied, resulting in the identification of five clusters, four clusters and 16 clusters.
- Supplementary Figure 5: Overexpression and silencing constructs for HPAT2, HPAT3 and HPAT5, NANOG ChIP-seq and regulation of HPAT5 during hESC differentiation. (388 KB)
(a) Validation of siRNAs targeting HPAT2, HPAT3 and HPAT5, respectively, in hESCs. Gene expression and P values were measured relative to siGlo control 48 h after transfection (n = 9). Orange color depicts expected gene downregulation. (b) Validation of the overexpression vectors. BJ fibroblasts were transfected with HPAT2, HPAT3 and HPAT5. Gene expression and P values were measured 48 h after transfection relative to those in GFP-transfected fibroblasts (n = 9). Blue color depicts expected gene upregulation. (c) ChIP-qPCR analysis in H9 cells (hESCs) using NANOG. Signals were quantified using primer sets specific to a subset of HPATs or two ‘negative’ intergenic, non-repetitive regions. Two enhancers around SOX2 are included as positive controls (n = 3; data are shown with s.e.m.). (d) Three snapshots of the UCSC browser (genome location indicated) aligned with the NANOG-binding region for HPAT2, HPAT3 and HPAT5 from ChIP-seq analysis. (e,f) Overexpression constructs and validation of the HPAT5-OE and mCherry-OE lines. HPAT5 was significantly upregulated in hESC-OE cells compared to control cells. mCherry protein expression was also confirmed. n = 3; data are sgiwb with s.e.m. (g) Increase in differentiation markers representing all three germ layers significantly repressed in HPAT5-OE cells. P values are calculated for comparison of the mCherry and HPAT5-OE lines on the same days.
- Supplementary Figure 6: Protein microarray with HPAT2, HPAT3 and HPAT5. (319 KB)
(a) Formaldehyde agarose RNA gel of the Cy5-labeled lincRNAs before hybridization to the protein array. (b) Representative image of a ProtoArray and fluorescence intensity for HPAT2 and HPAT3 (positive) and HPAT5 (negative) on OCT4 protein in duplicate. (c) Heat map of HPAT2-, HPAT3- and HPAT5-binding proteins with RISC proteins and OCT4 highlighted (z score > 2.5). (d) Total number of candidate proteins identified with the three HPATs (with and without common RNA-binding proteins). (e) Validation of the findings by Lu et al. that HERV-H–derived lincRNAs (HPAT2 and HPAT3) bind to specific OCT4, coactivators and mediators.
- Supplementary Figure 7: Loss-of-function analyses in hESCs. (395 KB)
(a) Predicted let-7 binding sites in HPAT5 transcript. Shown is HPAT5 with embedded TEs along the genomic length (black line). Exons are shown as gray boxes. TEs are shown as colored boxes underneath. let-7 binding sites are within a SINE element (Alu). Bases in red are point mutations and confer HPAT5 specificity. (b) Gene expression analysis of endogenous pre-let-7 and mature let-7 in fibroblasts. n = 3; data are shown with s.e.m. (c) Schematic overview of the HPAT5 locus in genomic DNA from subcloned hESCs that were treated with CRISPR pairs 2 and 5 (gRNA2/5). Forward and reverse primers (in red) were designed to amplify a region of genomic DNA that is inside the deleted HPAT5 locus. (d) Agarose gel illustrating successful derivation of the HPAT5-knockout hESC line. Genomic DNA from hESCs (passage 4 after subcloning) did not result in specific amplification. The controls included negative control (treatment only with one CRISPR arm, gRNA2), wild-type hESCs and no-template control (NT). (e) Gene expression analysis of endogenous HPAT5 in hESCs. n = 3; data are shown with s.e.m. (f) Endogenous let-7 levels do not reach the levels in differentiated cells during 48 h of hESC differentiation. Endogenous let-7 levels are significantly increased 48 h after differentiation with bFGF removal (tenfold). The levels of endogenous let-7 are still significantly higher in human fibroblasts (100-fold) compared to differentiated hESCs. HPAT5 knockout increases endogenous let-7 levels to ones similar to those found in hESCs differentiated for 24 h. Overexpression of let-7 in hESCs results in a -50-fold increase compared to human fibroblasts. let-7 levels were normalized to Hs-RNU6-2. n = 3; data are shown with s.e.m. (g) Differentiation of hESCs into secondary fibroblasts followed by episomal reprogramming into iPSCs. (h) Percentage of AP- and TRA-1-81–positive cells in HPAT5-WT and HPAT5-KO cells 25 d after reprogramming. n = 3; data are shown with s.e.m. (i) Endogenous let-7 and HPAT5 levels during nuclear reprogramming at day 10. n = 3; data are shwon with s.e.m.
- Supplementary Figure 8: HPAT5 regulates let-7 in hESCs during differentiation. (402 KB)
(a) Heat map of differentially expressed genes (P < 0.05) after let-7 overexpression in four different samples). (b–d) Enrichment of let-7 seed sits in transcripts that were downregulated in hESC-HPAT5-KO cells. Overexpression from HPAT5-WT transcript rescued let-7–mediated differentiation. The Word cluster plot shows sequences in genes ranked by differential expression, after let-7 transfection. Each dot represents a word, summarizing z scores, and enrichment specificity indices of the enrichment profiles of negatively correlated 6- and 7-mer words. Triangles annotate known seed sites of human miRNAs. (i) A zoomed-in view (top) from the cluster plot. (e) Endogenous HPAT2, HPAT3 and HPAT5 expression in hESCs with let-7 overexpressed. n = 3; data are shown with s.e.m. (f) Immunoblot confirming specific AGO2 pulldown. OE, overexpression. n = 3 samples; data are shown with s.e.m.
- Supplementary Text and Figures (14,558 KB)
Supplementary Figures 1–8, Supplementary Tables 1–4 and Supplementary Note.