The primate-specific noncoding RNA HPAT5 regulates pluripotency during human preimplantation development and nuclear reprogramming

Journal name:
Nature Genetics
Volume:
48,
Pages:
44–52
Year published:
DOI:
doi:10.1038/ng.3449
Received
Accepted
Published online

Abstract

Long intergenic noncoding RNAs (lincRNAs) are derived from thousands of loci in mammalian genomes and are frequently enriched in transposable elements (TEs). Although families of TE-derived lincRNAs have recently been implicated in the regulation of pluripotency, little is known of the specific functions of individual family members. Here we characterize three new individual TE-derived human lincRNAs, human pluripotency-associated transcripts 2, 3 and 5 (HPAT2, HPAT3 and HPAT5). Loss-of-function experiments indicate that HPAT2, HPAT3 and HPAT5 function in preimplantation embryo development to modulate the acquisition of pluripotency and the formation of the inner cell mass. CRISPR-mediated disruption of the genes for these lincRNAs in pluripotent stem cells, followed by whole-transcriptome analysis, identifies HPAT5 as a key component of the pluripotency network. Protein binding and reporter-based assays further demonstrate that HPAT5 interacts with the let-7 microRNA family. Our results indicate that unique individual members of large primate-specific lincRNA families modulate gene expression during development and differentiation to reinforce cell fate.

At a glance

Figures

  1. Molecular single-cell gene expression and functional analyses of HPAT2, HPAT3 and HPAT5 during human embryo development.
    Figure 1: Molecular single-cell gene expression and functional analyses of HPAT2, HPAT3 and HPAT5 during human embryo development.

    (a) Overview of single-cell gene expression analysis for human blastocysts. A total of 24 blastocysts were pooled and run on four C1 chips. (bd) Single-cell gene expression analyses. RT, reverse transcribe. (b) Three HPATs had significantly higher expressed in ICM than in trophectoderm (TROPH). (c,d) ICM (c) and trophectoderm (d) markers are also shown. Box plots are shown for each group. The whiskers are the minimum and maximum data points. The bottom and top of each box are the first and third quartiles, respectively. n = 46 single trophoectoderm cells and n = 67 single ICM cells. (e) Immunohistochemistry and RNA FISH for OCT4 (green) and lincRNAs (red), respectively, in human blastocysts. Sections are counterstained with DAPI (blue). ICMs are circled by dotted white lines. Entire human or mouse blastocysts are circled by dotted yellow lines in the merged images. lincRNA signal was specific to the ICM. Speckles (red) are nonspecific and are observed in all human blastocysts. Mouse blastocysts initiated hatching when fixed. Images are representative (n = 9 human blastocysts for HPAT3, n = 11 human blastocysts for HPAT5 and n = 3 mouse blastocysts; n = 2 independent experiments). Scale bar, 100 μm. (f) Blastomeres with reduced expression of HPAT2, HPAT3 and HPAT5 during human embryo development did not contribute to ICM. The presence of ICM was validated with OCT4 and SOX2 staining. ICMs are circled by yellow dashed lines (n = 3 blastocysts). RRX, Rhodamine Red-X; siHPAT2/3/5, combination of siRNAs targeting HPAT2, HPAT3 and HPAT5. Scale bar, 100 μm.

  2. Single-cell expression analysis of HPAT transcripts during nuclear reprogramming.
    Figure 2: Single-cell expression analysis of HPAT transcripts during nuclear reprogramming.

    (a) Dynamics in single-cell expression of HPAT transcripts during nuclear reprogramming shown as box plots. HPATs are grouped according to activation pattern (with two examples representing each group). (b) Bicluster analysis identifies five biclusters, P1–P5, and correlates HPAT2, HPAT3 and HPAT5 (gene names in orange) with core pluripotency markers (method by Lazzeroni and Owen52). (c) Correlation analysis across single cells identifies positively and negatively correlated gene pairs (see Supplementary Table 3 for additional details). (d) Bayesian network analysis predicts a central role for HPAT2, HPAT3 and HPAT5 (orange circles) within the core regulatory network (yellow circles). The hierarchical view predicts that SALL4 (red circle) triggers a cascade of key pluripotency gene activation. Arrow thickness and circle size increase with the confidence level of interactions in the calculated network. Only pluripotency genes and lincRNAs were included. Data in ad represent n = 578 single cells.

  3. Functional analyses of HPAT2, HPAT3 and HPAT5 during nuclear reprogramming.
    Figure 3: Functional analyses of HPAT2, HPAT3 and HPAT5 during nuclear reprogramming.

    (a) Experimental scheme of functional analysis of HPAT2, HPAT3 and HPAT5 (HPAT2/3/5) during iPSC reprogramming. AP, alkaline phosphatase. (b) Immunostaining of TRA-1-60 during reprogramming with HPAT2, HPAT3 and HPAT5 in combination. Representative images are shown (n = 8). KD, knockdown. Scale bar, 100 μm. (c) Calculated percentage of TRA-1-60–positive cells at different points during reprogramming (n = 8). Data are represented as means + s.e.m. (d) Representative images of colony size appearance at day 12 during reprogramming (n = 8). Scale bar, 100 μm. (e) Alkaline phosphatase staining at day 12. Shown are the wells of a six-well plate from one experiment (n = 2 independent experiments). (f) iPSC colony number counted on the basis of alkaline phosphatase staining (n = 3). Data are represented as means + s.e.m. (g) Cell number relative to control cells during reprogramming (n = 3). Data are represented as means + s.e.m. NS, not significant. (h) Alkaline phosphatase staining at day 12 during reprogramming (n = 2 for each condition). (i) Percentage of TRA-1-60–positive cells at day 12 of single knockdown of HPAT2, HPAT3 or HPAT5. siGlo was used as a control. Data are represented as means + s.e.m. (j) Reprogramming with POU5F1 and HPAT2, HPAT3 and HPAT5. POU5F1 only is used as a control. (k) Epigenetic and gene expression analysis of HPAT2, HPAT3 and HPAT5. NANOG mRNA was transfected into BJ fibroblasts treated or not with 5-Aza-2′-deoxycytidine (5-Aza) with gene expression measurement at 48 h (n = 6). Data are represented as means + s.e.m.

  4. HPAT5 binds directly to let-7.
    Figure 4: HPAT5 binds directly to let-7.

    (ac) HPAT5-OE hESCs suppress differentiation mediated by siRNA to POU5F1 and bFGF removal. (a) Outline of the protocol. (b,c) The expression of key pluripotency markers decreases with delay in comparison to the mCherry-OE control line. P values are calculated for the comparison of the mCherry-OE and HPAT5-OE lines on the same days (c), with this evidence supported by morphological changes (after day 3) (b) (n = 2 independent experiments). Scale bar, 50 μm. (d) Two predicted let-7 binding positions in HPAT5 identified by RegRNA 2.0 (miRanda). (e,f) Target validation using luciferase reporters in HEK293 cells. The relative luciferase activity (shown as fold change relative to empty vector) was assayed 48 h after cotransfection of cells with the indicated miRNAs or control (scrambled miRNA) together with wild-type reporter (e) or the let-7 miRNAs together with wild-type or mutant reporter (f). (g) The point mutations (in red) introduced into two mutant let-7 mimics and two mutant HPAT5 reporters for compensatory analysis. WT, wild type. (h) Analysis of the effects of the point mutations using luciferase reporters in HEK293 cells. Relative luciferase activity (shown as fold change relative to empty vector) of wild-type or mutant reporters was assayed 48 h after cotransfection of cells with the indicated miRNAs or scrambled miRNA. Representative results from n = 2 (c,e,f,h) independent experiments; n = 3 samples; data are shown with s.e.m.

  5. HPAT5 regulates let-7 in hESCs during differentiation.
    Figure 5: HPAT5 regulates let-7 in hESCs during differentiation.

    (a) Morphological changes 48 h after let-7 overexpression in HPAT5-WT and HPAT5-KO lines. (b) Table listing findings from enrichment analysis with cWords (Supplementary Table 8). NA, not applicable. (c) HPAT5 regulates let-7 activity. Expression levels of mature let-7a and let-7d in undifferentiated hESCs (H1) transiently transfected for 48 h with wild-type (WT) or mutant HPAT5 or with knockdown with siHPAT5. Empty vector or scrambled miRNA was used as a negative control. RNA levels (ln) and P values are shown relative to negative controls. n = 3 samples; data are shown with s.e.m. (d) RIP-qPCR showing interaction between AGO2 and HPAT5 but not GAPDH in hESCs transfected with let-7. n = 3 samples; data are shown with s.e.m.

  6. All 23 HPATs are significantly enriched for TEs.
    Supplementary Fig. 1: All 23 HPATs are significantly enriched for TEs.

    (a) The different classes of TEs are color-coded; corresponding colors are used in be. (b,c) Coverage of different TE classes on the genome (exons + introns) and transcript (only exons) levels. The percentage of total length for each TE is represented for all 23 HPATs. Three control genes are included. (d) The 23 HPATs with embedded TEs and genomic length. Displayed are the most highly expressed isoforms for each HPAT gene. Genomic DNA is represented as a black line, and exons are represented by gray boxes. TEs are represented by the colored boxes underneath. Each exon is exonized with TEs (exons overlap TEs). The length of each genomic locus is not to scale.

  7. Molecular and functional analysis of HPAT2, HPAT3 and HPAT5 during preimplantation development.
    Supplementary Fig. 2: Molecular and functional analysis of HPAT2, HPAT3 and HPAT5 during preimplantation development.

    (a) Magnified view of the ICM in human blastocyst demonstrates a specific staining pattern in the ICM of human blastocysts. Stars depict HPAT3 signal. Arrows depict HPAT5 signal. (b) HPAT2, HPAT3 and HPAT5 are significantly downregulated in human blastocysts injected with siRNAs compared to siScrambled controls (n = 3 blastocysts; data are shown with s.e.m.). (c) Blastomeres with knockdown of HPAT2, HPAT3 and HPAT5 during human embryo development did not contribute to ICM. The presence of ICM was validated with OCT4 and SOX2 staining. The ICM is highlighted by a yellow dashed circle. (d) Fluorescence-positive ICM in blastocysts with HPAT knockdown and control blastocysts.

  8. Primer validation and quality control of single-cell gene expression data.
    Supplementary Fig. 3: Primer validation and quality control of single-cell gene expression data.

    (a) Histological sections stained with hematoxylin and eosin from teratomas derived from established iPSCs (iPSCs that resulted from derivation from BJ fibroblasts are termed fully established iPSCs and were used as the last time point for collection (see b). (b) Tracking of morphological changes of BJ fibroblasts during mRNA reprogramming with the Yamanaka factors. Depicted are the days at which cells were collected. Fibroblasts transfected with GFP only for five consecutive days are shown as well with GFP signal (images are representative). (c) Representative example of a dilution series for all 96 assays. Ct values were plotted as a function of the dilution factors (1:2) on a log scale. Linear regression analysis is depicted by the red line. Eight assays with R2 <0.97 were excluded, thus leaving 88 assays. (d) Distribution histogram of calculated primer efficiencies for 88 DELTAgene assays estimated from the slopes of standard curve plots. The average efficiency is 1.02 with s.d. = 0.06. (e) Quantile-quantile plot with experimentally estimated efficiencies (y axis) and the values expected for a normal distribution with mean efficiency = 1.02 and s.d. = 0.06 (x axis). The black line indicates the values expected for a normal distribution (y = x). Efficiency values that were derived from plots with three points in the standard curve are depicted in blue. Values derived from plots with >3 points in the standard curve are depicted in red. (f) Microscopic view of two capture sides on the C1 Single-Cell Auto Prep System microfluidic chip. The left capture side has no cell, and the right capture side has one captured cell (red arrow). (g) Representative example of primer specificity evaluation using melting curve analysis (here with HPAT2). The graph shows the relative change in fluorescence signal (EvaGreen) over the temperature range for all 96 cells on a single array. The area in red depicts false positive signals with incorrect melting curve temperatures (determined with bulk RNA and based on the melting curve temperature provided by Fluidigm). The area in green depicts the correct melting curves. Data outside the correct melting curve were set to 0. (h) Correlation analysis of mean Ct values generated 96 cells of three dynamic IFC arrays (single cells of (i) BJ fibroblasts, (ii) BJ fibroblasts transfected with mRNA encoding GFP for 2 d, (iii) BJ fibroblasts transfected with mRNA encoding GFP for 5 d). Genes that were detected in at least 20% of the 96 cells for each dynamic IFC array are considered. Shown are all three comparisons. Outliers are shown in green (GFP) and red. The assays in red (total of six) were excluded from subsequent analysis due to a non-correlative pattern among the arrays, leaving the 82 assays that are listed in Supplementary Table 2. i, Schematic overview of the quality assessment before normalization of single-cell gene expression. Nine dynamic IFC arrays (96.96 Fluidigm chips) were used for gene expression analysis. Two GFP control chips along with one fibroblast chip were used for correlation analysis (h) followed by initial quality assessment. Processed chips were used for a second round of quality assessment, resulting in 578 normalized single cells.

  9. Single-cell gene expression analysis during nuclear reprogramming and reactivation of HPAT expression during in vitro transdifferentiation from fibroblasts into neurons.
    Supplementary Fig. 4: Single-cell gene expression analysis during nuclear reprogramming and reactivation of HPAT expression during in vitro transdifferentiation from fibroblasts into neurons.

    (ac) Heat map plot of single-cell gene expression of different markers during nuclear reprogramming. Single cells are in rows. Genes are in columns. Fibroblast markers decrease over time, and pluripotency-specific markers, including selected HPATs, increase over time as cell progress toward iPSCs (n = 87, 85, 72, 70, 86, 83 and 95 for fibroblasts, day 2, day 5, day 7, day 10, day 12 and iPSCs, respectively). Normalization was performed accordingly (the Supplementary Note provides details). White color indicates no expression. (d) PCA of 578 single cells collected at different time points during nuclear reprogramming. (e) Heat map and unsupervised clustering for 578 single-cell gene expression values resulted in clustering of novel genes implicating a similar biological context during reprogramming. Samples are color-coded according to the specific gene groups (horizontal) and the day at which single cells were collected (vertical). (f) The pluripotency marker POU5F1 (red) and HPAT2 (red; representative of all HPATs in this study) were exclusively expressed in H9 cells (hESCs) but not in (i) cDNA from colon, liver and lung (endoderm) and (ii) during neuronal transdifferentiation from fibroblasts (gray; samples collected at day 5 and day 30 are labeled iN-D5 and iN-D30, respectively) or cDNA from brain) (all ectoderm). EN2 and PAX6, included as ectoderm control markers, were detected during neuron differentiation and in brain samples (n = 3; data shown with s.e.m.). (g) Heat map of bicluster analysis illustrating a different bicluster within each plot (Supplementary Table 3). Three different algorithms for bicluster calculation were applied, resulting in the identification of five clusters, four clusters and 16 clusters.

  10. Overexpression and silencing constructs for HPAT2, HPAT3 and HPAT5, NANOG ChIP-seq and regulation of HPAT5 during hESC differentiation.
    Supplementary Fig. 5: Overexpression and silencing constructs for HPAT2, HPAT3 and HPAT5, NANOG ChIP-seq and regulation of HPAT5 during hESC differentiation.

    (a) Validation of siRNAs targeting HPAT2, HPAT3 and HPAT5, respectively, in hESCs. Gene expression and P values were measured relative to siGlo control 48 h after transfection (n = 9). Orange color depicts expected gene downregulation. (b) Validation of the overexpression vectors. BJ fibroblasts were transfected with HPAT2, HPAT3 and HPAT5. Gene expression and P values were measured 48 h after transfection relative to those in GFP-transfected fibroblasts (n = 9). Blue color depicts expected gene upregulation. (c) ChIP-qPCR analysis in H9 cells (hESCs) using NANOG. Signals were quantified using primer sets specific to a subset of HPATs or two ‘negative’ intergenic, non-repetitive regions. Two enhancers around SOX2 are included as positive controls (n = 3; data are shown with s.e.m.). (d) Three snapshots of the UCSC browser (genome location indicated) aligned with the NANOG-binding region for HPAT2, HPAT3 and HPAT5 from ChIP-seq analysis. (e,f) Overexpression constructs and validation of the HPAT5-OE and mCherry-OE lines. HPAT5 was significantly upregulated in hESC-OE cells compared to control cells. mCherry protein expression was also confirmed. n = 3; data are sgiwb with s.e.m. (g) Increase in differentiation markers representing all three germ layers significantly repressed in HPAT5-OE cells. P values are calculated for comparison of the mCherry and HPAT5-OE lines on the same days.

  11. Protein microarray with HPAT2, HPAT3 and HPAT5.
    Supplementary Fig. 6: Protein microarray with HPAT2, HPAT3 and HPAT5.

    (a) Formaldehyde agarose RNA gel of the Cy5-labeled lincRNAs before hybridization to the protein array. (b) Representative image of a ProtoArray and fluorescence intensity for HPAT2 and HPAT3 (positive) and HPAT5 (negative) on OCT4 protein in duplicate. (c) Heat map of HPAT2-, HPAT3- and HPAT5-binding proteins with RISC proteins and OCT4 highlighted (z score > 2.5). (d) Total number of candidate proteins identified with the three HPATs (with and without common RNA-binding proteins). (e) Validation of the findings by Lu et al. that HERV-H–derived lincRNAs (HPAT2 and HPAT3) bind to specific OCT4, coactivators and mediators.

  12. Loss-of-function analyses in hESCs.
    Supplementary Fig. 7: Loss-of-function analyses in hESCs.

    (a) Predicted let-7 binding sites in HPAT5 transcript. Shown is HPAT5 with embedded TEs along the genomic length (black line). Exons are shown as gray boxes. TEs are shown as colored boxes underneath. let-7 binding sites are within a SINE element (Alu). Bases in red are point mutations and confer HPAT5 specificity. (b) Gene expression analysis of endogenous pre-let-7 and mature let-7 in fibroblasts. n = 3; data are shown with s.e.m. (c) Schematic overview of the HPAT5 locus in genomic DNA from subcloned hESCs that were treated with CRISPR pairs 2 and 5 (gRNA2/5). Forward and reverse primers (in red) were designed to amplify a region of genomic DNA that is inside the deleted HPAT5 locus. (d) Agarose gel illustrating successful derivation of the HPAT5-knockout hESC line. Genomic DNA from hESCs (passage 4 after subcloning) did not result in specific amplification. The controls included negative control (treatment only with one CRISPR arm, gRNA2), wild-type hESCs and no-template control (NT). (e) Gene expression analysis of endogenous HPAT5 in hESCs. n = 3; data are shown with s.e.m. (f) Endogenous let-7 levels do not reach the levels in differentiated cells during 48 h of hESC differentiation. Endogenous let-7 levels are significantly increased 48 h after differentiation with bFGF removal (tenfold). The levels of endogenous let-7 are still significantly higher in human fibroblasts (100-fold) compared to differentiated hESCs. HPAT5 knockout increases endogenous let-7 levels to ones similar to those found in hESCs differentiated for 24 h. Overexpression of let-7 in hESCs results in a -50-fold increase compared to human fibroblasts. let-7 levels were normalized to Hs-RNU6-2. n = 3; data are shown with s.e.m. (g) Differentiation of hESCs into secondary fibroblasts followed by episomal reprogramming into iPSCs. (h) Percentage of AP- and TRA-1-81–positive cells in HPAT5-WT and HPAT5-KO cells 25 d after reprogramming. n = 3; data are shown with s.e.m. (i) Endogenous let-7 and HPAT5 levels during nuclear reprogramming at day 10. n = 3; data are shwon with s.e.m.

  13. HPAT5 regulates let-7 in hESCs during differentiation.
    Supplementary Fig. 8: HPAT5 regulates let-7 in hESCs during differentiation.

    (a) Heat map of differentially expressed genes (P < 0.05) after let-7 overexpression in four different samples). (bd) Enrichment of let-7 seed sits in transcripts that were downregulated in hESC-HPAT5-KO cells. Overexpression from HPAT5-WT transcript rescued let-7–mediated differentiation. The Word cluster plot shows sequences in genes ranked by differential expression, after let-7 transfection. Each dot represents a word, summarizing z scores, and enrichment specificity indices of the enrichment profiles of negatively correlated 6- and 7-mer words. Triangles annotate known seed sites of human miRNAs. (i) A zoomed-in view (top) from the cluster plot. (e) Endogenous HPAT2, HPAT3 and HPAT5 expression in hESCs with let-7 overexpressed. n = 3; data are shown with s.e.m. (f) Immunoblot confirming specific AGO2 pulldown. OE, overexpression. n = 3 samples; data are shown with s.e.m.

Accession codes

Primary accessions

ArrayExpress

Gene Expression Omnibus

References

  1. Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 17751789 (2012).
  2. Jia, H. et al. Genome-wide computational identification and manual annotation of human long noncoding RNA genes. RNA 16, 14781487 (2010).
  3. Ørom, U.A. et al. Long noncoding RNAs with enhancer-like function in human cells. Cell 143, 4658 (2010).
  4. Ulitsky, I. & Bartel, D.P. lincRNAs: genomics, evolution, and mechanisms. Cell 154, 2646 (2013).
  5. Kapusta, A. et al. Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet. 9, e1003470 (2013).
  6. Zhao, J., Sun, B.K., Erwin, J.A., Song, J.J. & Lee, J.T. Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322, 750756 (2008).
  7. Martin, L. & Chang, H.Y. Uncovering the role of genomic “dark matter” in human disease. J. Clin. Invest. 122, 15891595 (2012).
  8. Mercer, T.R., Dinger, M.E., Sunkin, S.M., Mehler, M.F. & Mattick, J.S. Specific expression of long noncoding RNAs in the mouse brain. Proc. Natl. Acad. Sci. USA 105, 716721 (2008).
  9. Khalil, A.M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. USA 106, 1166711672 (2009).
  10. Koziol, M.J. & Rinn, J.L. RNA traffic control of chromatin complexes. Curr. Opin. Genet. Dev. 20, 142148 (2010).
  11. Rinn, J.L. et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 13111323 (2007).
  12. Rinn, J.L. & Chang, H.Y. Genome regulation by long noncoding RNAs. Annu. Rev. Biochem. 81, 145166 (2012).
  13. Au, K.F. et al. Characterization of the human ESC transcriptome by hybrid sequencing. Proc. Natl. Acad. Sci. USA 110, E4821E4830 (2013).
  14. Kelley, D. & Rinn, J. Transposable elements reveal a stem cell–specific class of long noncoding RNAs. Genome Biol. 13, R107 (2012).
  15. Ulitsky, I., Shkumatava, A., Jan, C.H., Sive, H. & Bartel, D.P. Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution. Cell 147, 15371550 (2011).
  16. Wang, J. et al. Primate-specific endogenous retrovirus–driven transcription defines naive-like stem cells. Nature 516, 405409 (2014).
  17. Lu, X. et al. The retrovirus HERVH is a long noncoding RNA required for human embryonic stem cell identity. Nat. Struct. Mol. Biol. 21, 423425 (2014).
  18. Göke, J. et al. Dynamic transcription of distinct classes of endogenous retroviral elements marks specific populations of early human embryonic cells. Cell Stem Cell 16, 135141 (2015).
  19. Grow, E.J. et al. Intrinsic retroviral reactivation in human preimplantation embryos and pluripotent cells. Nature 522, 221225 (2015).
  20. Fort, A. et al. Deep transcriptome profiling of mammalian stem cells supports a regulatory role for retrotransposons in pluripotency maintenance. Nat. Genet. 46, 558566 (2014).
  21. Harada, F., Tsukada, N. & Kato, N. Isolation of three kinds of human endogenous retrovirus–like sequences using tRNAPro as a probe. Nucleic Acids Res. 15, 91539162 (1987).
  22. Rogers, J. & Gibbs, R.A. Comparative primate genomics: emerging patterns of genome content and dynamics. Nat. Rev. Genet. 15, 347359 (2014).
  23. Loewer, S. et al. Large intergenic non-coding RNA-RoR modulates reprogramming of human induced pluripotent stem cells. Nat. Genet. 42, 11131117 (2010).
  24. Santoni, F.A., Guerra, J. & Luban, J. HERV-H RNA is abundant in human embryonic stem cells and a precise marker for pluripotency. Retrovirology 9, 111 (2012).
  25. Moignard, V. et al. Characterization of transcriptional networks in blood stem and progenitor cells using high-throughput single-cell gene expression analysis. Nat. Cell Biol. 15, 363372 (2013).
  26. Buganim, Y. et al. Single-cell expression analyses during cellular reprogramming reveal an early stochastic and a late hierarchic phase. Cell 150, 12091222 (2012).
  27. Boyer, L.A. et al. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122, 947956 (2005).
  28. Gagliardi, A. et al. A direct physical interaction between Nanog and Sox2 regulates embryonic stem cell self-renewal. EMBO J. 32, 22312247 (2013).
  29. Loh, Y.H. et al. The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nat. Genet. 38, 431440 (2006).
  30. Wu, Q. et al. Sall4 interacts with Nanog and co-occupies Nanog genomic sites in embryonic stem cells. J. Biol. Chem. 281, 2409024094 (2006).
  31. Zhang, J. et al. Sall4 modulates embryonic stem cell pluripotency and early embryonic development by the transcriptional regulation of Pou5f1. Nat. Cell Biol. 8, 11141123 (2006).
  32. Kunarso, G. et al. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat. Genet. 42, 631634 (2010).
  33. Alcid, E.A. & Tsukiyama, T. ATP-dependent chromatin remodeling shapes the long noncoding RNA landscape. Genes Dev. 28, 23482360 (2014).
  34. Zhu, Y., Rowley, M.J., Bohmdorfer, G. & Wierzbicki, A.T.A. SWI/SNF chromatin-remodeling complex acts in noncoding RNA–mediated transcriptional silencing. Mol. Cell 49, 298309 (2013).
  35. Siprashvili, Z. et al. Identification of proteins binding coding and non-coding human RNAs using protein microarrays. BMC Genomics 13, 633 (2012).
  36. Chendrimada, T.P. et al. TRBP recruits the Dicer complex to Ago2 for microRNA processing and gene silencing. Nature 436, 740744 (2005).
  37. Li, J. et al. Identifying mRNA sequence elements for target recognition by human Argonaute proteins. Genome Res. 24, 775785 (2014).
  38. Melton, C., Judson, R.L. & Blelloch, R. Opposing microRNA families regulate self-renewal in mouse embryonic stem cells. Nature 463, 621626 (2010).
  39. Worringer, K.A. et al. The let-7/LIN-41 pathway regulates reprogramming to human induced pluripotent stem cells by controlling expression of prodifferentiation genes. Cell Stem Cell 14, 4052 (2014).
  40. Ohms, S., Lee, S.H. & Rangasamy, D. LINE-1 retrotransposons and let-7 miRNA: partners in the pathogenesis of cancer? Front. Genet. 5, 338 (2014).
  41. Ohms, S. & Rangasamy, D. Silencing of LINE-1 retrotransposons contributes to variation in small noncoding RNA expression in human cancer cells. Oncotarget 5, 41034117 (2014).
  42. Hagan, J.P., Piskounova, E. & Gregory, R.I. Lin28 recruits the TUTase Zcchc11 to inhibit let-7 maturation in mouse embryonic stem cells. Nat. Struct. Mol. Biol. 16, 10211025 (2009).
  43. Heo, I. et al. Lin28 mediates the terminal uridylation of let-7 precursor microRNA. Mol. Cell 32, 276284 (2008).
  44. Wang, Y., Medvid, R., Melton, C., Jaenisch, R. & Blelloch, R. DGCR8 is essential for microRNA biogenesis and silencing of embryonic stem cell self-renewal. Nat. Genet. 39, 380385 (2007).
  45. Hockemeyer, D. et al. A drug-inducible system for direct reprogramming of human somatic cells to pluripotency. Cell Stem Cell 3, 346353 (2008).
  46. Karginov, F.V. et al. A biochemical approach to identifying microRNA targets. Proc. Natl. Acad. Sci. USA 104, 1929119296 (2007).
  47. Giordano, J. et al. Evolutionary history of mammalian transposons determined by genome-wide defragmentation. PLoS Comput. Biol. 3, e137 (2007).
  48. Tanaka, Y., Chung, L. & Park, I.H. Impact of retrotransposons in pluripotent stem cells. Mol. Cells 34, 509516 (2012).
  49. Holdt, L.M. et al. Alu elements in ANRIL non-coding RNA at chromosome 9p21 modulate atherogenic cell functions through trans-regulation of gene networks. PLoS Genet. 9, e1003588 (2013).
  50. Gong, C. & Maquat, L.E. lncRNAs transactivate STAU1-mediated mRNA decay by duplexing with 3′ UTRs via Alu elements. Nature 470, 284288 (2011).
  51. Spengler, R.M., Oakley, C.K. & Davidson, B.L. Functional microRNAs and target sites are created by lineage-specific transposition. Hum. Mol. Genet. 23, 17831793 (2014).
  52. Lazzeroni, L. & Owen, A. Plaid models for gene expression data. Stat. Sin. 12, 6186 (2002).
  53. Bengtsson, M., Stahlberg, A., Rorsman, P. & Kubista, M. Gene expression profiling in single cells from the pancreatic islets of Langerhans reveals lognormal distribution of mRNA levels. Genome Res. 15, 13881392 (2005).
  54. Pang, Z.P. et al. Induction of human neuronal cells by defined transcription factors. Nature 476, 220223 (2011).
  55. Goff, L.A. et al. Ago2 immunoprecipitation identifies predicted microRNAs in human embryonic stem cells and neural precursors. PLoS One 4, e7192 (2009).
  56. Raj, A., van den Bogaard, P., Rifkin, S.A, van Oudenaarden, A. & Tyagi, S. Imaging individual mRNA molecules using multiple singly labeled probes. Nat. Methods 5, 877879 (2008).

Download references

Author information

  1. Present address: Department of Cell Biology and Neurosciences, Montana State University, Bozeman, Montana, USA.

    • Renee A Reijo Pera
  2. These authors contributed equally to this work.

    • Jens Durruthy-Durruthy &
    • Vittorio Sebastiano

Affiliations

  1. Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, California, USA.

    • Jens Durruthy-Durruthy,
    • Vittorio Sebastiano,
    • Mark Wossidlo,
    • Diana Cepeda,
    • Jun Cui &
    • Renee A Reijo Pera
  2. Department of Genetics, Stanford University, Stanford, California, USA.

    • Jens Durruthy-Durruthy,
    • Vittorio Sebastiano,
    • Mark Wossidlo,
    • Diana Cepeda,
    • Jun Cui,
    • Edward J Grow &
    • Renee A Reijo Pera
  3. Department of Obstetrics and Gynecology, Stanford University, Stanford, California, USA.

    • Jens Durruthy-Durruthy,
    • Vittorio Sebastiano,
    • Mark Wossidlo,
    • Diana Cepeda,
    • Jun Cui &
    • Renee A Reijo Pera
  4. Department of Pathology, Stanford University, Stanford, California, USA.

    • Jonathan Davila &
    • Moritz Mall
  5. Department of Statistics, Stanford University, Stanford, California, USA.

    • Wing H Wong &
    • Kin Fai Au
  6. Department of Chemical and Systems Biology, Stanford University, Stanford, California, USA.

    • Joanna Wysocka
  7. Department of Developmental Biology, Stanford University, Stanford, California, USA.

    • Joanna Wysocka

Contributions

J.D.-D., V.S., W.H.W., K.F.A. and R.A.R.P. conceived the project, designed experiments and wrote the manuscript, with input from all authors. J.D.-D., V.S. and D.C. performed siRNA knockdown experiments. J.C. designed and tested CRISPR constructs. M.W. conducted the human embryo experiments. E.J.G. performed ChIP experiments. J.D. and M.M. performed RIP and immunoblot experiments. J.W. helped with manuscript writing.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: All 23 HPATs are significantly enriched for TEs. (475 KB)

    (a) The different classes of TEs are color-coded; corresponding colors are used in be. (b,c) Coverage of different TE classes on the genome (exons + introns) and transcript (only exons) levels. The percentage of total length for each TE is represented for all 23 HPATs. Three control genes are included. (d) The 23 HPATs with embedded TEs and genomic length. Displayed are the most highly expressed isoforms for each HPAT gene. Genomic DNA is represented as a black line, and exons are represented by gray boxes. TEs are represented by the colored boxes underneath. Each exon is exonized with TEs (exons overlap TEs). The length of each genomic locus is not to scale.

  2. Supplementary Figure 2: Molecular and functional analysis of HPAT2, HPAT3 and HPAT5 during preimplantation development. (296 KB)

    (a) Magnified view of the ICM in human blastocyst demonstrates a specific staining pattern in the ICM of human blastocysts. Stars depict HPAT3 signal. Arrows depict HPAT5 signal. (b) HPAT2, HPAT3 and HPAT5 are significantly downregulated in human blastocysts injected with siRNAs compared to siScrambled controls (n = 3 blastocysts; data are shown with s.e.m.). (c) Blastomeres with knockdown of HPAT2, HPAT3 and HPAT5 during human embryo development did not contribute to ICM. The presence of ICM was validated with OCT4 and SOX2 staining. The ICM is highlighted by a yellow dashed circle. (d) Fluorescence-positive ICM in blastocysts with HPAT knockdown and control blastocysts.

  3. Supplementary Figure 3: Primer validation and quality control of single-cell gene expression data. (578 KB)

    (a) Histological sections stained with hematoxylin and eosin from teratomas derived from established iPSCs (iPSCs that resulted from derivation from BJ fibroblasts are termed fully established iPSCs and were used as the last time point for collection (see b). (b) Tracking of morphological changes of BJ fibroblasts during mRNA reprogramming with the Yamanaka factors. Depicted are the days at which cells were collected. Fibroblasts transfected with GFP only for five consecutive days are shown as well with GFP signal (images are representative). (c) Representative example of a dilution series for all 96 assays. Ct values were plotted as a function of the dilution factors (1:2) on a log scale. Linear regression analysis is depicted by the red line. Eight assays with R2 <0.97 were excluded, thus leaving 88 assays. (d) Distribution histogram of calculated primer efficiencies for 88 DELTAgene assays estimated from the slopes of standard curve plots. The average efficiency is 1.02 with s.d. = 0.06. (e) Quantile-quantile plot with experimentally estimated efficiencies (y axis) and the values expected for a normal distribution with mean efficiency = 1.02 and s.d. = 0.06 (x axis). The black line indicates the values expected for a normal distribution (y = x). Efficiency values that were derived from plots with three points in the standard curve are depicted in blue. Values derived from plots with >3 points in the standard curve are depicted in red. (f) Microscopic view of two capture sides on the C1 Single-Cell Auto Prep System microfluidic chip. The left capture side has no cell, and the right capture side has one captured cell (red arrow). (g) Representative example of primer specificity evaluation using melting curve analysis (here with HPAT2). The graph shows the relative change in fluorescence signal (EvaGreen) over the temperature range for all 96 cells on a single array. The area in red depicts false positive signals with incorrect melting curve temperatures (determined with bulk RNA and based on the melting curve temperature provided by Fluidigm). The area in green depicts the correct melting curves. Data outside the correct melting curve were set to 0. (h) Correlation analysis of mean Ct values generated 96 cells of three dynamic IFC arrays (single cells of (i) BJ fibroblasts, (ii) BJ fibroblasts transfected with mRNA encoding GFP for 2 d, (iii) BJ fibroblasts transfected with mRNA encoding GFP for 5 d). Genes that were detected in at least 20% of the 96 cells for each dynamic IFC array are considered. Shown are all three comparisons. Outliers are shown in green (GFP) and red. The assays in red (total of six) were excluded from subsequent analysis due to a non-correlative pattern among the arrays, leaving the 82 assays that are listed in Supplementary Table 2. i, Schematic overview of the quality assessment before normalization of single-cell gene expression. Nine dynamic IFC arrays (96.96 Fluidigm chips) were used for gene expression analysis. Two GFP control chips along with one fibroblast chip were used for correlation analysis (h) followed by initial quality assessment. Processed chips were used for a second round of quality assessment, resulting in 578 normalized single cells.

  4. Supplementary Figure 4: Single-cell gene expression analysis during nuclear reprogramming and reactivation of HPAT expression during in vitro transdifferentiation from fibroblasts into neurons. (679 KB)

    (ac) Heat map plot of single-cell gene expression of different markers during nuclear reprogramming. Single cells are in rows. Genes are in columns. Fibroblast markers decrease over time, and pluripotency-specific markers, including selected HPATs, increase over time as cell progress toward iPSCs (n = 87, 85, 72, 70, 86, 83 and 95 for fibroblasts, day 2, day 5, day 7, day 10, day 12 and iPSCs, respectively). Normalization was performed accordingly (the Supplementary Note provides details). White color indicates no expression. (d) PCA of 578 single cells collected at different time points during nuclear reprogramming. (e) Heat map and unsupervised clustering for 578 single-cell gene expression values resulted in clustering of novel genes implicating a similar biological context during reprogramming. Samples are color-coded according to the specific gene groups (horizontal) and the day at which single cells were collected (vertical). (f) The pluripotency marker POU5F1 (red) and HPAT2 (red; representative of all HPATs in this study) were exclusively expressed in H9 cells (hESCs) but not in (i) cDNA from colon, liver and lung (endoderm) and (ii) during neuronal transdifferentiation from fibroblasts (gray; samples collected at day 5 and day 30 are labeled iN-D5 and iN-D30, respectively) or cDNA from brain) (all ectoderm). EN2 and PAX6, included as ectoderm control markers, were detected during neuron differentiation and in brain samples (n = 3; data shown with s.e.m.). (g) Heat map of bicluster analysis illustrating a different bicluster within each plot (Supplementary Table 3). Three different algorithms for bicluster calculation were applied, resulting in the identification of five clusters, four clusters and 16 clusters.

  5. Supplementary Figure 5: Overexpression and silencing constructs for HPAT2, HPAT3 and HPAT5, NANOG ChIP-seq and regulation of HPAT5 during hESC differentiation. (388 KB)

    (a) Validation of siRNAs targeting HPAT2, HPAT3 and HPAT5, respectively, in hESCs. Gene expression and P values were measured relative to siGlo control 48 h after transfection (n = 9). Orange color depicts expected gene downregulation. (b) Validation of the overexpression vectors. BJ fibroblasts were transfected with HPAT2, HPAT3 and HPAT5. Gene expression and P values were measured 48 h after transfection relative to those in GFP-transfected fibroblasts (n = 9). Blue color depicts expected gene upregulation. (c) ChIP-qPCR analysis in H9 cells (hESCs) using NANOG. Signals were quantified using primer sets specific to a subset of HPATs or two ‘negative’ intergenic, non-repetitive regions. Two enhancers around SOX2 are included as positive controls (n = 3; data are shown with s.e.m.). (d) Three snapshots of the UCSC browser (genome location indicated) aligned with the NANOG-binding region for HPAT2, HPAT3 and HPAT5 from ChIP-seq analysis. (e,f) Overexpression constructs and validation of the HPAT5-OE and mCherry-OE lines. HPAT5 was significantly upregulated in hESC-OE cells compared to control cells. mCherry protein expression was also confirmed. n = 3; data are sgiwb with s.e.m. (g) Increase in differentiation markers representing all three germ layers significantly repressed in HPAT5-OE cells. P values are calculated for comparison of the mCherry and HPAT5-OE lines on the same days.

  6. Supplementary Figure 6: Protein microarray with HPAT2, HPAT3 and HPAT5. (319 KB)

    (a) Formaldehyde agarose RNA gel of the Cy5-labeled lincRNAs before hybridization to the protein array. (b) Representative image of a ProtoArray and fluorescence intensity for HPAT2 and HPAT3 (positive) and HPAT5 (negative) on OCT4 protein in duplicate. (c) Heat map of HPAT2-, HPAT3- and HPAT5-binding proteins with RISC proteins and OCT4 highlighted (z score > 2.5). (d) Total number of candidate proteins identified with the three HPATs (with and without common RNA-binding proteins). (e) Validation of the findings by Lu et al. that HERV-H–derived lincRNAs (HPAT2 and HPAT3) bind to specific OCT4, coactivators and mediators.

  7. Supplementary Figure 7: Loss-of-function analyses in hESCs. (395 KB)

    (a) Predicted let-7 binding sites in HPAT5 transcript. Shown is HPAT5 with embedded TEs along the genomic length (black line). Exons are shown as gray boxes. TEs are shown as colored boxes underneath. let-7 binding sites are within a SINE element (Alu). Bases in red are point mutations and confer HPAT5 specificity. (b) Gene expression analysis of endogenous pre-let-7 and mature let-7 in fibroblasts. n = 3; data are shown with s.e.m. (c) Schematic overview of the HPAT5 locus in genomic DNA from subcloned hESCs that were treated with CRISPR pairs 2 and 5 (gRNA2/5). Forward and reverse primers (in red) were designed to amplify a region of genomic DNA that is inside the deleted HPAT5 locus. (d) Agarose gel illustrating successful derivation of the HPAT5-knockout hESC line. Genomic DNA from hESCs (passage 4 after subcloning) did not result in specific amplification. The controls included negative control (treatment only with one CRISPR arm, gRNA2), wild-type hESCs and no-template control (NT). (e) Gene expression analysis of endogenous HPAT5 in hESCs. n = 3; data are shown with s.e.m. (f) Endogenous let-7 levels do not reach the levels in differentiated cells during 48 h of hESC differentiation. Endogenous let-7 levels are significantly increased 48 h after differentiation with bFGF removal (tenfold). The levels of endogenous let-7 are still significantly higher in human fibroblasts (100-fold) compared to differentiated hESCs. HPAT5 knockout increases endogenous let-7 levels to ones similar to those found in hESCs differentiated for 24 h. Overexpression of let-7 in hESCs results in a -50-fold increase compared to human fibroblasts. let-7 levels were normalized to Hs-RNU6-2. n = 3; data are shown with s.e.m. (g) Differentiation of hESCs into secondary fibroblasts followed by episomal reprogramming into iPSCs. (h) Percentage of AP- and TRA-1-81–positive cells in HPAT5-WT and HPAT5-KO cells 25 d after reprogramming. n = 3; data are shown with s.e.m. (i) Endogenous let-7 and HPAT5 levels during nuclear reprogramming at day 10. n = 3; data are shwon with s.e.m.

  8. Supplementary Figure 8: HPAT5 regulates let-7 in hESCs during differentiation. (402 KB)

    (a) Heat map of differentially expressed genes (P < 0.05) after let-7 overexpression in four different samples). (bd) Enrichment of let-7 seed sits in transcripts that were downregulated in hESC-HPAT5-KO cells. Overexpression from HPAT5-WT transcript rescued let-7–mediated differentiation. The Word cluster plot shows sequences in genes ranked by differential expression, after let-7 transfection. Each dot represents a word, summarizing z scores, and enrichment specificity indices of the enrichment profiles of negatively correlated 6- and 7-mer words. Triangles annotate known seed sites of human miRNAs. (i) A zoomed-in view (top) from the cluster plot. (e) Endogenous HPAT2, HPAT3 and HPAT5 expression in hESCs with let-7 overexpressed. n = 3; data are shown with s.e.m. (f) Immunoblot confirming specific AGO2 pulldown. OE, overexpression. n = 3 samples; data are shown with s.e.m.

PDF files

  1. Supplementary Text and Figures (14,558 KB)

    Supplementary Figures 1–8, Supplementary Tables 1–4 and Supplementary Note.

Excel files

  1. Supplementary Table 5 (817 KB)

    ChIP analysis with NANOG.

  2. Supplementary Table 6 (16,940 KB)

    Protein microarray analysis.

  3. Supplementary Table 7 (45,198 KB)

    Prediction of miRNA binding sites.

  4. Supplementary Table 8 (20 KB)

    Microarray and cWords analyses.

Additional data