Somatic cell reprogramming to a pluripotent state continues to challenge many of our assumptions about cellular specification, and despite major efforts, we lack a complete molecular characterization of the reprograming process. To address this gap in knowledge, we generated extensive transcriptomic, epigenomic and proteomic data sets describing the reprogramming routes leading from mouse embryonic fibroblasts to induced pluripotency. Through integrative analysis, we reveal that cells transition through distinct gene expression and epigenetic signatures and bifurcate towards reprogramming transgene-dependent and -independent stable pluripotent states. Early transcriptional events, driven by high levels of reprogramming transcription factor expression, are associated with widespread loss of histone H3 lysine 27 (H3K27me3) trimethylation, representing a general opening of the chromatin state. Maintenance of high transgene levels leads to re-acquisition of H3K27me3 and a stable pluripotent state that is alternative to the embryonic stem cell (ESC)-like fate. Lowering transgene levels at an intermediate phase, however, guides the process to the acquisition of ESC-like chromatin and DNA methylation signature. Our data provide a comprehensive molecular description of the reprogramming routes and is accessible through the Project Grandiose portal at http://www.stemformatics.org.
This is a preview of subscription content
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
European Nucleotide Archive
Sequence Read Archive
Sequencing data have been deposited in the NCBI Sequence Read Archive (SRA) under accession number SRP046744 for all RNA-seq and ChIP-seq experiments, and in the European Bioinformatics Institute under the European Nucleotide Archive (ENA) accession number ERP004116 for MethylC-sequencing. The global and cell surface mass spectrometry proteomics raw data have been deposited in the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository under data set identifiers PXD000413 and PXD001456, respectively.
Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–676 (2006)
Mikkelsen, T. S. et al. Dissecting direct reprogramming through integrative genomic analysis. Nature 454, 49–55 (2008)
Graf, T. & Enver, T. Forcing cells to change lineages. Nature 462, 587–594 (2009)
Tonge, P. D. et al. Divergent reprogramming routes lead to alternative stem-cell states. Nature http://dx.doi.org/10.1038/nature14047 (this issue)
Samavarchi-Tehrani, P. et al. Functional genomics reveals a BMP-driven mesenchymal-to-epithelial transition in the initiation of somatic cell reprogramming. Cell Stem Cell 7, 64–77 (2010)
Polo, J. M. et al. A molecular roadmap of reprogramming somatic cells into iPS cells. Cell 151, 1617–1632 (2012)
Golipour, A. et al. A late transition in somatic cell reprogramming requires regulators distinct from the pluripotency network. Stem Cells 11, 769–782 (2012)
O’Malley, J. et al. High-resolution analysis with novel cell-surface markers identifies routes to iPS cells. Nature 499, 88–91 (2013)
Nagy, A. Secondary cell reprogramming systems: as years go by. Curr. Opin. Genet. Dev. 23, 534–539 (2013)
Woltjen, K. et al. piggyBac transposition reprograms fibroblasts to induced pluripotent stem cells. Nature 458, 766–770 (2009)
Buganim, Y. et al. Single-cell expression analyses during cellular reprogramming reveal an early stochastic and a late hierarchic phase. Cell 150, 1209–1222 (2012)
Belteki, G. et al. Conditional and inducible transgene expression in mice through the combinatorial use of Cre-mediated recombination and tetracycline induction. Nucleic Acids Res. 33, e51 (2005)
Wells, C. A. et al. Stemformatics: visualisation and sharing of stem cell gene expression. Stem Cell Res. 10, 387–395 (2013)
Clancy, J. L. et al. Small RNA changes en route to distinct cellular states of induced pluripotency. Nature Commun. http://dx.doi.org/10.1038/ncomms6522 (2014)
Benevento, M. et al. Proteome adaptation in cell reprogramming proceeds via distinct transcriptional networks. Nature Commun. http://dx.doi.org/10.1038/ncomms6613 (2014)
Polo, J. M. et al. Cell type of origin influences the molecular and functional properties of mouse induced pluripotent stem cells. Nature Biotechnol. 28, 848–855 (2010)
Ohi, Y. et al. Incomplete DNA methylation underlies a transcriptional memory of somatic cells in human iPS cells. Nature Cell Biol. 13, 541–549 (2011)
Schug, J. et al. Promoter features related to tissue specificity as measured by Shannon entropy. Genome Biol. 6, R33 (2005)
Li, R. et al. A mesenchymal-to-epithelial transition initiates and is required for the nuclear reprogramming of mouse fibroblasts. Cell Stem Cell 7, 51–63 (2010)
Kojima, Y. et al. The transcriptional and functional properties of mouse epiblast stem cells resemble the anterior primitive streak. Cell Stem Cell 14, 107–120 (2014)
Li, B., Carey, M. & Workman, J. L. The role of chromatin during transcription. Cell 128, 707–719 (2007)
Simon, J. A. & Kingston, R. E. Occupying chromatin: polycomb mechanisms for getting to genomic targets, stopping transcriptional traffic, and staying put. Mol. Cell 49, 808–824 (2013)
Mansour, A. A. et al. The H3K27 demethylase Utx regulates somatic and germ cell epigenetic reprogramming. Nature 488, 409–413 (2012)
Pereira, C. F. et al. ESCs require PRC2 to direct the successful reprogramming of differentiated cells toward pluripotency. Cell Stem Cell 6, 547–556 (2010)
Wong, J. J.-L. et al. Orchestrated intron retention regulates normal granulocyte differentiation. Cell 154, 583–595 (2013)
Fadloun, A. et al. Chromatin signatures and retrotransposon profiling in mouse embryos reveal regulation of LINE-1 by RNA. Nature Struct. Mol. Biol. 20, 332–338 (2013)
Tang, S.-J. Chromatin organization by repetitive elements (CORE): a genomic principle for the higher-order structure of chromosomes. Genes 2, 502–515 (2011)
Lunyak, V. V. et al. Developmentally regulated activation of a SINE B2 repeat as a domain boundary in organogenesis. Science 317, 248–251 (2007)
Rebollo, R., Romanish, M. T. & Mager, D. L. Transposable elements: an abundant and natural source of regulatory sequences for host genes. Annu. Rev. Genet. 46, 21–42 (2012)
Bernstein, B. E. et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125, 315–326 (2006)
Mikkelsen, T. S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007)
Jørgensen, H. F. et al. Stem cells primed for action: polycomb repressive complexes restrain the expression of lineage-specific regulators in embryonic stem cells. Cell Cycle 5, 1411–1414 (2006)
Voigt, P. et al. Asymmetrically modified nucleosomes. Cell 151, 181–193 (2012)
Schmitges, F. W. et al. Histone methylation by PRC2 is inhibited by active chromatin marks. Mol. Cell 42, 330–341 (2011)
Yuan, W. et al. H3K36 methylation antagonizes PRC2-mediated H3K27 methylation. J. Biol. Chem. 286, 7983–7989 (2011)
Voigt, P., Tee, W. W. & Reinberg, D. A double take on bivalent promoters. Genes Dev. 27, 1318–1338 (2013)
Lee, D.-S. et al. DNA methylation as a reprogramming modulator: an epigenomic roadmap to induced pluripotency. Nature Commun. http://dx.doi.org/10.1038/ncomms6619 (2014)
Guttman, M. et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nature Biotechnol. 28, 503–510 (2010)
Cabili, M. N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011)
Khalil, A. M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl Acad. Sci. USA 106, 11667–11672 (2009)
Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009)
Guttman, M. et al. lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature 477, 295–300 (2011)
Behringer, R. R., Gertsenstein, M., Nagy-Vintersten, K. & Nagy, A. Manipulating the Mouse Embryo: A Laboratory Manual (Cold Spring Harbor Laboratory Press, 2013)
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010)
Kong, L. et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35, W345–W349 (2007)
Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012)
Xie, W. et al. Epigenomic analysis of multilineage differentiation of human embryonic stem cells. Cell 153, 1134–1148 (2013)
Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010)
O’Geen, H., Echipare, L. & Farnham, P. J. in Epigenetics Protocols 791, 265–286 (Humana, 2011)
Gaspar-Maia, A. et al. Chd1 regulates open chromatin and pluripotency of embryonic stem cells. Nature 460, 863–868 (2009)
Wang, T. et al. The histone demethylases Jhdm1a/1b enhance somatic cell reprogramming in a vitamin-C-dependent manner. Cell Stem Cell 9, 575–587 (2011)
Roberts, A., Trapnell, C., Donaghey, J., Rinn, J. L. & Pachter, L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 12, R22 (2011)
Feng, J., Liu, T., Qin, B., Zhang, Y. & Liu, X. S. Identifying ChIP-seq enrichment using MACS. Nature Protocols 7, 1728–1740 (2012)
Hawkins, R. D. et al. Distinct epigenomic landscapes of pluripotent and lineage-committed human cells. Cell Stem Cell 6, 479–491 (2010)
Shen, L., Shao, N., Liu, X. & Nestler, E. ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases. BMC Genomics 15, 284 (2014)
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011)
Gauci, S. et al. Lys-N and trypsin cover complementary parts of the phosphoproteome in a refined SCX-based approach. Anal. Chem. 81, 4493–4501 (2009)
Wollscheid, B. et al. Mass-spectrometric identification and relative quantification of N-linked cell surface glycoproteins. Nature Biotechnol. 27, 378–386 (2009)
Kislinger, T. et al. PRISM, a generic large scale proteomic investigation strategy for mammals. Mol. Cell. Proteomics 2, 96–106 (2003)
We thank M. Gertsenstein and M. Pereira for chimaera production, C. Monetti for cell culture, R. Cowling for DNA purification, and K. Harpal for chimaera embryo sectioning and staining. We acknowledge the intellectual contributions of P. P. L. Tam and R. P. Harvey. A.N. is Tier 1 Canada Research Chair in Stem Cells and Regeneration. This work was supported by grants awarded to A.N., I.M.R. and P.W.Z. from the Ontario Research Fund Global Leadership Round in Genomics and Life Sciences grants (GL2-01-028), to A.N. from the Canadian stem cell network (9/5254 (TR3)) and from the Canadian Institutes of Health Research (CIHR MOP102575). This work received support from the Korean Ministry of Knowledge Economy (grant 10037410 to J.-S.S.), from the SNUCM Research Fund (grant 0411-20100074 to J.-S.S.), and from Macrogen Inc. (grant MGR03-11 and MGR03-12). The Stemformatics resource is supported by an Australian Research Council special research grant to Stem Cells Australia (C.A.W. and S.M.G.). The analysis of the miRNA was supported by grants from the National Health and Medical Research Council of Australia (1024852 to J.L.C. and T.P.) and the Australian Research Council (DP1300101928 to T.P.). W.R. is a Cancer Institute of NSW Fellow and with J.E.J.R. receives support from the Cancer Council of NSW and National Health & Medical Research Council (571156 and 1061906). J.E.J.R. receives funding from Cure the Future & Tour de Cure. K.-A.L.C. is supported, in part, by the Wound Management Innovation CRC (established and supported under the Australian Government’s Cooperative Research Centres Program). S.M.G. received support from the Australian Research Council (SR110001002). C.A.W. is a QLD Smart Futures Fellow. M.B., J.M. and A.J.R.H. are supported by the Netherlands Proteomics Centre, and by the European Community’s Seventh Framework Programme (FP7/2007-2013) by the PRIME-XS project grant agreement number 262067. P.W.Z. is the Canada Research Chair in Stem Cell Bioengineering. S.M.I.H. received a fellowship from the McEwen Centre of Regenerative Medicine.
The authors declare no competing financial interests.
Extended data figures and tables
a, Frequency of doxycycline-independent pluripotent cells obtained when 1B secondary MEFs were reprogrammed in 1,500 ng ml−1 doxycycline until the indicated day. b, Morphology of cells at day 15 after lowering the doxycycline concentration from 1,500 ng ml−1 to levels as indicated on day 8 of reprogramming. c, Clonal efficiency measurement at day 15 of reprogramming after lowering the doxycycline concentration on day 8 to the level indicated. d, e, 1B secondary iPSCs show widespread contribution to all germ layers of chimaeric embryos. Whole-mount view (d) and transverse section of E10.5 diploid chimaera (e). Embryo is representative of n = 6 chimaeric embryos with strong (>75%) iPSC donor cell contribution. h, heart; hg, hindgut; nt, neural tube. Scale bars, 750 μm (d) and 400 µm (e). f, RNA-seq analysis of transgene and endogenous expression levels during reprogramming. CPM, counts per million.
Read coverage histograms representing gene expression and epigenetic status at the genomic loci of selected ESC-associated genes.
Extended Data Figure 3 Hierarchical clustering and principal component analysis (PCA) for multi-omics analyses.
a, Pearson correlation complete linkage hierarchical clustering of long RNA-seq data set. Colour coding indicates the grouping of samples based on clustering. b–d, PCA performed on each platform (10 neighbours for k-value nearest neighbour (KNN) imputation). Short RNA-seq platform PCA was performed on miRNAs (b). Long RNA-seq platform PCA was performed on protein-coding transcripts (b). Cell surface proteome PCA represents proteins detected by cell surface focused mass spectrometry analysis (b). c, PCA of global CpG methylation analysis. Red arrow follows the high-doxycycline sample trajectory; black dashed arrow follows D8H through low-doxycycline trajectory. Low-doxycycline samples D21L and D21 are highlighted in blue to indicate that compared to other platforms they do not project with ESC/iPSC (see text for further details). d, H3K4me3, H3K36me3 and H3K27me3 PCAs represent genome-wide enriched regions at annotated genes.
Extended Data Figure 4 Integration of gene expression data from 1B reprogramming and other transcriptome data sets.
a, Distribution of the entropy score of protein-coding gene expression for individual samples (blue) and sample groups (red) indicated as probability density curve. b, Pearson correlation analysis of 1B secondary reprogramming sample protein-coding gene expression with transcriptomes of early embryonic stages and epiblast stem cells (EpiSCs) derived from a range of developmental stages20. c, Pearson correlation analysis of 1B secondary reprogramming sample protein-coding gene expression with transcriptome of sorted secondary reprogramming intermediates8. d, Expression of CD44 and Icam1 markers during 1B reprogramming. Error bars represent standard error of the mean. e, Pearson correlation analysis of 1B reprogramming sample protein-coding gene expression with sorted reprogramming and pluripotent cells from the Col1a1 primary reprogramming system6.
Extended Data Figure 5 Effect of Oct4, Sox2, Klf4 and Myc expression level on reprogramming outcomes.
a, Pearson correlation analysis of RNA-seq data from 1B reprogramming samples and reprogramming clones from ref. 7 that are competent or incompetent to become factor-independent secondary iPSC (SC and SI clones, respectively). b, Transgene and endogenous gene expression determined by RNA-seq for Myc, Pou5f1 (Oct4), Sox2 and Klf4 in SC and SI clones7. Bar graphs represent average expression of doxycycline-treated samples or SC iPSCs. Error bars represent standard error of the mean. Student’s t-test was used for statistics. c, PCA of protein-coding stage-specific genes from Fig. 2c, comparing 1B reprogramming samples and secondary reprogramming clones from ref. 7. F-class cells cluster separately from SI and SC clones. Moreover, 1B reprogramming follows a different trajectory than SI and SC clones towards iPSCs. Colour coding indicates the grouping of samples. d, Pearson correlation complete linkage hierarchical clustering of 1B reprogramming samples and SI and SC secondary reprogramming clones. Clustering was performed on protein-coding stage-specific genes and based on FPKM values normalized to the averaged ESC/iPSCs values from the respective study. Heat maps show stage-specific protein-coding gene expression belonging to iPSC/ESC (top heat map) and F-class (bottom heat map) genes. Clusters and genes on the right of each heat map highlight genes that show a different expression pattern between F-class and doxycycline-treated SI clones. For gene lists associated with d, refer to Supplementary Table 1.
Extended Data Figure 6 Global analysis of histone mark and intron retention changes during reprogramming.
a, Intensity plots of genes associated with H3K4me3 (green) and H3K27me3 (red) ±10 kb of annotated TSSs. b, Heat map representation of PRC2 components and histone demethylase expression at the RNA (RNA-seq) and protein level. c, Correlation of gene transcription with protein and intron retention for genes that exhibit intron retention from Fig. 2c. d, Correlation of intron retention, RNA expression and protein level for Kdm6a. e, Violin plots comparing observed and random Pearson correlations of intron retention versus gene FPKM at reprogramming stages. Bars represent average Pearson correlation coefficients. Error bars represent standard error of the mean. Student’s t-test was used for statistics. f, Number of expressed transposable elements during reprogramming.
Extended Data Figure 7 Tracking secondary MEF histone mark changes during reprogramming from one sample to another.
a, Pie-chart diagram tracking the histone mark changes using secondary MEF and secondary iPSCs as reference points. Each histone mark is colour coded: H3K4me3, green; H3K4me3H3K27me3, orange; H3K27me3, red; no mark, grey. Loci were tracked from their start (2°MEF) and end (2°iPSCs) histone signatures. b–g, Tracking bar graphs of histone mark changes. The histone mark change is shown at the top of each set of 12 histograms. Bars represent number of genes whose mark changed for the time point indicated at the top of the individual histogram, and which of these genes carry the same mark at the other time points (x axis). For example, in b ‘2°MEF (H3K4me3/H3K27me3→H3K4me3)’ the histogram shows the number of genes that were bivalent in secondary MEFs but changed to H3K4me3 monovalent at another time point. In the case of the small histogram labelled D2H, the black-framed green bar represents the number of loci that showed this change from secondary MEFs at D2H and the bars for all the other samples indicate how many of these D2H loci were also H3K4me3+ in the other samples.
Extended Data Figure 8 Determining expression threshold for defining bivalent loci and bivalency in other reprogramming systems.
a, RNA-seq expression value (log2 of FPKM) distribution (as represented by density curves) of four categories of genes: (1) genes marked by H3K4me3 and H3K36me3 (blue line); (2) genes marked by H3K4me3 alone (green line); (3) genes marked by H3K27me3 alone (red line); and (4) genes marked by H3K4me3 and H3K27me3, but not H3K36me3 (orange line). Genes were combined from all the samples to identify each category. Expression threshold was defined as the 10th percentile expression boundary of genes marked by H3K4me3 and H3K36me3. Genes that were expressed at lower levels than this threshold were considered not expressed in subsequent analyses. b, Assessment of cellular heterogeneity in 1B reprogramming by chromatin mark and expression association of two cell surface markers: CD24 and CD73. Upper scatter plots show H3K27me3 versus H3K36me3 enrichment in individual samples. Lower plot shows percentage of cells expressing each marker for same samples as determined by FACs analysis. Active locus: H3K4me3+H3K36me3+H3K27me3−. Heterogeneous locus: H3K4me3+H3K36me3+H3K27me3+. c, Absolute number (primary y axis) and proportion (secondary y axis) of false (heterogeneous) bivalent loci during secondary reprogramming. the presence of H3K36me3 distinguishes false bivalent loci (H3K4me3+H3K27me3+H3K36me3+) that represent heterogeneity from true bivalent loci that are transcriptionally repressed (H3K36me3−). d, Tracking of histone mark status of secondary MEF heterogeneous loci. Heterogeneous loci resolve into silent and active loci during reprogramming. e, Total number of detected bivalent loci as defined by lack of H3K36me3 mark and expression levels below the threshold as shown in panel a. Dark and light green bar graphs highlight proportion shared among all samples and with secondary MEFs, respectively. f, Sequential addition of novel bivalent marks with respect to stages of reprogramming, as indicated by colours. g, h, Corresponding bivalent loci identified in 1B samples and two independent data sets6,31. i, Tracking of bivalent loci for Polo et al. reprogramming system6. For gene lists related to e, refer to Supplementary Table 2.
a, Determination of expression threshold for lncRNA genes using H3K4me3 and H3K36me3 chromatin mark. b, Distribution of the entropy of non-coding gene expression for individual samples (blue) and sample groups (red) indicated as probability density curve. c, Percentage of unannotated transcripts with listed genomic features. d, Analysis of unannotated lncRNA transcripts for coding potential using coding potential calculator (CPC). (See Supplementary Information for details.) e, RNA and protein expression profiles of three novel coding transcripts.
Extended Data Figure 10 Comparison of lncRNA expression in 1B secondary reprogramming and other reprogramming systems.
a, Pearson correlation analysis of differentially expressed un-annotated RNA transcripts for 1B reprogramming samples and secondary reprogramming clones that are competent or incompetent to become factor-independent secondary iPSCs (SC and SI clones, respectively)7. b, Pearson correlation analysis of differentially expressed unannotated RNA transcripts for 1B reprogramming samples and sorted reprogramming intermediates from ref. 8. c, Heat map of differentially expressed novel RNAs from 1B reprogramming samples with secondary reprogramming clones that are competent or incompetent to become factor-independent secondary iPSCs (SC and SI clones, respectively)7. For gene lists related to c, refer to Supplementary Table 4. d, Read coverage histograms representing gene expression and epigenetic status of unannotated lncRNAs observed in F-class (D16H) and ESC-like state (secondary iPSCs). e, GO analysis results for genes downregulated in F-class state (FDR <1%), but unchanged in ESC-like state, from D8H (combined groups 3, 6 and 9). f, GO analysis results for genes upregulated in ESC-like state (FDR <1%), but unchanged in F-class state, from D8H (combined groups 1b, 4b and 7b). For gene lists, full GO term analyses and P values associated with e, f refer to Supplementary Table 5.
About this article
Cite this article
Hussein, S., Puri, M., Tonge, P. et al. Genome-wide characterization of the routes to pluripotency. Nature 516, 198–206 (2014). https://doi.org/10.1038/nature14046
BMC Biology (2022)
Human Genetics (2022)
Porcine Primordial Germ Cell-Like Cells Generated from Induced Pluripotent Stem Cells Under Different Culture Conditions
Stem Cell Reviews and Reports (2022)
Journal of Genetics (2021)
Cell Death & Disease (2020)