• A Corrigendum to this article was published on 17 June 2015

This article has been updated


Somatic cell reprogramming to a pluripotent state continues to challenge many of our assumptions about cellular specification, and despite major efforts, we lack a complete molecular characterization of the reprograming process. To address this gap in knowledge, we generated extensive transcriptomic, epigenomic and proteomic data sets describing the reprogramming routes leading from mouse embryonic fibroblasts to induced pluripotency. Through integrative analysis, we reveal that cells transition through distinct gene expression and epigenetic signatures and bifurcate towards reprogramming transgene-dependent and -independent stable pluripotent states. Early transcriptional events, driven by high levels of reprogramming transcription factor expression, are associated with widespread loss of histone H3 lysine 27 (H3K27me3) trimethylation, representing a general opening of the chromatin state. Maintenance of high transgene levels leads to re-acquisition of H3K27me3 and a stable pluripotent state that is alternative to the embryonic stem cell (ESC)-like fate. Lowering transgene levels at an intermediate phase, however, guides the process to the acquisition of ESC-like chromatin and DNA methylation signature. Our data provide a comprehensive molecular description of the reprogramming routes and is accessible through the Project Grandiose portal at http://www.stemformatics.org.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Change history

  • 10 December 2014

    A minor addition was made to the Acknowledgements in the HTML and PDF versions.


Primary accessions

European Nucleotide Archive

Sequence Read Archive

Data deposits

Sequencing data have been deposited in the NCBI Sequence Read Archive (SRA) under accession number SRP046744 for all RNA-seq and ChIP-seq experiments, and in the European Bioinformatics Institute under the European Nucleotide Archive (ENA) accession number ERP004116 for MethylC-sequencing. The global and cell surface mass spectrometry proteomics raw data have been deposited in the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository under data set identifiers PXD000413 and PXD001456, respectively.


  1. 1.

    & Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–676 (2006)

  2. 2.

    et al. Dissecting direct reprogramming through integrative genomic analysis. Nature 454, 49–55 (2008)

  3. 3.

    & Forcing cells to change lineages. Nature 462, 587–594 (2009)

  4. 4.

    et al. Divergent reprogramming routes lead to alternative stem-cell states. Nature (this issue)

  5. 5.

    et al. Functional genomics reveals a BMP-driven mesenchymal-to-epithelial transition in the initiation of somatic cell reprogramming. Cell Stem Cell 7, 64–77 (2010)

  6. 6.

    et al. A molecular roadmap of reprogramming somatic cells into iPS cells. Cell 151, 1617–1632 (2012)

  7. 7.

    et al. A late transition in somatic cell reprogramming requires regulators distinct from the pluripotency network. Stem Cells 11, 769–782 (2012)

  8. 8.

    et al. High-resolution analysis with novel cell-surface markers identifies routes to iPS cells. Nature 499, 88–91 (2013)

  9. 9.

    Secondary cell reprogramming systems: as years go by. Curr. Opin. Genet. Dev. 23, 534–539 (2013)

  10. 10.

    et al. piggyBac transposition reprograms fibroblasts to induced pluripotent stem cells. Nature 458, 766–770 (2009)

  11. 11.

    et al. Single-cell expression analyses during cellular reprogramming reveal an early stochastic and a late hierarchic phase. Cell 150, 1209–1222 (2012)

  12. 12.

    et al. Conditional and inducible transgene expression in mice through the combinatorial use of Cre-mediated recombination and tetracycline induction. Nucleic Acids Res. 33, e51 (2005)

  13. 13.

    et al. Stemformatics: visualisation and sharing of stem cell gene expression. Stem Cell Res. 10, 387–395 (2013)

  14. 14.

    et al. Small RNA changes en route to distinct cellular states of induced pluripotency. Nature Commun. (2014)

  15. 15.

    et al. Proteome adaptation in cell reprogramming proceeds via distinct transcriptional networks. Nature Commun. (2014)

  16. 16.

    et al. Cell type of origin influences the molecular and functional properties of mouse induced pluripotent stem cells. Nature Biotechnol. 28, 848–855 (2010)

  17. 17.

    et al. Incomplete DNA methylation underlies a transcriptional memory of somatic cells in human iPS cells. Nature Cell Biol. 13, 541–549 (2011)

  18. 18.

    et al. Promoter features related to tissue specificity as measured by Shannon entropy. Genome Biol. 6, R33 (2005)

  19. 19.

    et al. A mesenchymal-to-epithelial transition initiates and is required for the nuclear reprogramming of mouse fibroblasts. Cell Stem Cell 7, 51–63 (2010)

  20. 20.

    et al. The transcriptional and functional properties of mouse epiblast stem cells resemble the anterior primitive streak. Cell Stem Cell 14, 107–120 (2014)

  21. 21.

    , & The role of chromatin during transcription. Cell 128, 707–719 (2007)

  22. 22.

    & Occupying chromatin: polycomb mechanisms for getting to genomic targets, stopping transcriptional traffic, and staying put. Mol. Cell 49, 808–824 (2013)

  23. 23.

    et al. The H3K27 demethylase Utx regulates somatic and germ cell epigenetic reprogramming. Nature 488, 409–413 (2012)

  24. 24.

    et al. ESCs require PRC2 to direct the successful reprogramming of differentiated cells toward pluripotency. Cell Stem Cell 6, 547–556 (2010)

  25. 25.

    et al. Orchestrated intron retention regulates normal granulocyte differentiation. Cell 154, 583–595 (2013)

  26. 26.

    et al. Chromatin signatures and retrotransposon profiling in mouse embryos reveal regulation of LINE-1 by RNA. Nature Struct. Mol. Biol. 20, 332–338 (2013)

  27. 27.

    Chromatin organization by repetitive elements (CORE): a genomic principle for the higher-order structure of chromosomes. Genes 2, 502–515 (2011)

  28. 28.

    et al. Developmentally regulated activation of a SINE B2 repeat as a domain boundary in organogenesis. Science 317, 248–251 (2007)

  29. 29.

    , & Transposable elements: an abundant and natural source of regulatory sequences for host genes. Annu. Rev. Genet. 46, 21–42 (2012)

  30. 30.

    et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125, 315–326 (2006)

  31. 31.

    et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007)

  32. 32.

    et al. Stem cells primed for action: polycomb repressive complexes restrain the expression of lineage-specific regulators in embryonic stem cells. Cell Cycle 5, 1411–1414 (2006)

  33. 33.

    et al. Asymmetrically modified nucleosomes. Cell 151, 181–193 (2012)

  34. 34.

    et al. Histone methylation by PRC2 is inhibited by active chromatin marks. Mol. Cell 42, 330–341 (2011)

  35. 35.

    et al. H3K36 methylation antagonizes PRC2-mediated H3K27 methylation. J. Biol. Chem. 286, 7983–7989 (2011)

  36. 36.

    , & A double take on bivalent promoters. Genes Dev. 27, 1318–1338 (2013)

  37. 37.

    et al. DNA methylation as a reprogramming modulator: an epigenomic roadmap to induced pluripotency. Nature Commun. (2014)

  38. 38.

    et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nature Biotechnol. 28, 503–510 (2010)

  39. 39.

    et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011)

  40. 40.

    et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl Acad. Sci. USA 106, 11667–11672 (2009)

  41. 41.

    et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009)

  42. 42.

    et al. lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature 477, 295–300 (2011)

  43. 43.

    , , & Manipulating the Mouse Embryo: A Laboratory Manual (Cold Spring Harbor Laboratory Press, 2013)

  44. 44.

    , & edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010)

  45. 45.

    et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35, W345–W349 (2007)

  46. 46.

    , & Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012)

  47. 47.

    et al. Epigenomic analysis of multilineage differentiation of human embryonic stem cells. Cell 153, 1134–1148 (2013)

  48. 48.

    & A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010)

  49. 49.

    , & in Epigenetics Protocols 791, 265–286 (Humana, 2011)

  50. 50.

    et al. Chd1 regulates open chromatin and pluripotency of embryonic stem cells. Nature 460, 863–868 (2009)

  51. 51.

    et al. The histone demethylases Jhdm1a/1b enhance somatic cell reprogramming in a vitamin-C-dependent manner. Cell Stem Cell 9, 575–587 (2011)

  52. 52.

    , , , & Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 12, R22 (2011)

  53. 53.

    , , , & Identifying ChIP-seq enrichment using MACS. Nature Protocols 7, 1728–1740 (2012)

  54. 54.

    et al. Distinct epigenomic landscapes of pluripotent and lineage-committed human cells. Cell Stem Cell 6, 479–491 (2010)

  55. 55.

    , , & ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases. BMC Genomics 15, 284 (2014)

  56. 56.

    & Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011)

  57. 57.

    et al. Lys-N and trypsin cover complementary parts of the phosphoproteome in a refined SCX-based approach. Anal. Chem. 81, 4493–4501 (2009)

  58. 58.

    et al. Mass-spectrometric identification and relative quantification of N-linked cell surface glycoproteins. Nature Biotechnol. 27, 378–386 (2009)

  59. 59.

    et al. PRISM, a generic large scale proteomic investigation strategy for mammals. Mol. Cell. Proteomics 2, 96–106 (2003)

Download references


We thank M. Gertsenstein and M. Pereira for chimaera production, C. Monetti for cell culture, R. Cowling for DNA purification, and K. Harpal for chimaera embryo sectioning and staining. We acknowledge the intellectual contributions of P. P. L. Tam and R. P. Harvey. A.N. is Tier 1 Canada Research Chair in Stem Cells and Regeneration. This work was supported by grants awarded to A.N., I.M.R. and P.W.Z. from the Ontario Research Fund Global Leadership Round in Genomics and Life Sciences grants (GL2-01-028), to A.N. from the Canadian stem cell network (9/5254 (TR3)) and from the Canadian Institutes of Health Research (CIHR MOP102575). This work received support from the Korean Ministry of Knowledge Economy (grant 10037410 to J.-S.S.), from the SNUCM Research Fund (grant 0411-20100074 to J.-S.S.), and from Macrogen Inc. (grant MGR03-11 and MGR03-12). The Stemformatics resource is supported by an Australian Research Council special research grant to Stem Cells Australia (C.A.W. and S.M.G.). The analysis of the miRNA was supported by grants from the National Health and Medical Research Council of Australia (1024852 to J.L.C. and T.P.) and the Australian Research Council (DP1300101928 to T.P.). W.R. is a Cancer Institute of NSW Fellow and with J.E.J.R. receives support from the Cancer Council of NSW and National Health & Medical Research Council (571156 and 1061906). J.E.J.R. receives funding from Cure the Future & Tour de Cure. K.-A.L.C. is supported, in part, by the Wound Management Innovation CRC (established and supported under the Australian Government’s Cooperative Research Centres Program). S.M.G. received support from the Australian Research Council (SR110001002). C.A.W. is a QLD Smart Futures Fellow. M.B., J.M. and A.J.R.H. are supported by the Netherlands Proteomics Centre, and by the European Community’s Seventh Framework Programme (FP7/2007-2013) by the PRIME-XS project grant agreement number 262067. P.W.Z. is the Canada Research Chair in Stem Cell Bioengineering. S.M.I.H. received a fellowship from the McEwen Centre of Regenerative Medicine.

Author information

Author notes

    • Samer M. I. Hussein
    • , Mira C. Puri
    •  & Peter D. Tonge

    These authors contributed equally to this work.

    • Javier Munoz
    •  & Kim-Anh Lê Cao

    Present addresses: Proteomics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain (J.M.); The University of Queensland Diamantina Institute, Translational Research Institute, 37 Kent Street, Princess Alexandra Hospital, Brisbane, Queensland 4102, Australia (K.-A.L.C.).


  1. Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada

    • Samer M. I. Hussein
    • , Mira C. Puri
    • , Peter D. Tonge
    • , Andrew J. Corso
    • , Mira Li
    • , Ian M. Rogers
    •  & Andras Nagy
  2. Department of Medical Biophysics, University of Toronto, Toronto, Ontario M5T 3H7, Canada

    • Mira C. Puri
  3. Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands

    • Marco Benevento
    • , Javier Munoz
    •  & Albert J. R. Heck
  4. Netherlands Proteomics Centre, Padualaan 8, 3584CH Utrecht, The Netherlands

    • Marco Benevento
    • , Javier Munoz
    •  & Albert J. R. Heck
  5. Institute of Medical Science, University of Toronto, Toronto, Ontario M5T 3H7, Canada

    • Andrew J. Corso
    •  & Andras Nagy
  6. Genome Biology Department, The John Curtin School of Medical Research, The Australian National University, Acton (Canberra), ACT 2601, Australia

    • Jennifer L. Clancy
    • , Hardip R. Patel
    •  & Thomas Preiss
  7. Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Queensland 4072, Australia

    • Rowland Mosbergen
    • , Othmar Korn
    •  & Christine A. Wells
  8. Genomic Medicine Institute, Medical Research Center, Seoul National University, Seoul 110-799, South Korea

    • Dong-Sung Lee
    • , Jong-Yeon Shin
    • , Jong-Il Kim
    •  & Jeong-Sun Seo
  9. Department of Biomedical Sciences and Biochemistry, Seoul National University College of Medicine, Seoul 110-799, South Korea

    • Dong-Sung Lee
    • , Jong-Il Kim
    •  & Jeong-Sun Seo
  10. Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, The University of Queensland, St Lucia, Queensland 4072, Australia

    • Nicole Cloonan
    • , David L. A. Wood
    • , Maely E. Gauthier
    • , Kim-Anh Lê Cao
    •  & Sean M. Grimmond
  11. Gene and Stem Cell Therapy Program and Bioinformatics Lab, Centenary Institute, Camperdown 2050, NSW, Australia & Sydney Medical School, 31 University of Sydney 2006, New South Wales, Australia

    • Robert Middleton
    • , William Ritchie
    •  & John E. J. Rasko
  12. Genome Discovery Unit, The John Curtin School of Medical Research, The Australian National University, Acton (Canberra) 2601, ACT, Australia

    • Hardip R. Patel
  13. Institute of Biomaterials and Biomedical Engineering (IBBME), University of Toronto, Toronto M5S-3G9, Canada

    • Carl A. White
    • , Nika Shakiba
    •  & Peter W. Zandstra
  14. The Donnelly Centre for Cellular and Biomolecular Research (CCBR), University of Toronto, Toronto M5S 3E1, Canada

    • Carl A. White
    •  & Peter W. Zandstra
  15. Life Science Institute, Macrogen Inc., Seoul 153-781, South Korea

    • Jong-Yeon Shin
    •  & Jeong-Sun Seo
  16. Department of Systems & Computational Biology, Albert Einstein College of Medicine of Yeshiva University, Bronx, New York 10461, USA

    • Jessica C. Mar
  17. Cell and Molecular Therapies, Royal Prince Alfred Hospital, Camperdown 2050, New South Wales, Australia

    • John E. J. Rasko
  18. College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8TA, UK

    • Christine A. Wells
  19. Victor Chang Cardiac Research Institute, Darlinghurst (Sydney), New South Wales 2010, Australia

    • Thomas Preiss
  20. Department of Physiology, University of Toronto, Toronto, Ontario M5S 1A8, Canada

    • Ian M. Rogers
  21. Department of Obstetrics and Gynaecology, University of Toronto, Toronto, Ontario M5S 1E2, Canada

    • Ian M. Rogers
    •  & Andras Nagy
  22. QIMR Berghofer Medical Research Institute, Genomic Biology Lab, 300 Herston Road, Herston, Queensland 4006, Australia

    • Nicole Cloonan


  1. Search for Samer M. I. Hussein in:

  2. Search for Mira C. Puri in:

  3. Search for Peter D. Tonge in:

  4. Search for Marco Benevento in:

  5. Search for Andrew J. Corso in:

  6. Search for Jennifer L. Clancy in:

  7. Search for Rowland Mosbergen in:

  8. Search for Mira Li in:

  9. Search for Dong-Sung Lee in:

  10. Search for Nicole Cloonan in:

  11. Search for David L. A. Wood in:

  12. Search for Javier Munoz in:

  13. Search for Robert Middleton in:

  14. Search for Othmar Korn in:

  15. Search for Hardip R. Patel in:

  16. Search for Carl A. White in:

  17. Search for Jong-Yeon Shin in:

  18. Search for Maely E. Gauthier in:

  19. Search for Kim-Anh Lê Cao in:

  20. Search for Jong-Il Kim in:

  21. Search for Jessica C. Mar in:

  22. Search for Nika Shakiba in:

  23. Search for William Ritchie in:

  24. Search for John E. J. Rasko in:

  25. Search for Sean M. Grimmond in:

  26. Search for Peter W. Zandstra in:

  27. Search for Christine A. Wells in:

  28. Search for Thomas Preiss in:

  29. Search for Jeong-Sun Seo in:

  30. Search for Albert J. R. Heck in:

  31. Search for Ian M. Rogers in:

  32. Search for Andras Nagy in:


S.M.I.H., M.C.P., P.D.T. and A.N. conceived, designed and carried out most of the experiments, interpreted results and wrote the manuscript. P.W.Z. contributed to study design. T.P., C. A. Wells, I.M.R., P.W.Z., C. A. White, N.S., A.J.C. and J.C.M. assisted with data interpretation and manuscript writing. M.L., S.M.I.H. and M.C.P. performed ChIP. M.C.P., S.M.I.H., N.C., O.K., D.L.A.W., M.E.G. and S.M.G. produced and analysed RNA-seq data. S.M.I.H., D.-S.L., M.C.P., J.-Y.S., J.-I.K. and J.-S.S. produced and analysed MethylC-seq and ChIP-seq data. J.E.J.R, W.R. and R.Mi. performed the IR analysis, interpretation and contributed to the manuscript writing. C. A. Wells, R.Mo., O.K., K.-A.LC. and J.C.M. provided support for bioinformatics analyses and data visualization. M.B., J.M. and A.J.R.H. performed the LC-MS analysis and proteomic data analysis. H.R.P. mapped the miRNA Next Generation Sequencing (NGS) data and provided support for bioinformatics analyses and data visualization. J.L.C. and T.P. analysed and interpreted the miRNA NGS data. C.A.W. performed the CSC proteomics. C.A.W., N.S. and P.W.Z. analysed CSC proteome data.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Andras Nagy.

Extended data

Supplementary information

Zip files

  1. 1.

    Supplementary Data

    This zipped file contains Supplementary Tables 1-6.

About this article

Publication history






Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.