Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Ranking reprogramming factors for cell differentiation

Abstract

Transcription factor over-expression is a proven method for reprogramming cells to a desired cell type for regenerative medicine and therapeutic discovery. However, a general method for the identification of reprogramming factors to create an arbitrary cell type is an open problem. Here we examine the success rate of methods and data for differentiation by testing the ability of nine computational methods (CellNet, GarNet, EBseq, AME, DREME, HOMER, KMAC, diffTF and DeepAccess) to discover and rank candidate factors for eight target cell types with known reprogramming solutions. We compare methods that use gene expression, biological networks and chromatin accessibility data, and comprehensively test parameter and preprocessing of input data to optimize performance. We find the best factor identification methods can identify an average of 50–60% of reprogramming factors within the top ten candidates, and methods that use chromatin accessibility perform the best. Among the chromatin accessibility methods, complex methods DeepAccess and diffTF have higher correlation with the ranked significance of transcription factor candidates within reprogramming protocols for differentiation. We provide evidence that AME and diffTF are optimal methods for transcription factor recovery that will allow for systematic prioritization of transcription factor candidates to aid in the design of new reprogramming protocols.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Identifying transcription factors that reprogram starting cells to target cell types.
Fig. 2: Selection of genomic regions affects traditional DNA sequence-based methods for identification of transcription factors from chromatin accessibility.
Fig. 3: Use of histone mark and EP300 annotation does not significantly affect transcription factor recovery in liver cells.
Fig. 4: Complex chromatin methods are top performers for transcription factor recovery and significance ranking.

Similar content being viewed by others

Data availability

Normalized area under rank recall curve values for all methods are available in Supplementary Tables 26 and for epigenomic marks of liver cells in Extended Source Data Fig. 3. Ranks of each reprogramming factor for all methods are available in Supplementary Table 8. The consensus mouse transcription factor motif database derived from the mouse HOCOMOCOv11 database50, shared mouse enhancer sequences, and a list of mouse transcription factors are available at: https://cgs.csail.mit.edu/ReprogrammingRecovery/. Publicly available ATAC-seq and RNA-seq samples were downloaded as fastqs from Nucleotide Read Archive (Supplementary Table 1) and processed as described in the sections on ATAC-seq processing and RNA-seq processing. Uniformly processed gene count and peak files are also available at https://cgs.csail.mit.edu/ReprogrammingRecovery/. Data collection software used were conda/bioconda (v.4.9.0), bedtools (v.2.29.2), Trimgalore) (v.21032019), cutadapt (v.0.6.2), samtools (v.1.7), bwa (v.0.7.17), MACS2 (v.2.2.7.1), FASTQC (v.0.11.8), STAR (v.2.5.2b), RSEM (v.1.3.0), R (v.3.6.1), python (v.3.6.9), DeepAccess (v.0.0.1), EBseq (v.1.2.0), CellNet (v.0.1.0), GarNet (v.0.5.0), HOMER (v.4.9.1), AME/DREME/TomTom (v.5.0.5), KMAC (GEM v.3.4), diffTF (v.1.7.1), PWMScan (v.1.1.1), HOCOMOCO (v.11), GENCODE (v.m24) and mouse genome (mm10), and are cited in Supplementary Table 17. Source data are provided with this paper.

Code availability

The custom script for performing motif discovery with AME, DREME, HOMER and KMAC is available at: https://cgs.csail.mit.edu/ReprogrammingRecovery/.

References

  1. Pellegrino, M. et al. RNA-seq following PCR-based sorting reveals rare cell transcriptional signatures. BMC Genomics 17, 361 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  2. Habib, N. et al. Div-Seq: single-nucleus RNA-seq reveals dynamics of rare adult newborn neurons. Science 353, 925–928 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Rai, V. et al. Single-cell ATAC-seq in human pancreatic islets and deep learning upscaling of rare cells reveals cell-specific type 2 diabetes regulatory signatures. Mol. Metab. 32, 109–121 (2020).

    Article  CAS  PubMed  Google Scholar 

  5. Sasagawa, Y. et al. Quartz-Seq: a highly reproducible and sensitive single-cell RNA-seq reveals non-genetic gene expression heterogeneity. Genome Biol. 14, 3097 (2013).

    Article  CAS  Google Scholar 

  6. Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866.e17 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Angermueller, C. et al. Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity. Nat. Methods 13, 229–232 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Grün, D. et al. Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525, 251–255 (2015).

    Article  PubMed  CAS  Google Scholar 

  9. Pijuan-Sala, B. et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature 566, 490–495 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Lake, B. B. et al. Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain. Nat. Biotechnol. 36, 70–80 (2018).

    Article  CAS  PubMed  Google Scholar 

  11. Pijuan-Sala, B. et al. Single-cell chromatin accessibility maps reveal regulatory programs driving early mouse organogenesis. Nat. Cell Biol. 22, 487–497 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Satpathy, A. T. et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol. 37, 925–936 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Wichterle, H., Lieberam, I., Porter, J. A. & Jessell, T. M. Directed differentiation of embryonic stem cells into motor neurons. Cell 110, 385–397 (2002).

    Article  CAS  PubMed  Google Scholar 

  14. Marson, A. et al. Wnt signaling promotes reprogramming of somatic cells to pluripotency. Cell Stem Cell 3, 132–135 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Ichida, J. K. et al. A small-molecule inhibitor of Tgf-β signaling replaces Sox2 in reprogramming by inducing Nanog. Cell Stem Cell 5, 491–503 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Oh, Y. & Jang, J. Directed differentiation of pluripotent stem cells by trascription factors. Mol. Cells 42, 200–209 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Mazzoni, E. O. et al. Synergistic binding of transcription factors to cell-specific enhancers programs motor neuron identity. Nat. Neurosci. 16, 1219–1227 (2013).

    Article  CAS  PubMed  Google Scholar 

  18. Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–676 (2006).

    Article  CAS  PubMed  Google Scholar 

  19. Rackham, O. J. L. et al. A predictive computational framework for direct reprogramming between human cell types. Nat. Genet. 48, 331–335 (2016).

    Article  CAS  PubMed  Google Scholar 

  20. Heinäniemi, M. et al. Gene-pair expression signatures reveal lineage control. Nat. Methods 10, 577–583 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. Roost, M. S. et al. KeyGenes, a tool to probe tissue differentiation using a human fetal transcriptional atlas. Stem Cell Rep. 4, 1112–1124 (2015).

    Article  CAS  Google Scholar 

  22. Lang, A. H., Li, H., Collins, J. J. & Mehta, P. Epigenetic landscapes explain partially reprogrammed cells and identify key reprogramming genes. PLoS Comput. Biol. 10, e1003734 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  23. D’Alessio, A. C. et al. A systematic approach to identify candidate transcription factors that control cell identity. Stem Cell Rep. 5, 763–775 (2015).

    Article  CAS  Google Scholar 

  24. Sharma, N. et al. The emergence of transcriptional identity in somatosensory neurons. Nature 577, 392–398 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Morris, S. A. et al. Dissecting engineered cell types and enhancing cell fate conversion via CellNet. Cell 158, 889–902 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Cahan, P. et al. CellNet: network biology applied to stem cell engineering. Cell 158, 903–915 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Radley, A. H. et al. Assessment of engineered cells using CellNet and RNA-seq. Nat. Protoc. 12, 1089–1102 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Bonneau, R. et al. The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol. 7, R36 (2006).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. Miraldi, E. R. et al. Leveraging chromatin accessibility for transcriptional regulatory network inference in T Helper 17 Cells. Genome Res. 29, 449–463 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Tuncbag, N. et al. Network-based interpretation of diverse high-throughput datasets through the omics integrator software package. PLoS Comput. Biol. 12, e1004879 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. Kedaigle, A. J. & Fraenkel, E. in Cancer Systems Biology (ed. Stechow, L.) 13–26 (Springer, 2018).

  32. Leng, N. et al. EBseq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics 29, 1035–1043 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Machanick, P. & Bailey, T. L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Bailey, T. L. DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27, 1653–1659 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Guo, Y., Tian, K., Zeng, H., Guo, X. & Gifford, D. K. A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction. Genome Res. 28, 891–900 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Hammelman, J., Krismer, K., Banerjee, B., Gifford, D. K. & Sherwood, R. I. Identification of determinants of differential chromatin accessibility through a massively parallel genome-integrated reporter assay. Genome Res. 30, 1468–1480 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Hammelman, J. & Gifford, D. K. Discovering differential genome sequence activity with interpretable and efficient deep learning. PLoS Comput. Biol. 17, e1009282 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Berest, I. et al. Quantification of differential transcription factor activity and multiomics-based classification into activators and repressors: diffTF. Cell Rep. 29, 3147–3159 (2019).

    Article  CAS  PubMed  Google Scholar 

  40. De Dieuleveult, M. et al. Genome-wide nucleosome specificity and function of chromatin remodellers in ES cells. Nature 530, 113–116 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  41. Ferrari, F. et al. DOT1L-mediated murine neuronal differentiation associates with H3K79me2 accumulation and preserves SOX2-enhancer accessibility. Nat. Commun. 11, 5200 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Cheloufi, S. et al. The histone chaperone CAF-1 safeguards somatic cell identity. Nature 528, 218–224 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Ramachandran, K. et al. Dynamic enhancers control skeletal muscle identity and reprogramming. PLoS Biol. 17, e3000467 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Quaife-Ryan, G. A. et al. Multicellular transcriptional analysis of mammalian heart regeneration. Circulation 136, 1123–1139 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Lawlor, N., Youn, A., Kursawe, R., Ucar, D. & Stitzel, M. L. Alpha TC1 and Beta-TC-6 genomic profiling uncovers both shared and distinct transcriptional regulatory features with their primary islet counterparts. Sci. Rep. 7, 11959 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. McClymont, S. A. et al. Parkinson-associated SNCA enhancer variants revealed by open chromatin in mouse dopamine neurons. Am. J. Hum. Genet. 103, 874–892 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Closser, M. et al. An expansion of the non-coding genome and its regulatory potential underlies vertebrate neuronal diversity. Neuron 110, 70–85.e6 (2022).

    Article  CAS  PubMed  Google Scholar 

  48. Cernilogar, F. M. et al. Pre-marked chromatin and transcription factor co-binding shape the pioneering activity of Foxa2. Nucleic Acids Res. 47, 9069–9086 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  50. Kulakovskiy, I. V. et al. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis. Nucleic Acids Res. 46, D252–D259 (2018).

    Article  CAS  PubMed  Google Scholar 

  51. Frey, B. J. & Dueck, D. Clustering by passing messages between data points. Science 315, 972–976 (2007).

    Article  CAS  PubMed  Google Scholar 

  52. Zhang, Y. et al. Model-based analysis of ChIP-seq (MACS). Genome Biol. 9, R137 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  53. Shen, Y. et al. A map of the cis-regulatory sequences in the mouse genome. Nature 488, 116–120 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Fu, S. et al. Differential analysis of chromatin accessibility and histone modifications for predicting mouse developmental enhancers. Nucleic Acids Res. 46, 11184–11201 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Wamstad, J. A., Wang, X., Demuren, O. O. & Boyer, L. A. Distal enhancers: new insights into heart development and disease. Trends Cell Biol. 24, 294–302 (2014).

    Article  CAS  PubMed  Google Scholar 

  56. Soufi, A. et al. Pioneer transcription factors target partial DNA motifs on nucleosomes to initiate reprogramming. Cell 161, 555–568 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Yamamizu, K. et al. Identification of transcription factors for lineage-specific ESC differentiation. Stem Cell Rep. 1, 545–559 (2013).

    Article  CAS  Google Scholar 

  58. Simeonov, K. P. & Uppal, H. Direct reprogramming of human fibroblasts to hepatocyte-like cells by synthetic modified mRNAs. PLoS ONE 9, e100134 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  59. Bai, F. et al. Directed differentiation of embryonic stem cells into cardiomyocytes by bacterial injection of defined transcription factors. Sci. Rep. 5, 15014 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Jin, Y. et al. Enhanced differentiation of human pluripotent stem cells into cardiomyocytes by bacteria-mediated transcription factors delivery. PLoS ONE 13, e0194895 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  61. Pistocchi, A. et al. Conserved and divergent functions of NFIX in skeletal muscle development during vertebrate evolution. Development 140, 1528–1536 (2013).

    Article  CAS  PubMed  Google Scholar 

  62. Messina, G. et al. NFIX regulates fetal-specific transcription in developing skeletal muscle. Cell 140, 554–566 (2010).

    Article  CAS  PubMed  Google Scholar 

  63. De Vas, M. G. et al. Hnf1b controls pancreas morphogenesis and the generation of Ngn3+ endocrine progenitors. Development 142, 871–882 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  64. Ait-Lounis, A. et al. The transcription factor Rfx3 regulates beta-cell differentiation, function, and glucokinase expression. Diabetes 59, 1674–1685 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Piccand, J. et al. Rfx6 maintains the functional identity of adult pancreatic β cells. Cell Rep. 9, 2219–2232 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Koo, P. K., Anand, P., Paul, S. B. & Eddy, S. R. Inferring sequence-structure preferences of RNA-binding proteins with convolutional residual networks. Preprint at bioRxiv 418459 (2018).

  70. Avsec, Ž. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53.3, 354–366 (2021).

    Article  CAS  Google Scholar 

  71. Kim, D. et al. The dynamic, combinatorial cis-regulatory lexicon of epidermal differentiation. Nat. Genet. 53.11, 1564–1576 (2021).

    Article  CAS  Google Scholar 

  72. Kelley, D. R. Cross-species regulatory sequence activity prediction. PLoS Comput. Biol. 16, e1008050 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Minnoye, L. et al. Cross-species analysis of enhancer logic using deep learning. Genome Res. 30, 1815–1834 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Jung, S., Appleton, E., Ali, M., Church, G. M. & del Sol, A. A computer-guided design tool to increase the efficiency of cellular conversions. Nat. Commun. 12, 1659 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).

    Article  CAS  PubMed  Google Scholar 

  76. Liu, Y. et al. CRISPR activation screens systematically identify factors that drive neuronal fate and reprogramming. Cell Stem Cell 23, 758–771 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Yang, J. et al. Genome-scale CRISPRa screen identifies novel factors for cellular reprogramming. Stem Cell Rep. 12, 757–771 (2019).

    Article  CAS  Google Scholar 

  78. Black, J. B. et al. Master regulators and cofactors of human neuronal cell fate specification identified by CRISPR gene activation screens. Cell Rep. 33, 108460 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Genga, R. M. J. et al. Single-cell RNA-sequencing-based CRISPRi screening resolves molecular drivers of early human endoderm development. Cell Rep. 27, 708–718.e10 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Ng, A. H. M. et al. A comprehensive library of human transcription factors for cell fate engineering. Nat. Biotechnol. 39, 510–519 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  81. Nakatake, Y. et al. Generation and profiling of 2,135 human ESC lines for the systematic analyses of cell states perturbed by inducing single transcription factors. Cell Rep. 31, 107655 (2020).

    Article  CAS  PubMed  Google Scholar 

  82. Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).

    Article  CAS  PubMed  Google Scholar 

  83. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at arXiv1303.3997 (2013).

  84. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).

    Article  Google Scholar 

  85. Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Dobin, A. & Gingeras, T. R. Mapping RNA‐seq reads with STAR. Curr. Protoc. Bioinforma. 51, 11–14 (2015).

    Google Scholar 

  87. Lambert, S. A. et al. The human transcription factors. Cell 172, 650–665 (2018).

    Article  CAS  PubMed  Google Scholar 

  88. Ambrosini, G., Groux, R. & Bucher, P. PWMScan: a fast tool for scanning entire genomes with a position-specific weight matrix. Bioinformatics 34, 2483–2484 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  90. Waskom, M. L. Seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).

    Article  Google Scholar 

  91. Grüning, B. et al. Practical computational reproducibility in the life sciences. Cell Syst. 6, 631–635 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

Download references

Acknowledgements

We thank members of the Gifford and Wichterle laboratories for helpful discussions. We gratefully acknowledge funding from 1RO1HG008363 (D.G.), 1R01HG008754 (D.G.), 1R01NS109217 (D.G. and H.W.), R01NS116141 (H.W.), NINDS Postdoctoral NRSA Fellowship (F32NS105372) (T.P.), Brain Initiative K99 (1K99NS121136) (T.P.) and National Science Foundation Graduate Research Fellowship (1122374) (J.H.).

Author information

Authors and Affiliations

Authors

Contributions

Data curation and visualization were carried out by J.H. The original draft was written by J.H. Reviewing and editing of the draft were carried out by J.H., T.P., M.C., H.W. and D.G. D.G. and H.W. supervised the work. Funding was acquired by J.H., T.P., H.W. and D.G.

Corresponding author

Correspondence to David Gifford.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Madhura Mukhopadhyay was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 A consensus database of 107 transcription factor motifs.

a, HOCOMOCO v11 mouse transcription factor core motif database is used as input. Motif PWM similarity to the HOCOMOCO database is computed using Tomtom. b, For each pair of motifs, Pearson correlation between Tomtom scores is computed, resulting in a symmetric correlation matrix. Affinity propagation clustering is applied to the correlation matrix, resulting in 107 clusters of transcription factor motifs with one motif being selected as the representative motif of the cluster. c, Cluster representing OCT/SOX heterodimer-like motifs with SOX2 motif selected as the representative. d, Cluster representing LIM-like motifs with LHX3 motif selected as the representative.

Source data

Extended Data Fig. 2 Comparing input features and methods for transcription factor recovery from chromatin accessibility data.

a, Reprogramming recovery effected estimated by linear models for decision axes in input to chromatin models for AURC of top 100 ranked factor motifs, excluding predicting stem cell reprogramming factors to estimate effect of use of fibroblast or stem cell as source cell type is selection of cell type-specific regions, or selection of top regions without eliminating regions that are accessible in the source cell type. b, Cell type AURC for top 100 ranked factor motifs stratified by decision axis and marginalized over other axes. Box plots show median and quartile values. Whiskers extend to represent the rest of the data distribution with the exception of outliers that are defined as values greater than 1.5 times the inter-quartile range and are plotted as individual points.

Source data

Extended Data Fig. 3 Comparing chromatin accessibility overlapping histone mark and EP300 epigenomic data for transcription factor recovery.

AURC for top 100 ranked factor motifs in liver using overlaps between chromatin accessibility (ATAC-seq) and overlap of chromatin accessibility with H3K27ac, EP300, H3K4me1, H3K4me3, and 3 enhancer markers (EP300, H3K27ac, and H3K4me1) per method identifies for DREME, HOMER, and KMAC worst performance using ATAC + H3K4me3 which is correlated with promoter activity, and for all methods we see similar performance levels with ATAC, ATAC + H3K27ac, and ATAC + H3K4me1 which mark enhancers.

Source data

Extended Data Fig. 4 Fibroblast and stem cell as starting cell type comparing each method.

Chromatin methods use optimal input features for each background a, Normalized area under the rank recall curve for top 10 ranked motifs averaged over cell types, b, scatter plot of normalized area under the recall curve for fibroblast (x-axis) and stem cell (y-axis) each dot represents the normalized area under the rank recall curve for top 10 ranked motifs for one cell type and one method where color represents the method, c, scatter plot of normalized area under the recall curve for fibroblast (x-axis) and stem cell (y-axis) each dot represents the normalized area under the rank recall curve for top 100 ranked motifs for one cell type and one method where color represents the method.

Source data

Extended Data Fig. 5 GarNet distance thresholds do not majorly impact performance for transcription factor recovery.

GarNet fraction of reprogramming factors over eight target cell types recovered by 2 kb, 10 kb, and 100 kb thresholds for maximum distance between transcription factor binding site and gene transcription factor start site.

Source data

Extended Data Fig. 6 Deciding input and methods for ranking reprogramming transcription factors.

Decision chart for performing optimal reprogramming factor recovery given chromatin accessibility data for a desired target reprogramming factor cell type.

Supplementary information

Supplementary Information

Supplementary table legends.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–17.

Source data

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hammelman, J., Patel, T., Closser, M. et al. Ranking reprogramming factors for cell differentiation. Nat Methods 19, 812–822 (2022). https://doi.org/10.1038/s41592-022-01522-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-022-01522-2

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing