Article | Published:

Single-cell mapping of lineage and identity in direct reprogramming


Direct lineage reprogramming involves the conversion of cellular identity. Single-cell technologies are useful for deconstructing the considerable heterogeneity that emerges during lineage conversion. However, lineage relationships are typically lost during cell processing, complicating trajectory reconstruction. Here we present ‘CellTagging’, a combinatorial cell-indexing methodology that enables parallel capture of clonal history and cell identity, in which sequential rounds of cell labelling enable the construction of multi-level lineage trees. CellTagging and longitudinal tracking of fibroblast to induced endoderm progenitor reprogramming reveals two distinct trajectories: one leading to successfully reprogrammed cells, and one leading to a ‘dead-end’ state, paths determined in the earliest stages of lineage conversion. We find that expression of a putative methyltransferase, Mettl7a1, is associated with the successful reprogramming trajectory; adding Mettl7a1 to the reprogramming cocktail increases the yield of induced endoderm progenitors. Together, these results demonstrate the utility of our lineage-tracing method for revealing the dynamics of direct reprogramming.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Data availability

All source data, including sequencing reads and single-cell expression matrices, are available from the Gene Expression Omnibus (GEO) under accession code GSE99915.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Vierbuchen, T. & Wernig, M. Direct lineage conversions: unnatural but useful? Nat. Biotechnol. 29, 892–907 (2011).

  2. 2.

    Cahan, P. et al. CellNet: network biology applied to stem cell engineering. Cell 158, 903–915 (2014).

  3. 3.

    Morris, S. A. et al. Dissecting engineered cell types and enhancing cell fate conversion via CellNet. Cell 158, 889–902 (2014).

  4. 4.

    Buganim, Y. et al. Single-cell expression analyses during cellular reprogramming reveal an early stochastic and a late hierarchic phase. Cell 150, 1209–1222 (2012).

  5. 5.

    Treutlein, B. et al. Dissecting direct reprogramming from fibroblast to neuron using single-cell RNA-seq. Nature 534, 391–395 (2016).

  6. 6.

    Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).

  7. 7.

    Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).

  8. 8.

    Rodriguez-Fraticelli, A. E. et al. Clonal analysis of lineage fate in native haematopoiesis. Nature 553, 212–216 (2018).

  9. 9.

    McKenna, A. et al. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science 353, aaf7907 (2016).

  10. 10.

    Porter, S. N., Baker, L. C., Mittelman, D. & Porteus, M. H. Lentiviral and targeted cellular barcoding reveals ongoing clonal dynamics of cell lines in vitro and in vivo. Genome Biol. 15, R75 (2014).

  11. 11.

    Yao, Z. et al. A single-cell roadmap of lineage bifurcation in human ESC models of embryonic brain development. Cell Stem Cell 20, 120–134 (2017).

  12. 12.

    Alemany, A., Florescu, M., Baron, C. S., Peterson-Maduro, J. & van Oudenaarden, A. Whole-organism clone tracing using single-cell sequencing. Nature 556, 108–112 (2018).

  13. 13.

    Spanjaard, B. et al. Simultaneous lineage tracing and cell-type identification using CRISPR–Cas9-induced genetic scars. Nat. Biotechnol. 36, 469–473 (2018).

  14. 14.

    Raj, B. et al. Simultaneous single-cell profiling of lineages and cell types in the vertebrate brain. Nat. Biotechnol. 36, 442–450 (2018).

  15. 15.

    Wagner, D. E. et al. Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo. Science 360, 981–987 (2018).

  16. 16.

    Sekiya, S. & Suzuki, A. Direct conversion of mouse fibroblasts to hepatocyte-like cells by defined factors. Nature 475, 390–393 (2011).

  17. 17.

    Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).

  18. 18.

    Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

  19. 19.

    Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

  20. 20.

    Chen, T. et al. m6A RNA methylation is regulated by microRNAs and promotes reprogramming to pluripotency. Cell Stem Cell 16, 289–301 (2015).

  21. 21.

    Batista, P. J. et al. m6A RNA modification controls cell fate transition in mammalian embryonic stem cells. Cell Stem Cell 15, 707–719 (2014).

  22. 22.

    Polo, J. M. et al. A molecular roadmap of reprogramming somatic cells into iPS cells. Cell 151, 1617–1632 (2012).

  23. 23.

    Hanna, J. et al. Direct cell reprogramming is a stochastic process amenable to acceleration. Nature 462, 595–601 (2009).

  24. 24.

    Guo, S. et al. Nonstochastic reprogramming from a privileged somatic cell state. Cell 156, 649–662 (2014).

  25. 25.

    Babos, K. N. et al. Balancing dynamic tradeoffs to drive cellular reprogramming. Preprint at (2018).

  26. 26.

    Rais, Y. et al. Deterministic direct reprogramming of somatic cells to pluripotency. Nature 502, 65–70 (2013).

  27. 27.

    Di Stefano, B. et al. C/EBPα poises B cells for rapid reprogramming into induced pluripotent stem cells. Nature 506, 235–239 (2014).

  28. 28.

    Di Stefano, B. et al. C/EBPα creates elite cells for iPSC reprogramming by upregulating Klf4 and increasing the levels of Lsd1 and Brd4. Nat. Cell Biol. 18, 371–381 (2016).

  29. 29.

    Yunusova, A. M., Fishman, V. S., Vasiliev, G. V. & Battulin, N. R. Deterministic versus stochastic model of reprogramming: new evidence from cellular barcoding technique. Open Biol. 7, (2017).

  30. 30.

    Schiebinger, G. et al. Reconstruction of developmental landscapes by optimal-transport analysis of single-cell gene expression sheds light on cellular reprogramming. Preprint at (2017).

  31. 31.

    van Galen, P. et al. The unfolded protein response governs integrity of the haematopoietic stem-cell pool during stress. Nature 510, 268–272 (2014).

  32. 32.

    Alles, J. et al. Cell fixation and preservation for droplet-based single-cell transcriptomics. BMC Biol. 15, 44 (2017).

  33. 33.

    McCarthy, D. J., Campbell, K. R., Lun, A. T. L. & Wills, Q. F. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 347, btw777 (2017).

  34. 34.

    Zorita, E., Cuscó, P. & Filion, G. J. Starcode: sequence clustering based on all-pairs search. Bioinformatics 31, 1913–1919 (2015).

Download references


We thank members of the Morris laboratory, and T. Druley and R. Mitra for critical discussions; S. McCarroll, E. Macosko and M. Goldman for advice establishing Drop-seq; B. Treutlein for quadratic programming assistance; J. Dick for the gift of the pSMAL backbone; K. Kniepkamp for help with CellTag Viz; and The Genome Technology Access Center in the Department of Genetics. This work was funded by National Institutes of Health (NIH) grants R01-GM126112, R21-HG009750; P30-DK052574; Silicon Valley Community Foundation, Chan Zuckerberg Initiative Grants HCA-A-1704-01646 and HCA2-A-1708-02799; The Children’s Discovery Institute of Washington University and St. Louis Children’s Hospital MI-II-2016-544. S.A.M. is supported by a Vallee Scholar Award; B.A.B.: NIH-T32HG000045-18; C.G.: NIH-5T32GM007200-42; S.E.W.: NIH-5T32GM007067-44; K.K.: Japan Society for the Promotion of Science Postdoctoral Fellowship.

Reviewer information

Nature thanks L. Perié, M. Porteus, L. Vallier and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

B.A.B. and S.A.M. conceived the research. S.A.M. led experimental work, assisted by B.A.B., W.K., C.G., S.E.W. and T.S. B.A.B. and W.K. led computational analysis, assisted by K.K. and supervised by S.A.M. All authors participated in interpretation of data and writing the manuscript.

Competing interests

The authors declare no competing interests.

Correspondence to Samantha A. Morris.

Extended data figures and tables

  1. Extended Data Fig. 1 CellTag processing and species-mixing validations.

    a, Schematic of the CellTag processing and filtering pipeline: CellTag sequences are first extracted from aligned sequencing reads, followed by construction of a matrix of CellTag expression in each cell. To mitigate potential artefacts arising as a result of PCR and sequencing errors, we implemented an error-correction step, collapsing similar barcodes one edit-distance apart, on a cell-by-cell basis. An initial filtering step then removes any CellTags that do not appear on a whitelist of CellTags that are confirmed to exist in the complex lentiviral library. A second filtering step removes cells expressing less than two or more than 20 unique CellTags. Using this filtered dataset, Jaccard analysis is then applied (using the R package, Proxy) to identify related cells, based on CellTag signature similarity, allowing clones to be called. b, Generation of the CellTag whitelist. Following CellTag lentiviral plasmid sequencing, CellTags were extracted from the raw fastq files via identification of the adjacent motifs as described in Methods (see Methods, ‘CellTag demultiplexing’). A 90th percentile cut-off in terms of reads reporting each CellTag was used to select CellTags for inclusion on the whitelist. Of a possible 65,536 unique combinations, we detected 19,973 sequences passing this 90th percentile of read counts. Data for CellTag version 1 (CellTagMEF) is shown here. Whitelist creation was also performed for CellTag versions 2 (CellTagD3) and 3 (CellTagD13). c, d, CellTag frequency (c), that is, how many times each CellTag is detected in a population of transduced cells, before (black) and after (red) removal of CellTags that do not feature on the whitelist. This whitelisting predominantly results in the removal of CellTags that appear only once; singletons that are likely to arise owing to sequencing and PCR errors. This is reflected in the histogram in d, showing that only 60% of singleton CellTags detected are retained, whereas over 90% of CellTags appearing in two or more cells are retained. e, Mean CellTags per cell pre- and post-CellTag pipeline filtering. Cells in this figure correspond to the cells shown in Fig. 1b, c (replicate 1: n = 8,535 cells; replicate 2: n = 11,997 cells). f, Pairwise correlation scores (Jaccard similarity) and hierarchical clustering of 10 major clones arising from this tag and trace experiment. Hierarchical clustering is based on each cell’s Jaccard correlation relationships with other cells, where each defined ‘block’ of cells represents a clone. Left, scoring and clustering of pairwise correlations, before whitelisting and filtering. Right, after whitelisting and filtering, pairwise correlations are stronger and more cells are detected within each clone (n = 869 cells). g, CellTag frequency metric: each detected CellTag appears in less than two cells (n = 9,072 cells in total) at the start of the experiment, on average. The library is therefore not dominated by any abundant CellTags, which would potentially generate false-positive results. h, A species mixing experiment, consisting of a mixture of human 293T cells and MEFs (left), labelled with ~3–5 CellTags per cell and expressing GFP as a result. A fibroblast (white arrow) is visible within a colony of 293T cells. Scale bar, 50μ M. Seventy-two hours after transduction, cells were collected and processed for Drop-seq. Right, following sequencing and alignment, cells were assigned to their corresponding species, revealing a low rate of doublet formation (n = 4,631 human cells, 312 mouse cells, 36 mixed). i, Mean CellTags per cell for human and mouse cells in the species-mixing experiment. CellTag transcripts were detected in 70% of cells (n = 3,493/4,979 cells). Of the tagged population, each cell expressed 5 CellTags on average: 3.800 ± 0.002 in human cells, and 5.90 ± 0.02 in mouse cells (mean ± s.e.m.). j, For each cell, CellTag signatures were extracted and Jaccard similarity analysis was performed to assess the frequency of CellTag signature overlap between the two species. To establish a false-positive baseline, we initially compared CellTag overlap between mouse and human populations, as these cells are not related. From the analysis of 4,943 cells, we identified 200 instances of mouse–human cell pairings out of a possible 1.5 × 107 pairs sharing the same individual CellTags. This demonstrates that reliance on only one CellTag per cell does not uniquely label cells with high confidence. Excluding cells represented by only one CellTag removes this noise, resulting in no detection of cross-species CellTag signatures (Jaccard similarity index <0.7). This highlights the importance of combinatorial labelling, and the efficacy of our approach to uniquely label unrelated cells.

  2. Extended Data Fig. 2 CellTagging does not perturb cell physiology or reprogramming efficiency.

    To assess the potential effect of CellTagging on cell physiology we performed scRNA-seq on CellTag-labelled cells and unlabelled control cells 72 h after tagging. a, Left, fluorescent image of CellTag-labelled, GFP-expressing, pre-B cell line, HAFTL-1. Right, 10x Genomics-based scRNA-seq of CellTag-labelled (n = 3,943 cells) and non-tagged control cells (n = 2,067 cells). Cells were clustered using Seurat, resulting in a t-SNE plot with 6 clusters of transcriptionally distinct cells. CellTag-labelled and control cells were evenly distributed across these populations. b, The CellTag-labelled B-cell population expresses a mean of 3.50 ± 0.02 CellTags per cell. c, We detect no observable differences in numbers of genes or UMIs per cell in either population. d, Average gene expression values between CellTag-labelled and control cells are highly correlated (r = 0.999, Pearson’s correlation), demonstrating that our labelling approach does not induce significant changes in gene expression. These experiments were performed independently twice with similar results. e, To assess the potential effect of CellTagging on reprogramming outcome, we induced lineage conversion (MEF to iEP) of CellTagged cells in parallel with unbarcoded control cells, followed by three weeks of culture and processing on the Drop-seq platform (n = 773 cells passing quality control). A mean of 3.30 ± 0.09 CellTags per cell are expressed in a labelled reprogrammed cell population. f, There are no observable differences in numbers of genes or UMIs per cell in either the labelled or unlabelled populations. g, Average gene expression values between CellTagged and control cells are highly correlated (r = 0.98, Pearson’s correlation), again demonstrating that our labelling approach does not induce significant changes in gene expression. h, Seurat clustering of cells, in which cells in fibroblast (Col1a2-high), transition, and fully reprogrammed (Apoa1-high) states can be identified. Right, barcoded and control cells are distributed fairly evenly across these reprogramming stages. Some variation is expected between these independent biological replicates. These experiments were performed independently twice with similar results.

  3. Extended Data Fig. 3 scRNA-seq metrics and quality control of cell clustering.

    a, Numbers of genes and UMIs per cell for 10x Genomics-based (time course 1, n = 30,733 cells and time course 2: n = 54,277 cells) and Drop-seq-based (time course 3, n = 5,932 cells and time course 4: n = 5,414 cells) reprogramming time courses. In these cross-platform comparisons, we apply more stringent filtering of Drop-seq data to include only those cells with 1,000 or more UMIs. For Drop-seq experiments, with a cell capture rate of 5%, 2 × 105 MEFs were initially seeded for reprogramming. For 10x Genomics experiments, with a cell encapsulation rate of up to 60%, 5 × 104 MEFs were initially seeded for reprogramming. b, Mean numbers of UMIs per cell at each captured time point during reprogramming (5,570.0 ± 2.2), in two independent biological replicates (10x Genomics, time courses 1 and 2): cells were captured at days 3, 6, 9, 12, 15, 21 and 28, along with the initial MEF population (day 0). c, Average gene expression values of 10x Genomics and Drop-seq replicates are highly correlated at day 0, demonstrating technical consistency (r = 0.99, and r = 0.98, respectively, Pearson’s correlation). d, Alignment of independent 10x Genomics replicates (time courses 1 and 2) with Drop-seq replicates (time courses 3 and 4) using canonical correlation analysis19. Left, expression of MEF marker Col1a2. Right, iEP marker Apoa1. Overlay of data from these two sources demonstrates a high level of technical and biological consistency between the two technologies. e, Alignment of 10x Genomics replicates (time course 1 and 2) using canonical correlation analysis. Expression of Col1a2 (left), Apoa1 (right). Integration of these two replicates demonstrates a high level of technical and biological consistency. f, Projections of cell cycle phase and UMIs per cell onto t-SNE alignment of time courses 1 and 2 shows that clustering is independent of these factors. g, Reprogramming factor expression (using detection of bicistronic Hnf4a-T2A-Foxa1 transgene expression) and CellTag expression across time courses 1 and 2.

  4. Extended Data Fig. 4 CellTag expression metrics.

    a, Mean counts of CellTags expressed per cell, following whitelisting and filtering for time course 1 (n = 19,581 cells passing filtering) and 2 (n = 38,943 cells passing filtering), broken down by time point and CellTag version. Red dashed lines denote time of CellTag transduction. b, Mean number of CellTags expressed per cell, post-whitelisting and filtering, for each round of barcoding across time courses 1 and 2. CellTagMEF: 3.40 ± 0.01 CellTags per cell, n = 37,612 cells; CellTagD3: 4.50 ± 0.02 CellTags per cell, n = 32,176 cells; CellTagD13: 3.20 ± 0.02 CellTags per cell, n = 10,212 cells. Sixty-five per cent of sequenced cells pass the ≥2 CellTag expression threshold to support tracking. c, Mean CellTags per cell following whitelisting and filtering for both Drop-seq time courses, broken down by time point. All cells with 200 or more genes were included in this analysis (time course 1: n = 10,038 cells; time course 2: n = 9,839 cells). CellTags were introduced only in MEFs, before reprogramming in these experiments. In Drop-seq time courses, we detected a mean of 7.80 ± 0.07 CellTags per cell, across 61% of cells (12,086/19,877 cells) passing the tracking threshold.

  5. Extended Data Fig. 5 Assignment of cluster identities based on mRNA and protein expression.

    a, Top enriched gene expression associated with each cluster, projected onto the reprogramming t-SNE plot (n = 85,010 cells). b, Left, expression of Col1a2, projected onto the t-SNE plot. Top right, violin plot of Col1a2 expression levels in each cluster. Bottom right, violin plot of Apoa1 expression levels in each cluster, ordered by gain of expression over the course of reprogramming. Clusters are classified as one of four reprogramming stages: fibroblast, clusters 5, 6, 7, 11; early transition, clusters 0, 3; transition, clusters, 1, 4, 8, 9,10, 12; and reprogrammed, cluster 2. Apoa1 is not expressed in the fibroblast clusters. c, Top, expression of the iEP marker3,16 Cdh1 (E-cadherin), projected onto the t-SNE plot, highlighting the location of fully reprogrammed cells. Bottom, staining of CDH1 protein in iEP colonies emerging following three weeks of reprogramming (control shown is from Fig. 4d). Scale bar, 20 mm. d, Top, expression of the novel iEP marker, apolipoprotein A1, Apoa1, projected onto the t-SNE plot. Bottom, immunofluorescence of APOA1 protein in an iEP colony, following three weeks of reprogramming. APOA1 (red) is localized to vesicles. This is a representative image selected from five independent biological replicates. Scale bar, 20 μm. e, Top, co-expression of Apoa1 and Cdh1 at the transcript level within the same individual cells in the fully reprogrammed cluster confirms Apoa1 as a marker of iEP emergence. Bottom, immunofluorescence of APOA1 and CDH1 protein in iEPs. White arrows mark emerging iEP colonies co-expressing both proteins. APOA1 expression (red) is found localized to vesicles of CDH1-positive cells (green), where the most intense CDH1 staining is observed at cell–cell junctions. This is a representative image selected from three independent biological replicates. Scale bar, 20 μm.

  6. Extended Data Fig. 6 Combinatorial CellTag labelling to identify clonally related cells.

    a, Heat map showing scaled expression of individual CellTags in 20 major clones from cells labelled with CellTagD3 (n = 10 representative cells per clone, time courses 1 and 2). The dashed yellow line marks separation between the two time courses. Dashed red lines mark separation between independent clones. Although some CellTags are shared between these independent biological replicates, the combined CellTag signatures are unique. b, Expression levels of individual CellTags per cell over three weeks in a representative clone labelled by four unique CellTags. Expression diminishes over time, but is not completely silenced. c, To assess CellTag silencing, we selected 10 major clones (n = 6,728 cells), defining the intact CellTag signature for each clone at reprogramming day 6. We then assessed loss, or ‘dropout’ of CellTags from each signature over the time course to day 28. By week 4, expression of an individual CellTag is lost in 1 out of 10 cells—that is, expected CellTag expression was not detected in 11 ± 2% of cells. Conversely, CellTag expression is retained in almost 90% of cells by day 28. Later rounds of CellTag labelling (CellTagD13) are less prone to this effect, with CellTags dropping out in only 3.0 ± 1.5% of cells. d, We mapped CellTag expression across four representative clones, in which expression of each CellTag is plotted over time. The y axis denotes the percentage of cells within each clone in which expression of specific CellTags has dropped out. Typically, only one CellTag exhibits dropout, and expression of the other CellTags is maintained. We do not observe complete silencing, that is, loss of expected CellTag expression in 100% of cells. This demonstrates the advantage of our CellTag combinatorial indexing method to reliably label cells and track them over an extended period of time. For example, reliance on the expression of a single, longer barcode would not be effective following integration into a region that later becomes silenced.

  7. Extended Data Fig. 7 Visualizing growth of clones and gene expression correlation within clones.

    a, Connected bar plots showing individual clones as a proportion of all clones at each reprogramming time point for time course 2, for each round of CellTagging (n = 14,088 cells across 1,120 clones). Connected bars denote clonal expansion and growth over time. b, Average number of cells per clone, per time point, for each round of CellTag labelling (time course 2, n = 1,120 clones). c, Number of clones detected at each time point, for each round of CellTagging over reprogramming time courses 1 (n = 1,031 clones) and 2 (n = 1,120 clones). The number of clones detected gradually increases over time as the probability of capture increases with clonal growth. The number of clones then begins to decrease as the growth of some individual clones out-competes other clones, which are lost from the population over time. d, Connected bar plots showing individual clones as a proportion of all clones called at each reprogramming time point for Drop-seq replicate 1 (n = 103 clones) and Drop-seq replicate 2 (n = 37 clones). In replicate 2, a single clone progressively dominates the culture over 10 weeks of growth. In our viral integration analyses (Supplementary Table 5), we detect three viral integration sites in the cells of this clone. We did not detect any differential expression of genes proximal to these integration sites. Similarly, analysis of gene expression enrichment in 12 dominant clones across two biological replicates does not reveal any common signature of these clones to explain their rapid expansion (data not shown). This suggests that the clonal growth we observe is a normal part of the iEP reprogramming process, in which the cells enter a progenitor-like state. Even so, these analyses do not exclude the acquisition of genetic and epigenetic changes endowing these expanding clones with increased fitness. e, Correlation of principal component (PC) scores in clonally related cells (clone 2315, n = 58 cells) relative to a random sampling of cells. Correlation between PC scores was used as a proxy for transcriptional similarity between cells. Clonally related cells were much more closely correlated, relative to randomly selected cells. f, Quantification of correlation analysis for all time course 2 clones consisting of 10 cells or more, for CellTagMEF (n = 78 clones, 3,963 cells) and CellTagD3-labelled clones (n = 109 clones, 6,265 cells). Mean correlation scores for clonally related cells are significantly higher than random cell groupings (P < 0.001, t-test, one-sided). We tagged cells both before and after the 72-h reprogramming window, expecting substantial heterogeneity to be introduced by serial viral transduction. On the contrary, there is only a slight but insignificant increase in PC score correlation between CellTagMEF and CellTagD3-labelled, clonally related cells.

  8. Extended Data Fig. 8 Reconstruction and visualization of lineages via force-directed graph drawing.

    a, b, Force-directed graph of all clonally related cells and lineages reconstructed from time course 1 (1,031 clones, 12,932 cells) (a) and time course 2 (1,120 clones, 14,088 cells) (b). All lineages and clone distributions can be interactively explored via our companion website, CellTag Viz ( c, In this tree, we follow CellTagMEF clone 487 from time course 1 and its descendants. Each node represents an individual cell, and edges represent clonal relationships between cells. Purple, CellTagMEF clones; blue, CellTagD3 clones; yellow, CellTagD13 clones. In the lineage highlighted in red, we follow the CellTagMEF clone (n = 678 cells), branching into two CellTagD3 lineages (clone 204 (n = 363 cells) and clone 240 (n = 260 cells)). d, Contour plots, representing cell density of each clone, projected onto the t-SNE plot, for the lineage shown in c. Top left, cells belonging to clone 487 (CellTagMEF). Clones 204 and 240 (CellTagD3) descend from this first clone, exhibiting a high degree of overlap within 2D space, on the t-SNE plot. An unrelated CellTagD3 clone, 329 (n = 38 cells), does not overlap with this lineage, demonstrating the high degree of similarity between cells belonging to the same lineage.

  9. Extended Data Fig. 9 Mapping reprogramming trajectories and timing of cell fate decisions.

    a, Projection of all clones (yellow, n = 2,151 clones, 27,020 cells) across reprogramming time courses 1 and 2 (n = 85,010 cells). A subset of clusters with the highest density of detected clones, outlined in red (clusters 0, 1, 2, 4, 8, and 12), were extracted from this larger dataset and re-clustered to generate a higher-resolution t-SNE plot, focusing on reprogramming days 6 to 28 (n = 48,515 cells). b, Left, original cluster identities of all cells (n = 85,010 cells). Right, subset of 48,515 cells, coloured by original cluster identity. c, Contour plots of iEP-depleted clone distribution (top panels, (n = 7 clones, 1,037 cells)) and iEP-enriched clone distribution (bottom panels, (n = 7 clones, 2,270 cells)) broken down by reprogramming day, and across days 9–28 (far right). These specific clones were selected from the larger iEP-depleted and iEP-enriched groups, as they included cells distributed across all time points, enabling their trajectories to be defined. In these distributions, clusters 8, 4 and 3 are iEP-depleted, thus representing the dead-end trajectory. Conversely, clusters 2, 6 and 1 are iEP-enriched, representing the reprogramming trajectory. These trajectories divide cluster 0 into two halves, but re-clustering does not increase resolution (data not shown). Deeper sequencing of a larger number of cells may provide further insights into this cluster in future studies. d, Monocle2 psuedotemporal ordering of cells in the subset of cells (n = 48,515 cells), coloured by day of reprogramming (left panel), Seurat cluster ID (middle panel) and Apoa1 expression (right panel). Monocle2 uses dimension reduction to represent each single cell in 2D space and effectively ‘connects the dots’ to construct a reprogramming trajectory. In this analysis, we performed semi-supervised ordering using Col1a2 (marking fibroblast identity) expression as a start point and Apoa1 expression (marking iEP identity) as an endpoint. The branched trajectory generated by Monocle2 is in general agreement with our clonal analyses. e, Restriction of CellTagD13 clones (time course 1, n = 79 clones, 240 cells; time course 2, n = 30 clones, 148 cells) to either the reprogrammed cluster (cluster 1), or the dead-end cluster (cluster 3) at day 28. Of the clones from these two biological replicates, 88 ± 8% exhibit adherence to one of these trajectories by day 13 of reprogramming. f, We identified lineages in which multiple CellTagD3-labelled clones share a common CellTagD0-labelled ancestor. The proportion of each clone on the reprograming trajectory (defined as occupancy of clusters 2, 6 and 1 on the t-SNE plot of the subset of clusters), and proportion of each clone on the dead-end trajectory (defined as occupancy of clusters 8, 4 and 3) was calculated. We then plotted the proportion of each CellTagMEF-labelled clone on the reprogramming trajectory against that of its CellTagD3-labelled descendants (r = 0.71, Pearson’s correlation, n = 13 lineages, 57 clones, 6,035 cells).

  10. Extended Data Fig. 10 Mettl7a1 expression is upregulated on the reprogramming trajectory, and promotes iEP generation.

    a, Violin plots of significantly different gene expression between reprogramming and dead-end trajectories (n = 2,074 cells). b, Projection of gene expression onto the t-SNE plot (n = 48,515 cells). Wnt4 and Spint2 expression is significantly upregulated along the reprogramming trajectory (P < 0.001, permutation test, one-sided, n = 1,037 cells). Dlk1 and Peg3 expression is significantly upregulated along the dead-end trajectory (P < 0.001, permutation test, one-sided, n = 1,037 cells). Expression of the Foxa1-Hnf4a transgene is significantly downregulated along the dead-end trajectory (P < 0.001, permutation test, one-sided, n = 1,037 cells). c, Mean numbers of genes and transcripts per cell following 10x Genomics-based scRNA-seq analysis: Foxa1-Hnf4a reprogrammed cells (n = 6,559 cells) and Foxa1-Hnf4a-Mettl7a1 reprogrammed cells (n = 10,161 cells), collected 14 days after initiation of reprogramming. For subsequent analyses, the Foxa1-Hnf4a-Mettl7a1 experimental group was randomly downsampled for direct comparison to the Foxa1-Hnf4a experimental group (n = 6,559 cells for both groups). d, The Foxa1-Hnf4a and Foxa1-Hnf4a-Mettl7a1 scRNA-seq datasets were merged with cells from time course 2, using canonical correlation analysis19, to help place these two experimental groups on the previously defined trajectories. Expression levels of Apoa1 are projected onto this t-SNE plot. e, Confirmation of Mettl7a1 expression by qRT–PCR, following transduction of cells with Foxa1-Hnf4a-GFP versus Foxa1-Hnf4a-Mettl7a1 retroviruses (**P = 5.3 × 10−3, t-test, one-sided). f, Violin plot of mean Apoa1 expression in cells reprogrammed with Foxa1-Hnf4a and Foxa1-Hnf4a-Mettl7a1. Addition of Mettl7a1 to the reprogramming cocktail results in a significant increase in Apoa1 expression, supporting observations that this factor increases the yield of fully reprogrammed cells (P < 0.001, permutation test, one-sided). g, Plot of identity scores of Foxa1-Hnf4a (purple) and Foxa1-Hnf4a-Mettl7a1 (green) reprogrammed cells. Cells are ordered according to an increase in iEP identity. Red dashed line indicates a cut-off of 0.75; above this score cells are considered as iEPs. Threefold-more Foxa1-Hnf4a-Mettl7a1 cells classify as iEPs, relative to Foxa1-Hnf4a cells, represented as a significant increase in iEP score (P < 0.001, permutation test, one-sided). h, Box plot of mean CellTag expression between Foxa1-Hnf4a (3 ± 0.05 CellTags per cell) and Foxa1-Hnf4a-Mettl7a1 (2.5 ± 0.04 CellTags per cell) experimental groups. The box plots show the median, first and third quantile, and error bar with outliers. i, Box plot of cells per clone for Foxa1-Hnf4a and Foxa1-Hnf4a-Mettl7a1 experimental groups, following data processing via our CellTag demultiplexing and clone calling pipeline. Clone size does not significantly differ between these two groups: Foxa1-Hnf4a, 6.0 ± 0.4 cells per clone (n = 99 clones, 595 cells); Foxa1-Hnf4a-Mettl7a1: 6.30 ± 0.65 cells per clone (n = 43 clones, 277 cells), demonstrating that the addition of Mettl7a1 enhances iEP yield by increasing the number of unique reprogramming events. For comparison, average clone size at ~day 14 for time course replicates 1 and 2 is ~8 cells per clone.

Supplementary information

  1. Reporting Summary

  2. Supplementary Table 1

    This file contains details of each scRNA-seq experiment included in this manuscript, including number of cells passing quality control, mean number of genes, UMIs, and confidently mapped reads per cell. Numbers of CellTagged cells passing filtering are also included.

  3. Supplementary Table 2

    This file contains details on differential gene expression analysis identifying signatures of cell clusters identified across the timecourses 1 and 2 (n=85,010 cells). Genes significantly enriched over a log-fold change of 0.25 are shown (likelihood-ratio test for single cell gene expression).

  4. Supplementary Table 3

    This file contains Gene Ontology (GO) terms associated with each cluster from Fig.1e and Extended Data Fig. 5. During reprogramming, GO terms associated with cell signaling, developmental processes, and cell adhesion are significantly enriched (Fisher's exact test). The Panther Classification System ( was used to identify GO terms, based on cluster-enriched gene expression (Table 2).

  5. Supplementary Table 4

    This file contains metadata for each cell passing quality control, including number of genes, UMIs, and percent of mitochondrial RNA detected per cell (n=85,010 cells). Information on cell cycle phase, timecourse, day of collection is also included. Clonal identity information for CellTagMEF, CellTagD3, CellTagD13, is also contained within this table.

  6. Supplementary Table 5

    This file contains CellTag integration sites detected in clone 1, timecourse 4. Peaks with >10-fold enrichment are shown. Identities of genes and locations of integration sites are shown, along with fold-change expression in clone 1. We did not detect any differential expression of genes proximal to these integration sites.

  7. Supplementary Table 6

    This file contains lineages where multiple CellTagD3-labelled clones share a common CellTagD0-labelled ancestor. The proportions of each clone on reprograming and dead-end trajectories are shown. Lower: CellTagMEF-labelled cells split and reprogrammed in two biological replicates. Of 84 clone pairs appearing in both replicates, only 4 pairs (4.8%) both reprogrammed and not at the same rate (n=4 pairs, 1,862 cells).

  8. Supplementary Table 7

    This file contains differential gene expression analysis for reprogramming and dead-end trajectories across all timepoints and in cells collected at reprogramming day 21 and 9 (n=14 clones, 2,074 cells). Significantly differentially expressed genes with a log-transformed fold-change of >0.5 are shown (likelihood-ratio test for single cell gene expression).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading

Fig. 1: CellTagging: clonal tracking applied to reprogramming.
Fig. 2: Tracking clonal dynamics of reprogramming and constructing lineage trees.
Fig. 3: Mapping reprogramming trajectories and timing of cell fate commitment.
Fig. 4: Molecular hallmarks of reprogramming trajectories.
Extended Data Fig. 1: CellTag processing and species-mixing validations.
Extended Data Fig. 2: CellTagging does not perturb cell physiology or reprogramming efficiency.
Extended Data Fig. 3: scRNA-seq metrics and quality control of cell clustering.
Extended Data Fig. 4: CellTag expression metrics.
Extended Data Fig. 5: Assignment of cluster identities based on mRNA and protein expression.
Extended Data Fig. 6: Combinatorial CellTag labelling to identify clonally related cells.
Extended Data Fig. 7: Visualizing growth of clones and gene expression correlation within clones.
Extended Data Fig. 8: Reconstruction and visualization of lineages via force-directed graph drawing.
Extended Data Fig. 9: Mapping reprogramming trajectories and timing of cell fate decisions.
Extended Data Fig. 10: Mettl7a1 expression is upregulated on the reprogramming trajectory, and promotes iEP generation.


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.