Synthetic recording and in situ readout of lineage information in single cells

Abstract

Reconstructing the lineage relationships and dynamic event histories of individual cells within their native spatial context is a long-standing challenge in biology. Many biological processes of interest occur in optically opaque or physically inaccessible contexts, necessitating approaches other than direct imaging. Here we describe a synthetic system that enables cells to record lineage information and event histories in the genome in a format that can be subsequently read out of single cells in situ. This system, termed memory by engineered mutagenesis with optical in situ readout (MEMOIR), is based on a set of barcoded recording elements termed scratchpads. The state of a given scratchpad can be irreversibly altered by CRISPR/Cas9-based targeted mutagenesis, and later read out in single cells through multiplexed single-molecule RNA fluorescence hybridization (smFISH). Using MEMOIR as a proof of principle, we engineered mouse embryonic stem cells to contain multiple scratchpads and other recording components. In these cells, scratchpads were altered in a progressive and stochastic fashion as the cells proliferated. Analysis of the final states of scratchpads in single cells in situ enabled reconstruction of lineage information from cell colonies. Combining analysis of endogenous gene expression with lineage reconstruction in the same cells further allowed inference of the dynamic rates at which embryonic stem cells switch between two gene expression states. Finally, using simulations, we show how parallel MEMOIR systems operating in the same cell could enable recording and readout of dynamic cellular event histories. MEMOIR thus provides a versatile platform for information recording and in situ, single-cell readout across diverse biological systems.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: The MEMOIR system for recording and in situ readout of cell lineage.
Figure 2: In situ readout of scratchpad state.
Figure 3: MEMOIR enables lineage reconstruction in ES cell colonies.
Figure 4: MEMOIR enables inference of gene expression dynamics and the recording of cellular events.

References

  1. 1

    Frumkin, D., Wasserstrom, A., Kaplan, S., Feige, U. & Shapiro, E. Genomic variability within an organism exposes its cell lineage tree. PLOS Comput. Biol. 1, e50 (2005)

    ADS  Article  Google Scholar 

  2. 2

    Salipante, S. J. & Horwitz, M. S. Phylogenetic fate mapping. Proc. Natl Acad. Sci. USA 103, 5448–5453 (2006)

    ADS  CAS  Article  Google Scholar 

  3. 3

    Behjati, S. et al. Genome sequencing of normal cells reveals developmental lineages and mutational processes. Nature 513, 422–425 (2014)

    ADS  CAS  Article  Google Scholar 

  4. 4

    Wasserstrom, A. et al. Reconstruction of cell lineage trees in mice. PLoS One 3, e1939 (2008)

    ADS  Article  Google Scholar 

  5. 5

    Lodato, M. A. et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015)

    ADS  CAS  Article  Google Scholar 

  6. 6

    Evrony, G. D. et al. Cell lineage analysis in human brain using endogenous retroelements. Neuron 85, 49–59 (2015)

    CAS  Article  Google Scholar 

  7. 7

    Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013)

    ADS  CAS  Article  Google Scholar 

  8. 8

    Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013)

    ADS  CAS  Article  Google Scholar 

  9. 9

    Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816–821 (2012)

    ADS  CAS  Article  Google Scholar 

  10. 10

    Lubeck, E., Coskun, A. F., Zhiyentayev, T., Ahmad, M. & Cai, L. Single-cell in situ RNA profiling by sequential hybridization. Nat. Methods 11, 360–361 (2014)

    CAS  Article  Google Scholar 

  11. 11

    Shah, S., Lubeck, E., Zhou, W. & Cai, L. In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron 92, 342–357 (2016)

    CAS  Article  Google Scholar 

  12. 12

    Larson, D. R., Zenklusen, D., Wu, B., Chao, J. A. & Singer, R. H. Real-time observation of transcription initiation and elongation on an endogenous yeast gene. Science 332, 475–478 (2011)

    ADS  CAS  Article  Google Scholar 

  13. 13

    Ding, S. et al. Efficient transposition of the piggyBac (PB) transposon in mammalian cells and mice. Cell 122, 473–483 (2005)

    CAS  Article  Google Scholar 

  14. 14

    Banaszynski, L. A., Chen, L.-C., Maynard-Smith, L. A., Ooi, A. G. L. & Wandless, T. J. A rapid, reversible, and tunable method to regulate protein function in living cells using synthetic small molecules. Cell 126, 995–1004 (2006)

    CAS  Article  Google Scholar 

  15. 15

    Molenaar, M. et al. XTcf-3 transcription factor mediates beta-catenin-induced axis formation in Xenopus embryos. Cell 86, 391–399 (1996)

    CAS  Article  Google Scholar 

  16. 16

    Raj, A., van den Bogaard, P., Rifkin, S. A., van Oudenaarden, A. & Tyagi, S. Imaging individual mRNA molecules using multiple singly labeled probes. Nat. Methods 5, 877–879 (2008)

    CAS  Article  Google Scholar 

  17. 17

    Lubeck, E. & Cai, L. Single-cell systems biology by super-resolution imaging and combinatorial labeling. Nat. Methods 9, 743–748 (2012)

    CAS  Article  Google Scholar 

  18. 18

    Chapal-Ilani, N. et al. Comparing algorithms that reconstruct cell lineage trees utilizing information on microsatellite mutations. PLOS Comput. Biol. 9, e1003297 (2013)

    Article  Google Scholar 

  19. 19

    Sokal, R. R. & Michener, C. D. A statistical method for evaluating systematic relationships. Univ. Kans. Sci. Bull. 28, 1409–1438 (1958)

    Google Scholar 

  20. 20

    McKenna, A. et al. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science 353, aaf7907 (2016)

    Article  Google Scholar 

  21. 21

    van den Berg, D. L. et al. Estrogen-related receptor beta interacts with Oct4 to positively regulate Nanog gene expression. Mol. Cell. Biol. 28, 5986–5995 (2008)

    CAS  Article  Google Scholar 

  22. 22

    Kumar, R. M. et al. Deconstructing transcriptional heterogeneity in pluripotent stem cells. Nature 516, 56–61 (2014)

    ADS  CAS  Article  Google Scholar 

  23. 23

    Singer, Z. S. et al. Dynamic heterogeneity and DNA methylation in embryonic stem cells. Mol. Cell 55, 319–331 (2014)

    CAS  Article  Google Scholar 

  24. 24

    Hormoz, S. et al. Inferring cell-state transition dynamics from lineage trees and endpoint single-cell measurements. Cell Syst. 3, 419–433 (2016)

    CAS  Article  Google Scholar 

  25. 25

    Hormoz, S., Desprat, N. & Shraiman, B. I. Inferring epigenetic dynamics from kin correlations. Proc. Natl Acad. Sci. USA 112, E2281–E2289 (2015)

    ADS  CAS  Article  Google Scholar 

  26. 26

    Bonnet, J., Subsoontorn, P. & Endy, D. Rewritable digital data storage in live cells via engineered control of recombination directionality. Proc. Natl Acad. Sci. USA 109, 8884–8889 (2012)

    ADS  CAS  Article  Google Scholar 

  27. 27

    Perli, S. D., Cui, C. H. & Lu, T. K. Continuous genetic recording with self-targeting CRISPR-Cas in human cells. Science 353, aag0511 (2016)

    Article  Google Scholar 

  28. 28

    Shipman, S. L., Nivala, J., Macklis, J. D. & Church, G. M. Molecular recordings by directed CRISPR spacer acquisition. Science 353, aaf1175 (2016)

    Article  Google Scholar 

  29. 29

    Hsiao, V., Hori, Y., Rothemund, P. W. K. & Murray, R. M. A population-based temporal logic gate for timing and recording chemical events. Mol. Syst. Biol. 12, 869 (2016)

    Article  Google Scholar 

  30. 30

    Farzadfard, F. & Lu, T. K. Genomically encoded analog memory with precise in vivo DNA writing in living cell populations. Science 346, 1256272 (2014)

    Article  Google Scholar 

  31. 31

    Gao, Y. & Zhao, Y. Self-processing of ribozyme-flanked RNAs into guide RNAs in vitro and in vivo for CRISPR-mediated genome editing. J. Integr. Plant Biol. 56, 343–349 (2014)

    CAS  Article  Google Scholar 

  32. 32

    Nissim, L., Perli, S. D., Fridkin, A., Perez-Pinera, P. & Lu, T. K. Multiplexed and programmable regulation of gene networks with an integrated RNA and CRISPR/Cas toolkit in human cells. Mol. Cell 54, 698–710 (2014)

    CAS  Article  Google Scholar 

  33. 33

    Ventura, A. et al. Cre-lox-regulated conditional RNA interference from transgenes. Proc. Natl Acad. Sci. USA 101, 10380–10385 (2004)

    ADS  CAS  Article  Google Scholar 

  34. 34

    Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)

    CAS  PubMed  Google Scholar 

  35. 35

    Camin, J. H. & Sokal, R. R. A method for deducing branching sequences in phylogeny. Evolution 19, 311–326 (1965)

    Article  Google Scholar 

  36. 36

    Chung, J. H., Whiteley, M. & Felsenfeld, G. A 5′ element of the chicken beta-globin domain serves as an insulator in human erythroid cells and protects against position effect in Drosophila. Cell 74, 505–514 (1993)

    CAS  Article  Google Scholar 

  37. 37

    Sharova, L. V. et al. Database for mRNA half-life of 19 977 genes obtained by DNA microarray analysis of pluripotent and differentiating mouse embryonic stem cells. DNA Res. 16, 45–58 (2009)

    CAS  Article  Google Scholar 

  38. 38

    Friedel, C. C., Dölken, L., Ruzsics, Z., Koszinowski, U. H. & Zimmer, R. Conserved principles of mammalian transcriptional regulation revealed by RNA half-life. Nucleic Acids Res. 37, e115 (2009)

    Article  Google Scholar 

  39. 39

    Clark, M. B. et al. Genome-wide analysis of long noncoding RNA stability. Genome Res. 22, 885–898 (2012)

    CAS  Article  Google Scholar 

  40. 40

    Robinson, D. F. & Foulds, L. R. Comparison of phylogenetic trees. Math. Biosci. 53, 131–147 (1981)

    MathSciNet  Article  Google Scholar 

  41. 41

    Percharde, M. et al. Ncoa3 functions as an essential Esrrb coactivator to sustain embryonic stem cell self-renewal and reprogramming. Genes Dev. 26, 2286–2298 (2012)

    CAS  Article  Google Scholar 

  42. 42

    Uranishi, K., Akagi, T., Sun, C., Koide, H. & Yokota, T. Dax1 associates with Esrrb and regulates its function in embryonic stem cells. Mol. Cell. Biol. 33, 2056–2066 (2013)

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We thank M. Budd and H. Li for helpful suggestions. We thank R. Kishony, and members of the Elowitz and Cai laboratories for discussions and comments on the manuscript. This research was supported by the Allen Distinguished Investigator Program, through The Paul G. Allen Frontiers Group, NIH R01HD075605 and K99GM118910 (to S.H.), the Gordon and Betty Moore Foundation Grant GBMF2809 to the Caltech Programmable Molecular Technology Initiative, and the Beckman Institute pilot program.

Author information

Affiliations

Authors

Contributions

K.L.F. and J.M.L. performed the experiments with assistance from S.H., J.C., K.K.C. and Z.S.S.; K.L.F. and S.H. analysed the data; S.H. performed the simulations; M.B.E. and L.C. supervised the project. All authors wrote the manuscript.

Corresponding authors

Correspondence to Michael B. Elowitz or Long Cai.

Ethics declarations

Competing interests

The biotechnology associated with MEMOIR is the subject of a patent application (14/650,133). The authors declare no other competing financial interests.

Additional information

Reviewer Information

Nature thanks A. Raj and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data figures and tables

Extended Data Figure 1 MEM-01 consistently expresses short-lived transcripts from multiple integrated barcoded scratchpads.

a, The barcoded scratchpad transposon is composed of the following elements (left to right): the PiggyBac 5′ terminal repeat (triangle), the chicken HS4 insulator36, a PGK promoter driving expression of the hygromycin resistance coding sequence, a 5′ FRT site, the PP7 scratchpad array consisting of 10 repeats, a 3′ FRT site, a barcode sequence (Supplementary Table 1), a priming region for sequencing and PCR, the BGH polyA, and the PiggyBac 3′ terminal repeat (triangle). b, Unique genomic integrations for the MEM-01 cell line were detected by qPCR. Bars show mean ± s.d. of four biological repeats with individual data points marked. c, The relative RNA expression levels of barcode integrations were quantified by RT–qPCR. Bars show mean ± s.d. of three biological repeats with individual data points marked. d, Scratchpad expression profiles remain constant over 1.3 months of passaging. Low- and high-passage cultures of MEM-01 cells (light and dark bars, respectively) were assayed for RNA expression levels by RT–qPCR. The unchanged expression levels indicate that most barcoded scratchpads express at a consistent level and are not routinely silenced over time. Bars show values from single biological samples with error bars calculated by combining in quadrature the technical replicate variation in barcode and normalizer quantitation cycle, Cq, values. eg, RNA half-lives assessed by RT–qPCR analysis of transcript levels after blocking transcription with actinomycin D (10 μg ml−1). e, Barcoded scratchpad transcripts were assayed with two different sets of qPCR primers (left and right panels). These data indicate a half-life of approximately 2 h. f, g, Myc and Sdha are known to have short and long mRNA half-lives, respectively, and were assessed as controls, for comparison37,38,39. Myc half-life (f) of 1 h was shorter than the other measured half-lives, while Sdha (g) was longer lived. For Sdha, the measured half-life value (indicated with an asterisk) is expected to overestimate the true value, as Sdha levels were determined relative to those of the similarly long-lived gene Atp5e, whose transcript levels were also decaying over the time course. A previous estimate of Sdha half-life in mESCs was 8–13 h (ref. 37). All sample transcript levels were assessed relative to those of Atp5e37,38,39. Transcript abundances were normalized to 1 at time zero. Decay curves were fit assuming one-phase exponential decay using weighted nonlinear least squares regression (e, f) or assuming a linear approximation to exponential decay (g). Half-lives were determined on the basis of the best fit decay constants and a range reported based on the 95% confidence interval (shown in parentheses). Data represent two biological replicates with multiple technical replicates; error bars show standard deviations.

Extended Data Figure 2 Barcoded scratchpads collapse to truncated products in activated cells and are stable in full-length and collapsed forms.

a, Agarose gel electrophoresis of PCR amplified scratchpads reveals scratchpad collapse after gRNA induction. Full-length scratchpads were amplified from plasmid DNA (lane 1), as well as from cells without gRNA constructs (lane 3), or with uninduced gRNAs (lane 4). By contrast, cells expressing gRNA showed shorter products (lane 5). Cells with no scratchpads are also shown as a negative control (lane 2). Bands corresponding to the full-length scratchpad and the collapsed scratchpad are indicated (arrows). Note that the laddering effect seen in all lanes and gels is due in part to PCR amplification artefacts with the repetitive arrays. For gel source data, see Supplementary Fig. 1. b, The lowest molecular weight band from scratchpad collapse, as shown in lane 5 in a, was extracted and subcloned into a vector. Nine of the colonies were sequenced. They aligned to a single repeat unit with 5′ and 3′ flanking regions, suggesting complete collapse of the repeats owing to Cas9 activity. Six of the nine sequencing reads resulted in collapse to a perfect single repeat (with a possible point mutation in the scratchpad sequence associated with barcode 2), and the remaining three sequencing reads had additional small deletions in the scratchpad. c, Scratchpad collapse requires induction of both Cas9 and gRNA. The gel shows scratchpad states for MEM-01 cells treated with no ligand, with Shield1 (to stabilize Cas9 protein), with Wnt3a (to induce gRNA expression), and with both Wnt3a (100 ng ml−1) and Shield1 (100 nM), all after 48 h. d, Scratchpad collapse increased with increasing gRNA activation, as assessed using smFISH to detect scratchpad co-localization with four highly expressed barcodes. Cells were analysed either without gRNA activation or 48 h after gRNA activation by addition of Wnt3a and Shield1 (same concentrations as in c). gRNA expression was measured by the intensity of co-expressed nuclear mTurquoise signal. Box plots show median (red bar), first and third quartiles (box), and extrema of distributions; n = 1,826, 1,081, 345, 191 cells, left to right. Related to Fig. 2c. eg, Scratchpad states remain stable over extended periods. e, Unactivated MEM-01 cells maintained uncollapsed scratchpads over timescales of months. f, To check the stability of individual barcoded scratchpad variants over time, multiple subclones of MEM-01 were isolated after no activation (control; top panels) and after a pulse of activation for 24 h (Wnt3a 100 ng ml−1, Shield1 100 nM; bottom panels). Subclones were assessed for the states of different barcoded scratchpad types after initial isolation (0 month relative age, left) and after one month of maintenance (right). The apparent collapse states (from uncollapsed to fully collapsed) of the barcoded scratchpad types were distinct in different subclones and remained stable over a month, indicating that scratchpad states are stable over these timescales. g, Barcoded scratchpads are also stable over long periods as assessed by smFISH readout. The fraction per cell of barcode transcripts (from four distinct barcode types) that co-localized with scratchpad signal was essentially unchanged between an unactivated low passage cell culture and one maintained for over a month. The imperfect co-localization fraction is largely the result of errors in smFISH detection and not gradual scratchpad collapse. Boxplots as in d; n = 1,826, 983 cells, left to right.

Extended Data Figure 3 Scratchpad collapse works with an alternative gRNA, and in multiple cell types.

ad, A Cre-recombinase-activated gRNA is effective at inducing collapse events. a, Schematic of Cre-activated gRNA system. The construct contains a constitutive PGK promoter driving expression of a histone 2B (H2B)–mTurquoise fusion protein (the H2B provides nuclear localization). This is followed by a U6 TATA-lox promoter33 driving expression of an shRNA against mTurquoise, followed in turn by a polyT (T6) transcriptional stop, and then a gRNA directed against scratchpad regions. Prior to Cre expression, expression of the shRNA keeps mTurquoise levels low (brown dashed line) and prevents expression of the gRNA. After the introduction of Cre, the shRNA-stop cassette is removed, allowing mTurquoise and gRNA expression. Thus, mTurquoise provides a visual marker of gRNA expression. This type of gRNA architecture could allow MEMOIR activation in specific tissues expressing Cre. b, PCR analysis shows that Cre can induce scratchpad collapse. Gel shows genomic DNA from a clonal cell line harbouring the construct in a. Scratchpads appear uncollapsed in untransfected cells (left lane), but show significant collapse after transfection with mRNA encoding Cre protein (right lane, approximately 52 h after transfection). Note that the laddering effect seen in all lanes and gels is due in part to PCR amplification artefacts with the repetitive arrays. c, smFISH analysis reveals Cre-activated scratchpad collapse. Quantification of barcode–scratchpad co-localization fractions as measured by smFISH. Cre transfection reduced scratchpad and barcode co-localization levels in cells that showed evidence of Cre activity, as assessed by mTurquoise expression (right). Transfected cells that were mTurquoise-negative or low and untransfected cells retained high co-localization levels (middle and left). Co-localization levels per cell were assessed based on the co-localization of four expressed barcodes with scratchpad transcripts. Box plots show median (red bar), first and third quartiles (box), and extrema of distributions; n = 995, 643, 649 cells, left to right. d, Example smFISH images of scratchpad and barcode co-localization detected in single cells containing the Cre-activated gRNA. Some activated cells (top panels, mTurquoise expression ‘on’) show loss of co-localized signal for a specific barcode (top panels, lower cell). Unactivated cells, as assessed by low mTurquoise expression, typically show no loss of co-localization (bottom panels). Scale bars, 10 μm. e, f, Scratchpads in CHO-K1 cells and yeast also undergo Cas9/gRNA-dependent collapse. e, Cas9- and gRNA-expressing plasmids were transiently transfected into Chinese Hamster Ovary (CHO-K1) cells containing stably integrated scratchpads. Gel analysis reveals Cas9 and gRNA-dependent scratchpad collapse (middle lane), while transfection with a Cas9-expressing plasmid alone or control plasmids resulted in no collapse (left and right lanes, respectively). f, Scratchpad collapse was tested in a yeast strain with doxycycline-inducible Cas9 and gRNA and integrated scratchpads. Before inducing Cas9-gRNA expression (lane 1 and 3), the scratchpads were intact. After Cas9-gRNA induction with 2 μg ml−1 doxycycline for 11 h, scratchpads appeared collapsed (lane 2 and 4). Left two lanes (lanes 1 and 2) and right two lanes (lanes 3 and 4) correspond to two biological replicates. Note that the scratchpads in CHO-K1 and yeast cells have a similar scratchpad PP7 array to that used elsewhere but different flanking sequences, so their absolute PCR product lengths differ. For gel source data, see Supplementary Fig. 1.

Extended Data Figure 4 Examples of lineage reconstruction for ten colonies.

Data for ten colonies that reconstructed with >70% of pairwise relationships correctly identified are shown here. The bubble chart shows the number of barcode transcripts detected (bubble size) and the uncollapsed fraction (colour scale). Matrix of cell-to-cell barcode distance (dissimilarity) scores were computed from the data. Low (blue) values indicate more similar barcoded scratchpad collapse patterns. Note that sisters and cousins tend to have lower distance scores than second cousins, creating a block diagonal pattern in the distance matrix. Lineage trees were reconstructed based on the distance matrix using an agglomerative hierarchical clustering algorithm (see Methods). Cluster distances from the reconstruction algorithm are shown as branch heights in the reconstructed linkage trees. Percentages on the linkage trees represent frequencies of clade occurrence from a barcode resampling bootstrap. The percentage of correct relationships identified by the depicted lineage reconstruction is shown as a percentage and the actual tree is reported as [(x y)(x y)][(x y)(x y)], where sister pairs are denoted as (x y) and cousins are grouped in brackets ([...]).

Extended Data Figure 5 Analysis of reconstruction failure modes.

These ten colonies showed reconstruction accuracies similar to those of random data. Bubble charts, distance matrices and linkage trees are shown as in Extended Data Fig. 4. Note the relative lack of block diagonal structures in the distance matrices, which typically reflect evidence of close sister or cousin relationships and less similar second cousins in better reconstructed colonies. Poor reconstructions result from insufficiently informative or inconsistent collapse patterns. These can occur in several ways. First, colonies may have too many collapsed scratchpads (for example, row 2, column 2), leading to degeneracy, and eliminating differences between clades. Second, and more often, colonies have too few collapsed scratchpads (for example, row 3, column 2) to reconstruct the full tree accurately. Third, colonies can provide inconsistent or incomplete lineage information such that the data do not point to one consistent lineage hypothesis (for example, row 5, column 1). Inconsistent information can arise from convergent collapse events in which the same scratchpad randomly collapses in separate branches of the lineage—such noise is inherent to this method of lineage tracking but can be significantly reduced by increasing the number of barcoded scratchpads. Additionally, variability in scratchpad expression, resulting from stochastic expression of individual barcoded scratchpads as well as apparent inconsistencies due to expression of multiple incorporations of the same barcoded scratchpad can generate conflicting information. Despite these issues, colonies can in many cases provide information about some lineage relationships. For example, for the colony in row 5, column 1, all the sister pairs are correctly identified, but they are not definitively placed in the lineage tree owing to conflicting readouts at the cousin level (for example, collapse events in barcodes 9 and 14). Similarly, for the colony in row 5, column 2, cells 3 and 4 are readily identified as sisters because of a common collapse event in barcode 9. But, there is little additional information, such as a collapse event from the two-cell-stage, which would allow the cousins to be correctly identified. These and other sources of noise impacting colony reconstruction are analysed in more detail in Extended Data Fig. 7 and Supplementary Information, and can be addressed in future implementations of MEMOIR.

Extended Data Figure 6 Bootstrap reconstruction score enriches for colonies that exhibit more accurate lineage reconstruction.

a, A bootstrap procedure (Methods) was used to determine the robustness of clade reconstruction to resampling of barcode data for each colony. The frequency of lineage reconstruction at the first cousin clade level was then used to rank all 108 colonies. Colonies with higher reconstruction robustness were enriched for more accurate lineage reconstructions, although no information about accuracy was used to identify these colonies. The top 20% of colonies based on bootstrap score were termed subset 1 (left of blue line; n = 22). This group correctly identified an average of 72% of relationships. The top 40% of colonies were termed subset 2 (left of green line; n = 43) and correctly identified 67% of relationships. Grey region indicates the range of correct relationships expected from random guessing of trees (mean ± s.d. indicated by line and shading). The bootstrap metric effectively filters out colonies that have insufficient or inconsistent scratchpad collapse information and thus do not robustly generate the same reconstruction. Noise sources that affect the data include convergent scratchpad collapse, imperfect collapse rates that may not result in collapse events every generation, and variable scratchpad expression that limits readout signal or introduces ambiguities due to expression from multiple incorporations of the same barcode type (see Extended Data Fig. 7 and Supplementary Information). b, Cumulative distributions show the fraction of pairwise sister, first cousin, and second cousin relationships correctly identified in each colony. Reconstruction accuracies of all these types of lineage relationships are similar to predictions based on the simulated model with eight scratchpads (no noise included). This shows that reconstruction is accurate across all levels of relationships. Related to Fig. 3g.

Extended Data Figure 7 Comprehensive error analysis identifies scratchpad expression variability as the key source of noise in MEMOIR experiments.

a, Overall reconstruction errors result from three types of noise: the inherent stochastic nature of recording lineage information with stochastic scratchpad collapse events, recording noise (due to fluctuations in the expression levels of Cas9 and gRNA), and readout noise (due to fluctuations in the expression levels of the barcoded scratchpads, variable expression from multiple integrations of the same barcoded scratchpad species (BC), and the fidelity of smFISH imaging readout). b, Cell–cell variability can be decomposed into intra-colony and inter-colony components, as shown schematically. For each hypothetical colony, the relative amounts of each type of variability are plotted (also schematic). c, Plots show experimentally measured intra- and inter-colony noise from gRNA activity (from the fluorescent signal of the Wnt reporter, left), Cas9 expression (from the transcript counts by smFISH, middle), and scratchpad expression (from transcript counts by smFISH, right). These plots represent data from individual cells of all 108 MEM-01 colonies (see Supplementary Information for details). d, Recording noise results in a small decrease in reconstruction accuracy. The plot on the left shows the cumulative distribution of reconstruction accuracies of 500 simulated colonies comprised of trees of three generations, with an average scratchpad collapse rate of 0.1, and 13 scratchpads. The heat map on the right shows the average reconstruction accuracy for 500 simulated colonies for a range of average collapse rates and number of scratchpads. e, Fluctuations in scratchpad (SP) expression levels substantially reduce reconstruction accuracy. Simulation results are plotted as in d, but with the addition of readout noise, rather than recording noise, to the idealized simulations. The readout noise is added as two separate components: scratchpad expression level fluctuations, which significantly increase error, and noise due to smFISH imaging fidelity, which contributes minimally to reconstruction error. The curves are for two integration sites per barcode. f, Cumulative distribution of reconstruction accuracy of 500 simulated colonies with all three components of noise included for different numbers of integration sites per barcode. The thick blue line is the experimental distribution obtained from the 108 MEM-01 colonies. The simulated distribution is consistent with the experimentally observed distribution, especially for two effective integrations per barcode. No fitting parameters were used.

Extended Data Figure 8 Performance analysis on deeper trees and trees with missing cells.

a, Simulations of reconstruction accuracy of full binary trees for varying numbers of unique barcoded scratchpads, varying collapse rates, and varying numbers of generations (N). The colour of the heat maps corresponds to the fraction of all pairwise lineage relationships correctly identified in the reconstructed tree, averaged over many simulated trees (Fig. 3h in the main text, also see Methods). Even at greater depth (for example, N = 10), trees can be reconstructed accurately with approximately 50 scratchpads. b, The collapse rate that maximizes reconstruction accuracy depends on the number of generations to be tracked, but is only weakly dependent on the number of scratchpads. This is because maximal lineage information is recorded when each scratchpad has a probability of 0.5 of having collapsed by the final time point, regardless of the total number of scratchpads. The plot shows the optimal collapse rate as a function of tree depth, as determined from the simulations (dots) as well as the theoretical expectation of a cumulative collapse probability of 0.5 per scratchpad (dashed line). The theory curve contains no fitting parameters. c, Simulations of reconstruction accuracy for binary trees of three generations as a function of the number of scratchpads and the scratchpad collapse rate for trees with one (left), two (middle), or three (right) randomly chosen endpoint cells missing. Compare with reconstruction accuracy for trees with no missing cells in Fig. 3h. The schematic above each panel shows the topology and branch lengths of trees with the given number of missing cells. A modified neighbour joining algorithm34 was used to exhaustively score all 315 possible reconstructions. To distinguish between reconstructions where tree topology is the same but the branch lengths are different (two such trees are shown bracketed in the schematic of the middle panel), we modified the reconstruction algorithm to estimate the branch lengths connecting a pair of cells based on the hamming distance of their barcoded scratchpad collapse patterns (see Supplementary Information). For example, two cells whose collapse patterns differ substantially would be estimated to have a longer lineage distance between them than would cells with more similar patterns. In general, trees with missing leaves can be reconstructed with accuracy similar to full binary tree (Fig. 3h). As the number of missing cells increases, the reconstruction accuracy decreases because there are fewer cells in the tree to provide lineage information.

Extended Data Figure 9 Simulations show that MEMOIR can operate at low collapse rates to reconstruct sparse trees.

We simulated MEMOIR in the sparse recording regime, in which collapse events for any given lineage occur, on average, once every few generations. Trees were generated using simulations and reconstructed using a maximum parsimony approach (see Supplementary Information). Experimentally, sparse tree regimes in which collapse events occur infrequently could be achieved with low Cas9 and/or gRNA expression levels or rare expression events (for example, by using weak promoters, occasionally-activated promoters, protein degradation domains), or with decreased Cas9-mediated affinity for target scratchpads (for example, by decreasing the complementarity between the gRNA and target). a, Cartoon of sparse collapse events on a full binary tree. Each collapse changes the state of each scratchpad (arrays of red or black boxes, shown only at nodes where new collapse events occur). At the final generation, there are five populations of cells with distinct collapse patterns, each shown in a different colour. In the sparse representation of the tree (right) each collapse event corresponds to a new branch, and the five leaves correspond to the five subpopulations of cells with distinct collapse patterns. b, Possible source of reconstruction errors. Unrelated clades can converge independently to the same collapse pattern and thus become indistinguishable, resulting in reconstruction errors (tree on the left), but the probability of such coincidences decreases with increasing number of scratchpads (all clades are distinguishable for the tree on the right). c, A simulated sparse tree with 30 leaves and an average depth of 2.4 ± 1.3. The depth of the tree is defined as the cumulative number of collapse events experienced by each leaf averaged over all the leaves of the tree. The statistics of this tree shape is approximately equivalent to a sparse tree generated by a collapse rate of 0.33 per cell per generation on a full tree of six generations. The heat map shows the status of the scratchpad sites for all the leaves. Each column corresponds to a particular barcoded scratchpad, and each row to a leaf. d, Same as in c, but for a simulated sparse tree with 100 leaves and a depth of 3.1 ± 1.6; approximately equivalent to a collapse rate of 0.275 per cell per generation on a full tree of eight generations. e, The fraction of correctly identified tree partitions (defined using the Robinson–Foulds metric40) is shown as a function of the number of scratchpads, and normalized by its value in the limit of an infinite number of distinct scratchpads (where a unique collapse pattern is generated for every collapse event). Sparse trees of three different sizes (that is, different numbers of leaves and depth) were generated. Each dot corresponds to one simulated tree. Tree size was held constant as the number of scratchpads was increased, requiring a fixed collapse rate per cell but a collapse rate per scratchpad that scaled inversely with scratchpad number. Trees with fewer leaves and lower depth required fewer scratchpads for accurate reconstruction. But, even larger trees could recover close to the maximal lineage information using only a modest number of scratchpads.

Extended Data Figure 10 The Esrrb expression level distribution is stationary.

a, Distribution of the number of Esrrb transcripts in individual cells in populations of MEM-01 ES cells activated by the addition of Wnt3a and Shield1 (same conditions as the colonies analysed in Figs. 3 and 4) for different amounts of time (0, 24, and 48 h from top to bottom). The distribution of Esrrb transcript counts does not change significantly over 48 h of Wnt3a exposure as quantified by the P value of the Kolmogorov–Smirnov (KS) test. The Kolmogorov–Smirnov test was performed for the observed distributions at 24 and 48 h with respect to the reference distribution at 0 h. The cumulative distribution functions (bottom) similarly show that the fraction of cells in the low (or high) Esrrb expression state does not change significantly over 48 h of Wnt3a activation. A stationary Esrrb distribution implies that transitions between the low and high Esrrb expression states must be reversible. b, LIF removal changes the Esrrb distribution. Same as in panel a but with LIF removed from the media at t = 0. The distributions show a significant change during the 48 h period, with the fraction of cells in the low Esrrb expression state increasing over time, as expected41,42.

Supplementary information

Supplementary Information

This file contains Supplementary Methods, describing the distance matrix computation, noise source analysis, sparsely sampled tree simulations, and switching rates inference. It also contains Supplementary Figure 1, the uncropped gels. (PDF 1024 kb)

Supplementary Table 1

A table of sequences for 28 scratchpad barcodes. (XLSX 25 kb)

Supplementary Table 2

A table of oligo sequences for probes used in smFISH and targeting either the PP7 array or one of 28 scratchpad barcodes. (XLSX 49 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Frieda, K., Linton, J., Hormoz, S. et al. Synthetic recording and in situ readout of lineage information in single cells. Nature 541, 107–111 (2017). https://doi.org/10.1038/nature20777

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.