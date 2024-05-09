Chromatin accessibility enables single-cell age estimation

Although the molecular mechanism that generates age-associated DNAm changes is unclear, it is possible that the methylation state of ClockDML might be affected by chromatin accessibility. Alternatively, the methylation state of ClockDML might reversely regulate chromatin accessibility. In either case, the chromatin accessibility on ClockDML could be used to deduce cell age (Fig. 1a). However, the dynamics of chromatin accessibility on ClockDML during aging are currently unknown.

Fig. 1: ChrAcc change associated with irreversible DNAm drift on ClockDML enables cell age estimation. a, Schematic diagram of the underlying epigenetic mechanism of cell mitotic age tracing using ChrAcc on ClockDML. b, Correlation between the DNAm level on G8-group ClockDML and sample age in human PBMCs. 95% CI is shown as a gray area around the linear regression line. R = −0.978, P < 2.2 × 10−16. c, Enrichment of mitosis-associated ClockDML (Mitosis, size = 1,934 bp), actual age-associated ClockDML (Chronology, size = 58,7801 bp) and solo-WCGW loci (size = 5 Mbp) in each class of ATAC peaks (size = 281 Mbp (TCGA Pan-cancer); 164 Mbp (bladder cancer); 462 Mbp (normal cell); 218 Mbp (placenta); and 295 Mbp (hematopoietic cell)). Two-sided Fisher’s exact test was performed for the expected against observed overlapped region size. The points show the resulting odds ratio of observed size over expected size, and the 95% CI is shown as whiskers. d, Overview of the EpiTrace algorithm. e, UMAP of the human early embryonic development scATAC dataset. Color indicates the developmental stage of human embryo. f, The total chromatin accessibility on ClockDML (ClockAcc), HMM-smoothened ClockAcc and initial and iterative EpiTrace ranking result (EpiTrace age) corresponding to the embryonic dataset. Sample numbers of biologically independent samples: n = 1 (Oocyte); 1 (Sperm); 2 (Zygote); 5 (two-cell); 1 (four-cell); 6 (eight-cell); 2 (Morula); 5 (ICM); 4 (Naive hESC); 7 (Primed hESC); 2 (TE); and 3 (Differentiated trophoblast). The upper and lower bounds of boxes show 25% and 75% percentiles of the data. The median of data is shown as the horizontal line in the box. The distribution minima and maxima, defined as farthest data point distanced ≤1.5 IQR from the box bounds, are shown by the whiskers. Correlation R and P value: Pearson’s. Tiny P values resulting in numerical underflow are shown as ‘<2.2 × 10−16’. tropho., trophoblast. Full size image

Literature-documented ClockDML18,25,26,38 were found mainly with methylation-specific microarrays and, thus, represent only a tiny fraction of possible genome-wide DNAm variation during aging. However, ATAC-seq—and, more so, scATAC-seq—data are too sparse to fully cover these loci. To enable accurate tracking of cell mitotic age by ATAC signal, we determined 126,420 ClockDML in the human genome by bisulfite capture sequencing the CpG island regions of peripheral blood mononuclear cells (PBMCs) from a panel of donors of different ages and determined the correlation coefficient between age and the methylation level (beta) on each locus (Supplementary Table 1: ClockDML). The DNAm status of these ClockDML showed excellent correlation with age in the training cohort (Fig. 1b). Both the general linear model (GLM) and the probability model (TimeSeq)31 built upon beta values of our ClockDML predict donor age with good precision in an additional validation cohort of samples (R = 0.85 (GLM) / 0.7998 (TimeSeq); Supplementary Fig. 1a–c), indicating that the DNAm status of these loci stably drifts over age.

Functional annotation suggests that ClockDML are enriched in the open, accessible chromatin region of the genome across different cell types and organs39,40,41 (Fig. 1c). In correlation with DNAm status, the fraction of opened ClockDML (hereafter, in short: ClockAcc) shifts in correlation with cell aging (Supplementary Notes and Supplementary Figs. 2–5; GSE74912 (ref. 42), GSE89895 (ref. 43) and GSE179606 (ref. 44)). We established an algorithm, called EpiTrace, that predicts sample age by counting the fraction of opened ClockDML in bulk ATAC-seq datasets (Supplementary Notes). Validation experiments using bulk ATAC-seq of FACS-sorted blood cells, induced pluripotent stem cell (iPSC) induction experiments and native immune cells showed that EpiTrace accurately predicted sample age in concordance with known developmental trajectories (Supplementary Notes and Supplementary Figs. 2–5). We then adopted the algorithm for scATAC-seq data. In brief, a reference ClockDML set was provided to the algorithm. ClockAcc, the total chromatin accessibility on this reference ClockDML set, is measured for each cell. The measurement was performed using a hidden Markov model (HMM)-mediated diffusion-smoothing approach, borrowing information from similar single cells to reduce noise in the single-cell measurement: cells were clustered via correlation of top variable ATAC peaks to form a cell‒cell similarity matrix, which was then used for diffusion-regression iterations of ClockAcc until convergence. After iteration, the regularized and smoothened ClockAcc of each single cell were then ranked. Such rank denotes the relative mitotic cell age. To overcome sampling sparseness in scATAC-seq, we reasoned that it might be unnecessary for age-dependent chromatin accessibility (ChrAcc) to always be accompanied by DNAm changes (Supplementary Notes). Thus, we perform stepwise iterations by extracting additional open regions with a high correlation coefficient with estimated single-cell age and then include them together with the reference loci to form a new set of reference clock-like loci for the next round of analysis until the age prediction converges (Fig. 1d). During the computation of EpiTrace algorithm, known sample age information is not required. The algorithm simply leverages the fact that heterogeneity of given reference ClockDML reduces during cell replication and then uses such information as an intermediate tool variable to infer cell age.

We first validated the EpiTrace algorithm on in vitro models. ChrAcc in single mouse cells was profiled with the simultaneous high-throughput ATAC and RNA expression with sequencing (SHARE-seq) assay (Supplementary Fig. 6), and DNAm age (in batch) was determined by DNAm sequencing (Supplementary Fig. 1d–g). In asynchronized immortal mouse embryonic fibroblast (MEF) cells, progression in the cell cycle results in a reduction in EpiTrace-predicted age (Supplementary Fig. 7), suggesting that EpiTrace tracks an epigenomic modification that dilutes during genome replication (as newly synthesized copy of genome emerges). In primary MEF (pMEF) cells, this phenomenon persists (Supplementary Fig. 8a,b,d,h). However, as the cells were passaged in vitro, the EpiTrace age stably increased (Supplementary Fig. 8b,i). Such a mitosis age-dependent increase in EpiTrace age overwhelms the genome replication-mediated dilution effect (Supplementary Fig. 8) and correlates well with DNAm-based age prediction of the same batch of samples (Supplementary Fig. 9). Finally, for cells that are pharmacologically blocked in a specific cell cycle (GSE65360 (ref. 1)), EpiTrace age increases from G1 to S and G2/M phase (Supplementary Fig. 10), suggesting that accumulation of error during copying of epigenomic modification to the newly synthesized copy of genome results in an increase in EpiTrace age prediction over mitosis. In large in vivo single-cell datasets without cell phase synchronization, the cell cycle had little effect on EpiTrace age prediction (Supplementary Fig. 11; GSE163579). Together, these data indicate that EpiTrace reports mitosis age.

As a proof of concept, we gathered ATAC data from various studies of early human embryonic development from gametes to blastula19,45 (Fig. 1e and Supplementary Fig. 12; PRJNA494280 and PRJNA394846), which were generated from only a few cells each, and subjected them to EpiTrace analysis without batch correction (Fig. 1f). The total ClockAcc in sample positively correlates with known cell mitotic age. Although the initial EpiTrace age prediction is noisy, iterative optimization improved the signal-to-noise ratio to draw a biologically plausible trajectory of age resetting during early embryonic development: starting from zygote, cell mitotic age gradually reduces to near ground state at the time of zygotic genome activation (ZGA) at morula, before its rebound in inner cell mass (ICM), trophectoderm (TE) and embryonic stem cell (ESC).

Inferring cell age across cell types and animal species

For many cell types and animal species, ClockDML have not been experimentally determined. The fact that ClockDML derived from human PBMCs could be used to predict the sample age not only of human blood cells but also of cells of the non-hematopoietic lineage (Fig. 1 and Supplementary Fig. 5) suggests that clock-like ChrAcc on the ClockDML genomic region might be universal across cell lineages. To test whether we could extend known ClockDML to other species or cell types for EpiTrace prediction, we mapped human ClockDML to the mouse genome using genomic synteny and computed EpiTrace age for the mouse scATAC-seq dataset using mouse ClockDML or ‘human-guided’ clock-like loci (Supplementary Fig. 13a). We found that the EpiTrace prediction results using the reference ‘human-guided’ clock-like loci closely approximated the prediction results using the reference mouse ClockDML (R = 0.81; Supplementary Fig. 13b; GSE137115 (ref. 46)).

To further validate the concordance between EpiTrace prediction starting from different reference loci, we tested a mouse scATAC-seq dataset of T cells under chronic or acute virus infection (Supplementary Fig. 14a; GSE164978 (ref. 47)). EpiTrace age prediction using the mouse reference ClockDML agrees with the known developmental trajectory of these immune cells (Supplementary Fig. 14b). In concordance with their tissue of origin, clock-like loci inferred from genomic synteny of human PBMC ClockDML overlap with known immune cell exhaustion genes, such as Pdcd1, Havcr2, Tox and Eomes, whereas mouse ClockDML48 (derived from pan-body DNAm interrogation) do not (Supplementary Fig. 14c). However, single-cell age inferred by EpiTrace with mouse ClockDML as reference correlates well with that inferred with ‘human-guided’ clock-like loci as reference (Supplementary Fig. 14d). The association of ATAC peak chromatin accessibility and single-cell ages from the two predictions shows extremely high concordance (R = 0.92; Supplementary Fig. 14e), with the identification of many known immune exhaustion genes being positively correlated with cell age (Supplementary Fig. 14e). Such correlation is not dependent on whether the loci are previously overlapping with a reference clock-like loci (Supplementary Fig. 14f). Furthermore, peaks overlapping with both ‘human-guided’ clock-like loci and mouse ClockDML showed the greatest age-dependent ChrAcc shift (Supplementary Fig. 14g). These results indicate that EpiTrace can use ClockDML from different tissues of origin to predict single-cell age, even in a cross-species scenario.

We then mapped human ClockDML to the zebrafish genome using a similar synteny-guided approach (Fig. 2a) and tested EpiTrace prediction on a zebrafish scATAC-seq dataset spanning from fertilization to the adult stage (GSE178969 (ref. 49)) using this ‘human-guided’ clock-like loci as reference. The mean EpiTrace age prediction from each stage closely approximated the known sample age (R = 0.97; Fig. 2b). For each single-cell type, the EpiTrace prediction closely assembles their time of emergence (Fig. 2c). Similar results were obtained for ‘mouse-guided’ clock-like loci (Supplementary Fig. 13c; GSE152423 (ref. 50)).

Fig. 2: Mapping ClockDML orthologous genomic regions across species facilitates single-cell age estimation using ChrAcc. a, Schematic of the experiment. Human ClockDML are mapped to the zebrafish genome by homology to produce ‘human-guided reference clock-like loci’ and are then used to infer zebrafish neural crest cell mitotic age. Because the data were provided as a one-hot matrix, we adopted the bulk ATAC-like algorithm output. b, Linear regression of predicted mean mitotic age (y axis) against log-transformed (log-ed) days post-fertilization (dpf) of the sample (x axis). 95% CI is shown as a gray area around the linear regression line. c, Single-cell mitotic age of each cell type (left) and cell-type-specific prevalence (Z-scaled, Z) in samples of different ages (right). Sample numbers of biological independent cells: n = 11,234 (UN, undefined); 2,223 (gill progenitor); 2,408 (gill stroma); 2,877 (dorsal mesenchyme); 1,363 (frontal mesenchyme); 2,060 (dorsal stroma); 1,412 (ventral stroma); 2,373 (perichondrium); 265 (dermal FB); 3,252 (cartilage); 1,656 (bone); 1,716 (mesenchyme); 1,262 (stroma); 382 (perivascular); 623 (teeth); 2,123 (periosteum); 2,825 (perichondrium); 1,329 (stroma/teeth); 1,009 (ventral stroma 2); and 7,127 (gill). d, Schematic of defining putative counterparts of human clock genomic loci in the Drosophila genome. Human ClockDML falling within ±100 bp of the gene transcription start site (TSS) were defined as ‘Promoter ClockDML’. For human genes that simultaneously have a promoter ClockDML and one or more Drosophila ortholog gene(s), we define any Drosophila scATAC peaks falling within ±100 bp of the TSSs of these Drosophila orthologs as putative clock-like genomic loci. These loci were subsequently used for EpiTrace analysis in the Drosophila dataset. e, Diagram showing the number of ClockDML and scATAC falling in each category. H-D: human-drosophila pair. f, EpiTrace age of Drosophila embryonic development time series samples taken every 2 hours after egg laying (GSE190130 (ref. 53)). Corresponding embryo sketches are shown on the right. Sample number of biological independent cells: n = 20,000 for each time slot. For box plots, the upper and lower bounds of boxes show 25% and 75% percentiles of the data. The median of data is shown as the horizontal line in the box. The distribution minima and maxima, defined as farthest data point distanced ≤1.5 IQR from the box bounds, are shown by the whiskers. The violin plot shows the empirically estimated density distribution of datam. m, months; w, weeks. Full size image

For many animal species, active DNAm is not present in the genome. For example, the Drosophila melanogaster genome has less than 1% of CpG being methylated51,52. We hypothesized that if clock-like chromatin accessibility is a universal phenomenon in ClockDML-homologous genomic regions, then identification of ClockDML-homologous genomic regions in such species might be sufficient for age prediction using EpiTrace.

Because these animal species are evolutionarily too distant from humans, only gene-level orthologous relationships could be reliably identified between their genomes and humans. To overcome this problem, we used an orthology-guided approach to first identify human–animal orthologous genes whose promoters encompassed ClockDML in the human genome and then to identify the corresponding promoter genomic loci in the distant animal genome (Fig. 2d). For the D. melanogaster genome, we identified 1,556 such loci (Fig. 2e). We then used this as ‘human-guided’ clock-like loci in the Drosophila genome for reference in EpiTrace age prediction in a Drosophila embryonic development scATAC-seq dataset. The prediction result showed high concordance with the known sampling time (Fig. 2f; GSE190130 (ref. 53)).

ChrAcc change is upstream of DNAm shift on ClockDML

Drosophila is an invertebrate species that lacks canonical DNA methyltransferase. Only 0.4% (1 hour post-fertilization (hpf)) to 0.1% (12 hpf) of cytosine in the Drosophila genome was methylated52 compared to 6–8% in humans, and most Drosophila methylated C was CpT/CpA. The fact that EpiTrace can work on the Drosophila genome suggests that clock-like chromatin accessibility might be independent of clock-like DNA methylation. To validate this model, we first tracked ChrAcc and DNAm on the same DNA molecule using the human embryonic development single-cell chromatin overall omic-scale landscape sequencing (scCOOL-seq) dataset (Supplementary Fig. 15; GSE100272 (ref. 54)) and a long-read nanopore sequencing of nucleosome occupancy and methylome (nanoNOME) dataset (Supplementary Fig. 16; GSE183760 (ref. 55)). In both datasets, ChrAcc shifts before ClockDML DNAm changes on the same molecule, indicating that clock-like DNAm is not necessary for clock-like ChrAcc.

We then performed forced transcriptional activation around ClockDML to test whether changes in ChrAcc would influence DNAm in these regions. We transfected single guide RNA (sgRNA) lentivirus targeting human G8-group ClockDML loci (shown in Fig. 1b) into HEK293 cells stably expressing the dCas9–p300 transactivator (Supplementary Fig. 17a). Gain of ChrAcc around these loci results in DNA hypomethylation on neighboring ClockDML (Supplementary Fig. 17b,c), indicating that clock-like DNAm could be driven by ChrAcc shift.

In another conceptually similar scenario, we measured the DNAm changes under forced transcription activation around mouse Sox1 loci and found them to linearly correlate with the age-dependent DNAm shift coefficient of corresponding loci in the human genome (Supplementary Fig. 18; PRJNA490128 (ref. 56)). Hence, changes of ChrAcc on ClockDML are sufficient to drive clock-like differential DNAm.

To validate whether changes in DNAm would affect ChrAcc on ClockDML, we tracked ChrAcc in a dataset where forced DNAm was performed with a ZNF-DNMT3A artificial methylator (Supplementary Fig. 19a; GSE102395 (ref. 57)). Although ZNF-DNMT3A induction results in irreversible DNAm on ClockDML around its binding site (Supplementary Fig. 19b,e), it does not change the overall ChrAcc on these loci (Supplementary Fig. 19c,d; GSE102395 (ref. 57) and GSE103590 (ref. 58)) nor does it change the EpiTrace age on these cells (Supplementary Fig. 19f,g; GSE102395 (ref. 57) and GSE103590 (ref. 58)).

Together, these data indicate that clock-like ChrAcc occurs upstream of the DNAm shift on ClockDML. In animals without active DNAm, genomic region exhibiting clock-like ChrAcc could also be identified. In other words, clock-like ChrAcc is an innate property on the clock-like loci, which usually harbor ClockDML. Furthermore, clock-like ChrAcc is independent from DNAm.

The reversal of epigenetic age during iPSC induction

We tested EpiTrace on a single-cell multiome (scMultiomic) sequencing dataset (CNP0001454 (ref. 59)) of primed human embryonic stem cell (‘Primed’ hESC) cultures undergoing chemical reprogramming through a ‘4CL naive PSC’ state toward an eight-cell-like (‘8CL’) state (Fig. 3a), measured EpiTrace age in single cells and compared the EpiTrace prediction with biological age predicted by whole-genome bisulfite sequencing (WGBS) of the same cultures59. Both DNAm-based prediction of sample age (Fig. 3b) and single-cell age predicted by EpiTrace (Fig. 3c) suggest that mitotic age increases as cells undergo transformation, with single-cell age gradually increasing across the evolutionary trajectory toward the 8CL state (Supplementary Fig. 20). Furthermore, the biological age estimation of DNAm and EpiTrace was precisely correlated (correlation coefficient of mean single-cell (sc) EpiTrace age × mean DNAm age: 0.998 (P = 0.04); scEpiTrace age × mean DNAm age: 0.526 (P = 1.9 × 10−38)) (Fig. 3d). While RNA velocity projections on these cells showed erroneous evolution trajectories rooted at single cells of a differentiated state (Supplementary Fig. 21), combining RNA velocity and EpiTrace age of the same cell results in more biologically plausible evolution trajectories with the primed hESC as the root of all other cells (Fig. 3e and Supplementary Fig. 21). These results suggest that ChrAcc on ClockDML predicts single-cell biological age at least as well as the DNAm-based age estimator, even in an age-reverse scenario.

Fig. 3: Inferring single-cell age reversal in iPSC induction with EpiTrace. a, Schematic overview of the in vitro chemical induction of human pluripotent stem cells (‘Primed’) back to 8-cell like cell (8CLC) state, through serially culturing in 4-cell-like medium (4CL, three passages (P3)) and the enhanced 4CL-medium (e4CL). b, DNAm age of D0 (day 0, Primed) and D12 (day 12, 4CL) cultures and sorted 8CLCs from D17 (day 17) culture, from WGBS data. n = 2 independent biological repeats in each group. c, Single-cell age estimated with EpiTrace from the D17 scMultiomic dataset. Sample numbers of biologically independent cells: n = 483 (primed); 33 (interim); and 61 (8CLC). d, Correlation of inferred age from DNAm (whiskers denote minimum/maximum, central point denotes median value, y axis) or single-cell EpiTrace age (whiskers denote 25%/75%, central point denotes median value, x axis) from the same set of cells. Correlation R and P value: Pearson’s. Sample numbers of WGBS and single cells were as in b and c. e, UMAP of scMultiomic-sequenced D17 culture with single-cell evolution trajectories built with kernels combining EpiTrace age and RNA velocity information. f, Schematic overview of the in vitro chemical induction of human adult fibroblasts toward chemically induced pluripotent stem cells (CiPSC). Both the uninduced and intermediate stage II cultures were sequenced by scATAC. 5-azaC, 5-azacytidine; C6NYSA, combination of CHIR99021, 616452, TTNPB, Y27632, SAG and ABT869; hADSC, human adipose stromal cell (mesenchymal stromal cell); HEF, human embryonic fibroblast; JNKIN8, c-Jun N-terminal kinase inhibitor; T5J, tranylcypromine, JNKIN8 and 5-azaC. g, Single-cell age estimated with EpiTrace from f. The induced cultures were either subjected to the full induction paradigm (+Chem: C6NYSA + T5J) or had 5-azaC or JNKin8 removed (−5aza, −JNKin8). Sample numbers of biologically independent cells: n = 8,826 (uninduced); 4,667 (+Chem, −JNKin8); 10,257 (+Chem, −5aza); and 8,671 (+Chem stage II). Statistical comparisons are shown between groups by two-sided Wilcoxon test. h, Correlation coefficient between the ChrAcc on each ATAC peak and EpiTrace age estimated from the MSC experiment (x axis) or the FB experiment (y axis). Peaks of interest are labeled, colored by their genomic location class. i, Prediction of sample age by DNAm from WGBS data of chemical induction of iPSCs. Chemical reprogramming induces genome-wide demethylation and an increase in DNAm age, as reported previously31, whereas the addition of 5-azaC globally reduces DNAm to increase DNAm age. Removal of 5-azaC blocks DNAm age from increasing. Sample numbers of biologically independent samples: n = 4 (uninduced); 2 (C6NYSA); 1 (−JNKIN8); 1 (−5azaC); 2 (C6NYSA + T5J); and 4 (iPSC/hESC). j, Scatter plot of WGBS DNAm age (x axis) and mean single-cell EpiTrace age (y axis) of the same sample. For box plots, the upper and lower bounds of boxes show 25% and 75% percentiles of the data. The median of data is shown as the horizontal line in the box. The distribution minima and maxima, defined as farthest data point distanced ≤1.5 IQR from the box bounds, are shown by the whiskers. The violin plot shows the empirically estimated density distribution of data. Corr.coef., correlation coefficient. Full size image

We measured EpiTrace age in an additional scATAC-seq dataset of cells undergoing the early stages of chemical reprogramming from differentiated endodermal cells (fibroblast (FB) or mesenchymal stem cell (MSC)) toward iPSCs (Fig. 3f; GSE178324 (ref. 60)) and compared them with WGBS-predicted ages from the same study. EpiTrace age prediction of single cells significantly decreased at stage II compared to the uninduced state (Fig. 3g), indicating that these cells are ‘rejuvenated’ as expected. Compared to uninduced cells, the EpiTrace age of stage II reprogrammed C6NYSA + T5J cells is significantly rejuvenated (decreased). Removal of 5-azaC from the treatment result only slightly impairs the ChrAcc age rejuvenation as reflected by EpiTrace age. On the contrary, removal of the JNK inhibitor from the treatment resulted in more significant impairment of rejuvenation (Fig. 3g). The age–peak association from the MSC reprogramming experiment is highly similar to that from the FB reprogramming experiment. (Fig. 3h). These results suggest relevance between the observed mitotic age resetting and cell fate reprogramming.

The DNAm-predicted age of cells during the chemical induction procedure shows that the biological age first increases at induction stage II (Fig. 3i; GSE178966 (ref. 60)) before decreasing to near zero in the pluripotent state. Removal of 5-azaC from the induction formula blocks the DNAm age increase at stage II, indicating that the apparent DNAm age increase is a result of global DNA demethylation31. Similar to our previous observations, comparing the DNAm age and EpiTrace age prediction of the same cell sets suggests that ChrAcc on ClockDML is independent from ClockDML DNAm change (Fig. 3j).

Epigenetic age determines future cell expansion potential

To test EpiTrace age estimation in genetically defined cell lineages, we took advantage of a mitochondria-enhanced scATAC-seq dataset (GSE142745 (ref. 8)) of cultured CD34 hematopoietic stem cells (HSCs) that underwent in vitro expansion for 14 d before being forced into differentiation under SCF/IL3/EPO toward myeloid/erythroid lineages for an additional 6 d (Fig. 4a). These cells were sequenced at day 8 (D8), day 14 (D14) and day 20 (D20). Cells were clustered by their transcriptomic (scRNA) phenotype as progenitor (Prog) cells, differentiated (Diff) cells or terminally (Terminal) differentiated cells, which gradually emerged over days in culture (Fig. 4b and Supplementary Fig. 22). Furthermore, they were segregated into lineages (clones) arising from the same progenitor by mitochondrial single-nucleotide variant (SNV).

Fig. 4: Single-cell age estimation revealed that epigenomic age determines clonal expansion potential. a, Schematic of the experiment. CD34+ HSCs were used in the in vitro expansion/differentiation experiment. Cells were first expanded to D8 (CD34_500) or D14 (CD34_800) and then differentiated by SCF, IL-3 and EPO until D20. Mitochondrial mutations from the scATAC experiment were used for tracking cells derived from similar clones. Cell phenotypes were determined by the scATAC profile. b, Cells from experiments performed on D8, D14 and D20, showing a gradual transition toward terminally differentiated myeloid (my4) and erythroid (ery6) cells. c, Tracking the mean EpiTrace age of each myeloid cell clone at each timepoint. Sample numbers of independent biological clones: n = 35 (Prog D8–D20); 10 (Diff D8–D20); and 67 (Terminal D8–D20). d, Ratio of rejuvenated (clone age decrease over time) clones in all clones for the myeloid cells. The terminally differentiated cells are dominated by rejuvenated clones. e, Number of terminal myeloid cells derived from young proliferator clones (mean initial clonal EpiTrace age < 0.7) and old proliferator clones (mean initial clonal EpiTrace age ≥ 0.7) at three timepoints. f, Scatter plot of the log clonal expansion ratio on D14 (y axis) compared to the mean initial clonal EpiTrace age of the same clone (x axis). Clonal types are color-labeled. g, EpiTrace age (color) of the single cells derived from a similar clone. Three clones with different fates are shown for example. The CD34_800_42 clone was a myeloid-specific clone that generated only myeloid cells. The CD34_800_8 clone was a bipotent clone that generated both myeloid and erythroid decedents. The CD34_800_10 clone was an erythroid-specific clone that generated predominantly erythroid cells. h, Relative contribution of young proliferator clones (mean initial clonal EpiTrace < 0.7) and old proliferator clones (mean initial clonal EpiTrace age ≥ 0.7) in the terminal myeloid cell population at three timepoints. i, Scatter plot of the log clonal expansion ratio on D20 (y axis) compared to the mean initial clonal EpiTrace age of the same clone (x axis). Clonal types are color-labeled. Correlation statistics (R and P value): Pearson’s. Group statistics: t-test, two-sided. For box plots, the upper and lower bounds of boxes show 25% and 75% percentiles of the data. The median of data is shown as the horizontal line in the box. The distribution minima and maxima, defined as farthest data point distanced ≤1.5 IQR from the box bounds, are shown by the whiskers. The violin plot shows the empirically estimated density distribution of data. Full size image

We used EpiTrace to predict mitosis age in these cells, separately for myeloid lineage and erythroid lineage cells. Because the single-cell age prediction by EpiTrace could be affected by highly biased cell composition (Supplementary Fig. 23), we selected a relatively balanced CD34_800 dataset for erythroid lineage cell age prediction. Both CD34_500 and CD34_800 datasets were used for myeloid lineage cell age prediction.

The age prediction shows high concordance with known sampling days across cell types and enables tracking the mitosis age of individual cells derived from the same clone (Fig. 4g). For the myeloid lineage, the EpiTrace age of cells from the same clone increased from progenitor to terminal myeloid cells (Fig. 4c,d). Forced differentiation increased the age of differentiated cells as expected but decreased the age of terminal cells (Fig. 4c,d). To explore this phenomenon in depth, we classified clones according to the relative age change between days of culture (Supplementary Fig. 24): clones that exhibited an age increase from D8 to D14 in a cluster were classified as ‘Aged’, and those that exhibited an age decrease from D8 to D14 were classified as ‘Rejuvenated’. Although most progenitor and differentiated cells show clonal aging during induction, clonal rejuvenation dominates the terminally differentiated clusters (Fig. 4c,d). The proportion of clones showing a rejuvenation increase in terminal cells (Fig. 4d), in correlation with their differentiation state, suggests that these terminally differentiated cells were derived from younger hematopoietic progenitors instead of existing intermediate differentiated cells.

To validate this hypothesis, we analyzed the expansion capability of different cell clones, which processes different types of proliferating cells, including progenitor (Prog) cells and intermediate (Int) differentiated cells, from the CD34_800 experiment (which was sequenced on all three timepoints). We first classified the cell clones according to their cell composition on D8 (the first timepoint): at this time, clones with only Prog cells but no Int cells were classified as ‘Prog-only’; clones with only Int cells but no Prog cells were classified as ‘Int-only’; and clones with both Int and Prog cells were classified as ‘Both’ (Supplementary Fig. 25a). The mean EpiTrace age of the clones at the initial timepoint was measured as mean EpiTrace age of cells from D8 (Supplementary Fig. 25b,d). We then tracked their clonal derivatives at the next timepoints (D14 and D20) to see if a clone was expanded (defined as an increased terminally differentiated cell number at later timepoints compared to D8). For each expanded clone, we calculated the clonal expansion ratio, defined as the increased number of terminally differentiated cells divided by the total cell number on D8.

At both D14 and D20, the log clonal expansion ratio was inversely correlated with the initial EpiTrace age of the clone (Fig. 4f,i and Supplementary Fig. 25c,e): the correlation between the log clonal expansion ratio and initial clonal age was R = −0.66 (P = 2.3 × 10−8) for D14 and R = −0.57 (P = 0.00058) for D20. Although Int-only clones expanded better than Prog-only clones at earlier timepoints (Supplementary Fig. 25c), the Prog-only clones caught up at the latter timepoint and showed improved expansion potential (Supplementary Fig. 25e).

We then re-classified the clones by their mean clonal age at D8 into ‘young progenitor-derived clones’ (defined as the mean EpiTrace age < 0.7) or ‘old progenitor-derived clones’ (defined as the mean EpiTrace age ≥ 0.7). The number of terminal cells derived from young clones steadily increased during the stimulation timecourse, outnumbering the terminal cells derived from old clones on D20 (Fig. 4e). As a result, the relative contribution of terminal cells from young clones steadily increased during the stimulation timecourse (Fig. 4h and Supplementary Fig. 25f,g), explaining the observed decrease in terminal cell EpiTrace age (Fig. 4c,d).

Combining the observations, we conclude that the clonal expansion potential is better explained by clonal epigenetic age instead of the initial phenotype of proliferating cells in the clone. Interestingly, the initial clonal age in clones with both Prog and Int cells was significantly older than that in Prog-only or Int-only clones (Supplementary Fig. 25b,d). These clones expanded the least at both timepoints (Fig. 4f,i and Supplementary Fig. 25c,e). This result indicates that cells in these clones, although phenotypically classified as capable of proliferation, are at the end of their expansion potential.

Together, these results support the model that, during in vitro HSC-stimulated expansion, terminally differentiated cells are preferentially derived from younger progenitors. In other words, younger hematopoietic progenitor cells are much more capable of expansion and differentiation. In the seminal study in which Hayflick determined the in vitro passage limit of cultured cells61, he co-cultured 46,XX and 46,XY cells with different in vitro passage numbers together. By counting the karyotypes of cells in the final passage population, he found that the ‘younger’ cells with less starting passage number always dominated the final passage population. Our current experiment is, by design, similar to Hayflick’s original experiment by using a genetic marker, mitochondrial mutation, to track each clone. By measuring the ‘clonal age’ of these single cells, EpiTrace derived a quantitative measure of future expansion potential against the current age of the clone. A pioneering study showed that the genome-wide DNAm level decreases during cell culture passage62. This phenomenon was later used to propose a method to infer Hayflick’s limit for individual cell lines63. This result provided experimental evidence for the pioneering theoretical works.

Elucidating T cell markers underlying anti-PD1 response

The CD34 dataset demonstrated above is based on an ideal in vitro scenario with cells cultured in an isolated dish. The cultures start at a similar starting point. They proliferate and die in the dish, without exchange with the external environment. To test EpiTrace in a more complex cell population in an in vivo setting, with possible influx, efflux and proliferation, we applied EpiTrace to an scATAC-seq dataset comprising biopsies from basal cell carcinoma pre-anti-PD1 and post-anti-PD1 treatment (Fig. 5a; GSE129785 (ref. 64)). After anti-PD1 treatment, cytotoxic T cells with exhaustion markers are significantly increased in anti-PD1 responders (R) but not in non-responders (NR). More immature exhausted T (T ex ) cells were present in non-responders, and this phenomenon was exaggerated after anti-PD1 treatment. However, overall maturity did not change in responders. The EpiTrace age of interim and mature T ex cells in responders did not change after the anti-PD1 treatment, suggesting that the increased cell number might not be solely due to local proliferation of pre-anti-PD1 mature T ex cells (Fig. 5b).

Fig. 5: Single-cell age estimation facilitates the discovery of molecular markers of peripheral influx T cells underlying the anti-PD1 response. a, Schematic overview of the experiment. Biopsies were taken from patients with basal cell carcinoma before (pre) and after (post) anti-PD1 treatment and subjected to scATAC-seq. b, Cell number (above) and EpiTrace age (below) of T ex cells, separated by treatment response (R: responder; NR: non-responder) and T cell phenotypic maturity (Immature/Interim/Mature). Sample numbers of independent biological cells: n = 322 (NR group, Immature cell, Pre-PD1); 596 (NR group, Immature cell, Post-PD1); 77 (NR group, Interim cell, Pre-PD1); 108 (NR group, Interim cell, Post-PD1); 97 (NR group, Mature cell, Pre-PD1); 14 (NR group, Mature cell, Post-PD1); 77 (R group, Immature cell, Pre-PD1); 237 (R group, Immature cell, Post-PD1); 222 (R group, Interim cell, Pre-PD1); 923 (R group, Interim cell, Post-PD1); 378 (R group, Mature cell, Pre-PD1); and 2,452 (R group, Mature cell, Post-PD1). c, Heatmap showing scATAC peak activity in pseudobulk single cells grouped by phenotype (R/NR), sampling time (pre-PD1 or post-anti-PD1) and EpiTrace age. Correlations between peak activity and EpiTrace age are shown on the left. Peaks were clustered according to their activity profile into response-specific, non-response-specific and age-associated clusters. d, Correlation coefficient between clusters of peaks and treatment (PD1: pre/post = 0/1), response (R: NR/R = 0/1) and cell age (Age). Non-significant correlations are labeled with ‘X’. e, GO enrichment of the C1 cluster peaks as in c. Enrichment was tested by one-sided Fisher’s exact test. −logP values were adjusted by multiple comparison. f, ChrAcc (top) and cross-correlation between peaks (bottom) of CD109 loci from pseudobulk single cells grouped with phenotype (R/NR), sample (pre-PD1 or post-anti-PD1) and EpiTrace age (Young/Interim/Mature). The association of the CD109 promoter ChrAcc across age is shown in the right panel. g, ChrAcc (top) and cross-correlation between peaks (bottom) of CHRNA1 loci from pseudobulk single cells grouped as in f. The association of the CHRNA1 promoter ChrAcc across age is shown in the right panel. Correlation test: Pearson’s. For box plots, the upper and lower bounds of boxes show 25% and 75% percentiles of the data. The median of data is shown as the horizontal line in the box. The distribution minima and maxima, defined as farthest data point distanced ≤1.5 IQR from the box bounds, are shown by the whiskers. The violin plot shows the empirically estimated density distribution of data. Corr.coef., correlation coefficient. Full size image

New post-anti-PD1 mature T ex cells could be derived either from pre-anti-PD1 immature T ex cells or from the influx of peripheral T cells. To test these alternatives, we performed a correlation of ChrAcc on T ex differentially expressed peaks and cell age. Hierarchical clustering of the peak openness of pseudobulk cells from similar age and phenotype segregated peaks into three clusters: C1: response-specific and age-independent; C2: response-irrelevant and age-associated; and C3: non-response-specific peaks that were weakly associated with age (Fig. 5c,d). Interestingly, known markers of activated (TIGIT, LAYN and HAVCR2) and tumor-reactive T cells (ENTPD1) were segregated into different clusters. T cell markers (TOX2, ID2 and MAFA) and tissue-resident marker CXCR6 belong to the group of scATAC peaks that are mainly associated with age but are not associated with PD1 response (Fig. 5c,d). The anti-PD1 response is not associated with cell age but, instead, with C1 peak expression. In contrast, cell age is associated with C2/C3 peaks, which are related to no response under anti-PD1. Gene Ontology (GO) enrichment of this C1 cluster, in contrast to C2/C3 cluster genes, showed particular enrichment in the ‘cytokine receptor’ and ‘immune receptor’ pathways (Fig. 5e and Supplementary Fig. 26), highlighting genes such as IL4R, CD74, IFNGR2 and IFNAR2, which might be implicated in the anti-PD1 response. Finally, we identified that cis-regulatory loci of a co-receptor and negative regulator of TGF-β, CD109 (Fig. 5f), and the nicotinic acetylcholine receptor CHRNA1 (Fig. 5g), are specifically activated in response-associated T ex cells, suggesting targets for future research.

Revealing developmental history during cortical gyrification

To test how mitotic age estimation might complement RNA-based development analysis, we applied EpiTrace to a scMultiomic dataset from the post contraception week (pcw) 21 human fetal brain cortex (GSE162170 (ref. 65)) to study the trajectory of glutaminergic neuron (GluN) development (Fig. 6a). GluNs develop from radial glia (RG) through the cycling progenitor (Cyc. Prog) cells into neuronal intermediate progenitor cells (nIPCs), before undergoing a cascade of maturation (GluN1 > GluN2 > GluN3 > GluN4 > GluN5)65,66,67. We modeled the cell fate transition by CellRank68 with kernels built with RNA velocity, CytoTRACE (an RNA-based index of cell differentiation state), EpiTrace age or a combined kernel with all three estimators. Although RNA velocity and CytoTRACE produced inconsistent transition trajectories that pointed toward a group of nIPCs (Fig. 6b, i and ii), kernels with EpiTrace age revealed a correct direction of development from the nIPCs toward terminally differentiated neurons (Fig. 6b, iii). The combined kernel of all three estimators resulted in a biologically plausible transition trajectory that starts from RG to bifurcate into two different branches, each giving rise to a distinct nIPC population that differentiates into mature neurons (Fig. 6a).

Fig. 6: EpiTrace reveals the developmental history during human cortical gyrification. a, UMAP projected cell evolution trajectory built with CellRank by using a hybrid kernel of EpiTrace, CytoTRACE and RNA velocity of an scMultiomic-seq dataset from a pcw21 human brain. EC, endothelial cell; IN, inhibitory GABAergic neuron; mGPC/OPC, medial ganglionic eminence progenitor/oligodendrocyte precursor cell; SP, subplate neuron. SPs and ECs are not shown in the figure due to space limitations. b, Trajectories built with only CytoTRACE (i) or RNA velocity (ii) resulted in unrealistic ‘sinks’ and ‘saddles’ on the map. In contrast, EpiTrace age (iii) provided a unidirectional reference of time to reveal that the ‘sink’ nIPC population is mitotically active to resolve the ‘nIPC stall’. c, Scatter plot of the differential gene expression estimate (−logP value, x axis) and differential TFBS-specific ChrAcc estimate (−logP value, y axis) in the GluN cells. Most significantly differential expressed transcription factors NR2F1 and TCF4 are highlighted in the figure. Differential expression was estimated by non-parametric Wilcoxon rank-sum test. d, UMAP of TFBS-specific ChrAcc of NR2F1. e, Expression of NR2F1 on UMAP. f, Expression of LMO3 on UMAP. g, EpiTrace age of cells belong to the LMO3+ population or NR2F1+ population. Sample numbers of biologically independent cells: 808 (LMO3+) and 1,198 (NR2F1+). P < 2.2 × 10−16 (Wilcoxon test, two-sided; the P value resulted in numerical underflow). h, CytoTRACE of cells belong to the LMO3+ population or NR2F1+ population. Sample numbers as in g. P = 0.017 (Wilcoxon test, two-sided). i, Mitotic clock (EpiTrace) and differentiation potential (CytoTRACE) of the same cell in scMultiomic-seq. The CytoTRACE score was reversed to show differentiation from left to right to facilitate comparison with EpiTrace. Sample numbers of biologically independent cells: n = 646 (RG); 341 (Cyc. Prog.); 2,348 (nIPC/GluN1); 1,546 (GluN2); 798 (GluN3); 459 (GluN4); 223 (GluN5); 190 (SP); 359 (mGPC/OPC); 301 (IN3); 780 (IN2); 959 (IN1); and 31 (EC/Peric.). j, Excitatory neuron phylogeny built with mitotic clock, showing that GluN4/GluN5 are likely direct, early-born progenies of RG, whereas GluN2/GluN3 are likely late-born, immature progenies of nIPC. k, Overall model of corticogenesis in the light of EpiTrace. Data source: Trevino et al.65. For box plots, the upper and lower bounds of boxes show 25% and 75% percentiles of the data. The median of data is shown as the horizontal line in the box. The distribution minima and maxima, defined as farthest data point distanced ≤1.5 IQR from the box bounds, are shown by the whiskers. The violin plot shows the empirically estimated density distribution of data. Full size image

Two transcription factors, TCF4 and NR2F1 (encoding the transcription factor COUP-TFI), were differentially expressed between the branches. They exhibit significant differential binding activities in these neurons (Fig. 6c). Interestingly, NR2F1 is mainly expressed in the gyrus of the human cortex, and hereditary NR2F1 loss-of-function mutations are associated with mental retardation and the polymicrogyri phenotype69,70,71. NR2F1 TFBS-associated peaks are open in a branch (Fig. 6d) that is NR2F1 negative (Fig. 6e) and LMO3 positive (Fig. 6f), suggesting that NR2F1 turned into a transcriptional repressor in nIPCs. The EpiTrace age of the NR2F1+ branch nIPC was significantly higher than that of the LMO3+ nIPC, suggesting increased mitotic activity (Fig. 6g). In concordance with this, the CytoTRACE score of NR2F1+ nIPC was lower than that of LMO3+ nIPC (Fig. 6h), suggesting increased differentiation. These results indicate that nIPCs are divided into NR2F1+ clones that support earlier neurogenesis and LMO3+/NR2F1− clones that expand relatively later, linking the gyrus-specific expression pattern of NR2F1 to its function in cortical gyrification69.

We compared the EpiTrace age of the neurons with their CytoTRACE score (Fig. 6i). Although the CytoTRACE score of GluNs correlates with their differentiation, the EpiTrace age of these cells is inversely correlated with their maturity. To explain this inconsistency, we built a ‘phylogenetic tree’ of single cell clusters with ClockDML ChrAcc (EpiTrace phylogeny). We reasoned that cells traverse on the phenotype manifold on branched trajectories while they undergo mitosis. As they evolve, ChrAcc on ClockDML converges into a specific state that should be lineage dependent because of the irreversible nature of such change. Hence, it is possible to infer cell lineage trees using phylogenetic-like methods. Such analysis revealed a birth sequence of GluNs: GluN5 is first divided from RG, followed by GluN4, GluN2, GluN3 and GluN1/nIPC (Fig. 6j), indicating that neurons that formed earlier undergo longer post-mitotic maturation (Supplementary Fig. 27). In concordance with this observation, by analyzing scRNA expression of the same cells, we found that, whereas the late-aged nIPC/GluN1 and GluN2 cells still showed reminiscent RNA expression of the proliferating cells, such as SOX11, SOX4, MALAT1 and NFIB, the earlier-aged, ‘more mature’ GluN5 and GluN4 cells showed significantly increased expression of mature neuron markers, including synaptic proteins, including SYT4, SYT11, FABP7, APP, GAP43 and PCDH17; mature neuron cytoskeleton proteins, such as TUBB2A and NEFL; and post-mitotic functioning transcription factors, such as MEF2C (Supplementary Fig. 28). Furthermore, in concordance with the known ‘inside-out’ developmental paradigm of the cortex72, the earlier-aged GluN5 specifically expresses the layer V/VI marker genes SCUBE1 and SEMA3E73, whereas the younger GluN4 population expresses similarly higher levels of the layer III/IV marker genes NTNG1 and MME73 (Supplementary Fig. 29). Hence, the dynamics of post-mitotic neurons undergoing continuous differentiation could be captured by combining mitotic age with other modality measurements.

Together, this analysis demonstrated that EpiTrace age analysis complements RNA velocity and stemness prediction in characterizing complex organ development; indicated a long post-mitotic maturation of neurons; and revealed the molecular mechanism of NR2F1 controlling human nIPC proliferation to underlie cortical gyrification (Fig. 6k).

Inferring gene function in kidney from a static snapshot

We already demonstrated that EpiTrace can track development using developing tissues. To test whether EpiTrace can recover epigenomic changes during development from a single, terminally developed, static snapshot from adult tissue, we applied EpiTrace to an scATAC-seq dataset from adult human kidney (Extended Data Fig. 1a; GSE166547 (ref. 74)). The birth sequence of kidney cells by EpiTrace phylogeny analysis suggests an endothelial origin of kidney tubules and delineates a cell-type-specific generation cascade during nephrogenesis (Extended Data Fig. 1b), with correlation to their spatial position (Supplementary Fig. 30). The distribution of EpiTrace age for each cell type suggests a distal-to-proximal genesis cascade of nephron tubules with a late expansion of proximal tubules (PTs) (Extended Data Fig. 1c).

In the PT lineage, EpiTrace age-derived phylogeny could be orthogonally validated with small nuclear RNA (snRNA)-derived phylogeny (Supplementary Fig. 31; GSE121862 (ref. 75)). The correlation between EpiTrace age and peak openness showed clear segregation of peaks opened in progenitor or differentiated PT cells (Extended Data Fig. 1d). Notably, such association is not guided by known cell type information, indicating the power of EpiTrace in positioning single cells along their evolutionary trajectory. Interestingly, the translocation renal cell carcinoma (TRCC) driver gene TFEB is specifically activated in progenitor cells and shows an age-dependent decrease in activity. In contrast, all hereditary renal dysgenesis (CAKUT) genes, FGF8, FGFR2, SLIT3, GDNF and NHS, are associated with differentiated cell-specific, age-dependent increased peaks. These results suggest that CAKUT is linked to genes functioning in terminal PT cell fate determination and function, whereas TRCC oncogenesis is linked to the mis-expression of progenitor-specific transcription factors, possibly forcing the dedifferentiation of terminally differentiated PT cells into a stem-like state.

Tracking glioblastoma clonal evolution

Finally, we analyzed an individual tumor sample (CGY2349) in a human glioblastoma (GBM) scATAC-seq dataset to study whether EpiTrace age analysis could work for cell evolution in oncogenesis (Extended Data Fig. 2a; GSE139136, GSE163655 and GSE163656 (ref. 76)). In this tumor, copy number variation (CNV) analysis showed that MDM4 amplification dominates the malignant clones, which additionally have either EGFR or PDGFRA amplifications, resulting in increased ChrAcc around these genes (Extended Data Fig. 2b–e). With EpiTrace, we identified a pre-malignant cluster (7) that is younger than all malignant clones (4/6/5/0/3) but shows accelerated aging/mitosis count compared to the ‘normal clones’ (1/9) (Extended Data Fig. 2b,f), has lower MDM4 amplification (Extended Data Fig. 2c) and is without either EGFR or PDGFRA amplification (Extended Data Fig. 2d–e).

Interestingly, some MDM4+ cells had both EGFR and PDGFRA amplification (Supplementary Fig. 32). EpiTrace age analysis revealed that the MDM4+-only cells are ancestral to triple-positive, EGFR+/PDGFRA+ cells, followed by loss of either EGFR or PDGFRA in the progeny (Extended Data Fig. 2f). This is further supported by EpiTrace phylogeny analysis (Extended Data Fig. 2g). Branched evolution of MDM4+/EGFR+ and MDM4+/PDGFRA+ cells was initiated at the beginning of malignant transformation (Supplementary Fig. 32). Together, these results characterized the evolutionary trajectory of malignancy from the MDM4+ pre-malignant clone to the earliest malignant cell population with amplification of MDM4, PDGFRA and EGFR in a catastrophic genomic instability event, which bifurcated into heterogeneous clones with either PDGFRA or EGFR addiction (Extended Data Fig. 2h). EpiTrace age analysis revealed the pre-malignant state of this tumor and suggested branching evolution of this tumor to indicate that heterogeneous cancer clones arise early in malignancy transformation.

It was previously known that telomere crisis and mitotic mis-segregation can cause catastrophic events in a single mitosis, most importantly chromothripsis77,78, chromoplexy79 and kataegis80. Multiple structural variations over the genome can occur simultaneously during such events, resulting in a synchronous, punctuated burst of chromosomal copy number aberration77,81. By timing the occurrence time of these mutational events, it was identified that such events occur early during oncogenesis82,83. PDGFRA and EGFR amplifications were reported to exist in different single-cell clones that coexist in a mosaic manner in GBM tumors84. Although most reports suggest that these mutations are mutually exclusive in single GBM-derived cell lines or tumor sphere cultures85, these clones coexist within the same tumor and share common somatic mutations, such as deletion of PTEN and CDKN2A84,86, indicating that they were derived from the same ancestral clone. scRNA-seq87,88 suggests that PDGFRA+/EGFR+ double-positive cells exist in GBM. Single-positive PDGFRA+ or EGFR+ descendent clones could emerge from double-positive parental clones without specific selection86. These observations are similar to our observation with EpiTrace. In our analysis, although we sampled only a fraction of the tumor, the similar cell age estimated for MDM4+/EGFR+ and MDM4+/PDGFRA+ clones suggested that neither of these clones gained selective advantage during tumor growth. Instead, they are under neutral evolution. Further experiments with higher-resolution clonal tracing, putatively with a genetic marker, are necessary to confirm this observation.