Single-cell multi-omics identifies chronic inflammation as a driver of TP53-mutant leukemic evolution

Understanding the genetic and nongenetic determinants of tumor protein 53 (TP53)-mutation-driven clonal evolution and subsequent transformation is a crucial step toward the design of rational therapeutic strategies. Here we carry out allelic resolution single-cell multi-omic analysis of hematopoietic stem/progenitor cells (HSPCs) from patients with a myeloproliferative neoplasm who transform to TP53-mutant secondary acute myeloid leukemia (sAML). All patients showed dominant TP53 ‘multihit’ HSPC clones at transformation, with a leukemia stem cell transcriptional signature strongly predictive of adverse outcomes in independent cohorts, across both TP53-mutant and wild-type (WT) AML. Through analysis of serial samples, antecedent TP53-heterozygous clones and in vivo perturbations, we demonstrate a hitherto unrecognized effect of chronic inflammation, which suppressed TP53 WT HSPCs while enhancing the fitness advantage of TP53-mutant cells and promoted genetic evolution. Our findings will facilitate the development of risk-stratification, early detection and treatment strategies for TP53-mutant leukemia, and are of broad relevance to other cancer types.

Understanding the genetic and nongenetic determinants of tumor protein 53 (TP53)-mutation-driven clonal evolution and subsequent transformation is a crucial step toward the design of rational therapeutic strategies. Here we carry out allelic resolution single-cell multi-omic analysis of hematopoietic stem/progenitor cells (HSPCs) from patients with a myeloproliferative neoplasm who transform to TP53-mutant secondary acute myeloid leukemia (sAML). All patients showed dominant TP53 'multihit' HSPC clones at transformation, with a leukemia stem cell transcriptional signature strongly predictive of adverse outcomes in independent cohorts, across both TP53-mutant and wild-type (WT) AML. Through analysis of serial samples, antecedent TP53-heterozygous clones and in vivo perturbations, we demonstrate a hitherto unrecognized effect of chronic inflammation, which suppressed TP53 WT HSPCs while enhancing the fitness advantage of TP53-mutant cells and promoted genetic evolution. Our findings will facilitate the development of risk-stratification, early detection and treatment strategies for TP53-mutant leukemia, and are of broad relevance to other cancer types.
Tumor protein 53 (TP53) is the most frequently mutated gene in human cancer, typically occurring as a multihit process with a point mutation in one allele and loss of the other wild-type (WT) allele 1,2 . TP53 mutations are also strongly associated with copy number alterations (CNA) and structural variants, reflecting the role of p53 in the maintenance of genomic integrity 2,3 . In myeloid malignancies, the presence of a TP53 mutation defines a distinct clinical entity 1 , associated with complex CNA, lack of response to conventional therapy and dismal outcomes 2,4,5 . Understanding the mechanisms by which TP53 mutations drive clonal evolution and disease progression is a crucial step toward Article https://doi.org/10.1038/s41588-023-01480-1 cloning and computational analysis (Extended Data Fig. 1h-j). Integration of index-sorting data revealed that dominant TP53 multihit clones were enriched in progenitor populations as previously described in de novo AML 19 , whereas TP53-mutant cells were less frequent in the HSC compartment (Extended Data Fig. 3a). CNA analysis using single-cell transcriptomes showed that all TP53 multihit clones harbored at least one highly clonally-dominant CNA, with very few TP53-mutant cells without evidence of a CNA (3.4 ± 1.2%) and an additional 5 of 14 (36%) patients also showing cytogenetically-distinct subclones (Fig. 1f,g and Extended Data Fig. 2p,q).
To confirm that dominant HSPC clones were functional LSCs, we established patient-derived xenografts (PDX) for two TP53-sAML patients (Fig. 1h). Mice developed leukemia in 27-31 weeks with high numbers of human CD34 + myeloid blast cells in the bone marrow (BM; Extended Data Fig. 3b-d), with a progenitor phenotype, TP53 mutations and CNAs similar to the dominant clone from patients' primary cells (Fig. 1i and Extended Data Fig. 3e-l). In patient IF0131, a monosomy 7 subclone (Fig. 1f) preferentially expanded in PDX models (Fig. 1i). Monosomy 7 cells showed a distinct transcriptional profile with increased WNT, RAS, MAPK signaling and cell cycle associated transcription (Extended Data Fig. 3m,n). Together, these data are compatible with a fitness advantage of monosomy 7 cells, a recurrent event in TP53-sAML (Extended Data Fig. 1b,c), driven by activation of signaling pathways that may relate to deletion of chromosome 7 genes such as EZH2 (ref. 20 ). In summary, the dominant leukemic clones in TP53-sAML were invariably characterized by multiple hits affecting TP53 (multihit state), indicating strong selective pressure for complete loss of WT TP53, together with the gain of CNAs and complex cytogenetic evolution, with very few TP53 multihit cells with a normal karyotype (Fig. 1j).

Molecular signatures of TP53-mutant-mediated transformation
To understand the cellular and molecular framework through which TP53 mutations drive clonal evolution, we next analyzed single-cell RNA-seq data from 10,459 TP53-sAML HSPCs alongside 2,056 MF and 5,002 HD HSPCs passing quality control. Force-directed graph analysis revealed separate clustering of TP53-mutant HSPC in comparison with healthy and MF donor cells, with a high degree of interpatient heterogeneity (Extended Data Fig. 4a) as observed in other hematopoietic malignancies 21 . This could potentially be explained by patient-specific cooperating mutations and cytogenetic alterations (Extended Data Fig. 1). TARGET-seq analysis uniquely enabled the comparison of TP53 multihit HSPC to TP53-WT preleukemic stem cells ('pre-LSCs') from the same TP53-sAML patients as well as healthy and MF donors, to derive a specific TP53 multihit signature including known p53-pathway genes (Extended Data Fig. 4b,c).
Integration of single-cell transcriptomes and diffusion map analysis of HSPCs from TP53-sAML patients showed that TP53 multihit HSPCs clustered separately from TP53-WT pre-LSCs in two distinct populations with enrichment of LSC and erythroid-associated transcription, respectively (Fig. 2a and Supplementary Table 3), and a differentiation trajectory toward the erythroid-biased population (Fig. 2b), an unexpected finding given that erythroleukemia is uncommon in TP53-sAML (refs. 22,23). Sorted CD34 + TP53-multi-hit cells exhibited potential for erythroid differentiation in vivo and in vitro, supporting that this occurs downstream of the LSC population (Extended Data Fig. 5a-c). TP53 multihit LSCs showed enrichment of cell cycle, inflammatory, signaling pathways and LSC-associated transcription, whereas TP53 multihit erythroid cells were depleted of the latter (Extended Data Fig. 4d).
To further explore this erythroid-biased population, we projected TP53 multihit cells onto a previously published HD hematopoietic hierarchy 24 . TP53-sAML differed from de novo AML with an enrichment into HSC and early erythroid populations, whereas de novo AML was enriched in myeloid progenitors (Fig. 2c,d) 25 . A similar enrichment was observed for TP53 multihit cells when mapped on a Lin -CD34 + MF the development of rational strategies to diagnose, stratify, treat and potentially prevent this condition.
Myeloproliferative neoplasms (MPN) arise in hematopoietic stem cells (HSC) through the acquisition of mutations in JAK/STAT signaling pathway genes (JAK2, CALR or MPL), leading to aberrant proliferation of myeloid lineages 6 . Progression to secondary acute myeloid leukemia (sAML) occurs in 10-20% of MPN and is characterized by cytopenias, increased myeloid blasts, acquisition of aberrant leukemia stem cell (LSC) properties by hematopoietic stem/progenitor cells (HSPCs) and median survival of less than 1 year 7,8 . TP53 mutations are detected in approximately 20-35% of post-MPN sAML [9][10][11] (collectively termed TP53-sAML), often in association with loss of the remaining WT allele 12 and multiple CNAs 13 . Furthermore, deletion of Trp53 combined with JAK2 V617F mutation leads to highly penetrant myeloid leukemia in mice 11,14 .
Notwithstanding the established role of TP53 mutation in MPN transformation, TP53-mutant subclones are also present in 16% of chronic phase MPN (CP-MPN), and in most cases, this does not herald the development of TP53-sAML (ref. 15). However, little is known about the additional genetic and nongenetic determinants of clonal evolution following the acquisition of a TP53 mutation. Resolving this question requires unraveling multiple layers of intratumoral heterogeneity, including reliable identification of the TP53 mutation, loss of the WT allele and presence of CNA. Integrating this mutational landscape with cellular phenotype and transcriptional signatures will resolve aberrant hematopoietic differentiation and molecular properties of LSC in TP53-sAML. This collectively requires single-cell approaches, which combine molecular and phenotypic analysis of HSPCs with allelic-resolution mutation detection, an approach recently enabled by the TARGET-seq technology 16 .
Taken together, these findings support that TP53-sAML is associated with complex genetic intratumoral heterogeneity.
To characterize tumor phylogenies and subclonal structures, we performed TARGET-seq analysis 16 , a technology that allows allelic-resolution genotyping, whole transcriptome and immunophenotypic analysis from the same single-cell, on 17517 Lin -CD34 + HSPCs from 14 TP53-sAML patients (Extended Data Fig. 1a), 9 age-matched healthy donors (HDs) and 8 previously published myelofibrosis (MF) patients (Fig. 1a, gating strategy shown in Extended Data Fig. 2a). HSPCs WT for all mutations analyzed were present in 10 of 14 patients (Extended Data Fig. 2b-o), providing a valuable population of cells for intrapatient comparison with mutation-positive cells 17 . In all cases, the dominant clone showed loss of WT TP53 through the following four patterns of clonal evolution: (1) bi-allelic TP53 mutations by acquisition of a second mutation on the other TP53 allele, (2) hemizygous TP53 mutations (deleted TP53 WT allele), (3) parallel evolution with two clones harboring different TP53 alterations and (4) a JAK2 negative dominant clone with bi-allelic TP53 mutations in patients with previous JAK2-mutant MPN 18 (Fig. 1b-e and Extended Data Fig. 2b-o). Bi-allelic mutations were confirmed by single molecule Article https://doi.org/10.1038/s41588-023-01480-1 cellular hierarchy (Extended Data Fig. 5d,e), with erythroid-biased populations being highly enriched in immunophenotypically defined MEPs (Extended Data Fig. 5f). Taken together, these findings support an aberrant erythroid-biased differentiation trajectory in TP53-sAML.
To determine whether upregulation of erythroid-associated transcription was a more widespread phenomenon in TP53-mutant AML, we investigated erythroid-myeloid-associated transcription in the BeatAML and The Cancer Genome Atlas (TCGA) cohorts 26,27 . Erythroid scores were increased in TP53 mutant compared to TP53-WT AML, whereas there was no significant difference in myeloid scores ( Fig. 2e-f, Extended Data Fig. 5g-j and scores described in Supplementary Table 3). Concomitantly, patients with high erythroid scores also showed decreased TP53-target gene expression (Extended Data Fig. 5k). We next investigated the expression of key transcription factors for erythroid/ granulomonocytic commitment and found increased GATA1 expression in Lin -CD34 + TP53 multihit HSPCs, whereas CEBPA was only expressed at low levels (Fig. 2g). Analysis of the BeatAML cohort revealed increased GATA1 and reduced CEBPA expression in association with TP53 mutation (Extended Data Fig. 5l), with consequent reduction in the CEBPA/GATA1 expression ratio (Fig. 2h). Similar findings were observed in TP53 knock-out or mutant isogenic MOLM13 cell lines (Extended Data Fig. 5m) 28 . These observations suggest that the CEBPA/GATA1 expression ratio, an important transcription factor balance that affects erythroid versus myeloid differentiation in leukemia 29,30 , is disrupted by TP53 mutation.
As 'stemness scores' have previously been applied to determine prognosis in AML 31 , we next asked whether a single-cell defined TP53 multihit LSC signature might identify AML patients with adverse outcomes. Single-cell multi-omics allowed us to derive a 44-gene 'p53LSC-signature' (Supplementary Table 4) by comparing gene expression of HD, JAK2-mutant MF HSPC and TP53 WT pre-LSC to transcriptionally defined TP53-mutant LSCs (Fig. 2a,k). High p53LSC-signature score (Extended Data Fig. 6a,b) was strongly associated with TP53 mutation status, although some TP53-WT patients also showed a high p53LSC score. A high p53LSC score predicted poor survival in the independent BeatAML and TCGA cohorts, irrespective of TP53 mutational status ( Fig. 2l and Extended Data Fig. 6c-e). The p53LSC signature performed well as a predictor of survival, including in sAML patients, as compared to the previously published LSC17 score 31 and p53-mutant score generated using all TP53-mutant HSPC rather than LSCs (Extended Data Fig. 6f  , hemizygous mutations (c), parallel evolution (d) and JAK2 negative bi-allelic evolution (e). The numbers in parenthesis indicate the number of patients in each category. The size of the circles is proportional to each clone's size, indicated as a percentage of total Lin -CD34 + cells for one representative patient in each group; each clone is colored according to its genotype (related to Extended Data Fig. 2b-o) and red boxes indicate TP53 multihit clones. f, Representative examples from integrated mutation and CNA-based clonal hierarchies. Solid lines indicate the acquisition of a genetic hit (that is point mutation or CNA), whereas dotted lines indicate the specific genetic hit acquired in each step of the hierarchy (related to Extended Data Fig. 2p,q). g, Proportion of TP53 multihit cells classified as carrying clonal or subclonal CNAs in each patient, using a transcriptomic-based CNA clustering approach (inferCNV). h, Experimental strategy for xenotransplantation of CD34 + cells from TP53-sAML patients in immunodeficient mice. i, Percentage of cells carrying CNAs found in each PDX and corresponding Lin -CD34 + cells from the primary TP53-sAML sample transplanted (related to Extended Data Fig. 3; n = 1). j, Model of TP53-sAML genetic evolution. Created with BioRender.com. Article https://doi.org/10.1038/s41588-023-01480-1

TP53-WT cells display self-renewal and differentiation defects
TARGET-seq uniquely enabled phenotypic and molecular characterization of rare TP53-WT cells, referred to as pre-LSCs, which include both residual HSPCs that were WT for all mutations analyzed, as well as HSPCs that form part of the antecedent MPN clone. These pre-LSCs were obtained in sufficient numbers (>20 cells) from 9 of 14 TP53-sAML patients, including all patterns of clonal evolution ( Fig. 3a and Extended Data Fig. 7a). Pre-LSCs representing the antecedent MPN clone (positive for MPN-associated driver mutations) were more frequent (60.5%) than pre-LSCs that were WT for all mutations (39.5%). Pre-LSCs were enriched in HSC-associated genes and mapped onto HSC clusters in healthy and MF donor hematopoietic hierarchies (Fig. 3a,b). Index sorting revealed that pre-LSCs were strikingly enriched in the phenotypic HSC compartment, unlike TP53 multihit HSPCs ( Fig. 3c and Extended Data Fig. 3a). Pre-LSCs were rare, as reflected by a reduction in the numbers of phenotypic HSCs present within the Lin -CD34 + HSPC compartment in TP53-sAML compared to HDs (Extended Data Fig. 7b).
We reasoned that the HSC phenotype of pre-LSCs, with reduced frequency in progenitor compartments, might reflect impaired differentiation. To explore this hypothesis, we carried out scVelo analysis, which showed the absence of a transcriptional differentiation trajectory in pre-LSCs, unlike HD HSCs (Fig. 3d). Pre-LSCs showed increased expression of HSC and Wnt β-catenin genes and decreased cell cycle genes as compared to HD and MF cells ( Fig. 3e-g and Supplementary Table 3). To functionally confirm these findings, we sorted phenotypic HSCs (to purify pre-LSCs), as well as other progenitor cells, from HDs, MF and TP53-sAML patients for long-term culture-initiating cell (LTC-IC) and short-term cultures ( Fig. 3h and Extended Data Fig. 7c). Pre-LSC LTC-IC activity was similar to HDs and increased compared to MF patients, with preserved terminal differentiation capacity and confirmed TP53 WT genotype ( Fig. 3i and Extended Data Fig. 7d-g). In short-term liquid culture, pre-LSCs showed reduced clonogenicity, with retained CD34 expression and decreased proliferation ( Fig. 3j and Extended Data Fig. 7h-i). In summary, we identified rare and phenotypically distinct pre-LSCs from TP53-sAML samples, which were characterized by differentiation defects and distinct stemness, self-renewal and quiescence signatures. As these cells were TP53 WT and showed normal differentiation after prolonged ex vivo culture, we reasoned that these functional and molecular abnormalities are likely to be cell-extrinsically mediated. Indeed, pre-LSCs showed enrichment of gene signatures associated with certain cell-extrinsic inflammatory mediators (TNFα, IFNγ, TGFβ and IL2; Fig. 3k).

Inflammation promotes TP53-associated clonal dominance
To understand the transcriptional signatures associated with leukemic progression, we analyzed samples from 5 CP-MPN patients who subsequently developed TP53-sAML (pre-TP53-sAML) alongside 6 CP-MPN patients harboring TP53-mutated clones who remained in CP (CP TP53-MPN, median 4.43 years (2.62-5.94) of follow-up; Fig. 4a and Extended Data Fig. 8). Compared to TP53-sAML samples, CP TP53-MPN had a lower VAF and number of TP53 mutations (Extended Data Fig. 8a-d). The type, distribution and pathogenicity score of TP53 mutations were similar between chronic and acute stages (Extended Data Fig. 8e,f). All five pre-TP53-sAML samples and four of the six CP TP53-MPN were then analyzed by TARGET-seq (Fig. 4a). HSPC immunophenotype was similar for pre-TP53-sAML and CP TP53-MPN patients (Extended Data Fig. 9a-c), and clearly distinct from the TP53-sAML stage (Extended Data Fig. 9d). Heterozygous TP53 clones were identified in 3 pre-TP53-sAML patients and all 4 CP TP53-MPN ( Fig. 4b and Extended Data Fig. 9e-m). A minor homozygous TP53-mutated clone initially present in one CP TP53-MPN patient was undetectable after 4 years (Extended Data Fig. 9h). As TP53-heterozygous mutant HSPCs represent the direct genetic ancestors of TP53 'multihit' LSCs, we compared gene expression of heterozygous TP53-mutant HSPC from pre-TP53-sAML (n = 296) to CP TP53-MPN (n = 273; Fig. 4b, blue box) to identify putative mediators of transformation. TP53-heterozygous HSPC from pre-TP53-sAML patients showed downregulation of TNFα-and TGFβ-associated gene signatures, both of which are known to be associated with HSC attrition 32,33 , with upregulated expression of oxidative phosphorylation, DNA repair and interferon (IFN) response genes (Supplementary Table 5 and Fig. 4c-e), without changes in IFN receptor expression levels or concurrent IFN treatment (Extended Data Fig. 9n and Supplementary Table 1). Upregulation of inflammatory signatures was detected in TP53-homozygous cells from the same pre-TP53-sAML patients at a higher level than in TP53-heterozygous cells (Extended Data Fig. 9o). Collectively, these findings raise the possibility that inflammation might contribute to preleukemic clonal evolution toward TP53-sAML.
To evaluate the role of inflammation in TP53-driven leukemia progression, we performed competitive mouse transplantation experiments between CD45.1 + Vav-iCre Trp53 R172H/+ and CD45.2 + Trp53 +/+ BM cells followed by repeated poly(I:C) or lipopolysaccharide (LPS) intraperitoneal injections. These experiments recapitulate chronic inflammation through induction of multiple pro-inflammatory cytokines 34,35 known to be increased in the serum of patients with MPN 36 , including IFNγ ( Fig. 5a and Extended Data Fig. 10a). Trp53-mutant peripheral blood (PB) myeloid cells, BM HSCs (Lin -Sca1 + c-Kit + CD150 + CD48 − ) and LSKs (Lin -Sca1 + c-Kit + ) were selectively enriched upon poly(I:C) treatment (Fig. 5b,c and Extended Data Fig. 10b-e). Crucially, the fitness advantage of Trp53-mutant HSCs and LSKs was exerted both through an increase in the numbers of Trp53 R172H/+ HSPCs and a reduction in the numbers of WT competitors (Fig. 5d,e and Extended Data Fig. 10f,g). Treatment of chimeric mice with LPS ( Fig. 5a), which induces an inflammatory response mediated through the release of IL1β and IL6 (ref. 37), among others, led to a similar increase in the number of Trp53-mutant PB myeloid cells and LSKs ( Fig. 5f,g). These results indicate that a variety of inflammatory stimuli can promote expansion of the Trp53-mutant clone.
To determine how inflammation might alter hematopoietic differentiation and exert a selective pressure to drive the expansion of the Trp53-mutant clone, we established an inducible SCL-CreER T Trp53 R172H/+ mouse model (Fig. 5h). Poly(I:C) treatment led to . e,f, Expression of an erythroid (e) and myeloid (f) gene score in AML patients from the BeatAML dataset stratified by TP53 mutational status (n = 329 TP53 WT; n = 31 TP53 mutant). g, CEBPA (top) and GATA1 (bottom) expression in the same cells as in a and b. h, CEBPA and GATA1 expression ratio in the same patient cohort as in e and f. i,j, Proportion of immature erythroid (CD235a + CD71 + ) and myeloid (CD14 + , CD15 + or CD11b + ) cells (i) and ratio of CEBPA to GATA1 expression in total cells (j) after 12 d of differentiation of peripheral blood CD34 + cells from patients with MPN transduced with shRNA targeting TP53 or shCTR. n = 5 patients, three independent experiments. Barplot indicates mean ± s.e.m. and two-tailed paired t-test P value (related to Extended Data Fig. 5n). k, Schematic representation of the key analytical steps to derive a 44-gene TP53-LSC sAML signature. l, Kaplan-Meier analysis of AML patients (n = 322) from the BeatAML cohort stratified by p53-LSC signature score (high, above median; low, below median) derived in k (related to Extended Data Fig. 6). P indicates log-rank test P value and HR, hazard ratio. All boxplots represent the median, first and third quartiles, and whiskers correspond to 1.5 times the interquartile range; 'P' indicates Wilcoxon rank sum two-sided test P value in panels e,f,h. inflammation-associated changes in blood cell parameters, including anemia, leucopenia and thrombocytopenia (Extended Data Fig. 10h-j). Similar to the Vav-iCre model, poly(I:C) treatment promoted the selection of myeloid Trp53-mutant cells in the PB (Extended Data Fig. 10k), with a myeloid bias induced by the inflammatory stimulus in PB leukocytes specifically associated with Trp53-mutation (Fig. 5i,j and Extended Data Fig. 10l). Analysis of HSPCs showed the expected selection for Trp53-mutant HSCs and LSKs following Poly(I:C) treatment (Extended Data Fig. 10m). Numbers of WT competitor erythroid progenitors were reduced upon poly(I:C) treatment as expected 38 , whereas Trp53-mutation was associated with an increase in erythroid progenitors that was not impacted by inflammation ( Fig. 5k and Extended Data Fig. 10n) in line with the erythroid bias detected in patient samples. Finally, to determine the mechanisms by which inflammation might promote a fitness advantage for Trp53-mutated cells, we performed cell cycle and apoptosis    Table 3). b, Projection of TP53-WT (n = 880) pre-LSCs on HD (left) and MF (right) hematopoietic hierarchy (related to Fig. 2c and Extended Data Fig. 5d). c, Immunophenotype of Lin -CD34 + CD38 − cells from four representative sAML patients colored by genotype. Lin -CD34 + CD38 − CD90 + CD45RA − cells (HSCs) were enriched using the sorting strategy outlined in Extended Data Fig. 2a. d, scVelo analysis of differentiation trajectories of Lin -CD34 + cells from one HD (left) and two representative TP53-sAML patients (middle and right). Insets show HSC or pre-LSCs clusters. e-g, Scores of HSC (e), WNT β-catenin signaling (f) and cell-cycle (g) associated transcription in Lin -CD34 + CD38 − cells from HDs (n = 730 cells), MF (n = 1,106 cells) and pre-LSCs from TP53-sAML patients (n = 880 cells). Boxplots represent the median, first and third quartiles, and whiskers correspond to 1.5 times the interquartile range; the white square indicates the mean for each group. P indicates the Wilcoxon rank sum test P value. h-j, Functional analysis of pre-LSCs. Schematic representation of HSC in vitro assays (h), LTC-IC colony forming unit activity (i) and short-term in vitro liquid culture clonogenicity (j) of HSC from HDs (n = 4), MF (n = 3) and pre-LSCs from TP53-sAML patients (n = 3, samples used (IF0131, IF0391 and GR001) were known to have TP53-WT pre-LSC in the HSC compartment). Violin plot indicates points' density; dashed lines, median and quartiles, two independent experiments (i); barplot indicates mean ± s.e.m., three independent experiments with 30 colonies plated per experiment (j). P indicates two-tailed t-test P value. k, GSEA analysis of HALLMARK inflammatory pathways in pre-LSCs compared to HDs; positive NES in the heatmap represents significant (FDR q value < 0.25) enrichment in pre-LSCs, values indicate NES for each pathway.

Trp53-mutant HSPC
As exit from dormancy promotes DNA-damage-induced HSPCs attrition 41 , we reasoned that Trp53 mutation might rescue HSPCs that acquire DNA damage (and would otherwise undergo apoptosis) driven by chronic inflammation-associated proliferative stress. To explore this possibility, we carried out multiplex fluorescence in situ hybridization (M-FISH) karyotype analysis of Trp53 +/+ LSKs expanded in vitro from mice following poly(I:C) treatment and Trp53 R172H/+ LSKs from mice with or without poly(I:C) treatment. WT competitor LSK-derived cells from poly(I:C) treated mice were karyotypically normal. In contrast, we observed a striking increase in the frequency and number of karyotypic abnormalities in Trp53-mutated LSK-derived cells upon poly(I:C) treatment ( Fig. 6a-d). Collectively, these results support a model whereby chronic inflammation promotes the survival and genetic evolution of TP53-mutated cells while suppressing WT hematopoiesis, ultimately promoting the clonal expansion of TP53-mutant HSPCs (Fig. 6e).

Discussion
Here we unravel multilayered genetic, cellular and molecular intratumoral heterogeneity in TP53 mutation-driven disease transformation through single-cell multi-omic analysis. Allelic resolution genotyping of leukemic HSPCs revealed a strong selective pressure for gain of TP53 missense mutation, loss of the TP53-WT allele and acquisition of complex CNAs, including cases with parallel genetic evolution during TP53-sAML LSC expansion. Despite the known dominant negative and/or gain of function effect of certain TP53 mutations 28,42 , loss of the TP53-WT allele, a genetic event associated with a particularly dismal prognosis 2 , confers an additional fitness advantage to TP53-sAML LSCs. As CNA were universally present in TP53-sAML with a very high clonal burden, it is not possible, even with high-resolution single-cell analyses, to disentangle the impact of TP53-multi-hit mutation versus the effects of patient-specific CNA that were inextricably linked in all patients analyzed.
Three distinct clusters of HSPCs were identified in TP53-sAML, including one characterized by overexpression of erythroid genes, of particular note as erythroleukemia is a rare entity, associated with adverse outcomes and TP53 mutation 43,44 . Analysis of a large AML cohort also revealed overexpression of erythroid genes as a more widespread phenomenon in TP53-mutant AML, with disrupted balance of   Article https://doi.org/10.1038/s41588-023-01480-1 GATA1 and CEBPA expression. CEBPA knockout or mutation is reported to cause a myeloid to erythroid lineage switch with increased expression of GATA1 (refs. 29,30) and, in addition, GATA1 interacts with and inhibits p53 (ref. 45). Notably, our data do not distinguish whether this lineage-switch is primarily an instructive versus permissive effect of TP53-mutation 46 . A second 'TP53-sAML LSC' cluster allowed us to identify a p53LSC-signature, which can predict outcomes in AML independently of TP53 status. This powerful approach could be more broadly applied in cancer, whereby single multi-omic cell-derived gene scores can be used to stratify larger patient cohorts using bulk gene expression data.
A third TP53 WT 'pre-LSC' HSPC cluster was characterized by quiescence signatures and defective differentiation, reflecting the impaired hematopoiesis observed in patients with TP53-sAML. Through the integration of single-cell multi-omic analysis with in vitro and in vivo functional assays, we show that TP53-WT pre-LSCs are cell-extrinsically suppressed while chronic inflammation promotes the fitness advantage of TP53-mutant cells, ultimately leading to clonal selection (Fig. 6e). Inflammation is a cardinal regulator of HSC function with many effects on HSC fate and function 47 , including proliferation-induced DNA-damage and depletion of HSCs 41 . There is emerging evidence that clonal HSCs can become inflammation-adapted 47-49 and by altering the response to inflammatory challenges, mutations can thus confer a fitness advantage to HSCs. Here we demonstrate a hitherto unrecognized effect of TP53 mutations, which conferred a marked fitness advantage to HSPCs in the presence of chronic inflammation induced with both poly(I:C) as well as LPS. We provide evidence that TP53-mutant HSPCs showing dysregulated inflammation-associated gene expression are enriched in patients who will develop TP53-sAML. We propose that HSCs that would otherwise undergo inflammation-associated and DNA-damage-induced attrition are rescued by TP53 mutation, ultimately leading to the accumulation of HSCs that have acquired DNA damage, thus promoting genetic evolution that underlies disease progression. This hypothesis was strongly supported through in vivo experiments in which inflammation promoted genetic evolution of Trp53-mutant mouse HSPCs. Further studies are required to characterize the key inflammatory mediators and molecular mechanisms involved, which we believe are unlikely to be restricted to a single axis, with a myriad of inflammatory mediators overexpressed in MPN 50 . Furthermore, loss of the Trp53-WT allele confers an additional fitness advantage to Trp53-mutant HSPC following DNA damage as previously described 28 , providing an explanation for the selection for multihit TP53-mutant clones observed in patients. Consequently, we believe that approaches that target the inflammatory state, rather than a specific cytokine, are likely to be required to restrain disease progression, as reported for bromodomain inhibitors, which, when combined with JAK2 inhibition, markedly reduce the serum levels of inflammatory cytokines 51 . Collectively, our findings provide a crucial conceptual advance relating to the interplay between genetic and nongenetic determinants of TP53-mutation-associated disease transformation. This will facilitate the development of early detection and treatment strategies for TP53-mutant leukemia. Because TP53 is the most commonly mutated gene in human cancer 3,52 , we anticipate that these findings will be of broader relevance to other cancer types.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41588-023-01480-1.  , according to the manufacturer's protocols. Raw sequence data in FASTQ format were analyzed using the following variant callers and as previously described 16,53 : BWA v-0.7.12 (read alignment); Picard-tools (marking duplicates); samtools v-1.2; v-1.139 (BAM file creation); GATK HaplotypeCaller v-3.4-46 GRVC v-1.1; snpEff v-4.0 (variant calling). Run quality control included %DP_100X (>95%), %DP_200X (>90%), number of reads per sample and % reads q30 forward and reverse (>85%), read quality mean (>30) and percentage of mapped reads (>75%). A minimum of ten reads was required for variant calling. Results were analyzed after alignment of the reads using two dedicated pipelines, SOPHiA DDM (Sophia Genetics) and an in-house software GRIO-Dx. All pathogenic variants were manually checked using Integrative Genomics Viewer software. The analysis is presented in Extended Data Figs. 1a and 8a.
Pathogenic scores for each TP53 variant (Extended Data Fig. 8e) were derived from the Catalog of Somatic Mutations in Cancer using the FATHMM-MKL algorithm. The FATHMM-MKL algorithm integrates functional annotations from ENCODE with nucleotide-based hidden Markov models to predict whether a somatic mutation is likely to have functional, molecular and phenotypic consequences. Scores greater than 0.7 indicate that a somatic mutation is likely pathogenic, while scores less than 0.5 indicate a neutral classification.
The type and location of TP53 mutations from this study, de novo AML patients and CHIP individuals represented in Extended Data

Sanger sequencing of patient-associated mutations in PDX models
Genomic DNA from PDX sorted populations (LMPP: hCD45 + Lin -CD34 + C D38 − CD45RA + CD90 − and GMP: hCD45 + Lin -CD34 + CD38 + CD45RA + CD123 + ) was extracted using QIAamp DNA Mini Kit (Qiagen). Sanger sequencing was performed with forward or reverse primers (Supplementary   Table 6a) targeting mutations identified by targeted bulk sequencing in the corresponding primary samples using Mix2seq kit (Eurofins Genomics) and sequences were analyzed with the ApE editor.

SNP array sample preparation, copy number variant and loss of heterozygosity analysis
Bulk genomic DNA from patients' MNCs was isolated using DNeasy Blood & Tissue Kit (Qiagen) as per the manufacturer's instructions. 250 ng of gDNA was used for hybridization on an Illumina Infinium OmniExpress v1.3 BeadChips platform.
To call mosaic copy number events in primary patient samples, genotyping intensity data generated were analyzed using the Illumina Infinium OmniExpress v1.3 BeadChips platform. Haplotype phasing, calculation of log R ratio (LRR) and B-allele frequency (BAF), and calling of mosaic events were performed using MoChA WDL pipeline v2021-01-20 (MoChA: a BCFtools extension to call mosaic chromosomal alterations starting from phased VCF files with either BAF and LRR or allelic depth) as previously described 59,60 . In brief, MoChA comprises the following steps: (1) filtering of constitutional duplications; (2) use of a parameterized hidden Markov model to evaluate the phased BAF for variants on a per-chromosome basis; (3) deploying a likelihood ratio test to call events; (4) defining event boundaries; (5) calling copy number and (6) estimating the cell fraction of mosaic events. A series of stringent filtering steps were applied to reduce the rate of false positive calls. To eliminate possible constitutional and germline duplications, we excluded calls with lod_baf_phase <10, those with length <500 kbp and rel_cov >2.5, and any gains with estimated cell fraction >80%, log(R) > 0.5 or length <24 Mb. Given that interstitial LOH are rare and likely artefactual, all LOH events <8 Mb were filtered 59 . Events on genomic regions reported to be prone to recurrent artifact 59 (chr6 < 58 Mb, chr7 > 61 Mb and chr2 > 50 Mb) were also filtered, and those where manual inspection demonstrated noise or sparsity in the array.
To find common genomic lesions on a focal and arm level, Infinium OmniExpress arrays were initially processed with Illumina Genome Studio v2.0.4. Following this, LRR data were extracted for all probes and array annotation was obtained from Illumina (InfiniumOmniExpress-24v1-3_A1). LRR data were then smoothed and segmentation called using the CBS algorithm from the DNACopy 61,62 v1.60.0 package in R. A minimum number of five probes was required to call a segment, and segments were analyzed using GenomicRanges 63 v1.38.0. Definitions of amplification, gain, loss and deletion events were as outlined in ref. 64. Segmentation data were then analyzed in GISTIC 65 v2.023.

Fluorescence-activated cell sorting (FACS) and single-cell isolation
Single-cell FACS-sorting was performed as previously described 16  Briefly, single cells were directly sorted into 384-well plates containing 2.07 μl of TARGET-seq lysis buffer 66 . Lineage-CD34 + cells were indexed for CD38, CD90, CD45RA, CD123 and CD117 markers, which allowed us to record the fluorescence levels of each marker for each single cell. The 7-aminoactinomycin D (7-AAD) was used for dead cell exclusion. Flow cytometry profiles of the human HSPC compartment (Extended Data Figs. 2 and 9) were analyzed using FlowJo software (version 10.1, BD Biosciences).

Single-cell TARGET-seq cDNA synthesis
Reverse transcription (RT) and PCR steps were performed as previously described 66 . Briefly, SMARTScribe (Takara, 639537) retrotranscriptase, RNAse inhibitor (Takara, 2313A) and a template-switching oligo were added to the cell lysate to perform the retrotranscription step. Immediately after, a PCR mix comprised of SeqAMP (Takara, 638509) and ISPCR primer (binding to a common adapter sequence in all cDNA molecules) was used for the PCR step with 24 cycles of amplification. Target-specific primers spanning patient-specific mutations were also added to RT and PCR steps (Supplementary Table 6a). After cDNA synthesis, cDNA from up to 384 single-cell libraries was pooled, purified using Ampure XP Beads (0.6:1 beads to cDNA ratio; Beckman Coulter) and resuspended in a final volume of 50 μl of EB buffer (Qiagen). The quality of cDNA traces was checked using a high-sensitivity DNA kit in a Bioanalyzer instrument (Agilent Technologies).

Whole transcriptome library preparation and sequencing
Pooled and bead-purified cDNA libraries were diluted to 0.2 ng μl −1 and used for tagmentation-based library preparation using a custom P5 primer and 14 cycles of PCR amplification 66 . Each indexed library was purified twice with Ampure XP beads (0.7:1 beads to cDNA ratio), quantified using Qubit dsDNA HS Assay Kit (Invitrogen, Q32854) and diluted to 4 nM. Libraries were sequenced on a HiSeq4000, HiSeqX or NextSeq instrument using a custom sequencing primer for read1 (P5_seq: GCCTGTCCGCGGAAGCAGT GGTATCAACGCAGAGTTGC*T, PAGE purified) with the following sequencing configuration: 15 bp R1; 8 bp index read; 69 bp R2 (NextSeq) or 150 bp R1; 8 bp index read; 150 bp R2 (HiSeq).

TARGET-seq single-cell genotyping
After RT-PCR, cDNA + amplicon mix was diluted 1:2 by adding 6.25 μl of DNAse/RNAse free water to each well of 384-well plate. Subsequently, a 1.5 μl aliquot from each single-cell derived library was used as input to generate a targeted and Illumina-compatible library for single-cell genotyping 66 . In the first PCR step, target-specific primers containing a plate-specific barcode (Supplementary Table 6b) were used to amplify the target regions of interest. In a subsequent PCR step, Illumina compatible adapters (PE1/PE2) containing single-direction indexes (Access Array Barcode Library for Illumina Sequencers-384, Single Direction, Fluidigm) were attached to pre-amplified amplicons, generating single-cell barcoded libraries. Amplicons from up to 3,072 libraries were pooled and purified with Ampure XP beads (0.8:1 ratio beads to product; Beckman Coulter). These steps were performed using Biomek FxP (Beckman Coulter), Mosquito (TTP Labtech) and VIAFLO 96/384 (INTEGRA Biosciences) liquid handling platforms. Purified pools were quantified using Qubit dsDNA HS Assay Kit (Invitrogen, Q32854) and diluted to a final concentration of 4 nM. Libraries were sequenced on a MiSeq or NextSeq instrument using custom sequencing primers as previously described 66 with the following sequencing configuration: 150 bp R1; 10 bp index read; 150 bp R2.

Targeted single-cell genotyping analysis
Data preprocessing. For each cell, the FASTQ file containing both targeted gDNA and cDNA-derived sequencing reads was aligned to the human reference genome (GRCh37/hg19) using Burrows-Wheeler Aligner (BWA v0.7.17) and STAR 67 (v2.6.1d). Custom perl scripts were used to demultiplex the gDNA and mRNA reads in the BAM file into separate SAM files based on targeted-sequencing primer coordinates (https://github.com/albarmeira/TARGET-seq). Next, Samtools 68 (v1.9) was used to concatenate the BAM header to the resulting SAM files before reconverting the SAM file to BAM format, which was subsequently sorted by genomic coordinates and indexed. Both gDNA and mRNA reads were tagged with the cell's unique identifier using Picard (v2.3.0) 'AddOrReplaceReadGroups' and duplicate reads were subsequently marked using Picard 'MarkDuplicates'. The sequencing reads overhanging into intronic regions in the mRNA reads were additionally hard-clipped using GATK (v4.1.2.0) SplitNCigarReads 69,70 .
Variant calling. Variants were called from the processed BAM files using GATK Mutect2 with the options (--tumor-lod-to-emit 2.0 --disable-read-filter NotDuplicateReadFilter --max-reads-peralignment-start) to increase the sensitivity of detecting low-frequency variants. The frequency of each nucleotide (A, C, G, T) and indels at each predefined variant site were also called using a Samtools mpileup as previously described 16 . Lastly, the coverage at each predefined variant site was computed using Bedtools 71 (v2.27.1).
To determine the coverage threshold of detection for each variant site, the coverage for 'blank' controls (empty wells) was first tabulated. A cut-off coverage outlier value was computed as having a coverage exceeding 1.5 times the length of the interquartile range from the 75th percentile. Next, a value of 30 was added to this outlier value to yield the final coverage threshold to be used for genotype assignment.
Genotype assignment. For each predefined variant site, the number of reads representing the reference and alternative (variant) alleles for indels (insertion and deletions) and single nucleotide variants (SNVs) were tabulated from the outputs of GATK Mutect2 and Samtools mpileup, respectively.
Here a genotype scoring system was introduced to assign each variant site into one of the following three possible genotypes: WT, heterozygous or homozygous mutant. Chi-square ( 2 ) test was first used to compare the observed frequency of reference and alternative alleles against the expected fraction of reference and alternative alleles corresponding to the three genotypes. The expected fraction of the reference alleles was 0.999, 0.5 and 0.001, and the expected fraction of the alternative alleles was 0.001, 0.5 and 0.999 for WT, heterozygous and homozygous mutant genotype, respectively. The 2 statistics were then tabulated for each fitted model and converted to genotype scores using the following formula: Score genotype = 1 log 10 (χ 2 + 1) Article https://doi.org/10.1038/s41588-023-01480-1 The genotype assigned to the variant site was based on the genotype model with the highest score.
Next, the variant (alternative) allele frequency (VAF) was computed and variant sites with 2 < VAF < 4 and 96 < VAF < 98 were reassigned as 'ambiguous'. For cells with no variants detected at the specific variant sites by the mutation callers (either due to the absence of the variants, that is WT genotype, or that such variants were present below the detection limit), a 'WT' genotype was assigned to those cells with coverage above the specific threshold and 'low coverage' to those cells with coverage below such threshold.
Taken together, each variant site was assigned one of the five following genotypes: WT, heterozygous, homozygous mutant, ambiguous or low coverage. Variants with ambiguous or low cover age assignments for a particular cell were excluded from the analysis.

Computational reconstruction of clonal hierarchies
Genotypes for each single cell were recoded for input to SCITE in a manner inspired by ref. 72; each mutation in each gene was coded as two loci, representing two different alleles. In the first recorded loci, all homozygous calls from each mutation where coded as heterozygous genotype calls. In the second recorded loci, all heterozygous and homozygous genotype calls in the original mutation matrix were coded as homozygous reference (that is, WT) and heterozygous, respectively. For example, if for a certain mutation 0 represents WT status, 1 encodes heterozygous and 2 refers to homozygous status, these would be encoded as (0,0), (1,0) and (1,1), respectively, where the first term in the parenthesis corresponds to the first loci and the subsequent to the second loci.
Then, SCITE was used (git revision 2016b31, downloaded from https://github.com/cbg-ethz/SCITE.git; ref. 73) to sample 1,000 mutation trees from the posterior for every single-cell genotype matrix corresponding to a particular patient, where all possible mutation trees are equally likely a priori. For patients in which several disease time points were available, all time points were merged for SCITE analysis. Clone assignment. For every patient's most common posterior tree, we assigned every cell to the tree node that matches the genotype of that particular cell. If an exact match was not found, then for every tree node, the loss of assigning a cell to that node was calculated using the following loss function: where M is a confusion matrix generated across all loci of a cell in which the first index represents the genotype that was measured for that particular cell (1 = homozygous reference, 2 = heterozygous, 3 = homozygous alternate), and the second index represents the genotype implied by the tree node. ADO = 0.01 and FD = 0.001 were used. Every cell was assigned to the node with the lowest loss l. For the trees  Table 8.

Computational validation of TP53 bi-allelic status from single-cell targeted genotyping datasets
To further validate the bi-allelic status of TP53 mutations in our dataset, the patterns of ADO in TARGET-seq single-cell genotyping data from patients carrying at least two different TP53 mutations were investigated (n = 6; Extended Data Fig. 1j).
If TP53 mutations are bi-allelic, the expected number of WT/HOM and HOM/WT would be higher than HOM/HOM cells considering TARGET-seq expected ADO rates (1-5%).

Single-cell 3′-biased RNA-sequencing data preprocessing
FASTQ files for each single cell were generated using bcl2fastq (version 2.20) with default parameters and the following read configuration: Y8N*, I8, Y63N*. Read 1 corresponds to a cell-specific barcode, index read corresponds to an i7 index sequence from each cDNA pool and read 2 corresponds to the cDNA molecule. PolyA tails were trimmed from demultiplexed FASTQ files with TrimGalore (version 0.4.1). Reads were then aligned to the human genome (hg19) using STAR (version 2.4.2a), and counts for each gene were obtained with FeatureCounts (version 1.4.5-p1; options-primary). Counts were then normalized by dividing each gene count by the total library size of each cell and multiplying this value by the median library size of all cells processed, as implemented in the 'normalize_UMIs' function from the SingCellaR package 74 (version 1.2.1; https://github.com/supatt-lab/SingCellaR). A summary of the preprocessing pipeline can be found at https:// github.com/albarmeira/TARGET-seq-WTA.
Quality control was performed using the following parameters: number of genes detected >500, percentage of ERCC-derived reads <35%, percentage of mitochondrial reads <0.25% and percentage of unmapped reads <75%. Cells with less than 2,000 reads in batch1, 5,000 reads in batch2 and 20,000 reads in batch3 were further excluded. This QC step was performed independently for each sequencing batch owing to differences in sequencing depth (mean library size: 42,949 batch 1, 93,580 batch 2 and 171,393 batch 3). After these QC steps, 7,123 cells passed QC for batch 1, 5,779 for batch 2 and 6,319 for batch 3 (79.3%, Article https://doi.org/10.1038/s41588-023-01480-1 68.9% and 80.3% of cells processed, respectively). Then, 2,734 cells from a previously published study 16 corresponding to 8 MF patients and 2 normal donor controls were further integrated, encompassing a final dataset of 21,955 cells in total.

Identification of highly variable genes
Highly variable genes above technical noise were identified by fitting a gamma generalized linear model (GLM) of the log 2 (mean expression level) and coefficient of variation for each gene, using the 'get_vari-able_genes_by_fitting_GLM_model' from SingCellaR package and the following options: mean_expr_cutoff = 1, disp_zscore_cutoff = 0.1, quantile_genes_expr_for_fitting = 0.6 and quantile_genes_cv2_for_ fitting = 0.2. Those genes with a coefficient of variation above the fitted model and expression cut-off were selected for further analysis, excluding those annotated as ribosomal or mitochondrial genes.

CNA inference from single-cell transcriptomes
InferCNV (v1.0.4) was used to identify CNAs in single-cell transcriptomes 75 (https://github.com/broadinstitute/inferCNV/wiki). Briefly, inferCNV creates genomic bins from gene expression matrices and computes the average level of expression for each of these bins. The expression across each bin is then compared to a set of normal control cells, and CNAs are predicted using a hidden Markov model. For each patient, inferCNV was performed with the following parameters: 'cutoff = 0.1, denoise = T, HMM = T', compared to the same set of normal donor control cells (n = 992). To identify CNA subclones, inferCNV in analysis_mode = 'subclusters' was used. CNAs identified by inferCNV were manually curated by removing those with size <10 kb, merging adjacent CNA calls with identical CNA status into larger CNA intervals and comparing them to SNP-Array bulk CNA calls. Finally, to generate combined TARGET-seq single-cell genotyping and CNA-based clonal hierarchies, the CNA status from each inferCNV cluster was assigned to its predominant genotype.

Dimensionality reduction, data integration and clustering
PCA was performed using 'runPCA' function from the SingCellaR R package, and Force-directed graph analysis was subsequently performed using the 'runFA2_ForceDirectedGraph' with the top 30 PCA dimensions to generate the plots in Extended Data Fig. 4a.
For the analysis of patient IF0131 presented in Extended Data Fig. 3m, PCA was performed using 'runPCA' function from the SingCellaR R package and then UMAP was performed using the 'runUMAP' function with the top ten PCA dimensions and the following options: n.neighbors = 20, uwot.metric = 'correlation', uwot.min. dist = 0.30, n.seed = 1.
Integration of TARGET-seq single-cell transcriptomes from 10,459 cells corresponding to 14 TP53-sAML samples was performed using 'runHarmony' function implemented in the SingCellaR package, using the patient ID as covariate and the following options: n.dims.use = 20, harmony.theta = 1, n.seed = 1. Diffusion map analysis was performed using 'runDiffusionMap' with the integrative Harmony embeddings and the following parameters: n.dims.use = 20, n.neighbors = 5, distance = 'euclidean'. Signature scores were calculated using 'plot_diffu-sionmap_label_by_gene_set' to generate the plots in Figs. 2a and 3a. Only cells with assigned genotypes 'TP53 multihit' and 'TP53-WT' are shown.

Pseudotime trajectory analysis
Monocle3 (ref. 76; https://cole-trapnell-lab.github.io/monocle3/) was used to infer differentiation trajectories from single-cell transcriptomes. Raw UMI count matrix and clustering annotations were extracted from the SingCellaR object to build a Monocle3 'cds' object. Next, we retrieved the first two components of the diffusion map (DC1 and DC2), and the 'learn_graph' function was used to calculate the trajectory on the two-dimensional diffusion map, using TP53-WT preleukemic cell cluster as the root node. Pseudotime was calculated using 'order_cells' function and overlayed on the diffusion map embeddings to generate the plot in Fig. 2b.

Differential expression analysis
Differentially expressed genes from TARGET-seq datasets were identified using a combination of nonparametric Wilcoxon test, to compare the expression values for each group, and Fisher's exact test, to compare the frequency of expression for each group, as previously described 17 . Logged normalized counts were used as input for this comparison, including genes expressed in at least two cells. Combined P values were calculated using Fisher's method and adjusted P values were derived using Benjamini-Hochberg procedure. Significance level was set at P-adjusted < 0.05. For the analysis presented in Extended Data Fig. 4b and Supplementary Table 2, the top 100 differentially expressed genes with log 2 (FC) > 0.3 and at least 20% expressing cells are shown. Differentially expressed genes identified between TP53-multihit versus TP53-WT cells were further assessed for the enrichment of known p53 target genes (337 curated p53 target genes from ref. 77) for the analysis presented in Extended Data Fig. 4c. We assessed the extent of overlap of these gene lists using the R package GeneOverlap. The overlapping genes were further assessed for the enrichment of p53-related pathways using the R package clusterProfiler.
For the analysis presented in Fig. 2k,l, only genes overexpressed in TP53 multihit cells and log 2 (FC) > 0.75 were included; for Fig. 4d, only those with log 2 (FC) > 1 were considered. Violin plots ( Fig. 4e and Extended Data Fig. 9n) from selected differentially expressed genes were generated using 'ggplot2' package in R.
For analysis involving >600 cells per group ( Fig. 3k and Extended Data Figs. 4d and 9o), GSEA was performed with 'identifyGSEAPre-rankedGene' function from SingCellaR R package with default options. Briefly, differential expression analysis was performed between two cell populations using the Wilcoxon rank sum test, and the resulting P values were adjusted for multiple testing using the Benjamini-Hochberg approach. Before the differential expression analysis, down-sampling was performed so that both cell populations had the same number of cells. Next, −log 10 (P value) transformation was performed and the resulting P values were multiplied by +1 or −1 if the corresponding log 2 (FC) was >0.1 or <−0.1, respectively. The gene list was ranked using this statistic in ascending order and used as input for GSEA analysis using 'fgsea' function from the fgsea R package with default options.

Projection of single-cell transcriptomes
A previously published human hematopoietic atlas was downloaded from https://github.com/GreenleafLab/MPAL-Single-Cell-2019 and used as a normal hematopoietic reference to project TP53-sAML and de novo AML transcriptions using Latent Semantic Index Projection 24 . Common genes to all datasets were selected, and then TP53-sAML or previously published de novo AML cells 25 were projected using 'projectLSI' function for the analysis presented in Fig. 2c,d. A previously published human MF atlas 78 was used as a reference to project TP53-sAML multihit cells in the analysis presented in Extended Data Fig. 5d,e, using previously defined force-directed graph embeddings. Article https://doi.org/10.1038/s41588-023-01480-1

Velocyto analysis
Loom files were generated for each single cell using velocyto (v0.17.13) with options -c and -U, to indicate that each BAM represents an independent cell and reads are counted instead of molecules (UMIs), respectively 79 . The individual loom files were subsequently merged using the combine function from the loompy Python module.
HDs with at least 300 cells with RNA-sequencing data and patients with at least 300 cells consisting of >50 preleukemic (TP53 WT) cells and >50 TP53 multihit cells were included for analysis. For each individual, the Seurat object was created from the merged loom file and processed for downstream RNA-velocity analysis 80 . Specifically, for each patient, the spliced RNA counts were normalized using regularized negative binomial regression with the SCTransform function 81 . Next, linear dimension reduction was performed using RunPCA function and the first 30 principal components were further used to perform nonlinear dimension reduction using the RunUMAP function. Ninety-six multiple rate kinetics (MURK) genes previously shown to possess coordinated step-change in transcription and hence violate the assumptions behind scVelo were removed 82 . The processed and MURK gene-filtered Seurat object was then saved in h5Seurat format using the SaveH5Seurat function and finally converted to h5ad format using the 'Convert' function.
AnnData object was created from the h5ad file using the scvelo python module for RNA velocity analysis 83 . Highly variable genes were identified and the corresponding spliced and unspliced RNA counts were normalized and log 2 -transformed using the scvelo.pp.filter_ and_normalize function. Next, the first-and second-order moments were computed for velocity estimation using the scvelo.pp.moments function. The velocities (directionalities) were computed based on the stochastic model as defined in the scvelo.t1.velocity function, and the velocities were subsequently projected on the UMAP embeddings generated from Seurat above. Finally, the UMAP embeddings were annotated using the HSPC and erythroid lineage signature scores 74 and TP53 mutation status. For each cell, the cell lineage signature score was computed using the average SCTransform expression values of the individual cell lineage genes.

Analysis of bulk BeatAML and TCGA gene expression datasets
Data retrieval and preprocessing. Two publicly available AML cohorts with genetic mutation and RNA-sequencing data available were used to validate findings from our single-cell analysis, namely BeatAML 26 and TCGA 27 . Gene expression values in FPKM (fragments per kilobase of transcript per million mapped reads) were retrieved from the National Cancer Institute (NIH) Genomic Data Commons (GDC) 84 . Gene expression values were then offset by 1 and log 2 transformed. TP53 point mutation status was retrieved from the cBio Cancer Genomics Portal (cBioPortal) 85 . Clinical data including survival data for BeatAML and TCGA were retrieved from the BeatAML data viewer (Vizome) and NIH GDC, respectively.
We selected samples from the BeatAML cohort with an AML diag nosis (540 de novo AML and 96 secondary AML) collected within 1 month of the patient's enrollment in the study, and with both TP53 mutation status and RNA-sequencing data available. For patients for whom multiple samples were available, samples were collapsed to obtain patient-level data. Specifically, the mean gene expression value for each gene from multiple samples was used to represent patient-level gene expression value. Furthermore, patients with at least one sample with a TP53 mutation were considered TP53-mutant. Analysis of TP53 VAF and reported karyotypic abnormalities indicated that the vast majority of patients could be classified as 'multi-hit', and therefore patients were classified as TP53-mutant or WT without taking into account TP53 allelic status. In total, 360 patients with TP53 mutation status (329 TP53 WT and 31 TP53 mutant) and RNAsequencing data available were included for analysis. Of these, 322 patients had concomitant survival data available (294 TP53 WT and 28 TP53 mutant).
The TCGA cohort consisted of 200 de novo AML patients represented by one sample each, of which 151 patients had TP53 mutation status (140 TP53 WT and 11 TP53 mutant) and RNA-sequencing data available, and were included for analysis. Of these, 132 patients had concomitant survival data available (124 TP53 WT and 8 TP53 mutant).
Cell lineage gene signature scores. For each sample, a given cell lineage gene signature score was computed as the mean expression values of the individual genes belonging to the cell lineage gene signature. Here the gene signature scores for two cell lineages were computed, namely myeloid and erythroid populations. Two gene sets for each cell lineage were compiled. The first gene set was based on cell lineage markers previously reported in the literature, whereas the second gene set was based on cell lineage markers derived from analyzing a published single-cell dataset 24 . Genes from each score are described in Supplementary Table 3.
For the former approach, six erythroid genes (KLF1, GATA1, ZFPM1, GATA2, GYPA and TFRC; Fig. 2e and Extended Data Figs. 5k,m and seven myeloid genes (FLI1, SPI1, CEBPA, CEBPB, CD33, MPO and IRF8; Fig. 2f) were identified. For the latter approach, the expression values of erythroid and myeloid cell clusters were first compared separately against all other cell clusters using Wilcoxon ranked sum test. The erythroid cluster consisted of the early and late erythroid populations, while the myeloid cluster consisted of granulocyte, monocyte and dendritic cell populations. Erythroid and myeloid-specific gene signatures were defined as genes having FDR values <0.05 and log 2 (FC) > 0.5 in ≥20 and 17 comparisons, respectively. In total, 100 erythroid genes and 135 myeloid genes were identified from this single-cell dataset (Supplementary Table 3) and were used to compute the scores presented in Extended Data Fig. 5g-j. TP53 target gene score. Genes downregulated in TP53-multihit compared to TP53-WT cells (defined as per 'differential expression analysis' section above; related to Extended Data Fig. 4b) and p53 targets positively regulated from ref. 77 were used to compute a TP53-target gene score presented in Extended Data Fig. 5k.

Prognostic signatures and Cox-regression survival models
LSC signature score. The 17-gene LSC17 gene set was retrieved from ref. 31. For each sample, the LSC17 score was defined as the linear combination of gene expression values weighted by their respective regression coefficients.
To identify TP53-sAML LSC signatures from our TARGET single-cell dataset, two different approaches were used. First, differentially expressed genes were identified as overexpressed in all Lin -CD34 + TP53-multihit cells regardless of their transcriptional classification (p53-all-cells) versus MF, HD and TP53-WT preleukemic cells; this gene set consists of 29 genes (Supplementary Table 4a). For the second approach, the same analysis was performed, but TP53-multihit cells transcriptionally defined as LSCs (falling in the LSC-like cluster; Fig. 2a, middle) were specifically selected; this gene-set is comprised of 51 genes (p53LSC; Supplementary Table 4a).
Next, lasso cox regression with tenfold cross-validation implemented in the glmnet R package (version 4.1-1) was used to identify p53-all-cells and p53-LSC genes that were associated with survival and to estimate their respective regression coefficients 86 . Specifically, Harrel's concordance measure (C-index) was used to assess the performance of each fitted model during cross-validation. The best model was defined as the fitted model with the highest C-index. Subsequently, the coefficient for each gene estimated using the best model was used to compute the gene signature scores. Only genes with nonzero coefficient values were included in the final gene set. In total, 9 and 44 genes were retained from the p53-all-cells and p53-LSC gene sets, respectively. For each sample, the gene signature score for each gene set was defined as the linear combination of gene expression values Article https://doi.org/10.1038/s41588-023-01480-1 weighted by their respective regression coefficients 31, 86 . The list of p53-LSC and p53-all-cells gene signatures is provided in Supplementary  Table 4b.
Survival analysis. For each gene expression signature, patients were first split using the median gene expression signature score. This resulted in two groups of patients, namely patients with high expression scores (greater than or equal to the median) and patients with low expression scores (lower than the median), exemplified in Extended Data Fig. 6a,b. The Cox proportional hazards regression model implemented by the survival R package (version 3.5-5) was fitted to estimate the hazard ratio associated with each feature. The log-rank test was used to test the differences between survival curves. The features analyzed here were LSC17, p53-all-cells and p53-LSC signatures. Patients with low gene expression signature scores (below median) and patients with TP53 WT status were specified as the reference groups in the model. Kaplan-Meier curves were plotted using the survminer R package (version 0.4.9) to visualize the probability of survival and sample size at a respective time interval. The medium was changed weekly and after 6 weeks of culture, cells were washed in IMDM + 20% fetal calf serum (FCS) and plated into 1.4 ml of cytokine-rich methylcellulose (StemCell Technologies, MethoCult H4435). Colonies were scored 14 days later under an inverted microscope, and each colony was classified according to its morphology as BFU-E (Burst-forming unit erythroid), CFU-G (Colony Forming Unit granulocyte), CFU-GM (granulocyte-macrophage), CFU-M (macrophage) or CFU-GEMM (granulocyte, erythrocyte, macrophage and megakaryocyte). Selected colonies were used for cytospin and genotyping as outlined below.

Short
LTC-IC colony genotyping. LTC-IC colonies were picked from methylcellulose media, washed, resuspended in 10 μl of PBS and transferred to individual wells in a 96-well PCR plate. 15 μl of lysis buffer (Triton X-100 0.4%, Qiagen Protease 0.1 AU ml −1 ) was added to each well, and samples were incubated at 56 °C for 10 min and 72 °C for 20 min. A 3 μl aliquot from each lysate was used as input to generate a targeted and Illumina-compatible library for colony genotyping. The preparation of single-cell genotyping libraries involves three PCR steps. In the first PCR step, target-specific primers spanning each mutation of interest are used for amplification (Supplementary Table 6a); in the second PCR step, nested target-specific primers (Supplementary  Table 6b) attached to universal CS1/CS2 adapters (forward adapter-CS1: ACACTGACGACATGGTTCTACA; reverse adapter-CS2: TACG-GTAGCAGAGACTTGGTCT) further enrich for target regions; and in the third PCR step, Illumina-compatible adapters containing sample-specific barcodes are used to generate sequencing libraries.
On day 12 of the culture, cells were stained with the antibodies detailed in Supplementary Table 7c (combination indicated as Panel C). DAPI was used for dead cell exclusion before acquisition on a FACSCanto II (BD Biosciences) instrument and on a BD FACS Diva software (version 8.0.2). Analysis of FACS data was performed using Kaluza (version 2.1, Beckman Coulter) software.

Quantitative real-time PCR in shRNA experiments
In TP53 knockdown experiments, RNA from either CD34 + cells sorted after transduction or bulk cells at day 12 of culture was extracted using Direct-Zol RNA MicroPrep Kit (Zymo Research) and reverse transcription was performed with SuperScript Vilo cDNA Synthesis Kit (Invitrogen). Quantitative RT-PCR was performed on a 7,500 real-time PCR Machine using SYBR-Green PCR Master Mix (Applied Biosystems). Expression levels were normalized to PPIA (housekeeping gene). Primers used are listed in Supplementary Table 6c.

Xenotransplantation
Purified CD34 + cells from AML patients were transplanted via retroorbital vein injection in sublethally irradiated (1.5 Gy) NOD.CB17-Prkdcscid IL2rgtm1/Bcgen mice (B-NDG, Envigo) (female, 8 weeks old, n = 1 for IF0131, n = 3 for GR001). All experiments were approved by the French National Ethical Committee on Animal Care (2020-007-23589). Blood cell counts were performed monthly by submandibular sampling of mice with blood chimerism assessed by flow cytometry using hCD34, hCD45 and mCD45 antibodies (Supplementary Table 7b, PDX PB panel). The following endpoints were applied: >50% of human blast cells in the blood, abnormalities of blood cell count (hemoglobin <7 g dl −1 , platelets <150 × 10 9 l −1 or white blood cells >60 × 10 9 l −1 ), altered general conditions or >15% of weight loss. At sacrifice, BM was stained with the antibodies listed in Supplementary Table 7b (PDX BM panel) and HSPC fractions were sorted on an Influx Cell sorter (BD Biosciences).

Evaluation of cell morphology
Cell morphology from PDX models (Extended Data Fig. 3d) and in vitro LTC-IC cultures (Extended Data Fig. 7f) was assessed after cytospin of 50-100,000 cells onto a glass slide (5 min at 500 r.p.m.) and May-Grünwald Giemsa staining, according to standard protocols. Images were obtained using an AxioPhot microscope (Zeiss).
Poly(I:C) and LPS were administered during weeks 6-8, 10-12 and 14-16 (setting 1), or during weeks 7-8, 11-12 and 15-16 (setting 2) post-transplantation. Within each round, injections were spaced one or two days apart. Blood cell counts and analysis of PB chimerism along with mature lymphoid and myeloid populations were performed every 2-4 weeks by submandibular sampling of mice, while BM chimerism and HSPC populations were analyzed 18-20 weeks after transplantation. The antibodies used are detailed in Supplementary  Table 7d. For dead cell exclusion, 7-AAD (Sigma-Aldrich) or DAPI (BD Biosciences) were used. FACS analyses were carried out on BD Fortessa or BD Fortessa X20 (BD Biosciences) and profiles were later analyzed using FlowJo (version 10.1, BD Biosciences) or Kaluza (version 2.1, Beckman Coulter) software.

LSK apoptosis and cell cycle
BM LSK cells (setting 2) were stained with Annexin-V and DAPI in Annexin V binding buffer 1X (BD Biosciences) for apoptosis analysis. BM LSK cell cycle was assessed by flow cytometry using Ki-67 and DAPI staining, after fixation and permeabilization (BD Cytofix/Cytoperm and Permeabilization Buffer Plus, BD Biosciences).
The analysis of the M-FISH hybridized cells was blinded. The cells on each slide were scored for the presence of structural aberrations (translocations, and/or derivative chromosomes and fragments) and/or numerical abnormalities. The presence of more than 40 chromosomes per cell was considered a numerical abnormality, except for cases where it could clearly be attributed to the presence of adjacent metaphases. Chromosome counts lower than 40 were not scored as numerical abnormalities for the impossibility to rule out technical issues (that is, metaphases bursting at the hypotonic step). We scored as follows: translocations and presence of one chromosome plus one or more extra chromosomal fragment(s)/derivative(s) as 'structural abnormalities' (except for sex chromosomes); presence of two chromosomes (or one in case of sex chromosomes) plus one or more extra chromosomal fragment(s)/derivative(s) as 'partial chromosome gains'; two chromosomes (or one in case of sex chromosomes) plus one or more extra chromosomes as 'whole chromosome gains'; two chromosomes plus two chromosomes with at least five different chromosomes present in number = 4n as 'tetraploidy or sub-tetraploidy'. Counts of numbers of karyotypic aberrations per cell were performed scoring every type of event occurring on one chromosome as a single event (that is, presence of four chromosomes is counted as one aberration).

IFNγ ELISA assay
WT mice were injected intraperitoneally with a single dose of 200 μg poly(I:C) and spleens were collected from injected mice and nontreated controls 4 h after injection. Spleens were processed into a single-cell suspension in 200 μl PBS, spun down at 500 g for 5 min and supernatant was collected and used as spleen serum. IFNγ levels were assessed using mouse IFNγ Quantikine ELISA assay (R&D Systems, MIF00) following the manufacturer's instructions. Optical densities of 450 nm and 540 nm were determined using Clariostar microplate reader (BMG Labtech).

Statistical analysis
Statistical analyses are detailed in figure legends (Figs. 2-6 and Extended Data Figs. 4-10) and performed using GraphPad Prism software (7 or later version) or R (version 3.6.1 and 4.0.5) software. The number of independent experiments, donors and replicates for each experiment are detailed in figure legends.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Nature Genetics
Article https://doi.org/10.1038/s41588-023-01480-1 Extended Data Fig. 2 | TARGET-seq sorting strategy and phylogenetic reconstruction of clonal hierarchies in TP53-sAML patients using a Bayesian model. a, Sorting strategy for TARGET-seq: Lineage -CD34 + cells were sorted into 384-well plates for subsequent library preparation. Selective enrichment of immunophenotypically defined populations (HSC: CD38 − CD90 + CD45RA − ; CD117 − ) is indicated with orange boxes. b-o, In each panel, corresponding to a different patient sample, the phylogenetic tree computed using SCITE is shown on the left and the number of cells mapping to each clone on the right. "pp" indicates the posterior probability of each consensus mutation tree, and the probability of each genotype transition is indicated inside each square for each mutation. The size of the circles is proportional to the size of each clone and is colored according to the genotype indicated. The number of cells mapping to each clone is indicated in each circle and the type of TP53 clonal evolution (biallelic mutation, hemizygous, parallel or JAK2-negative) below each patient's ID. (*) indicates patients for which the high clonality of the sample prevented the faithful reconstruction of the order of mutation acquisition. Horizontal lines indicate mutation acquisition where none of the experimentally-detected clones matched that particular combination of mutations and therefore the order of mutations cannot be reliably determined. Due to selective enrichment of certain subpopulations of cells (a), the numbers of cells assigned to each subclone in this figure is not necessarily representative of overall clonal burden, and early clones are likely over-represented due to selective enrichment of preleukemic HSCs. In contrast, the relative subclone percentages displayed in Fig. 1 for the same patients have been corrected according to each populations' frequency in the Lin − CD34 + compartment. p, Schematic representation of the strategy to reconstruct integrated clonal hierarchies based on single-cell TARGET-seq genotyping and inferCNV transcriptomic-based CNAs. q, Representative example of combined mutation and CNA hierarchies for patient IF0131, in which three cytogenetically-distinct subclones were detected. Corresponding cytogenetic lesions detected at the bulk level through high-density SNP arrays are shown in the bottom panels.

Nature Genetics
Article https://doi.org/10.1038/s41588-023-01480-1 Extended Data Fig. 9 | Clonal evolution and molecular signatures of TP53-mutant patients at chronic phase. a-b, Flow cytometry profiles of the Lin − CD34 + HSPC compartment in two CP TP53-MPN patients without evidence of clinical transformation (a) and in a representative paired chronic phase (b, up; pre-TP53-sAML) and acute phase (b, bottom; TP53-sAML) sample (Related to Fig. 4a). c-d, Percentage of immunophenotypic HSPC populations in normal donors (n = 8), CP TP53-MPN (n = 4) and pre-TP53-AML patients (n = 5) (c) or in the 5 paired pre-TP53-AML and TP53-AML samples (d). None of the population frequencies are significantly different (p < 0.05) between patient groups by multiple unpaired t-test analysis. In (c), barplot indicates mean ± s.e.m. e-h, Phylogenetic reconstruction of clonal hierarchies in CP TP53-MPN patients from single-cell TARGET-seq genotyping data. In each panel, the phylogenetic tree computed using SCITE is shown on the left, and the number of cells mapping to each clone for each patient, on the right. "pp" indicates posterior probability or each consensus mutation tree, and the probability of each genotype transition is indicated in the square for each mutation. For patient IF9118 (h), baseline (left) and 4 years of follow-up (right) samples are shown separately. i-m, Phylogenetic reconstruction of clonal hierarchies in pre-TP53-AML patients from single-cell TARGET-seq genotyping data (related to Extended Data Fig. 2). In panels (e-m), the size of the circles is proportional to each clone's size, and is colored according to the genotype indicated in the genotype key. Blue boxes indicate TP53heterozygous clones used for the analysis presented in Fig. 4c-e. n, Expression of interferon receptors in TP53-heterozygous cells from CP TP53-MPN (n = 273 cells) and pre-TP53-sAML patients (n = 296 cells). "p-adj" indicates adjusted p-value from combined Fisher's exact test and Wilcoxon tests, calculated using Fisher's method and adjusted using Benjamini & Hochberg procedure; "fc" indicates fold-change (related to Fig. 4d,e). Violin plots indicate log2(counts) distributions and each point represents the expression value of a single-cell. o, GSEA of inflammatory pathways in TP53-mutant heterozygous (n = 284) and homozygous (n = 622) cells from patients GH001 and GR005 at the pre-TP53-sAML stage. NES: Normalized Enrichment Score. FDR: False Discovery Rate.